Thursday, February 28, 2008

Disconnecting from E-Mail

One nice thing about traveling to New Zealand was that I spent several days disconnected from e-mail. I went a whole 80+hours without checking my mail on both ends of the trip. It was nice. (I could have checked mail, but in New Zealand it seems like all the hotels make you pay for Internet access. I didn't feel like paying. Of course, I had e-mail available at the workshop.)

The experience has suggested to me that I ought to try moving to limit my e-mail during the day. It seems more productive to set aside time to specifically deal with e-mail; maybe first thing in the morning and last thing before leaving the office. I'm not sure how this would work with other people, though; many people treat e-mail like it's a phone call, an immediate connection (including my wife), when of course it's not. I often see "emergencies" pop up in my e-mail box; while I was gone, I was asked to call somewhere for my opinion of a job candidate, and an NSF director wanted me to expand on my research nugget (as usual, by yesterday preferred, by tomorrow would be OK). These people didn't need to get in contact with me that second, but there seemed to be the clear expectation that I'd see their message and move on it within a small number of hours. Is that a realistic expectation?

Of course, these weren't real emergencies, and making a practice of leaving e-mail aside and blocking my e-mail time better would probably increase my efficiency, and possibly my happiness, since I would feel less that I was constantly being interrupted.

Wednesday, February 27, 2008

Of Mice and Computers

In the last of my posts about New Zealand, I'll talk about Mike Langston's talks on computational biology. He talked a lot about the style of computational biology. The difficulty of getting data unless you form a real partnership with people (who view their data as very valuable), the noisiness of the underlying problems, the need to worry about entire computation systems with memory/compute power/specialized hardware rather than individual algorithms, the compromises one has to make between theory and making things actually work, and so on. As I listened, it dawned on me, there's another area of research where I hear about similar issues -- search engines and computational advertising. The similarities sound uncanny.

The similarities reached the level of specific problems. The technical aspects of the talk were about heuristics/methods for maximum clique and biclique (bipartite clique) on real data. I've certainly heard of biclique coming up in search engine papers. Langston claimed to have the fastest bipartite clique algorithm in practice (at least on the biological data he was interested in). I wonder if there's room for crossover between these two fields?

The talk left me with a feeling that I've had before when seeing computational biology talks. The field seems interesting, but it looks like to have a real impact, you really need a lot of devotion to learning the underlying biology and making connections with working biologists. It doesn't seem like a good field for dabbling. So far, I haven't found myself ready to take the plunge and try to get involved with the field. Luckily for me, I still have other interesting things to do.

Tuesday, February 26, 2008

Practical Graph Isomorphism

Continuing my efforts to prove I was not just sitting out by the pool in New Zealand, I'll mention a highlight of the talks of Brendan McKay (of the Australian National University). Brendan has written code for graph isomorphism called nauty that works quite well in practice, handling graphs with up to millions of nodes. So while the complexity of the graph isomorphism problem is not known, it can be pretty hard to come up with bad examples that are actually hard to solve. As someone who's happy with good heuristics, I found it quite amusing to watch it in action. (On the other hand, it seems that many of the ideas behind this program are "old" by computer science standards. Are there any competing programs/approaches out there? Seems like a possible research direction. to revisit the algorithm engineering for this problem...)

While settling whether graph isomorphism is in P would certainly be interesting, Brendan pointed out that the question of whether it is in coNP is also quite important. Are there polynomial-sized certificates that can be used to prove that two graphs are not isomorphic? And one can imagine in practice one would like to have some methodology for easily convincing someone that two graphs aren't isomorphic.

As Brendan is a co-editor-in-chief of the Electronic Journal of Combinatorics, we also talked a bit at lunch about electronic journals (in particular the EJC). The EJC continues to grow successfully. (I'm happy to say I've published in it, back in 2004.) Remember to support your electronic journals.

Monday, February 25, 2008

Voting and Anti-Voting

While at NZIMA, I listened to Dominic Welsh's talks on Markov chains. It was a general talk, with a large chunk devoted to rapid mixing and applications. At some point he brought up the voter and anti-voter model, problems that I heard about a long time ago in the distant past, and that raised some questions in my mind.

The voter model can be expressed as follows. We have an undirected graph G, where each vertex has a label. For convenience, let's assume the labels are in {0,1}, although larger alphabets are possible. At each time step, a vertex x is chosen uniformly at random, and x chooses a neighbor y uniformly at random. Then x changes its label to match y's label. This is a Markov chain with absorbing states with all vertices having the same label. It's natural to ask things like what is the probability that you end up in the all 0/all 1 state from a given configuration.

If the graph is d-regular, and you start in a state with k 0's and n-k 1's, a simple argument gives that the probability of eventually being absorbed into the all 0 case is k/n. (Exercise Left To Reader.) Things seem to get more complicated for more general graphs. Here are some interesting questions that come to mind. Note: I don't know which are open and which aren't...
  1. Given a graph G and a starting configuration of labels, is there a polynomial time algorithm for computing the probability of being absorbed into the all 0 state? Or could a hardness result be proved? [My guess is there's a hardness result in there somewhere.]
  2. Given a star graph (one node in the center, connected to n-1 other vertices) and a starting configuration, what is the probability of being absorbed into the all 0 state (preferably in some natural closed form)? In particular, if you start with a 1 in the middle and a 0 everywhere else, what is this probability as a function of n? Notice that by symmetry here one can easily compute these values in polynomial time; the question is whether there are pleasant equations. [On some reflection, it's clear there will be nice equations; this is almost a birth-death chain with an added "phase" needed to track the center vertex. Exercise left to reader...]
  3. Similar questions to 1/2, but ask about the expected time/distribution of time until absorption.
Perhaps more interesting is the anti-voter model, which works just like the voter model, except that x changes its label to disagree with y's label. (Here the {0,1} alphabet is the most sensible.) Now we do not have absorbing states; we have a stationary distribution. But again one can ask similar questions:
  1. Given a graph G and a configuration of labels, is there a polynomial time algorithm for compute the stationary probability for that state?
  2. Are there natural families of graphs for which the stationary distribution for the anti-voting model has an easily expressible form? [d-regularity no longer seems to be enough...]
Obviously, one can argue about the importance of these questions, but I would just take the point of view that such easily expressible and seemingly natural questions about basic Markov chains are always interesting in their own right.

Saturday, February 23, 2008

New Zealand (NZIMA)

For the last week, I've been at a workshop in New Zealand, sponsored by the New Zealand Institute of Mathematics and Its Applications. They're having a programme in algorithms, including this workshop, and they invited me to give a couple of talks. (I'll be giving versions of my "Brief History of Power Laws" talk and my "Bloom Filters Survey" talk.)

OK, OK, I know, this violates my "avoid-travel" rule. Occasionally, I find myself suckered into travel for various reasons. In this case, the managed to time the workshop over my kids' winter break, in the middle of Boston winter (and the end of New Zealand summer), while it's winter. (My wife, also a native Californian, does not enjoy winter.) Besides the weather motivation, I've never been to New Zealand, and don't see a lot of future opportunities. So why not?

I'm happy to say we've had a really nice time in New Zealand, and I encourage visitors. Let me advertise that there will be a similar workshop sometime at the end of 2008 here as well. Perhaps if you're interested you might contact Mark Wilson, who will also leave comments with pointers and information. And the rest of the week I'll have some notes/ideas/open problems from the week in New Zealand.

Wednesday, February 20, 2008

Conference Reviewing: Another Rant on Competitive Analysis

Because of an inadequate lack of attention regarding conference deadlines on my part, I find myself currently reading lots of papers while concurrently serving on the SPAA and ICALP Program Committees.

And somehow, this means I'm reading lots of papers with competitive analyses of algorithms. (Both the standard variety, and the more new-fangled but generally similar "price of anarchy" variety.) Looking back on it, I should have simply have told the PC chairs that it might be best to make sure not to assign me any such papers, as I'm not sure it's fair to the authors who get me as a reviewer.

OK, I'm exaggerating. Don't get me wrong; I'm not automatically assigning negative scores to any such paper. Some of these papers I really like. But in general, as I've expressed frequently elsewhere, I'm a skeptic when it comes to competitive analysis. Why exactly should anyone care that you have an algorithm that in the worst case always gets within a factor of 6.5 of the optimal solution possible? Why is that the right way to judge an algorithm? (Never mind the offshoots, like those where you have an algorithm that's competitive if it has twice as much as memory as you allow for the optimal algorithm, and so on...)

In my mind, the problem is that competitive analysis has become such a standard in Theoretical Computer Science that people don't feel the need to bother justifying anywhere in the paper why they should use competitive analysis, when in fact these papers scream for a need for motivation.

Here are some motivations that tend to work for me when reading competitive analysis papers.

1. The resulting problem is just so mathematically interesting that we really have to look at it. (One person I can think of that manages to pull off this motivation with me time and again is Susanne Albers.) Generally, to pull off this motivation, your problem should be simple, clean, and natural; if you start to bog down in details, claiming that you're giving a "more realistic" model of the problem, you've already missed the boat in my mind. (If you really wanted to be realistic, you wouldn't be using competitive analysis...)
2. You are considering the performance of real, existing algorithms. This seems reasonable to me; here's something people actually do (like, say, Shortest Job First scheduling), and you want to gain some insight into its worst-case performance for this notion of worst-case. Many price-of-anarchy results actually correspond to the behavior of reasonable algorithms one might consider in practice, so I admit I often find myself more sympathetic to Price of Anarchy results, which seem more motivated.
3. You are using competitive analysis for an essentially (complexity)-theoretical result.
4. You are using competitive analysis to gain significant insight into the problem (and algorithms for the problem) that could be useful in understanding real performance or designing real algorithms. I think most papers implicitly claim that their paper fits into this last category, just because it obviously doesn't fit into the other three, but usually this isn't explicitly discussed. I think this motivation works best when the problem is inherently very hard.

I'm sure there are other reasonable motivations. But overall I would call for more restraint from the community; not everything should be subjected to competitive analysis, just because it can.

Sunday, February 17, 2008

Some News from Harvard

A few interesting pieces of news from Harvard.

First, Harvard is setting up a scheme to avoid restrictive access policies of some journals. Essentially, as I imperfectly understand it, Harvard is obtaining from the faculty a non-exclusive right to disseminate articles written by the faculty. The intention is that a Harvard faculty member should be able to say to any journal that insists on having an exclusive copyright to an article, "That's fine, but I work at Harvard, and as such Harvard has a non-exclusive right to my work, which will be placed in an open repository." Stuart Shieber, a Harvard computer science professor and a strong proponent of open access, was behind this faculty legislation. Perhaps all the universities can get together and give a message to the journals that they are the ones that actually pay the faculty, and they will work against journals where the business model depends on restricting access to the research to only those who pay a monopoly-based fee.

Second, the Dean for the School of Engineering and Applied Sciences at Harvard, Dean Venky, is stepping down. Venky and I began about the same time at Harvard, and he's done a lot for improving the visibility and status of computer science and engineering at Harvard. I hope we can find a new Dean that can continue the push to build up these areas at Harvard.

On Valiant

I was avoiding posting on Les Valiant's EATCS award and the obvious question about the somewhat odd fact that he hasn't yet been given his Turing Award yet because of the amazingly obvious bias I would have (such as being on the same faculty with him). But now that it's shown up on the complexity blog, I feel unrestricted.

From my standpoint, Les is an "old-school" scientist. He spends a long time thinking very deeply about difficult problems, trying to come up with results or frameworks that will fundamentally change how we think about something. And as the list of his results show (see the complexity blog post and comments; whatever you think Les has done, he's done even more than that), he's been amazingly successful at it; he's had several once-in-a-lifetime results. As part of his style, he doesn't care about his paper count or h-index. This type of science is incredibly risky, and not for everyone. But it's a remarkable example of what's possible.

Tuesday, February 12, 2008

Class Demographics

This year, my undergraduate algorithms and data structures class is about 25% women. This is significantly more than average (enough that I noticed). Based on previous experience (and initial impressions from the first few lectures), these women will be very strong candidates for graduate school should they choose to apply when they graduate (one or two years from now). (I could simply be suffering mental bias from the various studies I've read, but my recollection from prior years is that when there's a larger cohort of women in the class, they generally perform better overall. I leave others to postulate on causes and effects, or to argue whether my data point is unrepresentative.)

Saturday, February 09, 2008

Why are You Doing Research (and the CRA blog)

Despite the last round of budget horrors, the CRA is optimistic about 2009. While I'd like to believe it, I'll believe it when I see it. The other good news is CDI seems to be going forward roughly as planned.

In an effort to spur further comments, I'll have to say I was disappointed by the discussion that accompanied my last pointer to the CRA blog. It seems we have a lot of self-deprecating sorts in our field who don't think what we do is important enough to be funded by the government. Even after the great successes of the last 20 years, many of which have ties both direct and indirect to government funding. I don't get it.

In my opinion, government's most important role is to do things for its citizens that individuals can't do adequately by themselves. That's why national defense is a government job. And so is basic research. Basic research is important for national defense, as well as for the economy -- both in national defense terms (the bigger/better our economy, the better our national defense), as well as for feeding the homeless (unless we keep moving forward and developing, there's going to be a lot more homeless to feed). For those who think that feeding/caring for the poor is more important than funding basic research, I'd ask 1) isn't it more efficient for charities/local organizations rather than the national government to do this (except in extreme, Katrina-like circumstances) and 2) where do you think the economic advancement that will keep the country going (so your kids aren't hungry or homeless) is going to come from?

Some comments were of the form "there are so many other things to be funded, why fund us?" (Let's say us means "computer science", though one could make the case for basic science more generally.) First, our success record is pretty darn good. (I'm confused by people who don't recognize that -- as if none of the work we've done has had an impact on the world.) Second, all the other sciences are becoming more computational; I believe Chazelle's riff that algorithms will increasingly become the language of science. Funding us should help all the sciences. Third, well, see the above paragraph.

So (and remember, following recent comments, I'm aiming to be "controversial" and "less nice"), I'll end on the following thought. Certainly, I do research because I enjoy it and am reasonably successful at it. But if I thought it wasn't actually important, I'd either go find a job that paid a lot more, or go find a job that I thought meant a lot more. I've known people that have done each of those. For those of you who really, honestly feel that CS research is mostly a waste -- and are still working in the area -- why are you still around?

Thursday, February 07, 2008

Class Reviews

I just received the class reviews for my Randomized Algorithms class. Historically, I have found my reviews are bimodal. Some students describe me as their favorite professor (and end up taking all my classes). Some students describe me in fairly unflattering terms. This year was pretty much the same.

The most signal from the noise was worth hearing, although pretty much stuff I already knew. First, students would like more of my time. I apologize for being busy, I suppose. Second, it would have much better to have had a TA. I agree (you think I enjoyed grading problem sets???). Sadly, my first choice TA was (rightly) busy working on his thesis, and there wasn't really another appropriate graduate (or undergraduate) student. But going without a TA is not something I'll try again.

A lot of the rest of the comments are lost in the jumble that comes from teaching to a class that serves a wide audience, not just theory folks. Some thought it was too fast, some too slow. Some liked that we pretty much followed my textbook, some thought I should let people read the textbook on their own and cover other stuff. Most found the problem sets a "challenge", which I think is just fine.

Class reviews never answer the question I really want answered. Five years out, was my class worthwhile? Has it had any impact on your research/job/worldview/life? Or even if you don't see an impact, do you remember it as a worthwhile learning experience? I don't care if you come out of the class thinking I'm some sort of sadistic homework machine (but really, I'm not...), if a few years down the road you find it was all worthwhile.

So ex-students -- not just MY ex-students, but all ex-students -- if you have a class you particularly liked several years back, a class that sticks in your mind or a class where the material has proven really useful to you later on, please send an e-mail to that professor and let them know. Anonymously is fine. That's the kind of feedback that means a lot to teachers. And you can even do it if you had a course you particularly disliked -- at least I wouldn't mind hearing from students who years down the line still think that I did a terrible job if they have constructive contributions to offer.

Wednesday, February 06, 2008

On Comments

I've noticed a phenomenon surprising to me on this blog: comments trickle in regularly on older posts. Whether that's from new readers discovering the blog and going back over old posts or regular readers who take a while to think about what they want to say, I thank you, and welcome your comments and questions always. (OK, I don't always have time to answer all the questions...) But it's interesting to me that the active lifetime of a post can be substantially more than a week.

More generally, I would like to figure out how to get more people commenting on the blog. I try to discuss issues here where there can be a variety of opinions and room for tangents, and I think the most interesting part of the blog is in the discussions. I'm genuinely surprised when I go to conferences and people tell me they read the blog, since in my mind the corresponding comment-level seems sparse.

So feel free to comment on what I can do to make the blog more comment-friendly.

Tuesday, February 05, 2008

More on the Shopping Period

As I mentioned two posts ago, Harvard has an unusual tradition of a "shopping period" -- you don't preregister for a class, but you choose classes after the first week, after you've had a chance to go to a lecture or two and see how you like it.

As a student, I loved it. What better way to get an idea if you'll like a class than to go, hear the professor, check out the syllabus, see who else is thinking of taking the class, etc. It makes choosing classes much more flexible. Instead of switching out of classes you've already chosen without seeing, you choose later. Part of that benefit might just be psychological -- most schools allow you to change courses in the first few weeks fairly easily -- but there is a marked difference between changing your classes and choosing your classes, especially if a student has to fill out forms or get signatures to change a class. The openness of the first week is a real benefit to students. I can understand why something like shopping period just might not be feasible for some very large schools, but I think it's a shame more schools don't do it.

Several years ago, there was a movement by the administration to introduce preregistration and get rid of shopping period. I was on the Committee for Undergraduate Education and the Faculty Council, and when the idea was first brought up I spoke against it, only to find that the issue didn't seem up for discussion; it apparently had been "decided" higher up. (It's things like this that helped make the Presidency of Larry Summers so unpopular, as opposed to some of the supposed reasons popularized in the press.) I was surprised that so many faculty on these advisory committees seemed willing to go along with the idea. It was massively unpopular among Computer Science faculty; we like students being able to choose their courses.

Overall, naturally, students didn't seem to like the idea. The administration's main argument seemed to be that it would allow more accurate predictions of class sizes in advance, so Teaching Assistants (and, in some classes, classrooms) could be assigned more readily and efficiently. (Here's an old Crimson opinion giving both sides of the issue.) This was around a time period where there were murmurs of graduate student unionization, and that might have been influencing the administration's mindset. Of course, nobody in the administration had an answer when I asked what prediction mechanisms they were using now, and if there was any evidence that preregistration would help predictions any. (I wasn't the only one asking this question. This was another reason the CS faculty in particular were against the idea; they saw no reason for it. It's in interesting problem to design an enrollment predictor; one semester, Stuart Shieber ended up running a projects class to find solutions for the problem.)

A funny thing happened, though. The change had to be approved by the faculty, and while I seemed to be a lonely voice with objections in these committee meetings, apparently a lot of faculty didn't actually like the idea. Instead of it being a quick and simple vote like the administration seemed to expect, the faculty meeting was a disaster. Eleven faculty spoke on the issue; ten spoke against it (including, I'm happy to say, me). Quietly, pre-registration was dropped as an issue, and shopping period continues.

Monday, February 04, 2008

Microsoft Cambridge Lab II

It's officially hit the wires, Jennifer Chayes and Christian Borgs will be heading up a new Microsoft lab in Cambridge II -- that is, here by Harvard and MIT. The proposed theme is interesting (and, happily, very appealing to me) -- algorithms with an emphasis on social networks and algorithmic game theory.

It opens this summer. I can't wait to see how it plays out.

Saturday, February 02, 2008

Shopping Period Lecture

One of the great Harvard "traditions" that I enjoyed as a student is the "shopping period". Students don't pre-register for classes; they spend the first week "shopping" whatever classes they like, and then choose what ones they want to take. (I'll talk more about the "politics" of shopping -- the pros and cons -- next time.)

Because of this, rather than dive right into material the first class for my Algorithms and Data Structures class, besides going over the syllabus and requirements, I do something that at least I consider fun. We talk about how to get fair bits from a biased coin. The class starts with the classic brain-teaser: suppose you have a coin that may be biased. How can we flip that coin to decide something fairly, like who should pay for lunch?

[The simple solution to this question, unless I'm mistaken, is commonly attributed to von Neumann.]

Starting from there, I try and take the class through a series of questions, leading up to how to efficiently extract lots of random bits from a sequence of biased flips. The method I base the lecture on is due to Yuval Peres [(see "Iterating von Neumann's Procedure for Extracting Random Bits," Annals of Statistics, March 1992)], and I learned about it at some point in graduate school at Berkeley. I try to run this lecture in a very back-and-forth manner, asking questions of the students and trying to get them to answer. (I also do this a bunch during the semester, with varying degrees of success...) Here's a version of my notes, with the various questions.

For the students who decide not to take the course, I figure at the very least they've learned something interesting that they can take with them. Also, it's conceivably the only time students will hear the word "entropy" in a computer science class, so I think it's worthwhile for that alone. Somehow, this problem fascinates people. Way back when I had more energy, I wrote a Dr. Dobb's article on it to show it to a wider audience, and there's been lots of related research on the problem. In some sense, this problem is the pre-history of all the randomness extraction work that has come since.