My Biased Coin: June 2007

Saturday, June 30, 2007

Research labs vs. academia

A discussion over at Muthu's blog led to the following great question:

You have a talented graduate who has a faculty job offer and a research lab job offer. Modulo the specifics of the univ and the lab, where should they go?

I think it's a great question, in part because it points out fantastic biases in the system. Of course almost all professors will almost always favor the university position; after all, they chose academia, so there's an inherent bias toward that direction. In my experience, students also are biased toward academia. Almost all students who work for a Ph.D. go in from the beginning with the mindset that they're going to be a professor, and that can be hard to let go of.

But very few people in computer science (both in general, and theory in particular) go on to become professors. There just aren't that many professor jobs. Perhaps one of the first conversations advisors ought to have with students would start, "I'm sure you want to be a professor when you're done here. But just so we're clear, what would be your backup plan?"

Of course, research labs shouldn't be just a backup plan. They're different from academic positions in ways that can be very appealing to many people : no need to continually find funding, no teaching, no tenure pressure, and generally more emphasis on accomplishing things as a group rather than individually. There's no single right answer to the question of whether to go to the lab or the university; it should depend on the personality, life goals, and needs of the student.

And a final thought -- life goals can change, and one can change one's mind later. Many people switch from labs to academia or academia to labs. Some people -- as Muthu himself knows -- can make multiple such switches throughout their career.

Thursday, June 28, 2007

Update from ISIT 2

The winner of the Claude E. Shannon Award, given by the IEEE Information Theory Society for "profound and consistent contributions to information theory", went to Sergio Verdu from Princeton. He is the youngest person to win the award (he is under 50). Apparently, all previous winners are now over 65. As the award recipient, he gave this year's Shannon lecture. His lecture went into ways he likes to teach information theory and relationships to his research. From my standpoint, he talked about various formulas and bounds and how to interpret them; with the right interpretation, the results become simple, and moreover, armed with the insight from finding the right interpretation, you are led to new results and generalizations. Another key point in this mix was the relationship between bounds for fixed length sequences and asymptotic bounds, and how the right interpretations make the connections between the two more transparent. I found it a very interesting talk, although I have to admit, my lack of facility with information-theoretic notation means I missed a lot that would have been nice to get. (The talk went over a lot, fast. I suppose I should teach information theory at some point so that I'll understand it better, and then listen to the talk again.) I highly recommend the talk, especially to those interested in coding/compression/entropy; I'm sure the slides/video will be appearing online shortly.

My student Adam Kirsch gave a talk about a paper with my former student Eleni Drinea on an information theoretic and improved approach to lower bounds on the deletion channel.

I gave a talk on upper bounds for the deletion channel.

I promise to return to the question of "interesting open questions in network coding", preferably with a guest blogger or bloggers more expert than I, in the near future.

After ISIT, I flew into London, where I'm giving a (networking) talk at UCL, hosted by Brad Karp (no relation to Dick Karp), and a group of us went out to fantastic London dinner. (I promised Brad I'd put the fantastic dinner part in the blog.)

Tuesday, June 26, 2007

Update from ISIT

I have been hanging out mostly at the network coding sessions. For those who haven't seen the term, the one sentence description of network coding is that you combine routing with coding -- intermediate nodes do not just store and forward, they get to evaluate functions of received packets and send those on as needed. Why aren't there more CS theory people working on network coding (myself included!)? It's a huge relatively new area in information theory and networking. There are lots of interesting, challenging open problems. Lots more info here. Also, there's a Scientific American article this month by three of the leaders in the field -- Michelle Effros, Ralf Koetter, and Muriel Medard (three top names in information theory that CS people should become familiar with if they are not...) -- that gives excellent high-level background.

The most interesting talk so far was the work by Koetter and Kschischang, which reduces network coding with erasure and errors to an interesting coding variant. In this setting, codewords are vector subspaces; erasures delete a basis vector from the subspace and errors add a spurious basis vector to the subspace. What do codes in this world look like? (Hopefully I'm doing the question some justice -- it was a great talk.)

Michael Luby and Amin Shokrollahi won the IEEE Eric E. Sumner Award "For bridging mathematics, internet design, and mobile broadcasting as well as successful standardization." Go Digital Fountain.

I gave a talk on good codes for limited insertion/deletion channels.

At the lunch, I talked with some people about how the information theory community doesn't really have a "competitive" conference -- like a FOCS/STOC, or a SIGCOMM. They were wondering about how to and the impact of adding one. I suggested adding special single-track sessions (as opposed to the eight parallel sessions) for the key papers to ISIT.

Sunday, June 24, 2007

Traveling: International Symposium on Information Theory

I am a big believer that traditional information theory and computer science theory are becoming much more intertwined. Perhaps, someday, the boundaries will be so fuzzy that we won't really distinguish them. Since I don't think that day is here, I'll describe the International Symposium on Information Theory -- a conference, it seems, most CS people don't know about -- where I'm headed this week.

ISIT is the major information theory conference of the year, by which I mean it's huge. I mean, ridiculously huge. Eight parallel sessions huge. I don't know the actual numbers, but I'd guess attendance is in the large hundreds, nearing one thousand, maybe more. I'll ask the IEEE people when I get there.

We don't have a conference like this in CS theory. If we get 300 people to a conference, it's a big deal. We prefer to split things up into smaller, more coherent units. There are pros and cons to both approaches. The obvious benefits of the big conference is that it gets everyone together.

With smaller conferences, CS theory conferences ostensibly have a higher quality bar. As you might expect at such a big conference, with acceptance rates of 50% or higher, the paper quality varies dramatically. You tend to see a lot more preliminary ideas or work that's not as fleshed out as a STOC/FOCS/SODA paper would be -- of course, in ISIT, you only have 5 pages. On the other hand, you'll also get breakthroughs presented here, like Amin Shokrollahi's Raptor Codes.

I am still a bit of an outsider at ISIT, as it is not my regular crowd, although every year I seem to know more faces. So you could view it as simply self-interest that I encourage more CS people to submit and attend. If you're doing anything related to codes (network coding, list decoding, low-density parity-check coding -- all topics many CS theory people are working on), you probably already know about ISIT -- there are lots of session on coding, naturally. But they also have sessions in cryptography, quantum information theory, compression, sequences and complexity, and network information theory. From that list of topics, you might think it's a CS theory conference! But that's the point -- the two areas are increasingly close, and more interaction, including CS people attending ISIT, would help both sides.

I hope to have interesting news to report.

Friday, June 22, 2007

CCC Update and Funding Opportunity

The Committee for the Advancement of Theoretical Computer Science (or CATCS, a commitee under SIGACT) is trying to spread the word about a new proposal of the CRA Computing Community Consortium (CCC). The proposal is designed to provide seed funding for new visions in computer science, with the goal of developing NSF programs to support these visions.

This may sound rather grandiose, but there's clearly room for theory to contribute. CATCS has been putting a lot of effort into providing vision for the Cyber-Enabled Discovery and Innovation (CDI) program that will be beginning soon, as well as trying to inject more theory vision into the GENI program (as in this report). As individuals and as a community, we have to make an effort to promote theoretical computer science, especially in the early stages when NSF programs are being developed. Funding apprarently begins with having your vision understood and accepted.

If you have a vision idea, please apply. If you're concerned about doing it on your own, contact a CATCS member, and we'll see if we can help get a group together, or otherwise assist in organizing a larger effort. There's an (out-of-date) list of members here, but you can always mail me if you like, and I'll pass things on.

This effort is a further sign that the CCC train is moving full steam ahead. Theory should be hopping on for the ride, and making sure we don't get left at the station. Since I know nothing about trains, I'll stop the metaphor there.

Finally, I have been informed that the slides from the CCC talks at FCRC are on-line. Enjoy!

Thursday, June 21, 2007

What We're Doing Wrong : Programming in Theory Classes

My student Adam expressed some disappointment in my first few weeks of blogging. Knowing how opinionated I am, he was hoping for more incendiary posts. I was planning to avoid controversy until after a few more weeks of building readership, but what the heck, let's see if anyone's listening.

Looking at what I think are currently the major textbooks for algorithms and data structures, Algorithm Design by Kleinberg/Tardos and Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein, I find tremendous number of mathematical exercises, but essentially no programming exercises. (The textbooks do have pseudocode, which is good, but different.) And looking at the web sites of introductory classes in this subject, it seems most don't have programming assignments.

Here's my claim: theory does untold damage to itself every year by not having programming assignments in the introductory classes on algorithms and data structures.

While I could list many reasons why having some programming is didactically a good idea for such classes, here I'm targeting the theory-self-interest point of view. Most majors come into their theory classes having primarily done programming. We like to think we're showing them the beauty of computer science by teaching them the fundamentals of mathematics and modeling that make computer science amazing. But for most students, by not connecting it to what they've previously learned -- programming -- and not explicitly showing them the practical implications of that beauty -- efficiency -- we make it seem like theory is divorced from the rest of computer science. It's an impression many of them seem to carry with them. We're not getting our message across as powerfully as we should, because of a bizarre but apparently common mindset that says theory classes should be above programming.

I urge you, put a major programming assignment or a least a few programming exercises into your algorithms/data structures classes. Theory will be better off in the long run.

Tuesday, June 19, 2007

Cuckoo Hashing, Theory and Practice: Part 3

In two previous posts, I have discussed cuckoo hashing, a natural generalization of multiple-choice hashing that give better space utilization by moving elements around as needed. There were three issues to address in using such schemes in hardware: moving is expensive, the rare failure cases must be considered more carefully, and the analysis should be as tight as possible. I suggested that for hardware implementation, looking at much simpler schemes that allow just one move would be a step forward.

The main reason for this approach is that moves are expensive. It will therefore take some convincing to get practitioners to accept the value of moves. By looking at just one move, we make the cost seem low, and the law of diminishing returns suggests that the gains from that first move should still be high. If we can first talk the practitioners into a single move, and they find it helpful, then perhaps we can see if additional moves are worthwhile. In any case, we can avoid the problem of moves being expensive by limiting them severely.

If we're limiting ourselves to a single move, it stands to reason that sooner or later there's going be trouble, in the way of overflows. We'll try to place an item in the hash table, but all of its buckets will be filled, and there won't be any way to move any of those items out of the way, because all of the other buckets for these items will be filled. Since one move is less flexible, we need to be able to handle overflow problems.

In hardware, overflow can be handled by a small content addressable memory, or CAM. A CAM is a fully associative memory; you can find anything in it essentially instantaneously. Of course, they're expensive, so they have to be small. If overflows are sufficiently rare, a small CAM can hold all of them. In fact, one point of our paper is that CAM's are not generally utilized in theoretical analysis, but they should be; they're a natural addition to the model. In this way, we circumvent the problem of rare but annoying failures due to overflows.

Another advantage in studying single-move schemes is that many variations are nicely amenable to fluid limit analyses. That is, we can set up a collection of differential equations that accurately give what fraction of buckets in the table are loaded at every point in time, allowing us to give very accurate numbers (generally within a few percent) detailing their final performance. Arguably, one could do this with simulation as well, but the fluid limit approach is orders of magnitude faster than simulations. The fluid limit approach gives a compellingly accurate performance description that may persuade practitioners more than big-O arguments. Moreover, it allows us to do interesting optimizations, such as figure out how to size sub-tables within our table using black-box optimization techniques.

If all this seems a little vague, that's because this is a blog. For details, you can now see the paper Adam and I are submitting. The end result is that allowing a single move can easily save 1/3 of the space or more; things like optimizing the memory layout can give savings of roughly 10% in space. Our approach seems ready for implementation, and we're eager to get feedback from the real world.

Sunday, June 17, 2007

Anti-PC Travel

Over at the new Computational Geometry Discussion Forum, they are discussing program committees, and Ileana Streinu argues that every PC should have a face to face meeting for final decisions. I couldn't disagree more.

Academic life involves a significant amount of necessary travel -- conferences, talks, NSF meetings, research collaborations. Requiring travel for a PC should be avoided, especially these days when any air travel is an unpleasant hassle, and electronic PC tools (like easychair) are becoming reasonably good. My other main objections:

Travel wastes PC members' time. For people with families, the cost is particularly severe, as overnight travel specifically takes away from family time.
PC travel costs. It either increases the cost of the conference, or PC members have to pay with their own grant money.
Extra air travel is ostensibly bad for the environment. Be aware of your carbon footprint!

What are the supposed benefits of face-to-face PC meetings, and how to respond to them?

It leads to better decisions. This seems to be a matter of faith with some people, but I have seen no real evidence for it. In my experience, online discussions can be as good as face-to-face discussions. In both cases, there are various types of noise, and decisions will not be perfect, but hopefully nearly so.
It saves time, by gathering people all in the same room to make decisions. Given the time required for actual travel, I doubt this is true. Even granting this, it isn't worth the loss in time flexibility. Getting on a plane means missing class, family dinners, etc.
It leads to better feedback. I do not see how this can be possible. Electronic PC meetings allow recording of comments and discussions, which should make it easier to provide better feedback, in the form of appropriate cuts and pastes, in the reviews to the authors.
It provides an opportunity for colleagues to get together. This I can't deny. Face-to-face PC meetings are usually more fun and informative than electronic ones. Does that really make up for the time/cost/annoyance of travel?

So in general, if a face-to-face PC meeting is required, I'll likely be saying no, unless I have some ulterior motive that makes the travel worthwhile.

Friday, June 15, 2007

Cuckoo Hashing, Theory and Practice: Part 2

In the last post, we examined cuckoo hashing, a natural generalization of multiple-choice hashing that give better space utilization by moving elements around as needed. Cuckoo hashing has worst-case constant access time and constant insertion time on average.

In this post, we'll look at some issues that arise when trying to put cuckoo hashing into practice. If I can wax a bit philosophical here (and I can, since it's my blog), putting things into practice is rarely as easy as taking the theoretical result and implementing it. There are always details to be dealt with. Often theorists shy away from (or run away screaming from) such details, which is unfortunate. I believe that more ideas from theory would be applied much more readily if we theory folk put in a little extra legwork to make things easier for the non-theory folk. If we want them to try something new and potentially challenging, we could help by making the benefits as clear and convincing as possible, and by attempting to address natural implementation concerns ahead of time. As a bonus, sometimes new theory problems come out of such efforts.

A natural place to try to apply cuckoo hashing ideas in practice is in routers. Routers can use hash tables for IP lookups, network measurement and monitoring, and other related tasks. If the hashing is to be done frequently, say on a per packet basis, then really the only way to do it without slowing the router down beyond reason is to build the hash table directly into the hardware. So now we have to think in terms of hardware implementation, a mindset which is generally not present in the theoretical analysis.

By the way, besides compelling applications, there's another reason to try to apply these ideas to routers: multiple-choice hashing is already there! I have heard the technique is known and in use at Cisco, though sadly being an outsider I am not privy to any top-secret details.

In thinking about hardware implementations, there are some key positives.

There's no need for pointers. The hash table can be simply stored in an array or a collection of arrays. This is apparently very useful in hardware, and there's no wasted pointer space.
Lookups are trivial, as one knows exactly where to look for an element. The multiple accesses to memory required can be easily parallelized.

So what is holding back cuckoo hashing?

The biggest impediment is cuckoo hashing's biggest strength: all that moving around of items. In many cases, the hash table will be stored in a chip's slow memory. We simply don't have time t0 do all those moves. The average number of moves is constant, which already may be too high. And with some non-trivial probability you'll need a logarithmic number of moves. In most hardware implementations, the worst-case time for an operation is what's important, and the worst-case insertion time is a weakness of cuckoo hashing, even if the table is stored in fast memory.
The failure mode of cuckoo hashing is unpleasant. One thing I glossed over last time is that there's some (small) chance that cuckoo hashing gets caught in a loop; in moving things around to try to find an empty space, it can get stuck in a cycle, moving the same things back and forth repeatedly without ever being able to squeeze in a new element. In the theoretical analysis, this can swept under the rug by saying that after a certain number of steps, declare a failure and re-hash everything with new hash functions. In practical settings, like a router, this is often not a feasible option.
The analysis of advanced cuckoo variants that perform well remains incomplete. Generally, proofs give that you can hold n elements with O(n) space, but the analysis gives up something non-trivial in the constant factors. Arguably, this is only a theoretical concern, but this gap in the analysis makes designing, testing, and optimizing such structures much harder, in that one must rely almost exclusively on simulation. More accurate theory can not only help convince others to use these schemes, it can actively aid in the development and deployment.

The starting point of our solution to these problems is to think small. Specifically, a small number of moves. In fact, the focus of of our submission is on what can be done with just one move. We'll discuss how this helps with the above problems, and what we lose by allowing just one move, next post.

Wednesday, June 13, 2007

Cuckoo Hashing, Theory and Practice : Part 1

One recent line of research I've really been enjoying is the work on cuckoo hashing variants. Currently, my graduate student Adam Kirsch and I are working on a submission tackling the question of how to design cuckoo hashing variants that are more practical in the setting of hardware implementation, such as for routers. So this gives me an excuse to write about cuckoo hashing, explain the problems we see in applying them in practice, and describe the interesting theoretical and practical questions that result. I'll do this over a series of posts.

To start, let us recall the basics of the power of two choices, or multiple-choice hashing. In the standard setting, we are hashing n elements into a hash table with n buckets. Normally, we would hash each item once to determine its bucket; this gives a maximum load, or largest number of elements in any bucket, of about log n / log log n. (All the results I state hold with high probability, so I'll avoid repeating the phrase.) Suppose instead we use two hash functions, giving two choices of locations for each element. Elements are sequentially placed in the least loaded of their two buckets. When we do a lookup, we'll now have to look in two places for the item. In return, the maximum load now falls to about log log n / log 2 -- we've got an extra log in there! (With d > 2 choices, the maximum load would be log log n / log d, just a constant factor better than d = 2.) Results of this type are due to Azar, Broder, Karlin, and Upfal, in their paper Balanced Allocations. (There was some earlier work with a similar result by Karp, Luby, and Meyer auf der Heide.)

In the course of these results, the question arose as to whether one could do better if placing the elements offline. That is, if you had all n elements and their hashes in advance, could you place each element in one of their bucket choices and achieve a better load than the O(log log n) obtained by the sequential placement? The answer turns out to be yes. The best placement only has a constant maximum load. Another way to think of this is to say that if we place the elements sequentially, but retain the power to move them later, as long as each element ends up at one of its bucket choices, we could get down to constant maximum load.

The idea of cuckoo hashing is that such movements can actually be done online quite efficiently. Cuckoo hashing is, essentially, multiple-choice hashing plus moves, and it appears very effective. Let us consider an extreme case, analyzed in the original Cuckoo Hashing paper by Pagh and Rodler. (If my description is unsatisfactory, there is a good explanation already on Wikipedia!) Each bucket can contain just 1 item, and we will have n buckets, split into two subtables of size n/2. Each element has one choice of bucket in each subtable, where the choices are given by independent hash functions. Each hashed element resides at one of its two choices. A lookup therefore always requires at most two accesses into the hash table.

What to do, however, on an insertion -- what if both spaces for an item are already full? When we insert a new item, we put it in the first subtable. If another item is already there, we kick it out, and make it move to the second subtable. If another item is already there, we kick it out, and make it move back to the first table, and so on. This is where the name cuckoo hashing comes from; as the Wikipedia article says,

The name derives from the behavior of some species of cuckoo, where the mother bird pushes eggs out of another bird's nest to lay her own.

Pagh and Rodler prove that if the number of elements inserted is at most (1/2 - epsilon)n for some constant epsilon, then this process finishes in constant expected time! In other words, one can achieve memory utilization of up to 50 percent of capacity, with guaranteed constant lookup time and average constant insertion time.

Further variants significantly improve the memory utilization that can be achieved. Specifically, allowing more than two choices of location helps (more choices of where to go), as does having more than one item per bucket (more choices of what to move). Memory utilization of over 90 percent is easily possible in practice. This paper and this one have more information.

That summarizes the theoretical background, and hopefully convinces you that the idea of cuckoo hashing is a good one. In the next post, we'll look at why you might want to implement cuckoo hashing in hardware, and what obstacles lie therein.

Monday, June 11, 2007

Mission Statement

What will this blog be about?

The main thrust that I hope will differentiate this blog from others is that it will focus on the convergence of algorithms (or, more generally, "CS theory"), information theory, and networks. While I was trained as a computer science theorist, and that's still my primary home, I have been and will continue to be working in all three areas. Intellectually, I think there is a great deal connecting these research areas, and I am often surprised how little physical crossover there is among them, in terms of people and ideas. On the positive side, I think this has been changing dramatically over the last five or ten years; I can throw out many examples of topics like peer-to-peer networks, low-density parity check/expander codes, list decoding, network coding, power law distributions, and Bloom filters where at least two and usually all three of these areas have large intersecting interests. I think this convergence will continue substantially over the years to come, as all of these examples show that exciting things can happen when multiple communities bring their various expertise to bear on problems.

This blog will aim to be a place for people from these research areas (and hopefully others!) to meet and get to know a bit more about each other. While there is likely to be a bit of blog bias toward the CS theory side, I will try to cover important news and events of interest for all three communities.

Other than that, I hope to take a cue from Lance Fortnow's blog, and provide a forum for discussing issues related to research and academics. Teaching, reviewing, editing, consulting, researching, writing, advising, and any other appropriate -ing will be considered fair game. As the blog title suggests, I often have strong if not well-informed opinions on these matters, and I hope that these opinions will inspire some debate within the community.

Saturday, June 09, 2007

CCC Talks at FCRC

I want to encourage everyone at FCRC to go to Christos Papadimitriou's talk on the Algorithmic Lens (Monday, 6-7 pm), one of the five talks being given under the auspices of the new Computing Community Consortium (CCC). The SIGACT Funding Committee helped arrange two workshops in the last year on this general theme; there is promise that there will be a related major NSF initiative in the near future, which (hopefully) will yield more money for research in theoretical computer science.

Personally, I like the phrase the algorithmic lens. Let's all start using it whenever and wherever possible and see if we can get scientists in other fields to adopt it too. Maybe people will start using the word algorithm everywhere, which can only be good for theoretical CS.

Since this blog covers the intersection of theory and networking, I would be remiss to not also promote Scott Shenker's CCC talk on GENI, the Global Environment for Network Innovations (Thursday, same awkward 6-7 pm). There should be opportunities for interesting research (and funding) for theorists embedded in the GENI project as well...

Friday, June 08, 2007

Harvard's New Dean

If you haven't heard the news, Michael Smith, a computer science professor renowned for his work in compilers, programming languages, and security, has been named the next Dean of the Faculty of Arts and Sciences here at Harvard. This is great for many reasons:

It's great for Harvard. Mike has served as associate dean (which equals chair in Harvard-speak) for EE and CS here; he is a fantastic administrator with great insight and vision. He knows how to build consensus and get positive things done. We need that here.
It's great for computer science. Mike will have a large presence in the world of higher education; this will improve the visibility of computer science in the always-important administrative circles.
It's great for applied science at Harvard. Harvard has been talking a lot about improving its applied science programs; Mike has the background and knowledge to continue to make these improvements happen.

However, I can't help but be a bit saddened by his selection.

It's a bit of a blow to computer science at Harvard. We're losing one of the key figures in our department. Sure, technically he'll still be around, but we won't have the immediate benefit of his leadership.
It's a deep personal loss. Mike has been in the office next to me pretty much since I arrived at Harvard. He's been a mentor and a friend. Sure, technically he'll still be around, but I won't be able to just pop my head in next door anymore. I'll probably have to make an appointment.

The times are a-changing for the better here. Best of luck, Mike!

Best papers for a theory/networking class?

When I recently visiting Princeton, the esteemed Jennifer Rexford mentioned a great idea for a graduate class. The theme would be "Theory papers every networking person ought to read." Naturally, I have some biased thoughts on the matter. But I would like to collect suggestions from others here. This would be a class that I'd love to teach!

For my own purposes, I'm also interested in the other direction -- what are the networking papers every theory person ought to read?

Wednesday, June 06, 2007

Starting to Blog

I've threatened for some time to start blogging. I think, now, over the lazy days of summer, is the time to try. Please comment, send feedback, and give me time to make some mistakes.

My Biased Coin