Monday, April 13, 2009

New Preprint: An Analysis of Random-Walk Cuckoo Hashing

Some people have asked if being the chair of STOC destroyed my work life for several months. Not really. It certainly was work, and like other administrative duties, it took time away from research. But it didn't completely obliterate my research time. (Arguably, it had less effect than, say, having a new baby daughter...)

So here's an example of new research output: An Analysis of Random-Walk Cuckoo Hashing with Alan Frieze an Pall Melsted. For background, you can read one of my earlier posts on cuckoo hashing. The question we were aiming to answer is what I consider to be one of the big ones left for cuckoo hashing: why does the random-walk approach -- where when you have to kick out an element you choose one randomly and keep going -- work so well? The natural intuition is that in such a setting an insertion of a new element should take at worst logarithmic time with high probability -- each time an element is kicked out to make room for another element, it should have an empty space available with constant probability. But nobody has been able to make that intuition stick.

We obtained a polylogarithmic bound on the time for insertion with random walk hashing. The main idea is to use a 2-step approach: show that the set of items "close" to an empty space (within an augmenting path of length O(log log n) of an empty space) is reasonably large, and then show that a random walk reaches one of those items quickly (polylogarithmic number of steps). Once we get to a close item, there is an inverse polylogarithmic probability of taking the augmenting path to get to the empty space, so breaking it up in this way gives us the desired bound.

It takes a fair bit of analysis of the cuckoo graph to get this result (which has been open for a surprisingly long time). There's still plenty of room to work with in both directions -- a better upper bound, and is a super-logarithmic lower bound possible for some value for the number of choices each item has -- but it was gratifying to make progress on what I think is turning out to be a much more challenging problem than people originally might have expected.


Anonymous said...

Your link seems broken, but works.

Pall Melsted said...

A small theorem in the paper, not directly related to the random-walk, is that a better bound on the running time of the BFS insertion method.

We can now show (1/epsilon)^O(1) instead of (1/epsilon)^O(log d), and
in fact the O(1) is decreasing in d and converges to 1.

This partially answers some of the questions in the comments of your earlier post on cuckoo hashing.