Wednesday, September 26, 2007

Allerton Conference, Part II

At Allerton, I'll be talking about some follow-on work to some previous work my graduate student Adam and I did on cuckoo-like (or better said cuckoo-light) hashing schemes. In that work, we focused on schemes that allowed only one additional move per insert, and used a CAM to store the rare element that couldn't be placed. Besides simplicity, an advantage of such an approach is that we can analyze many basic one-move schemes.

In our new paper, we look at a different approach; here we use the CAM as a queue, so the CAM holds elements that are being moved but have not reached a resting place. Using the CAM as a queue is somewhat more complicated in terms of an implementation, but it allows one to essentially implement a full cuckoo hashing scheme. The queue is used to de-amortize the amortized performance of cuckoo hashing. That is, on average inserting an item in the queue take a constant number of moves, but in the worst case it can take Omega(log n) moves (where n is the number of items being stored). If we allow the queue a constant number of moves per insert, then the queue can simply buffer the insertions that take too long. The result is even better space utilization (naturally, since more moves are used per insert). The paper and talk slides are available.

Sadly, this ended up being essentially an experimental paper. The problem is that there are too many open questions in the setting of cuckoo hashing. For example, in the practical implementations I know of, for cuckoo hashing with d > 2 choices of locations for an element, when an element has to be kicked out, one chooses the element randomly. The analyses I know of are based on doing a breadth-first-search to find an element to kick out, which isn't really practical (certainly not in hardware). As a result, I believe it's still open to prove things like constant average moves/worst-case logarithmic moves for standard cuckoo hashing when d > 2. To really do an analysis of the queueing process, we'd need even more detailed information about the distribution of the number of moves required per inserted element. Lots of good open questions left here...

2 comments:

Rasmus Pagh said...

Speaking of hardware dictionaries: The idea of cuckoo hashing actually can be found in a patent from 1999 that describes a HW implementation of cuckoo hashing with d=4. No queue used, though.

If I recall correctly, an early use of a "queue" for real-time dictionary updates was in the paper of Dietzfelbinger and Meyer auf der Heide on "Dynamic hashing in real time".

Indeed there are still many open questions around cuckoo-like schemes, and I think you mention some of the most important ones.

Michael Mitzenmacher said...

Thanks for the pointers, Rasmus. I hadn't seen that patent before!