Tuesday, June 19, 2007

Cuckoo Hashing, Theory and Practice: Part 3

In two previous posts, I have discussed cuckoo hashing, a natural generalization of multiple-choice hashing that give better space utilization by moving elements around as needed. There were three issues to address in using such schemes in hardware: moving is expensive, the rare failure cases must be considered more carefully, and the analysis should be as tight as possible. I suggested that for hardware implementation, looking at much simpler schemes that allow just one move would be a step forward.

The main reason for this approach is that moves are expensive. It will therefore take some convincing to get practitioners to accept the value of moves. By looking at just one move, we make the cost seem low, and the law of diminishing returns suggests that the gains from that first move should still be high. If we can first talk the practitioners into a single move, and they find it helpful, then perhaps we can see if additional moves are worthwhile. In any case, we can avoid the problem of moves being expensive by limiting them severely.

If we're limiting ourselves to a single move, it stands to reason that sooner or later there's going be trouble, in the way of overflows. We'll try to place an item in the hash table, but all of its buckets will be filled, and there won't be any way to move any of those items out of the way, because all of the other buckets for these items will be filled. Since one move is less flexible, we need to be able to handle overflow problems.

In hardware, overflow can be handled by a small content addressable memory, or CAM. A CAM is a fully associative memory; you can find anything in it essentially instantaneously. Of course, they're expensive, so they have to be small. If overflows are sufficiently rare, a small CAM can hold all of them. In fact, one point of our paper is that CAM's are not generally utilized in theoretical analysis, but they should be; they're a natural addition to the model. In this way, we circumvent the problem of rare but annoying failures due to overflows.

Another advantage in studying single-move schemes is that many variations are nicely amenable to fluid limit analyses. That is, we can set up a collection of differential equations that accurately give what fraction of buckets in the table are loaded at every point in time, allowing us to give very accurate numbers (generally within a few percent) detailing their final performance. Arguably, one could do this with simulation as well, but the fluid limit approach is orders of magnitude faster than simulations. The fluid limit approach gives a compellingly accurate performance description that may persuade practitioners more than big-O arguments. Moreover, it allows us to do interesting optimizations, such as figure out how to size sub-tables within our table using black-box optimization techniques.

If all this seems a little vague, that's because this is a blog. For details, you can now see the paper Adam and I are submitting. The end result is that allowing a single move can easily save 1/3 of the space or more; things like optimizing the memory layout can give savings of roughly 10% in space. Our approach seems ready for implementation, and we're eager to get feedback from the real world.

10 comments:

Anonymous said...

In the paper, Section III, par 2, shouldn't the multiplications be over F_k(.) and not F_i ?

Michael Mitzenmacher said...

Thanks anonymous; you're right, we have an index error. Nice to get feedback before the final deadline!

MM

Qin said...

Your posts are very nice! I am very interested and appreciate this kind of research direction. Hope to learn more from you in the future.

If you have more research information of this kind (that is, how to turn Theory to Practice), do share with us:)

qin
HKUST

Wim van Dam said...

Is it just me, or is there indeed no option to comment on your latest posting "What we're doing wrong?"

Michael Mitzenmacher said...

Sorry about that Wim; fixed now, and making sure the defaults are really set for comments! (Another few months, I'll have the basics down.)

Mihai said...

Content addressable memory is not explicitly used in theory, but the idea is everywhere --- namely, do whatever you can for most elements, and then fall back to a high performance dictionary, or something like that. We don't need any augmentation to the model here, since there are (theoretical) dictionaries that give you O(n) space, O(1) worst-case query time and O(1) update whp.

As one who's thought a lot about cuckoo hashing, I'm surprised you don't mention anything about the type of hash functions needed. Right now, we have no nontrivial guarantee (lgN is kind of trivial). From a security perspective, using heuristic hash functions is not nice. Randomizing hash tables (which allows rebulding, say), instead of using heuristic hash functions is a great theoretical contribution, and we must teach it to "practical people".

Speaking of hash functions, I recommend the STOC paper by Pagh/Pagh/Ruzic, which gives an excellent independence result for linear probing.

As a general direction for research, I think getting a practical dictionary with theoretical guarantees, and worst-case lookup time is of huge importance. The paradigm of worst-case lookup has really not made it in practice (at least for dynamic dicitonaries), and we can't blame them. Cuckoo hashing is the only choice that seems practical, but unfortunately we don't have theoretical guarantees (about the hash functions needed).

Michael Mitzenmacher said...

Mihai,

Lots of good points in your comment.

1) While I agree we don't necessarily need to change the theoretical model, I think adopting something like a CAM in the description would be of great help in bridging theory/practice. It's a common language in hardware that we're not using, causing a translation problem.

In essence, maybe this boils down to the age-old difference of while theory provides O(x) type results, practitioners want to know and understand those constants, especially for these applications.

2) Regarding what hash functions to use -- I'm actually working on something related to that right now; hopefully I'll have something to report after the SODA deadline.

3) The Pagh/Pagh/Ruzic result is extremely nice, and I recommend the paper (as well as the many other nice results from the land of Pagh).

Mihai said...

Michael, regarding your (1), I believe it is Okay if theory and practice use different approaches that lead to the same behavior. If in theory, we say we have a high performance dictionary, and we have a footnote saying that in some practical settings you might want to replace that with a CAM, it's ok.

You see this behavior in a lot of places. For instance, we know radix sort works great, and we've developed the theory of integer sorting to have a theory behind it. The fact that all the sketching ideas used there don't beat radix sort yet just means the data isn't big enough for smaller order terms to really become smaller. But there's no disagreement between practice and theory: we know that mathematically there is an advantage to these schemes, and we even have practical things that can use the advantage.

Unknown said...

I want to use cuckoo hashing in packet classification program. I used TCAM before but it seems that TCAM is much expensive, ,much power-consuming. I use snort rule set, but the problem is that cuckoo hashing just return only one rule that match the imcoming packet....

Unknown said...

can you suggest a hash function. I'm using res = res ^ (res << 5 + res >> 2 + Ci) - ramakrishna SAX hash function. but it seem that this hash function is just proper for string processing. can u suggest a better hash function for bit string (exact from incoming packet). I want to use cuckoo hashing to classify the source port and destination port fields and TCAM with source address, destination address, protocol fields.