In two previous posts, I have discussed cuckoo hashing, a natural generalization of multiple-choice hashing that give better space utilization by moving elements around as needed. There were three issues to address in using such schemes in hardware: moving is expensive, the rare failure cases must be considered more carefully, and the analysis should be as tight as possible. I suggested that for hardware implementation, looking at much simpler schemes that allow just one move would be a step forward.
The main reason for this approach is that moves are expensive. It will therefore take some convincing to get practitioners to accept the value of moves. By looking at just one move, we make the cost seem low, and the law of diminishing returns suggests that the gains from that first move should still be high. If we can first talk the practitioners into a single move, and they find it helpful, then perhaps we can see if additional moves are worthwhile. In any case, we can avoid the problem of moves being expensive by limiting them severely.
If we're limiting ourselves to a single move, it stands to reason that sooner or later there's going be trouble, in the way of overflows. We'll try to place an item in the hash table, but all of its buckets will be filled, and there won't be any way to move any of those items out of the way, because all of the other buckets for these items will be filled. Since one move is less flexible, we need to be able to handle overflow problems.
In hardware, overflow can be handled by a small content addressable memory, or CAM. A CAM is a fully associative memory; you can find anything in it essentially instantaneously. Of course, they're expensive, so they have to be small. If overflows are sufficiently rare, a small CAM can hold all of them. In fact, one point of our paper is that CAM's are not generally utilized in theoretical analysis, but they should be; they're a natural addition to the model. In this way, we circumvent the problem of rare but annoying failures due to overflows.
Another advantage in studying single-move schemes is that many variations are nicely amenable to fluid limit analyses. That is, we can set up a collection of differential equations that accurately give what fraction of buckets in the table are loaded at every point in time, allowing us to give very accurate numbers (generally within a few percent) detailing their final performance. Arguably, one could do this with simulation as well, but the fluid limit approach is orders of magnitude faster than simulations. The fluid limit approach gives a compellingly accurate performance description that may persuade practitioners more than big-O arguments. Moreover, it allows us to do interesting optimizations, such as figure out how to size sub-tables within our table using black-box optimization techniques.
If all this seems a little vague, that's because this is a blog. For details, you can now see the paper Adam and I are submitting. The end result is that allowing a single move can easily save 1/3 of the space or more; things like optimizing the memory layout can give savings of roughly 10% in space. Our approach seems ready for implementation, and we're eager to get feedback from the real world.