Comments on My Biased Coin: Cuckoo Hashing, Theory and Practice : Part 1

With IP Address on routers the need is to find the...

2013-07-03T19:30:25.085-04:00

With IP Address on routers the need is to find the longest prefix match, not an exact one. Have you considered how to use hashing to solve this?
hashing a full (say IPv4) 32 bit IP address may give us a miss but that doesn't mean we can not route the pkt. We need to look for a shorter than 32 bit prefix with the first k bits of the address.
I can imagine this could be done by hashing, however it would need a hash for each possible prefix length... and then selecting the longest one.

Sound great, but I suspect this method goes hard o...

2010-06-04T09:49:58.011-04:00

Sound great, but I suspect this method goes hard on the cash lines.
Regards, Frank Hirsch

It sounds like if this is done in hardware (I'm as...

2009-03-11T23:48:00.000-04:00

It sounds like if this is done in hardware (I'm assuming a use case is for routing tables), it might be possible to build a sequence of inserts that would give some worse case data sets that would cause a lot of moves. I suppose randomizing the hash function could solve this, though.

Also, what happens when the hash table is full? It seems like when utilization exceeds a certain percentage, the time per insert would degenerate badly.

Aravind,It seems like a great conjecture. The Fot...

2007-06-15T14:35:00.000-04:00

Aravind,

It seems like a great conjecture. The Fotakis/Pagh/Sanders/Spirakis paper seems to have the best known results, and (if I'm reading correctly) they have that they can find an augmenting shortest path by doing a Breadth First Search that hits (1/epsilon)^O(log d) vertices for d choices. Their simulation results, though, seem to suggest that with enough choices, you could get close to full utilization with something like O(1/epsilon) probes.

Mike, yes, I understand that dependency issues are...

2007-06-15T13:36:00.000-04:00

Mike, yes, I understand that dependency issues are likely to crop up. I was suggesting the heuristic idea that since epsilon * n slots are always empty and since independent probes will find an empty slot in 1/epsilon expected time, whether we can believe (heuristically) that a scheme such as Cuckoo will do well even with (1 - epsilon) * n occupancy. And yes, I understand the slippery slope of such "heuristics". My question was, do we think that this (1 - epsilon)*n conjecture is true but we don't yet know how to prove it, or is it false to start with?

thanks, aravind

Aravind,I'm not sure exactly what your model is fo...

2007-06-15T10:16:00.000-04:00

Aravind,

I'm not sure exactly what your model is for "finding an empty slot" and "random process" in your comment; could you elaborate?

I haven't got a good handle on how well intuition maps to reality for cuckoo schemes, as there's a lot of unpleasant dependencies that generally need to be worked around. For example, once you "find" a long sequence of moves to insert an item, it tells you something significant -- if you ever hash into one of the locations on the chain, you'll likely have to do a lot of moves. I think that sort of intuition may be useful for thinking of the possible limitations of these schemes, but I'm not sure it's the right direction for analysis.

hi mike, i am sure you will cover the following "i...

2007-06-15T09:51:00.000-04:00

hi mike, i am sure you will cover the following "intuition" at some point, but couldn't resist this comment .. is it reasonable to conjecture that if there are at most (1 - epsilon)*n elements to hash, then you expect to spend only O(1/epsilon) time to hash an element -- basically because only (1 - epsilon)*n locations are occupied, and the random process will find an empty slot in expected
O(1/epsilon) time? is such a result known to be true?

aravind

Josiah --The relocation cycle is a problem I gloss...

2007-06-14T14:39:00.000-04:00

Josiah --

The relocation cycle is a problem I glossed over (it will show up in part 2), but you're right, it is a problem!

The within k steps approach is also useful for many types of hash tables, though maybe not for hardware. (Again, part 2 is coming...)

In practical work, I usually choose open-address d...

2007-06-14T14:32:00.000-04:00

In practical work, I usually choose open-address double hashing, if only because I already have an implementation ;).

In any case, cuckoo hashing is very interesting. My personal concern is that unless one chooses hashes carefully, one could get into a situation where there is a relocation cycle, though this is perhaps only of concern with two tables, and perhaps very unlikely in practice.

A simple variant of this, which considers entries within k steps of a single computed hash per key without relocation, seems to perform reasonably well in tests (I get a tested maximum load of 3 with k=2, and 2 with k=6, both at 90% full).

Hi Professor Mitzenmacher,This blog really helps a...

2007-06-14T14:03:00.000-04:00

Hi Professor Mitzenmacher,

This blog really helps a lot to find the right things to read!

Zhenming