tag:blogger.com,1999:blog-8890204.post2052958000611653164..comments2024-10-03T18:59:49.239-04:00Comments on My Biased Coin: Why Simple Hash Functions Work WellMichael Mitzenmacherhttp://www.blogger.com/profile/06738274256402616703noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-8890204.post-1811360096044866412011-10-16T12:23:09.302-04:002011-10-16T12:23:09.302-04:00Hi Prof. Mitzenmacher,
Where could I find a versi...Hi Prof. Mitzenmacher,<br /><br />Where could I find a version with the appendix ... there are some theorems in the paper that I'm having trouble figuring out myself, and I can't find the appendix in any of the versions online ...<br /><br />Thanks!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8890204.post-53988290324062057532007-07-24T12:12:00.000-04:002007-07-24T12:12:00.000-04:00Anonymous 2: I'm sure 5-wise independence helps a...Anonymous 2: I'm sure 5-wise independence helps a bit, but I don't have a clean way to add it in to the analysis; those moment inequalities always work out much easier with even moments.Michael Mitzenmacherhttps://www.blogger.com/profile/02161161032642563814noreply@blogger.comtag:blogger.com,1999:blog-8890204.post-80886652830542280892007-07-23T20:28:00.000-04:002007-07-23T20:28:00.000-04:00I don't know about you, but I've found that even u...I don't know about you, but I've found that even universal hash functions (weak ones) perform measurably worse on "real" data (in a constant factor sense) than "good" ones unless your hash table size is prime or close to it.<BR/><BR/>(Using a prime hash table size covers up many sins, but a) it requires computing primes, b) it requires a division operation rather than a bit mask operation to determine the bucket number, and c) it fails for any situation where you don't get to choose the table size, such as in external hashing.)<BR/><BR/>I think the important point here is that constant factors sometimes really do matter. Wasting a small amount of CPU time per operation is one thing if that's not the bottleneck, but adding an extra disk seek per operation is a big deal.Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.comtag:blogger.com,1999:blog-8890204.post-30661969184208670332007-07-23T17:10:00.000-04:002007-07-23T17:10:00.000-04:00From the paper:In the future, we hope that there w...From the paper:<BR/><BR/><I>In the future, we hope that there will be a collaboration between theory and systems researchers aimed at fully understanding the behavior of hashing in practice.</I><BR/><BR/>This truly sounds like you :)<BR/><BR/>By the way, Thorup's data structure can evaluate 5-wise independent hash functions as quickly at 4-wise (a point not made in the paper). Does that help?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8890204.post-87459752748186626722007-07-23T16:35:00.000-04:002007-07-23T16:35:00.000-04:00Anonymous #1:Yes, the analysis will work for cucko...Anonymous #1:<BR/><BR/>Yes, the analysis will work for cuckoo hashing as well. (This is mentioned in the paper.)<BR/><BR/>We mention 2-wise and 4-wise because they are well-known in theory and have efficient implementations. (Mikkel Thorup, in particular, has written some very nice papers describing how to do 2-wise and 4-wise independence efficiently.) The analysis in the paper, however, is general, and just depends on the "collision probability" of the hash function family.Michael Mitzenmacherhttps://www.blogger.com/profile/02161161032642563814noreply@blogger.comtag:blogger.com,1999:blog-8890204.post-76685707833125507792007-07-23T15:37:00.000-04:002007-07-23T15:37:00.000-04:00Hello.I have a few questions on your paper : 1-You...Hello.<BR/>I have a few questions on your paper : <BR/>1-You didn't mention cuckoo hashing. Is the analysis for multiple choice hash functions also include cuckoo hashing.<BR/>2-In your paper you mention 2-wise and 4-wise hashing. Is the k-wise independence the only measure of strength of hash functions : Perhaps the space that is used to describe a hash function could also be a good measure of strength of hash functions : for example zobrist hash is 3-wise independent but uses a lot of space 256*l for a keys of l bytes.Anonymousnoreply@blogger.com