A paper that's been sitting around in the pipeline far too long is finally out in its journal version.
Using the Power of Two Choices to Improve Bloom Filters, by Lumetta/Mitzenmacher.
This paper pretty much came about from a dare. I was visiting my friend Steve Lumetta while at Allerton, and as he was ribbing me about all my papers being on either Two Choices or Bloom filters, he suggested I should be able to combine the two in one paper. It took a few minutes for me to work out that he wasn't entirely teasing me, he actually had an idea.
The basic idea is the following: instead of having one group of k hash functions we use to insert an item into a Bloom filter, we have two groups of hash functions. We insert items sequentially into the Bloom filter, but we use our power of two choices as follows: when we insert an item, we use the group of k hash functions that sets the smaller number of bits to 1 (since fewer 1s means fewer false positives). When checking if an item in the Bloom filter, we have to see if all corresponding bits are set to 1 for either group of k hash functions, since we don't know which group we used initially for each item.
It turns out you can analyze this scheme, and somewhat surprisingly to me, there are settings where it leads to improved false positive rates. That is, the gain you get from putting in fewer ones actually more than makes up for the fact that now you have to check two groups of k hash locations. We also explore even better schemes (based on better choosing which group to use for each item) that we can't analyze.
I admit I took this as a funny mathematical curiosity more than a practical scheme. And I couldn't resist the title. (If you know my work, it really does sound like a joke.) But someone saw the paper and ran with the idea: the follow-on SIGMETRICS paper Building high accuracy bloom filters using partitioned hashing gives a potentially practical variation. (It's not clear to me when you'd use this variation over a perfect-hash-function-based Bloom filter; I suppose their variation is more amenable to adding a small number of additional items on the fly.)
My hope is perhaps it will inspire other non-obvious valuable uses of the power of two choices.