I'm putting online a paper with Zhenming Liu and Kai-Min Chung (both graduate students at Harvard) that extends one of the results from Declaring Independence via the Sketching of Sketches by Indyk and McGregor (SODA 2008).
They consider a model where a stream of pairs (i,j) in [n]^2 arrive, giving a joint distribution (X,Y), and the goal is to determine how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close X and Y are to being independent. All the normal goals in the streaming setting hold (small space, small computation per new pair, one or small number of passes, etc.). We extended one of their main results to higher dimensions (a problem they had left open) : the stream is now k-dimensional vectors in [n]^k, and we wish to approximate the L_2 distance between the joint distribution and the product of the marginal distributions in a single pass. We give a randomized algorithm that is a (1 plus-minus epsilon) approximation (with probability 1-\delta) that requires space logarithmic in n and the stream size m and proportional to 3^k. The paper should be of interest to those who know and like the original Indyk/McGregor paper, which I assume is a signficant part of the streaming community.
The Indyk/McGregor proof required a clever application of the Arithmetic Mean-Geometric Mean inequality. To move the proof up into higher dimensions, we end up having to use the AM-GM inequality twice, along with a clever partitioning of the objects (edges in [n]^k) to get the right constant factor 3^k out. It also seems that this is the "right" dependence on k, at least for this algorithmic approach. (A previous result by Braverman/Ostrovsky also tackled this problem, but had a 2^O(k^2) dependence on k.)
We submitted this to SODA, where it didn't get in, but seemed to be on the borderline. The reviews were on the whole quite reasonable. My take is that in this paper, we (by which, of course, one should read "Kai-Min and Zhenming") took a nice previous SODA result, presented a non-trivial extension, and really wrote it up very clearly and nicely -- I'd like to think we really clearly explained the essence of the argument and why it works. The reviewers, I think, recognized this, but on the other hand had to deal with the issue that it was just a single result. The original paper had more results, and, at the risk (or hope) of instigating commentary, I'll suggest that SODA as a whole is more interested in quantity over writing quality ; a hodgepodge of lemmas and theorems stands a better chance of getting in than a single well-written idea. (The original Indyk/McGregor paper, I should be clear, both has a number of results and excellent ideas, and is not what I'm referring to in this statement.) I understood going in that this would probably be borderline for SODA; I had hoped it would make it in, but this is certainly not a case where I would complain about the decision or the review content by the PC. (One might argue that perhaps SODA should take more papers, and in particular seek out more papers of this type, but that's a different, higher-level argument.)
There are other conferences we could submit to, but the conferences with upcoming deadlines didn't seem particularly suitable. (In particular, they weren't in the US, and would require either extensive travel for me, or visa issues for the students.) Since the result seemed to us essentially done as it was, we're submitting to a journal. But it's available now for anyone who is interested.