We recently put on arxiv a new draft on "Theoretical Foundations of Equitability and the Maximal Information Coefficient". This is some follow-on work to a paper that appeared in Science a couple of years ago, where we introduced the idea of equitability. Essentially, in that Science paper (link to page where you can access the paper), we wanted a statistic that would give back, for samples from a noisy functional relationship, a score corresponding to the amount of noise (or, in that case, to the R^2 of the noisy data relative to the relevant noiseless function), regardless of the relationship type. The idea was that this would be useful in data exploration settings, where we might have a large number of possible relationship pairs and in particular a number of non-trivially correlated relationships, and we'd want to score them, in some fair way across the possible types of relationships (linear, parabolic, sinusoidal, etc.), so that we could choose the most promising to look at. We also wanted the statistic to do reasonable things for non-functional relationships. And, finally, we wanted a pony. (But we couldn't find a way to put that in the paper.) The maximal information coefficient (MIC), which we built on top of mutual information, was our proposed statistic.
The paper has gotten some interest. One thing
that we heard was that people wanted a richer theoretical framework for
these ideas. So now we're finally delivering one. It took a while,
because the students involved -- Yakir Reshef and David Reshef -- were off doing crazy, wacky young-people things like going to
medical school, making it hard to get cycles for the project. On the
other hand, the time did some good, allowing us to explore to determine
the formulation we wanted.
The result is, I hope, an interesting mix of ideas from statistics and
computer science. We're eager for feedback as we hope to formally
submit somewhere soon.
In a couple of weeks we should have another
paper out on the same topic that is more empirical. Naturally, when
working through the theory, we came up with better algorithms for
computing MIC, and it made sense to separate those results (and some
others) into another paper.