Tuesday, August 26, 2014

New Article on arxiv on Equitability and MIC

We recently put on arxiv a new draft on "Theoretical Foundations of Equitability and the Maximal Information Coefficient".  This is some follow-on work to a paper that appeared in Science a couple of years ago, where we introduced the idea of equitability.  Essentially, in that Science paper (link to page where you can access the paper), we wanted a statistic that would give back, for samples from a noisy functional relationship, a score corresponding to the amount of noise (or, in that case, to the R^2 of the noisy data relative to the relevant noiseless function), regardless of the relationship type.  The idea was that this would be useful in data exploration settings, where we might have a large number of possible relationship pairs and in particular a number of non-trivially correlated relationships, and we'd want to score them, in some fair way across the possible types of relationships (linear, parabolic, sinusoidal, etc.), so that we could choose the most promising to look at.  We also wanted the statistic to do reasonable things for non-functional relationships.  And, finally, we wanted a pony.  (But we couldn't find a way to put that in the paper.)  The maximal information coefficient (MIC), which we built on top of mutual information, was our proposed statistic.

The paper has gotten some interest.  One thing that we heard was that people wanted a richer theoretical framework for these ideas.  So now we're finally delivering one.  It took a while, because the students involved -- Yakir Reshef and David Reshef -- were off doing crazy, wacky young-people things like going to medical school, making it hard to get cycles for the project.  On the other hand, the time did some good, allowing us to explore to determine the formulation we wanted. The result is, I hope, an interesting mix of ideas from statistics and computer science.  We're eager for feedback as we hope to formally submit somewhere soon. 

In a couple of weeks we should have another paper out on the same topic that is more empirical.  Naturally, when working through the theory, we came up with better algorithms for computing MIC, and it made sense to separate those results (and some others) into another paper.

No comments: