We recently put on arxiv a new draft on "Theoretical Foundations of Equitability and the Maximal Information Coefficient". This is
some follow-on work to a paper that appeared in Science a couple of
years ago, where we introduced the idea of equitability. Essentially,
in that Science paper (link to page where you can access the paper), we wanted a statistic that would give back, for
samples from a noisy functional relationship, a score corresponding to
the amount of noise (or, in that case, to the R^2 of the noisy data
relative to the relevant noiseless function), regardless of the
relationship type. The idea was that this would be useful in data
exploration settings, where we might have a large number of possible
relationship pairs and in particular a number of non-trivially
correlated relationships, and we'd want to score them, in some fair way
across the possible types of relationships (linear, parabolic,
sinusoidal, etc.), so that we could choose the most promising to look
at. We also wanted the statistic to do reasonable things for
non-functional relationships. And, finally, we wanted a pony. (But we
couldn't find a way to put that in the paper.) The maximal information
coefficient (MIC), which we built on top of mutual information, was our
proposed statistic.
The paper has gotten some interest. One thing
that we heard was that people wanted a richer theoretical framework for
these ideas. So now we're finally delivering one. It took a while,
because the students involved -- Yakir Reshef and David Reshef -- were off doing crazy, wacky young-people things like going to
medical school, making it hard to get cycles for the project. On the
other hand, the time did some good, allowing us to explore to determine
the formulation we wanted.
The result is, I hope, an interesting mix of ideas from statistics and
computer science. We're eager for feedback as we hope to formally
submit somewhere soon.
In a couple of weeks we should have another
paper out on the same topic that is more empirical. Naturally, when
working through the theory, we came up with better algorithms for
computing MIC, and it made sense to separate those results (and some
others) into another paper.
Tuesday, August 26, 2014
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment