Friday, December 16, 2011

This Week, I Am A Scientist

This week, I am a scientist;  I know this, because I have an article in Science

It's nice to have something to point to that my parents can understand.  I don't mean they'll understand the paper, but they'll understand that getting a paper in Science is important, more so than my papers that appear other places.  And because they're probably reading this, I will also point them to the Globe article appearing today about the work, even though the article rightly focuses on the cool and unusual fact that the lead authors are two brothers, David and Yakir Reshef.

I had been planning to write one or more posts about the paper -- both the technical stuff and the great fun I've been having working with David, Yakir, my long-time colleague Hilary Finucane, the truly amazing systems biology professor Pardis Sabeti, and others on this project.  (See Pardis's Wikipedia page or this site to see how Pardis rocks!)  But between Science's "news embargo" policies and my own end-of-semester busy-ness, I blew it.  I'll try to have them out next week.  But for now, for those of you who might be interested, here's the link to the project web page, and here's the abstract:

Detecting Novel Associations in Large Data Sets

Abstract: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

8 comments:

Jeffe said...

getting a paper in Science is important, more so than my papers that appear other places

[citation needed]

Anonymous said...

Hi Mike,

Congratulations -- not just on the placement of this paper in Science, but on the method and analysis in the paper (& suppl mats, of course) itself.

I've been very excited about MIC/MINE since I saw David Reshef's talk at the Broad Institute, and am pouring over the materials now. Can't wait to read your insights as well.

Anonymous said...

I took the "more so" clause to apply to "they'll understand that getting a paper in Science is important" not "getting a paper in Science is important".

Kevin O'Neill said...

Serendipity.

I've just in the past week posed the question: How do EOFs or PCA capture non-linear relationships in climate data?

I've read and appreciate the paper. I'm thrilled the software is available for download. I haven't had a chance to play with it yet, but I'm eagerly looking forward to it.

Congratulations and thanks to you and all the others involved.

Anonymous said...

Hi, is MINE available in an API form and not just as a command line tool, so that it can be plugged in to a user interface? Thanks!

Anonymous said...

Is there a freely available paper which covers this algorithm? I cannot access the Science one.

Yaroslav Bulatov said...

Is there a free version of this paper?

Michael Mitzenmacher said...

Anons 6/7: Please check the project web page; Science has given us a link there so people can freely download the article.

Jeffe/Anon 3: you sort of both got the twist. I meant for one to read it as the more so was was applying to "they'll understand that getting a paper in Science is important", but I did purposely leave it ambiguous, playing on the fact that Science is not understood as "more important" by default to a CS audience...