tag:blogger.com,1999:blog-8890204.post1510163584041626403..comments2024-03-10T05:26:42.148-04:00Comments on My Biased Coin: MIC and MINE, a short descriptionMichael Mitzenmacherhttp://www.blogger.com/profile/06738274256402616703noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8890204.post-40533333205787550112012-01-11T16:03:14.601-05:002012-01-11T16:03:14.601-05:00Multiple testing is really not that simple and con...Multiple testing is really not that simple and conceptually tricky. It is not a solved problem in statistics.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8890204.post-91662217830156958562012-01-03T05:19:30.420-05:002012-01-03T05:19:30.420-05:00Thanks for a good post and a chance to discuss you...Thanks for a good post and a chance to discuss your article. To me, MIC and MINE seem great exploration tools but I'm a bit puzzled about the magic parameter B, limiting the complexity of the overlaid grid. <br /><br />As is evident from Figure S1 in the supplement, MIC tends to 1 as B tends to size of your data. Isn't this due to overfitting the histogram to the data? Why don't you use something like MDL to control complexity of your irregular 2D histogram (e.g. by adapting <a href="http://cosco.hiit.fi/Articles/aistat07.pdf" rel="nofollow">"MDL Histogram Density Estimation"</a> by Kontkanen and Myllymaki. See Kontkanens thesis for analytical approximation.)<br /><br />In my view, because of the overfitting, MINE is much more important contribution than MIC. That being said, they are likely to become the tool of choice when Pearson fails.Anonymoushttps://www.blogger.com/profile/09741990215746807255noreply@blogger.comtag:blogger.com,1999:blog-8890204.post-8148368975208881122011-12-27T08:40:16.175-05:002011-12-27T08:40:16.175-05:00Anon: We discuss this in the paper. It's a s...Anon: We discuss this in the paper. It's a standard issue for these sorts of problems. There's a whole subfield of statistics devoted to this issue; see http://en.wikipedia.org/wiki/Multiple_testing for details.Michael Mitzenmacherhttps://www.blogger.com/profile/02161161032642563814noreply@blogger.comtag:blogger.com,1999:blog-8890204.post-27541113563818123782011-12-27T01:01:03.086-05:002011-12-27T01:01:03.086-05:00Thanks for this description. The main application...Thanks for this description. The main application of MIC as you describe it (and what seems to be the MINE approach described in the paper), is to take a high dimensional dataset and correlate all pairs of variables, and then select the top hits. <br /><br />Does this run into the multiple samples comparison problem? That is, as the number of pairwise comparisons N increases, I'd expect the frequency of high MIC scores (close to high correlation or high anticorrelation) to also increase. Curious if you had any thoughts on this. --YKAnonymousnoreply@blogger.com