Sunday, April 15, 2018

New Papers/Code for MIC and MINE

Several years ago, I worked on a project where the goal was to try to come up with an "equitable" version of a measure of dependence;  the idea was you could take a large multi-dimensional data set, score the dependence for each pair of variables, rank the pairs by their score, and then look at the top-scoring paris to try to determine the most interesting relationship to follow up on in further work.  We were motivated by the need for data exploration tools for multi-dimensional data sets.

After a large number of years, we've updated the site http://www.exploredata.net/ , with some (finally) recently published papers, and new versions of the code that are faster, more accurate, and can do additional tasks (what we call TIC as well as MIC).  Our technical information subpage has links to papers, including the relatively recent papers in JMLR and the Annals of Applied Statistics.  Our MINE-Application page contains links to our new version of the code, as well as links to other versions (such as minepy, a library that has APIs in python and Matlab). 

The incentive for all this was, in part, one of the co-authors, Yakir Reshef, finishing up his PhD thesis.  Congratulations Yakir!


Wednesday, April 11, 2018

Sublinear Algorithms Workshop

I was asked to post to announce the workshop/bootcamp on Sublinear Algorithms, June 10-13 at MIT.  I plan to be there and possibly talk about some new work. 

From the web page (which you should go to to register, if you plan to attend!):

Synopsis

As big data is getting bigger, there is a need for analyzing data with sublinear constraints -- that is, for algorithms which require only sublinear time, space, measurements and/or samples. The goal of this workshop is to bring together experts in various areas of Computer Science, Electrical Engineering, Statistics and Mathematics to discuss recent work and exciting new challenges. It is hoped that the multidisciplinary nature of the workshop will highlight common goals and themes, as well as to facilitate an interchange of technical ideas that may be of use more widely than previously thought. The workshop will be preceded by a one day bootcamp on June 10, with the goal of presenting the basic techniques, definitions and goals in several of the communities.