Tuesday, December 27, 2011

Some of The Fun of Collaborating...The Other People

One of the things I enjoy most about working in algorithms is the breadth of work I get to do.  And while I've worked on a variety of things, almost all have been squarely in Computer Science.  (If one counts information theory as EE then perhaps not, but CS/EE at the level I work on them are nearly equivalent from my point of view.)  But at Harvard there are plenty of opportunities to work with people outside of computer science, and my work on MIC and MINE was my biggest cross-cultural collaboration at Harvard.  In particular, the other co-primary-advisor on the work was Pardis Sabeti, who works in systems biology at Harvard -- here's the link to her lab page.

Pardis is no stranger to mathematical thinking -- some of her work that she's most well known for is in designing statistical tests to detect mutations at the very short time scale of humans over the last 10,000 years, with the goal being to identify parts of the genome that have undergone natural selection in response to diseases.  So while my take was a bit more CS/math focused (what can we prove? how fast/complex is our algorithm?) and hers what a bit more biology/statistics focused (what is this telling us about the data?  what data sets can we apply this too?) the lines were pretty blurry.  But it did mean  I got to pick up a little biology and public health on the way.  Who knew that natural selection could be so readily found in humans?  Or that the graph of weight vs. income by country has some really unusual characteristics (weight goes up with national income up to some point, but then goes down again;  the outliers are the US, and Pacific island nations)?  And that you can find genes that are strongly tied to susceptibility to certain viral diseases?

I must admit, it is a bit intimidating when talking to a colleague about your other projects, and I'm explaining cuckoo hashing and Groupon, while she's discussing flying off for the Nth time this year to Africa for her project on how genes are changing in response to Ebola and Lassa viruses.  But it's just encouraged me to keep looking for these sorts of broadening opportunities.  Maybe someday I'll find a project I too can work on that will help us understand viruses and diseases.  (Hint, Pardis, hint!)     

The primary student collaborators were David and Yakir Reshef.  David was a student at MIT when he took my graduate class Algorithms at the End of the Wire, which covers information theory as one of the units.  He had already been working with Pardis on data mining problems and ended up working on an early version of MINE as his project for the class.  I told him he should continue the project and I'd be happy to help out.  It's always nice when class projects turn into papers -- something I try to encourage.  This one turned out better than most. 

David continued working with his brother Yakir (along with me and Pardis).  David and Yakir are both cross-cultural people;  Yakir is currently a Fulbright scholar in the Department of Applied Math and Computer Science at Weizmann, after getting his BA in math at Harvard -- but is currently applying to MD/PhD programs.  David is getting his MD/PhD here at Harvard-MIT Health Sciences and Technology Program, after having spent the last few years at Oxford on a Marshall doing graduate work in statistics.  So between them they definitely provided plenty of glue between me and Pardis.  Both of them pushed me to learn a bunch of statistics along the way.  I'm not sure I was the best student, but I read a bunch of statistics papers for this work, to know what work was out there on these sorts of problems.  

Others also came into the project, the most notable for me being Hilary Finucane -- who I wrote multiple papers with when she was a Harvard undergrad, and who was simultaneously busy obtaining her MSc in theoretical computer science at Weizmann.  And who is now engaged to Yakir (they were a couple together back at Harvard) .  With two brothers and a fiancee in the mix, the paper was a friendly, family-oriented affair.  I also got to meet and work with Eric Lander, who was one of the leaders in sequencing the human genome, and quickly found out how amazing he is.  

Like many of my collaborations these days, the work would have been impossible without Skype -- at various points on the project, we had to set up meetings between Boston, Oxford, and Israel, which generally meant at least one group was awake at an unusual hour.  But that sort of group commitment helped keep the energy up for the project over the long haul.

The work on MIC and MINE was a multi-year project, and was definitely more effort than most papers I am involved with.  On the other hand, because it was a great team to work with, I enjoyed the process.   I got to work with a bunch of people with different backgrounds and experiences, all of whom are immensely talented;  they pushed me to learn new things so I could keep up.  I'm glad the work ended up being published in a prestigious journal -- especially because the students deserve the payoff.  But even if it hadn't, it would have been a successful collaboration for me.

No comments: