Thursday, July 12, 2012


I'm hanging out at MMDS -- the Workshop on Algorithms for Modern Massive Datasets at Stanford.  The crowd is surprisingly huge, with a greater number of people in "adjacent" areas (math/statistics/machine learning) and industry than is normal for me.   It's very exciting to see such wide-scale interest.

Right now, though, the non-theorists are having to listen to a very theoretical session:
11:00 - 11:30 Ping Li
Probabilistic Hashing for Efficient Search and Learning on Massive Data
11:30 - 12:00 Ashish Goel
Real Time Social Search and Related Problems
12:00 - 12:30 Andrew Goldberg
Hub Labels in Databases: Shortest Paths for the Masses

Fun stuff!

I just enjoyed an interesting aspect of Ashish's talk.  He was putting the social search problem in a framework where you first do preprocessing on the social graph (using distance oracles for shortest paths), and then do incremental updates (corresponding to when someone say does a new Tweet, you update the keywords associated with that user).  I like it because I talk about the preprocessing + query answering approach (using examples like suffix trees, least common ancestor data structures) in my undergrad class.  This preprocessing + incremental update + query answering example in the context of social search would make a nice addition (that students can hopefully appreciate), if I could simplify it in a reasonable way.

No comments: