My Biased Coin: October 2011

Thursday, October 27, 2011

Lisa Randall on the Daily Show

Last night's Daily Show (link to full episode) was on fire.

The first segment was focused on SCIENCE! The part with Aasaf Mandvi was simultaneously hysterical and very, very, very sad.

The last segment had guest Lisa Randall -- Harvard physicist and author of the new book Knocking on Heaven's Door: How Physics and Scientific Thinking Illuminate the Universe and the Modern Worldas well as the old book Warped Passages: Unraveling the Mysteries of the Universe's Hidden Dimensions -- talking about science also. While I understand Lisa has many tremendous accomplishments, appearing on the Daily Show is the one I am jealous of.

Wednesday, October 26, 2011

Students are Awesome(ly Productive Right Now)

What's the use of a blog if you can't brag about your students? And my students have all been doing great stuff, so I'm excited to let others know about their work. In random order...

Zhenming Liu's paper Information Dissemination via Random Walks in d-Dimensional Space will be appearing in SODA 2012. It looks at a very natural random walk diffusion question that hadn't been solved. Here's the arxiv version, and a slightly edited down abstract:

We study a natural information dissemination problem for multiple mobile agents in a bounded Euclidean space. Agents are placed uniformly at random in the $d$-dimensional space $\{-n, ..., n\}^d$ at time zero, and one of the agents holds a piece of information to be disseminated. All the agents then perform independent random walks over the space, and the information is transmitted from one agent to another if the two agents are sufficiently close. We wish to bound the total time before all agents receive the information (with high probability). Our work extends Pettarin et al.'s work, which solved the problem for $d \leq 2$. We present tight bounds up to polylogarithmic factors for the case $d \geq 3$.

Justin Thaler's paper on Practical Verified Computation with Streaming Interactive Proof was accepted to ITCS 2012. I think part of its "innovation" is how well Justin puts together the theory with the implementation, showing how practical this line of work could be. Here's the arxiv version, and again a slightly edited down abstract:

When delegating computation to a service provider, as in cloud computing, we seek some reassurance that the output is correct and complete. Yet recomputing the output as a check is inefficient and expensive, and it may not even be feasible to store all the data locally. We are therefore interested in proof systems which allow a service provider to prove the correctness of its output to a streaming (sublinear space) user, who cannot store the full input or perform the full computation herself. Our approach is two-fold. First, we describe a carefully chosen instantiation of one of the most efficient general-purpose constructions for arbitrary computations (streaming or otherwise), due to Goldwasser, Kalai, and Rothblum. This requires several new insights to make the methodology more practical. Our experimental results demonstrate that a practical general-purpose protocol for verifiable computation may be significantly closer to reality than previously realized. Second, we describe techniques that achieve genuine scalability for protocols fine-tuned for specific important problems in streaming and database processing.

Finally, Giorgos Zervas's work on Daily Deals: Prediction, Social Diffusion, and Reputational Ramifications, which I've already discussed on the blog since it got so much press play, was also just accepted to WSDM. Giorgos is no longer my student, but working with me and Joan Feigenbaum as a postdoc, so I'll count him in the mix.

Tuesday, October 25, 2011

An Exceptional Exponential Embedding

This week in class I get to teach one of my favorite probability arguments, which makes use of a very unusual embedding. Here's a short description (for the longer description, see the book or the original paper, where the idea is ascribed to Rubin).

The setting is balls and bins with feedback: I have two bins, and I'm repeatedly randomly throwing balls into the bins one at at time. When there are x balls in bin 1 and y balls in bin 2, the probability the ball I throw lands in bin 1 is x^p/(x^p+y^p), and the probability it lands in bin 2 is y^p/(x^p + y^p). Initially both bins start with one ball. The goal is to show that when p is greater than 1, at some point, one bin gets all the remaining balls thrown. That is, when there's positive feedback, so the more balls you have the more likely it is that you'll get the next one in a super-linear fashion, eventually the system becomes winner-take-all.

We use the following "exponential embedding". Consider the following process for bin 1. At time 0, we associated an an exponentially distributed random variable X_1 with mean 1 = 1/1^p with the bin. The "time" that bin 1 receives its next ball is X_1. Now it has two balls. We then associate an exponentially distributed random variable X_2 with mean 1/2^p with the bin. And so on.

Do the same thing with bin 2, using random variables Y_1, Y_2, ...

Now, at any point in time, due to the properties of the exponential distribution -- namely, it's memoryless, and the minimum of two exponentials with mean a_1 and a_2 will be the first with probability proportional to 1/a_1 and the second with probability 1/a_2 -- if the loads in the bins are x for bin 1 and y for bin 2, then the next ball will fall into bin 1 is x^p/(x^p+y^p), and the probability it lands in bin 2 is y^p/(x^p + y^p). That is, this exponential process is equivalent to the initial balls and bins process.

Now let X = sum X_i and Y = sum Y_i. The infinite sums converge with probability 1 and are unequal with probability 1. So suppose X < Y. Then at some finite "time" in our exponential embedding, bin 1 receives an infinite number of balls while bin 2 just has a finite number of balls, and similarly if Y < X. So eventually, one bin will be the "winner" and take all the remaining balls.

In my mind, that's a beautiful proof.

Saturday, October 22, 2011

ITCS Review

The list of accepted papers for ITCS (Innovations in Theoretical Computer Science) is up.

Some thoughts:

1) I have expressed reservations in the past about ITCS, based on the idea that it was creating another conference similar to FOCS and STOC, where instead we should be "fixing" FOCS and STOC, for example by expanding it. I suppose my reservations this year are muted. While the titles don't suggest to me that ITCS is necessarily a home for more "innovative" papers than FOCS/STOC, there seems to be no inclination to expand these conferences, so why not have yet another conference where 40 very good papers can go? (Indeed, why not make it a bit larger? I'm not sure how many submissions there were; hopefully someone can confirm, but I'd guess the acceptance rate was roughly 20-25%?)
2) Another issue was ITCS was in China it's first two years, making it seem a bit "exclusive". (Not to Chinese researchers, of course; and not to authors, who were given funds for the trip. But it is a far distance to go for others.) This year, it will be at MIT, which hopefully will attract people from up and down the East Coast (weather permitting), and help it build up a longer term audience.
3) 5 out of the 40 papers have Quantum in the title. Should this be telling us something?
4) Talk I'm most looking forward to: Compressed Matrix Multiplication by Rasmus Pagh. (I've already read and enjoyed the paper.) But I'm also looking forward to seeing Algorithms on Evolving Graphs, if only based on the title.

Thursday, October 20, 2011

An Apple a Day

Reviewing papers for a conference is a slow, time-consuming process. Suppose you had 20 reviews due and about 4 weeks to do them. What's your approach?

I take the tortoise approach. I first try to do a quick pass over all the papers to at least have some idea of the topics and themes I'll be dealing with in the papers. This lets me find papers that, at least on a fast first reading, seem unusually good or unusually bad, and lets me see if any papers are sufficiently related that they should be compared to each other implicitly or explicitly when I do my reviews. But then, I try to set aside time to do one review a day, more or less. I'll enter the review, press the button, and put it up for others to see. I won't go back and revise things until the first round is over unless another paper I'm reading or another review I see makes me rethink substantially. At the end, I'll go back and check that my scores seem consistent, given that I've seen my full set of papers. Slow forward progress, with an eventual finish line.

Doing a paper a day does mean I limit the time I put into each review. While there's some variance, I almost never let myself go down a rabbit hole with a paper. That's not always a good thing; sometimes, finding a bug in a proof or a similar serious flaw in a paper takes several hours of careful thought, and unless I pick up that's there a problem right away, I often miss it while going on to the next review. (This is just one good reason for why we have multiple reviewers!)

Perhaps another reason this is not always a good strategy: I'm told it's noticed that my reviews are actually done on time, and apparently this leads to people asking me to be on PCs.

Tuesday, October 18, 2011

John Byers on WBUR

Listening to my co-author, John Byers, streamed live on WBUR, discussing our work on Groupon. Ben Edelman is another participant in the show.

Here's the link.

Wednesday, October 12, 2011

Listen

A colleague outside theory (but inside computer science) recently brought up an interesting question with me that seemed like a possible research-level issue. We had some back and forth, trying to figure out what we each meant (naturally, our "vocabularies" are a bit different in how we describe the problem), what the actual question was, and if there was a direction to go. After a couple of rounds of this, he thanked me. Paraphrasing: "I appreciate your patience. My experience is other theorists are often immediately dismissive to these sorts of questions."

I was taken aback. First, I'm not sure patient is a word commonly used to describe me. Second, this colleague is a first-rate genius (with the track record to prove it). Who wouldn't listen to what they have to say? Quite frankly, I was happy they were interested in talking to me!

But it's been gnawing at me. If a high-powered colleague outside theory has this impression of theory and theorists, how do we appear to others? Was this an isolated opinion, or a common feeling?

I know 20 years ago, back when I was in graduate school, the theory/systems divide was quite large, at least at Berkeley. There seemed to be minimal communication among the faculty groups. Indeed, in part that was one reason that, at the time, the LogP paper seemed like such a big deal; Dick Karp had successfully crossed over and worked with the systems side to build a model for parallel computation! It was, sadly, notable, if only because such collaborative work had seemed so rare.

I've generally felt that while this theory/systems divide was still much larger than I might personally like that there had been a lot of progress since my grad student days. I feel I can point to a significant number of examples. But perhaps I'm holding the unusual opinion. Maybe there's still not enough listening going on, in at least one direction.

Monday, October 10, 2011

Reading Confidence Men

My current spare time reading* is Confidence Men, Ron Suskind's book on Wall Street and the Presidency. Without "taking sides" with regard to Larry Summers, I have to admit enjoying reading this paragraph:

"It all boils down to the classic Larry Summers problem: he can frame arguments with such force and conviction that people think he knows more than he does. Instead of looking at a record pockmarked with bad decisions, people see his extemporaneous brilliance and let themselves be dazzled. Summers's long career has come to look, more and more, like one long demonstration of the difference between wisdom and smarts."

In Summers's defense(?), there are lots of people who would fit this description...

* As a parent who tries to read what my kids are reading, my future spare time reading looks to be the similarly political but less timely Mockingjayand the new-to-me series Artemis Fowl. Recommendations for 8-10 year-old readings welcome.

Sunday, October 09, 2011

Submissions, A Comparison

I just got my set of NSDI papers to review, and have been looking them over.

One thing that immediately strikes me as I give them a first quick pass is how nice it is that the submissions are 14 double-column pages. The authors have space to present a meaningful introduction and a reasonably full description of and comparison with related work. They can include (detailed) pseudocode as well as description of their algorithms. They have space for a full page (or more) of graphs for their experimental results, and even more space to actually explain them. The papers actually make sense, in that they're written in sequential order without having to flip over to "appendices" to find results. The phrase "this will appear in the full paper" appears rarely -- not at all in most papers. The papers are, as a consequence, a pleasure to read. (Well, I can't vouch for the actual content yet, but you get what I mean.)

As a reviewer, it's also nice that if I tell the authors they've left out something that I think is important, I'll generally have confidence they'll have space to put it in if the paper is accepted, and that it's a reasonable complaint to make, in that they ostensibly had space to cover my issue. (There are some papers which fill the 14 pages and perhaps won't have something obvious that could be removed, but experience suggests they'll be rare.)

So I wonder again why theory conferences have 10-page single-column submission formats ("appendices allowed"!), or, even worse, for conferences like ICALP and ESA, they have final page counts of 10-12 pages in the over-1/2-blank-page LNCS format. (Really, it's just about enough space for an abstract, introduction, and pointer to your arxiv version.) Interestingly, for my accepted SODA papers this year -- which went with 10 page submissions, "full versions" attached on the back, but had 20 pages for accepted papers -- both sets of co-authors didn't want to bother when the final submission deadline came around to filling the 20 pages, figuring people could just be pointed to the arxiv (or eventual final journal) version. Why create yet another version of the paper according to arbitrary page limitations? I certainly couldn't suggest a good reason.

On the theory side, as I've maintained for years, we're doing something wrong with our submissions, with artificial page limits creating more mindless work for authors and making decisions more arbitrary than they need to be.

Thursday, October 06, 2011

Goodbye to Steve Jobs

My Mac laptop froze today. It was an unusual occurrence; I turned the machine off, and for a minute it wouldn't turn back on again. I was in a panic.

Then it went back to normal.

After the fact, I was wondering if my machine was having its own minute of silence. Then I had the morbid idea -- what if Steve Jobs had the power to arrange for all Mac products and iProducts to stop working when he died? It would be a disaster for so many of us -- not just the loss of individual data, but the loss of the platforms and devices he made reality, that so many of us use and love.

Steve Jobs may be gone, but his legacy lives on.

Saturday, October 01, 2011

New York Times

Our work on daily deals is mentioned and linked to in Sunday's New York Times (front page). (John Byers even got a quote in!) It was also mentioned in this week's print edition of Time magazine. (Behind a paywall, so here's a jpeg.)

Sadly, the name Mitzenmacher doesn't appear on these items. ("...researchers from Boston University and Harvard..." seems to be a common phrase), so I continue to toil happily in relative obscurity.

My brother points out that this is all just an example of why print media is disappearing. He says most anyone who might have seriously cared about our work would have heard about it two weeks ago on the Internet. The print media is just getting to it now? They're two weeks behind.

My Biased Coin