My Biased Coin: June 2011

Wednesday, June 29, 2011

UC Budget Woes

The new California budget call for $650 million in cuts to the UC systems -- I understand $500 million had already been planned, they added $150 million on top of that, and may add more later if the revenue numbers don't pan out.

I'll admit that as a long-time CA resident (that's where I grew up), and a UC Berkeley alum (for grad school), it pains me to hear this. I'm very sympathetic to statements such as this one quoted in this San Francisco Chronicle piece:

"Some students don't even call the increases "tuition" anymore, but tax increases. They say state lawmakers are deceptive in claiming to have passed the budget without raising taxes."

I grew up in a California where we really believed we were a shining vanguard for the rest of the nation -- and the UC system was a very big part of that. By the time I left California I, at least, didn't have that feeling about the state any more. And these cuts, of course, are painful to hear.

I don't mean to make it sound like the UCs are dead -- they have too many brilliant faculty to count, and will continue to attract many top students. But it seems to me they are being weakened, perhaps (and I hope not) even crippled longer term. I hope that this is just a historical blip, and California's greatness -- including vibrant, powerful, and more healthy UC and CSU systems -- will be on full display the rest of the century.

Addendum:

Michael Goodrich (who I've had the pleasure of working with lately) has suggested that all this points to (high) double-digit percentage increases in UC tuition coming up. He pointed me to several news bits from the UC news site on the cuts generally and tuition specifically, including this, this, and this, and pointed me to the UC Regents site, where you can find things like the minutes to their meetings.

Friday, June 24, 2011

Probability Assignments Using Set

I'm gearing up for teaching my graduate course on randomized algorithms and probabilistic analysis next semester. It's been a while since I've taught it, and I'm somewhat uncertain what to do with the course, precisely since I wrote the textbook. Me lecturing from the textbook is boring, both for me and for them, but of course the textbook contains exactly what I think is important.

Somehow I'll have to have them read the textbook offline and try to do more online learning in class. I'm working through creating some programming exercises based on the game Set. (Amazon picture below.)

Set is a great card came (wikipedia entry, or the Set web site). As it says in Wikipedia, "The deck consists of 81 cards varying in four features: number (one, two, or three); symbol (diamond, squiggle, oval); shading (solid, striped, or open); and color (red, green, or purple)." You turn over 12 cards, and look for a set, which is three cards so that for each feature, EITHER all cards are the same in that feature, or they are different. So below is an example of a set. (different in each feature). If they cards were all green, it would still be a set -- they can be all the same in some features, and all the same in others, and still be a set.

The first player to find a set picks it up, and replacement cards are dealt in to get you back to 12 cards. The player with the most cards at the end of the game wins. My eldest had already seen the game in school, and after a few months, she and I now are pretty competitively matched. I'd put it up there with Mastermind as a good mental exercise game for kids.

So the reason I thought about Set for my class is that the game instructions say that the odds of not getting a set with 12 cards dealt out is 33 to 1. (When this happens, or you can't find one, you can deal another card out; with 15 cards, the claim is more than 2500:1.) But when you play the game, it seems you don't find a set much more often. While my daughters and I are probably missing some sets some of the time, it's also clear that conditional probability is coming into play here. At any point in the game, there aren't 12 random cards on the table. Most clearly, suppose I deal 12 cards, find a set, and replace the 3 cards. What's left isn't 12 random cards; it's 9 cards left after a set was removed, and 3 cards remaining from the deck.**

This seems like a nice way to introduce conditional probability in a concrete but perhaps subtle way. And it seems like there are plenty of other related questions one can ask as well. (What's the probability of not finding a set on the kth turn, given a set has been found in the first k-1 turns...) Feel free to suggest exercises. (Apparently the largest number of cards without a set is 20. I wonder if there's a short proof of that without just doing exhaustive search.)

At the very least, it will be a good excuse to get a couple of boxes of Set for the office.

Have to go. My daughter just asked to play a game of Set...

** Of course I'm not the first to have noticed this. Peter Norvig posted on it as well.

Sunday, June 19, 2011

New Books Worth Looking At

New books are coming out all the time, but here are two big ones that stick out in my mind (perhaps because I've seen the authors recently). [Feel free to mention others worth noting in the comments.]

Recently available (I saw copies at FCRC) is The Design of Approximation Algorithms (Amazon link) by Williamson and Shmoys. Here's a link to the book page. (All books have web pages now, don't you know.) An up-to-the-moment books on approximation algorithms by two well-known experts.

Coming soon is The Nature of Computation (Amazon link) by Cristopher Moore and Stephen Mertens. Here's a link to the book page. About 1000 pages of introduction to computer science (with some statistical physics, and maybe some other physics, mixed in), with lots of problems. I've seen a preview of the text and it looks like an interesting read. (I'm curious if people will use it for courses -- if you have or are considering it, let me know.)

Friday, June 17, 2011

NSF Changing Broader Impacts

The NSF is changing its description of its merit criteria -- specifically, what the Broader Impacts criteria will be. The details are still being worked out, and comments are being collected until July 14. More information can be found on this NSF page.

The current draft text states:

Collectively, NSF projects should help to advance a broad set of important national goals, including:

Increased economic competitiveness of the United States.
Development of a globally competitive STEM workforce.
Increased participation of women, persons with disabilities, and underrepresented minorities in STEM.
Increased partnerships between academia and industry.
Improved pre-K–12 STEM education and teacher development.
Improved undergraduate STEM education.
Increased public scientific literacy and public engagement with science and technology.
Increased national security.
Enhanced infrastructure for research and education, including facilities, instrumentation, networks and partnerships.

This list is much more concrete than how I normally think of the current NSF Broader Impact statement. However, it still seems rather vague and open-ended. Most proposals can easily claim to help enhance increased economic competitiveness or development of a globally competitive STEM workforce. Perhaps that is for the best: proposals will have to make a compelling case that their work has broader impact, but there can be many ways to accomplish that.

It does seem, though, that this list isn't particularly theory-friendly. Cryptographers can point to national security; my algorithmic work can certainly point to academia-industry partnerships and economic competitiveness. But more complexity-related proposals, or algorithmic proposals with less clear immediate practical applications -- where do they fit in? Should it be a national goal to support more theoretical research with long-term and unclear payoffs? (I think so, particularly as that sort of research is generally relatively very cheap and has potential for huge benefits.) How would you place such research in the above Broader Impact context, or should a new bullet be added? What else would you add?

Thursday, June 16, 2011

Dr. Georgios Zervas

Congratulations to Georgios (alternatively, Giorgos), who defended his thesis today.* (There's still some paperwork to get in, but we drank the champagne afterward, so I'm calling it official.) He gave a great presentation covering some of our old work (on analyzing Swoopo, and efficient keyword value computation) plus some of our work-to-be on studying Groupon. (We've put an appetizer up on arxiv; now that this defense is done, we can finish preparing the main course this summer.) I hope to have the chance to swipe the slides and his jokes and give the talk myself sometime; however, if you'd like an interesting talk on data-driven e-commerce analysis, you should of course just invite him to present it instead.

Three cheers for Dr. Georgios Zervas.

*Giorgos is co-advised by me and John Byers.

Monday, June 13, 2011

NY Times on Computing

A couple of days ago there was a NY Times article on computing being cool (again), including a shout-out to Harvard's own CS 50 and of course the Social Network.

Matt Welsh has a take on it, but please also look for my comment (#7) if you read it. The issue seems worthy of further discussion -- yes, learning how to make something useful quickly is a really exciting aspect of computer science, but it's not what defines computer science as a field -- which may be taken up here later (or in comments at Matt's blog).

Sunday, June 12, 2011

2012 PCs

In an effort to return to normalcy, I find myself agreeing to serve on PCs again this year.

NSDI 2012 call for papers is up. Dina Katabi and Steve Gribble went with the old "The PC meeting will be in Boston" ploy -- very clever, made it hard to say no. They also politely asked that I avoid referring to them as stupid or misguided, and I'm expecting that to be an easy requirement to fulfill. Please send interesting papers, so I have something entertaining to read.

I will, finally, be on an ISIT PC, for ISIT 2012. No information up yet that I know of. But it will be held in Cambridge, MA, a little over a year from now. So it rates to be a HUGE conference. (ISIT is normally in the 700-800+ range, as I recall, so I wouldn't be surprised to see it hit 1000+ here.)

For balance, it might be nice to serve on a (CS) theory PC later in the year. (I was recently asked by one, but the timeline overlapped directly with ISIT, so I declined.) Or not -- I'm sure I'll have papers to push around here at Harvard.

Wednesday, June 08, 2011

ITCS

I was asked to post the call for papers for the (renamed) Innovations in Theoretical Computer Science conference. (I'm glad they finally changed the name -- see previous posts here and here.)

This year the conference is moving out of China, where it was well-funded and the logistics were taken care of, to the US. I think this will be a defining year for the conference, as it tries to set out what it should be. The question I continue to have is how it should distinguish itself from and/or relate to FOCS/STOC/SODA. There seems to be a significant bloc in the theory community that is against making these well-established conferences any bigger; to me, this says that we don't need yet-another-conference with the same type of papers unless it's different somehow. So submit and help decide what the identity of ITCS will be.

Tuesday, June 07, 2011

FCRC Continued

I've been enjoying FCRC; of course, I like large conferences where lots of people show, as long as they're run well. Rooms have been pretty full at the talks, and lots of people around to talk to. I still don't understand why people object to the idea of making FOCS/STOC larger (and instead seem to prefer to create new conferences and workshops), but if that's the way the theory community wants it, I'd argue that more conference co-location is a good way to go.

I thought Les Valiant gave an excellent Turing lecture -- I'm sure it will be online soon. He provided a clear scientific challenge -- understanding how evolution could have accomplished so much in so little time (just several billion years) -- and made the case the computational complexity (and in particular learning theory) would be a necessary tool in developing an understanding to this problem. I thought especially he did an excellent job gearing the talk to a general computer science audience, limiting the technical discussion and giving a broad overview of the importance of complexity theory, particularly as it might apply in this setting.

Naturally, my favorite session so far has been the "randomized algorithms" session (1A) for STOC, including The Power of Simple Tabulation Hashing (Patrascu/Thorup), Tight Bounds for Randomized Load Balancing (Lenzen/Wattenhofer), and Social Networks Spread Rumors in Sublogarithmic Time (Doerr/Fouz/Friedrich). The first paper shows that tabulation hashing, while only being 3-wise independent, can provide strong theoretical bounds (and excellent practical performance) for most of the natural application of hashing. The second paper looks at parallel load balancing strategies, and shows that many of the lower bounds proven years ago can be beaten by loosening assumptions that led to the lower bounds in quite natural ways. The third paper considered randomized rumor spreading in social network graphs, including the surprising result that taking care not to (randomly) send a message to the same neighbor in consecutive rounds changes the asymptotic behavior (to just barely sublogarithmic) in this setting. All three of the talks were well-presented, making getting up for an 8:30 session worthwhile.

I'd have to say that the STOC poster session -- done for the first time -- seemed to be a successful experiment. There were plenty of people around talking to the poster presenters, so much so that that it didn't seem to wrap up until more like 11 instead of the scheduled 10:30. Hopefully students who presented will comment on how they liked the experience -- either on this blog, or, more directly, with e-mail to the organizers -- and let them know if they found it valuable. I think getting students to present their work in this way will encourage and benefit them greatly and thereby strengthen the field.

Monday, June 06, 2011

Harvard CS Hires (2011 Edition)

At FCRC, I'm getting asked a lot about hiring. Our hiring season still isn't quite yet finished, but I'm going to go ahead and announce three new faculty who will be joining Harvard. (At least, that's what they've told me!)

On the theory side, Jelani Nelson will be joining us, though in a delayed fashion, as he'll be postdoc-ing first. I know this was reported in a comment on Lance's blog some time ago (before he formally accepted our offer, in fact), so I'm glad to be able to confirm it's correct.

Ryan Adams, who works in machine learning and computational statistics, will also be joining us, and we're thrilled to have hired someone in this increasingly key (and highly competitive) area. Ryan was a student of David MacKay, for whom I've previously expressed worship (most recently here).

On the systems side, Eddie Kohler, well known for his work on the Click router as well as many other systems projects, will be joining us. Non-systems people may know him as the writer of HotCRP, the best conference management software available today. (Yes, I've said it!)

More seriously, we're very excited that all three of these outstanding researchers will be coming to Harvard, allowing us to broaden the department's scope and capabilities significantly. While I'd like to take all the credit (and am totally fine with anyone who wants to give it to me), the real credit goes to David Parkes, who led the search committee this year, and of course all the rest of the highly overworked search committee members.

Sunday, June 05, 2011

FCRC : Awards Banquet

I got to FCRC a little early to attend the ACM Awards Reception and Banquet. (Les Valiant nicely put my name on a list.) For more on the awards you can also go the ACM award page. They have a nice little award booklet they gave out at the reception, but it looks like last year's is still up on the page -- I imagine they'll update it soon. They also had copies of the latest Communications of the ACM out, which is sporting a cover with an extremely nice picture of Les on it.

It was a very nice and extremely well-run event. It was black-tie optional, and I was one of the half in a regular tie. It's odd to see that many computer scientists I know in a tux -- certainly not something I've ever seen before! It was at the Fairmont, in a suitably-large-sized space, and with much better food than I was expecting (and after flying all day, I was suitably hungry to enjoy it). While there were a lot of awards to go through, they kept it moving along quite quickly, so it didn't drag on at all. (I'd like to thank the award-winners, for keeping their speeches both short and entertaining.) The theory highlights were of course Les for the Turing Award, but also Craig Gentry received the Grace Murray Hopper Award for his work on fully homomorphic encryption, and Kurt Melhorn received the Paris Kanellakis Theory and Practice Award for his work leading to LEDA. Bryan Parno, a Harvard undergrad and CMU grad student, received the Doctoral Dissertation Award. On the systems side, Frans Kaashoek received the ACM-Infosys Foundation Award. And of course there were the new ACM Fellows.

Thanks to everyone involved.

Wednesday, June 01, 2011

Justin's Poster at FCRC

Justin Thaler, a student I'm advising (or, quite arguably, the other way around), will be at the poster session for STOC, ready to talk about a bunch of his currently-under-submission work on streaming interactive proofs, including the most recent, Practical Verified Computation with Streaming Interactive Proofs (with me and Graham Cormode). Justin has been working for some time on streaming problems with the following motivation: you're gathering a big data set that you're going to store, and compute on, on the cloud. How do you trust that the cloud is going to give you the right answer? A natural theorists' answer is via an interactive proof, and here the natural limitation is to a streaming verifier: a verifier that sees all the data initially, forwarding it to the cloud, but can't store it all due to limited memory, including later when the prover (the cloud) wants to prove that it's done the computation correctly.

We didn't initiate these models, to be sure. There's now been a chain of work, including the "annotations" paper by Chakrabarti, Cormode, and McGregor, which was a primary motivator. For Justin's other theoretical work, see the arxiv version of his ESA paper (with me and Cormode, journal version under submission) and this paper of his with Cormode and Yi (under submission).

Some time ago I challenged Justin to figure out if this work on streaming verifiers was "practical" for real-world cloud settings, and if not, why not. The outcome (so far) is the Practical Verified Computation with Streaming Interactive Proofs paper. We're not the only ones to have noticed that perhaps interactive proofs are ready for deployment in these kinds of real-world systems. Apparently, there was a recent HotOS paper, Toward Practical and Unconditional Verification of Remote Computations, on the same theme, by real-live honest-to-goodness systems people. I'll avoid (biased) comparisons, and leave that to others.

But I can highlight some of what Justin's done. He's considered and implemented both special-purpose "building-block" functions, like self-joins (or F_2 computations), and a fully general approach for arbitrary computations based on the "Muggles" protocol (interactive proofs with polynomial time provers). He's considered and implemented both non-interactive protocols, where the prover sends a single message that includes both the answer and the verification, and interactive protocols, where the prover and verifier have a multi-round conversation to verify the result. Along the way he's had to come up with several "engineering" improvements (with some strong theoretical foundations) to make things practical, including a clever use of FFTs for the non-interactive protocols that seems new, and adopting the approach of linearizing polynomials from previous work on interactive proofs to improve efficiency to these protocols.

Our experience, I think, shows that with some significant engineering, interactive proof systems in the streaming verifier setting are ready for real systems. So if you get a chance stop by and ask Justin more about the work. He's eager to talk about it!

My Biased Coin