My Biased Coin: 2010

Saturday, December 18, 2010

End of Year Update from the Trenches (Area Dean News)

I last posted on Lance's blog about two months ago, so it seems a good time for an update.

My class is (hooray!) finished. I've sent out grades to the students, and just have to record them in the system. (Last chance for students to complain.) This is my survey/seminar/project class, and (as is often the case) there were a number of projects this year that I think have potential to turn into papers. Some of the students will do it on their own, and some I'll try to help along. I admit I'm a bit hesitant to see the class reviews -- I feel I didn't put in as much time into the class as usual, because administration now sucks up more of my time. We'll see if the students felt the same. (Historically, "Area Deans" have not had teaching relief, but the position has changed some with it seems more responsibility now assigned to the role -- so I think of myself as a test case for whether someone can/should teach while doing the job.)

Much of my Area Dean time has gone toward hiring and reviews/promotions. On the down side, we've had to cope with one of our faculty members leaving (see Matt's blog -- my post and his -- if you're interested and weren't aware). On the plus side, this made our requests for targeted future hiring much more acceptable across the School of Engineering. (I'm not saying our future hiring plans weren't already largely supported, but our needs were really highlighted.) Obviously, besides the fact that we're currently running a search, there's not much I can publicly report on hiring right now -- but hopefully we'll have some interesting news going forward.

Other administrative duties have included doing my small bit to make sure our CS 50 Fair went smoothly. CS 50 is our intro course, taught by the ever-energetic David Malan, and had over 500 students this year. Students do final projects, and at the fair students set up their laptops, and amid popcorn, cupcakes, music, and balloons show their projects off to each other, to fellow students, and to anyone else who wants to come by (including companies who set up recruiting tables). With 500+ students, this year required three shifts over the day. Some writeups on the fair are here and here (with videos!), and many projects are available via links at the class web site. The fair, now it's in third year, has become quite the winter event. While all the credit for the great success of this event is David's, I viewed it as my job this year to help, by doing what I could to get administrative barriers out of David's way. (So I helped get the budget increased, for example. I'm the administrator as offensive lineman.)

Research-wise, I think I've successfully increased the pace from a few months ago. Various old items have now made it or are making it through the final stages of the pipeline. Some re-submissions have taken place. New projects are now going on. I'm definitely relying on co-authors and students to push me along -- the research has been a bit more "responsive" than "reflective" -- but I'm happy to be keeping up on research stuff too. I've also continued doing some side consulting -- some of it is research-oriented, and some of it isn't (some expert witness stuff).

Some fun, "educational" writing has recently come out. I have a chapter in Algorithms Unplugged, which is the English translation of the book Taschenbuch der Algorithmen. This is a version of my "dream book" of a computer science book geared for high school students, that I blogged about long ago (old post on it here with other links to other older posts). The editors asked me for a chapter for the English translation, and I took one I had written for the dream book, and they included it (with minor edits). I'm getting my copy soon -- the book, I hear, is about to be released -- and then I'll spring it on my kids to see how good it is. I also have a little writeup on Human-Guided Search in XRDS -- the ACM Magazine for Students, talking about some of our long-ago work on the subject.

So roughly 6 months down on my administrative stint. Again, not that I'm counting. The job is actually just fine -- thanks to both really supportive faculty, and a supportive Dean of the School of Engineering. I always have the feeling that everyone is trying to make this job easier for me, and they're doing a very good job of it.

Wednesday, October 20, 2010

Guest Post over at Computational Complexity Blog

For those still receiving this feed, I have a guest post briefly describing some aspects of life as Area Dean over at the Computational Complexity blog.

Tuesday, August 31, 2010

Blog Retrospective

I started blogging a little over 3 years ago, as something of an experiment. Lance had given up blogging, and I had been a reasonably frequent and opinionated commenter on his blog, enough so that people often asked when I would start my own. I hadn't planned on it, but Lance's stepping down (which, later, turned out to be temporary) felt like it had left a hole. I hoped that I might provide, in my own way, a community forum for discussing issues, and a connection point for the areas I'm interested in -- algorithms (or theory more broadly), networking, and information theory.

In the end I'm not sure how well I achieved the various goals. I don't feel this blog has ever become a strong authority (or hub, in Kleinberg's language) in the way that I might have liked. Commenting has been sparse with infrequent spikes; longer more detailed discussions seem rare. Perhaps this is just hard to do -- people have, on the whole, better things to do with their time. Or perhaps (probably?) it represents flaws in my posts. Certainly one wish I think I had going in is that my posts would be more technical, but technical posts take a great deal of time, and are, quite frankly, hard. I'm ever-impressed by what Dick Lipton is doing, in terms of technical depth, at his blog; it's a wonder to me.

On the other hand, I'm amazed and pleased to find that people read this blog, and have enjoyed the "behind-the-scenes" look at life and work as a professor. Everywhere I've gone in the last few years, there are people who tell me they've been reading it. I've never implemented tools to tell the size of my readership, but anecdotally it must be larger than I think. (The joy of low expectations.) It's opened the doors to lots of interesting discussions about research, the state of computer science, what being a professor is like, and a whole range of various things. And from what I can tell, it has given the different communities I was targeting a better idea of what each of them is like, in terms of culture and process. That perhaps hasn't always been a good thing, but overall I'll view it as a success.

I didn't realize when I began how much blogging would raise my "visibility", but that seems to have been a pleasant side effect. I'll admit, I'm glad to have been able to take advantage of that. Perhaps I'll be invited to give fewer talks now that I'm giving up the blog. Or fewer PCs. Maybe at this point that's not all bad. Or maybe stopping will force me to explore other positive ways of raising my visibility, perhaps by writing another book.

Overall, I've had a great deal of fun blogging, and that alone has made it worthwhile. Over the last several months, however, I've found blogging less enjoyable. Some of that must just be fatigue; I suppose I've been running out of things to write, making writing harder. But also there have been fewer comments, and -- as discussed in this post over at the Complexity blog -- there has been much more of an unpleasant tone in (anonymous) comments (across many blogs) of late. It's a sign for me that, as fun as this all has been, it's time for me to stop. My new position has provided a good excuse, but I probably would have stopped anyway.

Perhaps blogging has just been the latest Internet fad -- perhaps our social networks can't support the number and diversity of blogs that we have, and our attention is now moving elsewhere. (Like, back to work.) I'd like to think not. I think the latest P=NP? phenomenon is an excellent demonstration of the potential power and importance of blogs. (Again, Dick Lipton's blog was a wonder.) I hope that all the bloggers we have in our community keep going, that new bloggers come into the picture, and that we use blogging -- or whatever new tools come along -- to enhance communication within and across our communities. As an example, I've spent some time the past few days looking around at the CS Theory StackExchange Q and A site, prompted by Suresh's posts. I'm not quite sure what to make of it yet, but it's been fun to explore and seems to have interesting potential.

Thanks to all of you who have been reading, and especially to those of you who have been taking the time to provide thoughtful comments. I wouldn't have continued for as long as I did without you. I've enjoyed this experiment, and I'm gratified to think that some of you have enjoyed it to. I'm sure I'll still be around, offering my opinion at other places. And I hope when you see me around (physically), even though I'm not blogging, you'll consider trying to strike up a conversation with me; I'm sure we can find things to talk about, and, without the blog, I'll be missing this type of conversation.

Saturday, August 28, 2010

Doing the Right Thing? (Quick Links Edition)

From Shots in the Dark, a pointer to a new "feature" -- apparently, there's not a tweet system recording and listing books checked out from Harvard libraries. No, they're not putting names with it, just times. But who thought this was a bright idea? Seems like a clear potential privacy-violating nightmare with no upside that I can see. I'll have to find out who to call on Monday to complain and spread the word to other profs...

From the Crimson, Marc Hauser will be teaching his classes in the Harvard Extension School this year. Now, in some sense, this isn't a big deal; Marc's on leave from FAS, and the Extension School is separate from FAS. And trust me, he won't be getting any huge paycheck from the teaching; while the Extension school pays its teachers (naturally), I'm sure it's a small fraction of Marc's Harvard salary (which he may or may not be getting; I haven't heard confirmation one way or another whether he's on paid or unpaid leave). Given that he's been heralded as a great teacher for a number of years, arguably, why shouldn't he teach? But I admit, as someone who works with the Extension School, it's leaving me with an uncomfortable feeling that I'm still trying to process.

Anyone have gossip to tell about why the Crypto 2010 proceedings were put online, but then taken down (apparently once the link got publicized)?

Scott Aaronson answers some questions for MIT news about the P/NP proof. I won't opine on whether his bet was a right thing or not (his own blog has had plenty of discussion on that) -- what's wrong with the article is that it has multiple links to Deolalikar's paper that are now non-functioning. I understand that web-news links aren't going to be kept up to date in perpetuity, but you'd think for this fairly recent article and controversial topic someone might have updated accordingly. One thing I wonder -- given the unusual amount of press that this proof attempt was given, and the current consensus that it's incorrect and recovery isn't possible, how many people are left with the misinformation that this very important problem was solved?

Wednesday, August 25, 2010

Conference/Journal Versions -- Transactions on Networking

I was recently asked to review a paper for Transactions on Networking, and noticed the following bit in the e-mail?

Please note that while this paper may have had a previous conference version, ToN does not mandate any specific differences between conference papers and their versions subsequently submitted to the journal.

Is this new??? Am I reading this right, that there's no mandated "30% new material", or some similar rule It's been a while since I've submitted to ToN, but I seem to recall being explicitly asked by reviewers or editors from ToN before what "new material" there was in the paper over the conference version. I'd be interested to know if this was an actual policy change -- it's one I've called for before, but didn't expect to see implemented anywhere.

Just curious if anyone can share any insight....

Monday, August 23, 2010

Various Quick Pointers, Redux

There were many interesting things at the CRA Snowbird conference for CS chairs (which I missed...), but I haven't heard any blog-level discussion of their call to move up the schedule for hiring (as well as related changes in procedure). Anyhow, lots of slides from various presentations.

UC Campuses are tops in Washington Monthly rankings, which are different in substantial ways from the US News and World Report rankings...

Still time to sign up for Harvard Extension School courses for this semester; here's the list for computer science. Including, for example, E-210.

Dick Lipton's future book, taken from his blog, appears on Amazon (you can pre-order now!).

Nature's take on Hauser's MonkeyBusiness opens with: "When news broke last week that famed Harvard University evolutionary psychologist Marc Hauser had been investigated for scientific misconduct, it was no surprise to many in the field. Rumours had been flying for three years, ever since university officials arrived to snatch computers from Hauser's laboratory at the start of the inquiry. By the time Harvard completed its investigation in January, the gossip had become standard cocktail-hour fare at conferences."
Maybe they're right -- I'm not in his field -- but I'd never heard any sort of rumor at Harvard. I'm clearly not getting invited to the right cocktail hours.

Sunday, August 22, 2010

How's that New Job Treating You? Edition

I told myself I'd quit blogging when summer ended. That's a bit over a week away, as classes start September 1. Also nicely, from the count on the right, I'm nearing 500 blog posts. Seems like a good stopping time.

I'm now not infrequently asked how the new "Area Dean" job is working out. Just fine, thanks. I figured I'd say a little more about it, and perhaps that will also explain why it's a good time to give blogging a rest; I can't imagine people would want regular blog posts on this sort of stuff.

So far, the time commitment is about what I expected, but only because I was told to expect that it would take more time than I would expect. There's a lot of meetings, e-mail, and writing. Pleasantly, the time thus far has been spent on fairly worthwhile endeavors -- most of the time has been spent on hiring and promotion plans. Since, really, managing those issues are the highest priorities of this job, that feels like time well spent. Some time has been spent on letter-writing -- those CAREER, Sloan, and other fellowship letters get written by someone, and now that someone is often me. Finally, some time has been spent as being "voice of the faculty" on certain issues. For example, there are some non-trivial changes supposed to take place on our e-mail system, and unsurprisingly the CS faculty are more concerned than the average faculty member about this. (A little knowledge is a dangerous thing...) My job, where possible, is to be the consensus voice and contact point on faculty-administration issues like this.

Because I'm new -- and because it's summer and we're not having our regular faculty meetings -- there's been a lot of e-mail. We're a consensus-oriented faculty in CS, so I want to represent the consensus. I feel at this point it's important for me to check carefully with other faculty members before expressing a collective opinion (or even my own, since often it will be taken as the collective opinion). Being new at the job, this means -- in my mind -- checking in with the faculty perhaps more than is truly necessary, both so I am secure that I am representing them accurately, and perhaps even more importantly, so that THEY'RE secure that I am representing them accurately. I suspect after a few months, assuming that I've grown into the role and the faculty has developed a trust in how I perform the job, there will be less need for as many explicit checks on things. (I suspect some faculty will just get tired of getting e-mail from me!) On the other hand, maybe they'll appreciate this conservative style, even if it means they get e-mail pings on administrative issues more frequently. We'll see.

I expect further aspects of the job will reveal myself as the semester begins -- more committee meetings, more curricular issues to handle, more faculty concerns. There are also some long-term initiatives that I expect CS to be at the center of that are just starting up but will require my attention. (They're not ready to talk about yet.) And, perhaps, I'll find myself involved in other activities like fund-raising. (I may have to convince my Dean that, although my standard work wardrobe is a simple button-down shirt and jeans -- or a T-shirt and jeans over the summer -- I do own a few suits and ties and can be made to don them for appropriate occasions.)

It is time-consuming, and it will, sadly, clearly eat into my research time. I'm thinking about how best to handle that. And when you're ~~shafted with~~ given a job like this, it really makes you appreciate your predecessors. (I knew Greg Morrisett was doing a great job before, but now I really appreciate it.)

So far, though, it's all fine. I hope to do some good in the position; and I hope I end up being good at the position.

Friday, August 20, 2010

RATS roundup

I didn't see every talk (my brother lives in the area, so I took a break to see family) but I did have a fun day at RATS. There was a brief introduction by Chris Anderson of Wired/The Long Tail fame (on video -- I was disappointed he couldn't make it in person, I wanted to meet him), which was very interesting. I was pleased that as he was talking he kept mentioning power law and lognormal distributions; I knew he mentioned my survey on his blog at one point, but I (and others, as expressed to me later) were still surprised he mentioned them together when discussing long tail issues. That nicely set up my nice "survey talk" on lognormal/power law distributions. This was followed by the excellent talk by Aaron Clauset on power-law distributions in empirical data, discussing the challenging issues of how do you determine, based on your data measurements, whether you're looking at something that seems to be following a power law or some other distribution. (I'm asked this question a lot; happily, I can just point people to Aaron's paper.) Sharad Goel gave a fascinating talk on the implications of the long tail in marketing/web sales, arguing that the "value" for sites like Amazon in offering the "long tail" of items is NOT necessarily in the additional sales, but in the power of locking in customers. (Since Amazon has "essentially everything", at a reasonable if not optimal price, why bother wasting time going anywhere else?) Neel Sundaresan of eBay discussed insights form eBay data about the differing "shape" of different market segments, and the implications for assisting customers to find items in the large landscape that is eBay. Silvio Lattanzi talked about implications of power laws in compressing social networks, and on models for affiliation networks.

The slides, apparently, should all be up at some point on the RATS webpage, or I'll update with an appropriate link.

MonkeyBusiness : Some Resolution

Wow. After days of various speculation and reports from multiple new sources, Dean (Mike) Smith of the Harvard Faculty of Arts and Sciences has made an announcement regarding the investigation of Marc Hauser. The opening paragraph is the key:

"No dean wants to see a member of the faculty found responsible for scientific misconduct, for such misconduct strikes at the core of our academic values. Thus, it is with great sadness that I confirm that Professor Marc Hauser was found solely responsible, after a thorough investigation by a faculty investigating committee, for eight instances of scientific misconduct under FAS standards. The investigation was governed by our long-standing policies on professional conduct and shaped by the regulations of federal funding agencies. After careful review of the investigating committee’s confidential report and opportunities for Professor Hauser to respond, I accepted the committee’s findings and immediately moved to fulfill our obligations to the funding agencies and scientific community and to impose appropriate sanctions."

Rather than reproduce the whole letter here, I can point you to Harvard Magazine, or Science. Mike also discusses the Harvard process and the reason for confidentiality in such cases. I'm glad to see this come out, and I can imagine the difficulties Mike had in deciding to produce such a letter. (As I have stated previously in this blog, I have great respect for Mike Smith, who was in the office next to me before getting proverbially kicked upstairs, and I'm very happy that someone of his talents is serving as Dean of FAS). On the other hand, it's a sad day for Harvard, and arguably science more generally.

Thursday, August 19, 2010

Various Quick Pointers

While it may not be news elsewhere, I'm certainly interested in the "local" case of Marc Hauser, the evolutionary psychologist at Harvard whose work has been "under review". The latest interesting update appears at the Chronicle of Higher Education.

As reported elsewhere, congrats to Dan Spielman for winning the Nevanlinna Prize.

I'll be at the ill-named RATS (Research and Analysis of Tail Phenomena Symposium) tomorrow reviving my introductory talks on power laws, lognormal distributions, and the importance of verification. Stop by and say hi!

While I was away in the UK the Microsoft PR machine must have gone to work, and I've seen a few articles like this describing our work on password popularity. I'm happy to see my name "in lights" a bit -- why not? -- I just think it's interesting that this is the paper that gets it there. (I guess our coding work is also being touted a bit as part of Dan's Nevanlinna Prize, so that's "in the news" as well.)

The Museum of Mathematics is getting more notice -- check out their web page.

Friday, August 13, 2010

In Need of a Few Bad Papers

For my graduate class this semester, there's a lot of paper-reading, and I view learning how to critically and constructively read papers as part of the student goals for the class.

A corollary of this, it seems to me, is that the class should include some bad papers, so students learn to recognize (and, if possible, get something out of) reading those. So I need some really good examples of bad papers. (In one of the areas of the class focus -- web search, compression. coding, streaming data structures...)

Now I should be clear about what I mean by bad papers. I'm looking for something of a higher standard than an automatic journal reject -- I get at least one of those a month in my mailbox, and it's not clear there's much to learn from that. I'm talking about papers that at least superficially look quite reasonable -- indeed, I'm expecting papers that have been published in reasonable places -- but when you think about them more, there are subtle (or not-so-subtle) flaws. In theoretical papers, possibly it might be that the paper starts with a model that sounds really nice but it just clearly wrong for the problem being addressed. For systems papers, it might be a paper where the experiments just don't match up to what ostensibly was being proposed.

[I had a nice example of a bad paper in earlier incarnations of the class, but I don't think it's aged well, and I've removed it.]

Maybe bad is even the wrong term for what I'm looking for. Perhaps I should use a more neutral word, like "controversial" -- indeed, then I can get the students to take sides. (Is the Faloutsos, Faloutsos, Faloutsos paper still considered controversial these days? That could be a nice example, but it's not really on topic for the class.) Or perhaps I just want papers that reached too high for their time -- noble failures. The key is that, in my mind, just showing students examples of great papers doesn't seem didactically sound. Negative examples are important for learning too (especially if they also show that great scientists don't always get it right).

Feel free to mail me rather than post a comment if you're afraid of offending anyone. Naturally, mailing me links to my own papers will be taken with the appropriate humor.

STOC tutorial online

Paul Oka asked me to announce that the STOC 2010 tutorials are now all online. You can find them here.

Thursday, August 12, 2010

Monkey Business

I see Harvard's in the news yet again, as the Boston Globe broke a story about psychologist Marc Hauser, who is "taking a year-long leave after a lengthy internal investigation found evidence of scientific misconduct in his laboratory." One paper has been retracted, others are under examination. As discussed over in Shots in the Dark, an unpleasant issue is that Harvard is being silent regarding its investigation. It's not clear to me what the right approach in such cases are -- what rights to privacy, if any, does an academic have in such situations, or, assuming improper behavior is found, is it incumbent on the institution to correct the scientific record itself? The issue is also raised in a New York Times article. Feel free to discuss the institutional ethics in the comments.

I have no inside insight on what has actually transpired; however, I have served on university committees with Marc in the past, and found him an enjoyable colleague. I hope to the extent possible the issues are resolved satisfactorily.

This controversy provides an interesting contrast with the current hubbub over the P not equal NP paper -- best considered over at Richard Lipton's blog here, here, here, and here. In theory we don't have to worry about people "forging" a proof in the way that experimental data might be forged, but proofs can easily have mistakes or unclear gaps, and this is not viewed as misconduct -- just embarrassing. I wonder what the state is in computer systems work -- I can't recall hearing of cases where there were accusations of misconduct with data, although I've certainly heard mutterings that experiments in papers were carefully chosen to (excessively) highlight positive results. Such cases can lead to heated discussions in PC meetings, and to some interesting discussions post-publication, but I haven't heard people suggest that that level of data manipulation corresponds to misconduct. Our field seems to have, for now, sidestepped these particular issues.

Wednesday, August 11, 2010

Other UK Adventures

While in the UK, I went out to some other places to give talks -- Liverpool and Cambridge.

At both places I gave my talk on our analysis of the auction site Swoopo, which seemed well received. Of course it's a topic that can appeal to a wide, general audience and is just fun to think about, but the major credit has to go to our student Giorgos Zervas. Not only did I swipe his excellent slides, I even shamelessly adopted some jokes from his presentation, and of course they got the biggest laughs. Maybe I need to get him to prep all my talks. (I also talked about some recent work on networking+hashing at UCL and Cambridge as well. Slides are up at my talks page.)

At Liverpool I was hosted by Leslie Goldberg, and it was great fun to ask her questions about the UK system. One issue that came up is a UK policy to use "short-term economic impact" as one of the bases for research funding decisions. Leslie has rallied against the idea -- she has an interesting web page devoted to the issue with a host of opinions on why it's a bad idea. We also discussed the RAE, the Research Assessment Exercise, where schools are scored and ranked based on their research output, and this affects their future government funding for research. (Here's an article from the Guardian in 2008 when the last results came out.) It's interesting that the NSF does not do something like this, but the link between university funding (apparently, even for research) and the government is perhaps more direct in the UK. It's worth pointing out that Cambridge is at the top overall, and my host institution University College London was 5th in the latest rankings; specifically for computer science, if my info is right, Cambridge is still 1st, UCL is still fifth, and Liverpool is 11th.

At Cambridge I was hosted by Jon Crowcroft at the computer lab and Peter Key of Microsoft. The two building are right next to each other, far from the Cambridge center (about 2 miles). They're just past Churchill college, where I spent almost a year after college, so I got to experience the waves of nostalgia as I walked by. (I would have experienced it even more had I had a bike.) More than nostalgia, I felt a twinge of jealousy -- when I was at Churchill XX years ago, it was far removed from everything. Now it's pretty much at the center of the mathematical sciences complex and the computer science buildings, which have moved out to the outskirts (for space reasons, and so nice new modern buildings could be made for them). Why couldn't they have had that when I was around? Re-visiting Cambridge was also a blast -- it's just a lovely city. Hey, come to think of it, why doesn't someone plan a major conference there (not hard to get to from London airports; I'm pretty sure the main conference could be held at University/Microsoft lecture rooms; hotels, though, are probably quite expensive).

Sunday, August 08, 2010

Papers to Teach This Year

This fall I'm again teaching my "introductory" graduate class loosely centered on the themes of big data and communications/networks, Algorithms at the End of the Wire (Computer Science 222). [The real subtitle for the course should be "Things I'm Interested In."] The subthemes include big graphs (PageRank, HITS, link prediction, etc.), compression, data streams/streaming algorithms, and coding. Most of the class involves reading and discussing papers, and I try to have some fraction of them be current rather than historical. Since it's been over a year since I last taught the course, and I'm lazy enough to wish to have other people do my work for me, I thought I'd ask for recommendations -- any good new papers to teach?

One aim of the class is to try to bridge the gap between theory and systems, so papers that fall into that area are highly desirable. For example, this year's Best Paper at SIGCOMM, Efficient Error Estimating Coding: Feasibility and Applications, will surely be added in to this year's reading (assuming I can get a copy -- should be online at the conference site shortly).

Last year's web page is still up here, though I'll be aiming to take it down and update it this week.

Saturday, August 07, 2010

Back from Travels

The slowdown in posting for the past month has been primarily due to travel. For the last month, I've been in England, based primarily at the Computer Science Department at the University College London. Thanks to my host, Brad Karp, I got funding through a Royal Academy of Engineering Distinguished Visiting Fellowship -- that's a mouthful -- to spend time working with the networks research group at UCL (and bring my family). UCL has a strong computer science group, and a particularly strong networking group, with Brad Karp, Mark Handley, Kyle Jamieson, Damon Wischik, and others. If you're going to or through London -- a frequent stopover flight -- you should visit there, maybe give a talk. It's a great place, you'll definitely enjoy it.

Hopefully they'll be some interesting products from the visit in the future. Overall, I had a productive time, and it was nice to break out of my routine and do something different for a month's time. (They managed to put me up walking distance from the CS building -- for most of the time, I was a two minute walk away -- so just not driving for a month was a shake-up from my status quo!) Of course London is a wonderful city, so evenings and weekends were filled with tourist adventures. And I can't thank Brad enough -- besides being great fun to work with, he really helped set everything up so it all went smoothly. (As everyone from the SIGCOMM 2009 PC meeting probably recalls, Brad takes hosting duties very seriously.)

I'll probably have another post about other parts of the trip. Twos thing I found, though, were that I didn't miss blogging so much, and that being Area Dean is indeed going to suck up large chunks of my time (well, it already is). So when classes start, my more permanent break begins...

Thursday, July 29, 2010

The 2050 Calculator

As regular blog-readers know, I'm a tremendous fan of David MacKay, who has gone from being a leader in the general area of Bayesian inference (author of Information Theory, Inference & Learning Algorithms) to the author of the popular book Sustainable Energy - Without the Hot Air and now Chief Scientific Advisor to the Department of Energy and Climate Change in the UK.

David recently showed me a fantastic tool he and his department have made available: Essentially, it's a calculator tool that let's people determine levels of effort they'll put into creating various types of energy on the supply side, as well as effort into curbing the demand side, and figures out based on those inputs whether the resulting configuration will lead to Britain reaching its legally mandated 2050 greenhouse gas goals (as well as other related outputs), all with a pleasant user interface. One could view it as a "game" with the player figuring out what policy decisions will have to be made to reach the desired target. Further, it's all open source! Here's a link to a description page, and a direct link to the calculator. Try it and see...

I've already recommended it to my environmental engineering colleagues as a potential learning tool. Moreover, since it's open source, one could imagine building projects on top of it -- David suggested that developing calculators for various countries (including the US), or providing enhanced user interfaces for various purposes could be interesting.

Relating this back more directly to computer science, does anyone have pointers to similar interesting tools that might be useful for computer science classes? It would seem one could imagine many such things in the networks economics space. Luis von Ahn's work (like the ESP game) had some associated sites that were fun to point students to.

Tuesday, July 27, 2010

Teaching Time Conflicts

As the semester ominously approaches, we've noticed that we're facing a number of class time-slot collisions in computer science.

This is unsurprising. Until recently, we've been fairly ad hoc in assigning class times; for the most part, each faculty member, more or less, just picked their time. Not surprisingly, our Tuesday-Thursday slots are packed. Computer science faculty like to have Monday and Friday free -- often for travel to conferences, but also to minimize teaching days (MWF vs Tu-Th). This also seems to match student desires; many seem to like to avoid Friday classes if possible. (We can sometimes arrange M-W classes, but strictly speaking it's against some policy -- we can't do them in the morning.) And class times are further limited -- many faculty (including myself) and students are put off by classes before 10 am or after 4pm, there's the weekly faculty lunch meeting and various seminars to consider, and so on.

The problem has gotten a little worse in recent years, as we've happily been offering more classes (new faculty, and the benefit of some visitors), making it more noticeable.

We're not so clear that this is a terrible thing. Other classes -- such as popular distribution requirement classes, various math classes, and so on -- appear to have taken the natural MWF slots, so we're avoiding those external conflicts. (Although not entirely -- my class has conflicted the last few years with the 2nd semester math class on algebra, which leads to a few e-mail exchanges each year on what can be done about that.) We tend to avoid really dumb conflicts where it's clear lots of people might want to take both courses naturally.

But it's clear it's become enough of an issue that we have look more carefully at it, and honestly, it wouldn't hurt to put some more reasoning into our scheduling rather than keep following the path tread by historical accident.

While I'm sure the CS faculty will generate plenty of ideas, to get the ball rolling, does anyone have any good suggestions on how to design a procedure to assign class times? In fact, I think we're still at the size where we can all do it together and resolve possible conflicts peacefully and happily, but rather than waste lots of faculty time going through all the permutations, it seems worthwhile to create an approximately good starting point. It seems like intro classes should get top priority, and following that theme, perhaps grad classes should get placed last. Or perhaps an ordering should be tied to faculty members, not classes? I was thinking every faculty member give their three top time orderings, and give preference to junior faculty members. Other insights welcome.

Friday, July 23, 2010

"I Understood Your Talk"

Often, after I give a talk, the feedback I hear back is "I understood your whole talk," or something near that. I take it as a compliment (although I don't think it's always meant that way).

When I present a talk to systems people, I think the statement is actually meant in gratitude. Instead of trying to force some challenging theoretical method or computation on them in the space of an hour, I've tried to give the high-level overview of what we've done (and why), and provide some simple and understandable examples behind the work that went into it. With any luck, there's an idea or technique in there they can use themselves sometime. I have some suspicion that many of them have suffered through a fair number of theoretical talks that have left them behind, and were grateful not to have to sit through one of those. Also, in my experience when systems people say they don't understand a systems talk that's a bad thing; in that case, it's often that there are some significant nagging details that haven't been discussed sufficiently that are making the listener suspicious that there's some flaw or an important side case that hasn't been adequately addressed.

When I present a talk to theory people, it's not always clear to me how to take that comment. There's certainly a subculture in theory CS that seems to think it's important to show how complex your result is (or, perhaps, how smart you are), never mind the audience. On the other hand, some theory results are so complicated it is truly a challenge to try to present a lucid 1-hour talk. In some cases, people saying they understand the talk feels like a real compliment -- thank you for presenting things in an understandable way. In some cases, it feels like a backhanded compliment -- if it's that easy to explain, you're not working on very hard stuff, are you? When people say after a talk they didn't understand it, it's more ambiguous -- did the speaker do a bad job, or is this really exciting new difficult stuff that will take some time to learn?

Generally, when I plan my talks, the aim is to make almost all of it understandable to as large an audience as possible. For specialized audiences, I'm happy to go into details, but for more general audiences -- the very large bulk of my talks -- I'll aim for simple when I can.

Monday, July 19, 2010

New Paper: Popularity is Everything

Another new paper announcement: Popularity is Everything: A New Approach to Protecting Passwords from Statistical-Guessing Attacks, which will appear next month at HotSec 2010, is online. My co-authors are Stuart Schechter and Cormac Herley of Microsoft. The idea here is that the real problem with passwords is that some are too popular, making them easy to guess. Providers respond by forcing users to choose passwords that pass certain rules -- you must have a capital and lower-case letter, you must have a number, etc. These rules are somewhat arbitrary and don't directly tackle the significant problem of popularity. Our paper is about how that can be done. (As you might imagine, from my involvement, some Bloom filter variant -- the count-min filter in this case -- is part of the solution.)

This paper was one of those great examples of serendipity. Stuart (who was a grad student at Harvard, before joining Microsoft) came back to give a talk. I met with him and talked about problems. We found a nice intersection point and, some months later, a paper appears. As faculty we're often cajoling our students to go to the colloquia or to interesting talks outside their direct field -- and to, on occasion, talk with and meet the speakers. I admit that it's time consuming, and not all talks end up being worth going to. But if you give up on the chance to talk to people, including (and perhaps especially) people outside your direct problem space, you miss the chance for these wonderful bits of serendipity, where you can work on something new and different, and ideas can potentially cross between areas.

Friday, July 16, 2010

Facebook Movie Preview

It's nice to see Harvard finally getting some recognition in the mainstream media, with the upcoming Facebook/Zuckerburg movie the Social Network (based on Ben Mezrich's book The Accidental Billionaires: The Founding of Facebook: A Tale of Sex, Money, Genius and Betrayal -- Mezrich is also a Harvard grad... graduated the same year I did!) -- here is (one of many) pages with the latest movie preview with some analysis, and here is a Youtube link. Somewhere in there I saw the words Maxwell-Dworkin (the Harvard CS building). And the scene where he is in front of the Ad Board (the Harvard body responsible for disciplining students) looks amusing -- I want to see how close it matches reality in full.

Thursday, July 15, 2010

Arxiv Paper: Heapable Sequences and Subsequences

I haven't been blogging much; consider me on extended vacation. It's helping me prepare for giving up the blog soonish.

But I thought I'd mention our newly posted arxiv paper Heapable Sequences and Subsequences. I mentioned some time back I was looking at an Aldous/Diaconis survey on Longest Increasing Subsequences. That was for this paper, where we look at a variation on this theme. Say that a sequence is heapable if you can sequentially place the items into a (binary, increasing) heap, so each new item is the child of some item already in the heap. So, for example, 1 4 2 3 5 is heapable, but 1 5 3 4 2 is not. Then, just as you might consider increasing sequences and subsequences, you can consider heapable sequences and subsequences. We (as far as I know) introduce this problem and provide some first results, as well as a lot of interesting open problems. We give a simple (greedy) algorithm for determining if a sequence is heapable, but show that determining if a sequence is perfectly heapable (so the heap is a perfect binary tree) is NP-hard. We show that the longest heapable subsequence of random permutation of [1,n] has length (1-o(1))n. And we show some other thing as well.

The paper ends up being essentially combinatorial in nature, but we were actually thinking of a more economic motivation when we started the problem. It's meant to be a variation of in the space of hiring problems (like the secretary problem); here you are hiring for an organization with an org chart -- say in the shape of a perfect binary tree -- and your hiring restriction is that each node is the boss of its children nodes, and therefore must have a higher ranking than those nodes. (One would, naturally, argue whether in practice a boss is always higher-ranked than the employees directly reporting to them, but one can always consider noisy variations in a future paper.) Now this is like a multi-hire secretary problem, but with restrictions.

Given the large number of papers related to longest increasing subsequences, it seems like there might be a lot of interesting things to find in looking at this variation on the them.

Sunday, July 04, 2010

(n)On Vacation

Despite appearance, the blog hasn't yet disappeared. I was on family vacation somewhere tropical.

Even though I was on vacation, it didn't always feel like vacation. With a consulting engagement that required a full-day phone call, several papers to deal with, some unfortunate emergency administrative stuff that came up for a later trip this summer, and various other administrivia, work was with me more on vacation than I would have liked. I'm not the sort that minds bringing some work along on vacation (often to my wife's annoyance), but this time it felt, even to me, like much more of an intrusion.**

The same seemed to be true for my brothers and sister-in-law, however. They're all in business (of various sorts) and having to deal with mail, sit in on conference calls, and otherwise work while nominally on vacation seemed like the norm. There's been plenty written in mainstream media about this (a little random searching yields examples like this), so it's not exactly new, either for me or my family. But the magnitude of it -- and the expectations of others about one's accessibility -- definitely seems to have increased substantially since our last similar family vacation.

On a day-to-day basis, I don't mind the "blurring" of work and non-work time; I appreciate that my job lets me walk my kids to school in return for having to put in some hours after they go to bed. But as a society the level of this blurring seems very dramatic, and I think we'll be figuring out the consequences for years to come.

**I'd exclude my co-authors from this statement. They've actually been quite lenient and understanding; I admit, however, to some guilt in being busy the week before the SODA deadline.

Monday, June 21, 2010

Colocating Conferences -- What Will It Take?

In the ongoing future of STOC debate, one possibility that seems to have significant support is that we should do more colocation of conferences. (See Suresh's latest post, for instance.) That is, rather than expanding STOC itself, it is thought that the best way to make STOC more of an event that theorists will want to attend is to set up a "pan-TCS" conference, that has several related conferences or workshops the same week at the same location. Arguably this would allow STOC to continue essentially the way it is -- which many people seem to want -- while leading to higher attendance and community building.

Given the success of the STOC/CCC/EC mix this year in Cambridge, it's hard to argue against this idea. So I won't. Heck, I think it's a fine idea. The one thing that people seem to be ignoring (or perhaps just forgetting), however, is that this will require some non-trivial organization to make this work well consistently over time. And I'm left wondering how this sort of organization will come about. (Honestly, it's not quite clear to me how things all managed to work out this year!)

Right now, separate conferences organize themselves. Heck, even getting conferences following each other, like FOCS/SODA coming up, to organize their deadlines so people can send their rejects on to the next conference appears to be a strain. (Anyone else think that the 1 week between when we're supposed to hear back from FOCS -- hopefully with actual reviews, but maybe not -- and the SODA deadline -- which includes the 4th of July weekend -- is maybe not enough to make worthwhile improvements and changes?) Now let's think about what has to be done to make this sort of colocation work:

1) Arranging conference hotels, and meeting space, in roughly the same place at the same time. (On another post somewhere, some anonymous people complained that STOC and CCC/EC were not conveniently located -- they were miles apart, and you'd have to switch hotels to go to both easily. Perish the thought! Add "adjusing people's attitudes" to the general list of what has to get done to make this work...)
2) Synching up schedules, including submission deadlines (will one conference in the set be able to take another's rejects for submissions and still have a workable timeline? ), and where possible avoiding scheduling important plenary/invited talks at the same time (sometimes this might not happen, as with this year's EC tutorials overlapping the STOC conference -- see above about "adjusting people's attitudes"...).
3) Managing these relationships and corresponding decisions years -- possibly multiple years -- in advance; some conferences have different ideas about how often they will be run internationally, whether they should be connecting just to theory or other areas (e.g., EC is also connected to AI), etc. So conferences will be joining and exiting this system regularly, requiring further high-level organization and significantly more advanced planning that we currently have.

I should be clear that I don't think any of these organizational issues can't be overcome. It's just that they'll require, naturally, some organization. Which translates into more time (at least up front -- maybe less time years down the line) spent by people in our community to put these sorts of things together, and/or more costs (as, like with FCRC, the organizational aspects are left to other organizations -- that charge for it). So for all of you eager for this style of change -- as opposed to, for example, simply increasing the number of accepted papers -- I eagerly (very, very eagerly) await to hear your voices chime in when volunteers are sought in the future to help make this sort of thing happen.

Saturday, June 19, 2010

ISIT Day 2 (2010)

First, finishing up day 1, the banquet was, as expected, a fairly nondescript affair involving chicken of some sort, which I happily got to spend with a group of people from EPFL (Lausanne), including Reudiger Urbanke. The big news announced was Shlomo Shamai won the 2011 Shannon Prize, which (as I understand it) is the rough equivalent of the Turing Award for Information Theory.

ISIT, I think, is more like the sort of large conference that Lance Fortnow was trying to push for in his thoughts on re-formatting STOC (as we discussed here and here). I talked a bit with one of the staff who manages the conference (when it's in the US, every other year). This year's attendance was around 750, with over 300 students. He said that was down from previous years, by roughly 10% or so -- 820's was the usual number. His thought was that it was the economy; I opined Austin wasn't the easiest place to get to, especially for people abroad. (Previous years had been in Chicago and Seattle.) At the banquet, they said there was something just under 1000 submissions with about 550 accepted. So they accept over 50% of the papers, but there's still a filter. I'll leave it as a comparison point with STOC -- which has a tougher bar for acceptance, and much less attendance. (And also loses out on several other activities -- see their meeting schedule here. There's a lot of events for students, meetings of various boards, etc. as well as the standard tutorials and plenary speakers.)

The highlight of day 2 for me was going out for Texas barbeque for lunch -- a postdoc I had met at dinner took pity on me when I said I had gone out to a nondescript place by the hotel the day before and been disappointed, and volunteered to take me to an excellent barbeque place she had found earlier in the week. (She was from France, so I accepted her food judgment.) Great food, and we discussed what we liked and didn't like about the conference.

There seemed to be 50+ people in my session when I gave my talk, somewhat surprising given it was the next-to-last session and I'm sure several people had already left the conference. Yashodhan Kanoria gave a great talk after me with some really nice new insights into the deletion channel (paper is on his web page) -- it seems like significant progress in our understanding, as he and his advisor Andrea Montanari seem to have developed insights into what the right "random codebook" looks like, at least for small deletion probabilities. It make me think there's some real chance for significant progress on the deletion channel going forward.

Thursday, June 17, 2010

Fly-By Conferences (ISIT edition)

I'm writing this from ISIT (Int'l Symposium on Inf. Theory) in Austin. In fact, I'm sitting in on the session on for "Coding for Memories", a topic I did some work on last year and could see returning to (if I could figure out a problem I thought was sufficiently fun theoretically and sufficiently close to practice). I'm at the conference to give the talk on "Tight Asymptotic Bounds for the Deletion Channel with Small Deletion Probabilities" (work with Adam Kalai and Madhu Sudan; here's my web page version which might not be entirely up to date) tomorrow. This is one of those massive 8-parallel session conferences. For the record, I count about 50 people in the room, and my guess is this is one of the less popular sessions. I'm finding the talks interesting, though I'd admit that perhaps they're for a specialized audience. The pros-and-cons of the multi-session large conference.

I just got in a few hours ago, and I'm afraid this conference is a "fly-by" for me -- in and out as quickly as possible. (Sadly, from Boston to Austin, there doesn't seem to be a convenient direct flight these days, so even a fly-by is a 2-day affair.) Too much recent conferencing-- Bertinoro/STOC/EC -- meant I was eager to minimize this trip. I'm the first talk in my Friday early afternoon session, and in order to get a flight out to Boston Friday, I have to leave before my session finishes!* I feel a little bad about that, but I'd feel worse about not being home until late afternoon Saturday.

I'll enjoy my time here -- plenty of talks look interesting**, plenty of people to say hi to, and plenty of other work I can get done while at a hotel or in an airplane. But it does make me wish perhaps there was another way. Giving a talk by video? Conferences in an electronic avatar space? I know classes that are recorded and put online generally get smaller attendance, as students don't come to class as often knowing they can catch a stream of the lecture later -- would our way of doing conferences change significantly if people could "go" to conferences electronically without actually traveling to them? (The experience of STOC/EC, where I went home at the end of the day, was vastly different for me than other conferences where I'm there all day. There are pros and cons, but in general, I'm at a stage where I like going home. I'm sure that will change as my kids grow up and tell me they'd rather I stay away.)

Until such conference changes occur, though, I'll continue to be timing my flights carefully.

* Thanks to ITA software, I also found I could leave later, fly west to LA, and then take the red-eye to get to Boston Saturday morning in plenty of time for breakfast. I almost did this -- more frequent flier miles! -- but the ticket was a couple hundred dollars more. And my wife reminded me I'm not exactly at my best after a red-eye.

** For the CS crowd, Zhou/Bruck had a paper on extracting independent uniform random bits from a Markov chain efficiently -- a TR version seems to be here. This is an old problem -- Manuel Blum, for instance, had a paper on it. In fact, my first lecture in my algorithms/data structure class is on the simpler, earlier-studied problem on how to get unbiased bits from biased bits, as I described in this post a while back.

Tuesday, June 15, 2010

"Area Dean" : What's That Mean

In a few weeks, I "officially" take on my new job as "Area Dean" (read, "Chair") of CS. What will in entail?

Formally, there appear to be a number of responsibilities, most of which involve being the "interface" between the rest of the CS faculty and the rest of the administration. These include:

1) Managing our class schedule, and keeping an overall eye on our curriculum -- where we're aiming for continual improvement.
2) Overseeing promotions, mentoring, etc. for our tenure-track faculty.
3) Helping manage searches for tenure-track faculty, and managing the process for various non-ladder positions.
4) Overseeing certain budget items.
5) Representing CS (and the School of Engineering and Applied Sciences) at other university committees where needed.
6) Helping deal with space management, for visitors and others.
7) Various other bureaucratic stuff.

Naturally, I have my own biased view. Obviously there's lots of basic paperwork and other background management stuff that goes with the job. That will get done, and I should have plenty of assistance for those aspects of the job. But in terms of priorities, the way I see it, my job should be focusing on the following:

1) Helping our junior faculty be as prepared as they can be as they go through promotion stages -- especially the tenure stage. And managing the processes so they goes smoothly through the bureaucracy.
2) Pushing as much as possible for hiring, in Computer Science and, as appropriate, closely related fields (like Applied Math).

These are far and away my top two priorities, and they're obvious ones, in terms of long-term improvement of CS at Harvard. Do right by the junior faculty we hire, and hire more. I've long argued that we have a great, but small, department. Our path to becoming better, in my mind, is to grow, and I believe this growth is best achieved by hiring, and tenuring, more great faculty.

As a further priority, my personal take going in is that the next priority is to make Computer Science at Harvard more visible, both internally on campus, and externally to the world. I think we've got a great base to build on here: our intro programming course is now one of the biggest classes at Harvard, we've got various initiatives going with other areas of the university, the Deans of the Faculty of Arts and Sciences and Radcliffe come from CS, we've just hosted EC/CCC (and helped largely with STOC), we're figuring out how to expand our already significant efforts in distance education, and so on. Still, with our size, visibility is always a battle -- we can continue to improve on it.

Of course, these goals might change depending on the feedback of the faculty, who I'm sure will tell me what they're thinking. (Actually, I plan to go ask them. :) ) And for those who think I'm forgetting something important, I'm sure undergraduate and graduate education will also play a big role -- I just view those as shared faculty responsibilities, and not specific priority items for the chair. Hopefully, I'll just help our activities in these areas move along. Though again, I might be told otherwise by my peers. We'll see soon enough.

Cherry Murray appointed to oil spill panel

Dean of Harvard's SEAS, Cherry Murray, is to be appointed to President Obama's oil spill panel.

Monday, June 14, 2010

Crimson Article on SEAS

This article from the Crimson on the state of Harvard's School of Engineering and Applied Sciences from a couple of weeks ago serves as a good primer for me on some of the many things I'll be dealing with as Area Dean.

Saturday, June 12, 2010

Hectic Week, Finally Over!

Co-located conferences make for a hectic week! Again, my thanks to the Microsoft local arrangement team (especially Paul Oka) for setting up STOC, which I thought went very well. (The space worked out very nice -- even with about 75 more registrants than we were expecting!) Big thanks also to Salil Vadhan and the other local organizers of CCC -- I wandered through their setup multiple times on the way to my office and it looked great -- and to David Parkes, Yiling Chen, and Jennifer Wortman Vaughan for the local organization of EC -- I really like that 52 Oxford lower level as a conference space, especially the red couches and the quiet-back-entrance lecture room.

I maintain that Cambridge/Boston is a great place to hold co-located conferences like this, and we should try to do the same thing again (perhaps with a different conference mix) in just a few years. We have lots of researchers in the area, lots of students (to attend and help set things up), and lots of universities that can help host in terms of lecture space. And a very viable airport to travel to. If you look at the list above, though, I do think Harvard really did much more than its fair share this time. Maybe next time around one of our smaller, less-resourced, less visible neighbors, like MIT, can step up to the plate.

Friday, June 11, 2010

STOC/FOCS Opinion : Guest Post by Boaz Barak

Boaz Barak asked to give a guest post on the recent STOC/FOCS issue and the question of accepting more papers. So I hand the floor over to Boaz:

Do we really need more papers in STOC/FOCS?

Continuing the discussion in the STOC business meeting, I wanted to offer a somewhat opposing opinion to Lance's and Michael's view that the number of accepted papers should be significantly increased. (Since STOC and FOCS are at the moment fairly indistinguishable, the discussion below applies to both.)

In my opinion, the primary objective of a "flagship" conference such as STOC/FOCS is to highlight to researchers recent results of high quality and/or interest outside their area. We get updates on our own subfield through specialized conferences, and so the flagship conference is meant to keep the TCS community informed about progress in the entire field, and enable the transfer of ideas, techniques, problems and people across sub-fields.

An important secondary objective is to serve as a meeting place for the community, giving people in different geographic areas a chance to talk and collaborate. Why is this a secondary goal for STOC/FOCS? Because (1) opportunities to meet and collaborate can be achieved in workshops, without the tremendous effort of the refereeing process, and (2) at the moment our community has no alternative conference (or journal: I don't want to enter that debate) that performs the primary objective nearly as well as STOC/FOCS. (As an aside, people in the business meeting raised the possibility of an ICM-style meeting in TCS which sounds like a good idea to me.)

Traditionally conferences were also the primary way to make papers physically available quickly, but that has now been largely supplanted by online archives such as eccc/arxiv/eprint, and so the primary objective of refereed conferences today is indeed filtering and highlighting. But this is a very important role: without STOC/FOCS I would have only heard of papers outside my area if they were by local people, had some "buzz", or the title caught my eye- while the conference review process has its problems, it is certainly less superficial than that.

The fact that Theoretical Computer Science has grown in size and scope only makes this selection role of a flagship conference more important. Since there was no accompanying growth in our free time or attention span, I think the relevant metric is not the acceptance rate but the absolute number of papers accepted. Indeed, already STOC/FOCS together accept about 150 papers per year which is too much for anyone to follow, especially given that we need to follow specialized conferences as well.

Nevertheless, I think STOC/FOCS have on the whole been very successful, and a look at the STOC 2010 program will show some very exciting results in a variety of areas, many of which I would not have heard of otherwise. Throughout the years there were several examples of ideas and techniques transferred across areas, and rapid progress and development of other areas, that were greatly facilitated by STOC/FOCS. Program committees have always made and will always make mistakes, but the current form is still much better than having nothing at all. (E.g., I am not so naive to think my non crypto colleagues are so interested in cryptography so that even without STOC/FOCS they would go through CRYPTO/Eurocrypt/TCC proceedings to learn of the cool theory papers...)

I ignored above one more "objective" of a conference, which is to rank people in the context of hiring/promotion. The pitfalls of publication counting are well known, and here is not the place to repeat them. In any case we should not be trying to optimize our conferences for that purpose. Needless to say, accepting more papers to STOC/FOCS will not create more positions in theoretical computer science.

Are STOC/FOCS perfect? No - they could be improved in several ways. First, while the diversity of areas is perhaps unmatched by any other conference, STOC/FOCS could use better coverage in some areas (e.g., efficient cryptographic constructions, exact algorithms and hardness, and many others). Some great theory work was rejected, even multiple times, from STOC/FOCS, and some great work was never submitted. In that respect I liked a lot Dan Spielman's suggestion in the business meeting to find a way to include in STOC/FOCS great recent theory papers from other conferences. The idea of a poster session is also interesting. All talks should be videotaped and put online, so that even people who cannot make the conference (whether it's due to geography, family, or funding) can follow it. (Again, our goal should be not to maximize registration income but to maximize impact.) Personally I prefer single session rather than parallel sessions, and perhaps slightly longer talks as well, even if it means accepting a bit less papers. A pet peeve is double column papers, and page limits should be rethought now that paper proceedings seem to be on the way to extinction.

As a final note, why do I oppose adding just 10 more papers to the program? I agree this will not make much of a difference, though I think this is a change in the wrong direction. I also doubt there is any way to evaluate the effects of such a minor change. The fact that this STOC had 100 more registrants than last one demonstrates that other factors such as attractive location, strong invited talks/tutorials, and co-located conferences and workshops make much more of a difference in attendance. Perhaps SIGACT can also use some of its $800K surplus for travel support even for non student participants, and in particular people from overseas.

Thursday, June 10, 2010

Branding Your Research (and Yourself)

In honor of being at EC, a post on branding.

In theory, your research will be judged and appreciated based on its intrinsic quality, and your scientific reputation will follow accordingly.

In practice, unsurprisingly, branding helps, both you and your research. I certainly wouldn't claim to be an expert, or even particularly good, at branding, although I'd acknowledge I've benefited from it throughout my career. So you should take this as a starting point, rather than a final word on the subject (and comments are very, very welcome).

By branding I mean any of the sorts of things that might be considered marketing. Perhaps one of the most basic methods of branding involves something as simple as choosing project names. Well-chosen project names are a simple way to get people to remember your work -- when we hear PageRank, Chord, smoothed analysis, or XORS in the Air, for example, it resonates; they've become lasting brand names. Picking good names is hard, though -- it's easy to go overboard, or come up with something that just sounds ugly. (I should note that networking people, perhaps because they build specific systems, seem much better at and more aware of this kind of branding than theory people in my experience.) I'm not sure if cool-sounding project names can help you get research funding (Robobees!), but my guess is it doesn't hurt.

Another basic approach is to become associated with a certain type of research -- especially, if possible, if you've started a chain of research, although you can certainly become well-associated with a line of work you didn't initiate. Often this is a certain subarea -- "they do online algorithms", "they do network coding" -- or can be a certain technique "they're expert in their use of martingales." Perhaps best, early in your career, is to be associated with something fairly specific -- "they work on cuckoo hashing." This is why, as a graduate student, you're very strongly encouraged to find a thesis topic where you can write multiple closely thematically related papers -- it helps you to get known for something, which is an important benefit in a competitive job market. One can argue whether it's really a good thing that people's work is reduced, at times, to such a shorthand, but when dealing with dozens of potential candidates, having a natural and memorable label is truly helpful.

Indeed, the other major time in academia that branding seems important is when you come up for tenure. I was explicitly asked what I was "known for" when I came up for tenure. (Hint: if what you're known for is just your PhD work, that's usually not a good sign.) It makes sense, when your case is being prepared for a broader audience outside your immediate field, that your work be summarized in your tenure case -- preferably into key "brand names", easily describable by key words and phrases (power of two choices, low-density parity-check codes, Bloom filters, etc.) -- that will match the summaries that come in from the external letters. This follows the principle (that I try to emphasize to my students) of making it easy on the people who are grading you. If the letters match your case description, that's good positive reinforcement.

Of course, another aspect of branding is standard self-promotion -- giving talks, for instance. Or writing a blog -- shameless, but effective, self-promotion. (I definitely noticed I was invited to give more talks after starting the blog.) Writing a blog is certainly not for everyone, but you should find other ways of promoting yourself and your work.

One perhaps underestimated and lesser-utilized but powerful branding method is to write a survey -- or, if you're up for it, a longer treatise, or even a book. I've written several surveys, and besides being useful for myself, I've been amazed how many people seem to read (and even cite) the things. It's possibly been my most powerful consciously used branding tool (outside the blog). I'm already thinking now of the next survey I want to write.

An eventual goal, of course, is to obtain many brands that are associated with you -- different research topics, in potentially different areas. This is helpful because of the obvious reason: if you're known in multiple contexts, you'll get more chances for any specific individual to know you. Of course this takes time. As a graduate student, it seems to make more sense to work on developing a single, strong self-brand rather than many. (That's not to say that you just work on one thing -- some breadth is generally appreciated and looked for at hiring time -- but there's usually a clear focus on one line of research.) Once you become a faculty member, you can start developing more brands. If you have many students, this may happen naturally -- each student will (or at least should) be trying to develop their own brand, which you may be associated with as an advisor. In some sense, your students can be a brand extension, although hopefully they can also become their own independent brand.

I've been fortunate, I think, to have over time apparently developed multiple brands in multiple areas. It's fun for me that when I visit someplace to give a talk, the theory/networking/information theory people all seem interested in talking with me. One thing that's clear is that people see your brand from the context of their own work, which is worth keeping in mind. Sometime after I wrote my survey on power laws (and another associated paper on power laws), Barabasi (of various book fame -- Bursts and Linked) invited me to come give a talk to his group. He nicely introduced me as being most famous for my work on power laws -- something that was, I imagine, certainly true from his perspective. People will generally know at most one or maybe two of your brands, and that's fine.

So what are the downsides of branding yourself and your work? You do run the risk of getting pigeonholed, like a child star, known for a small bit of your larger body of work. I'm still widely known as the "balls-and-bins" guy. Worse yet, you may start to believe yourself that you're tied to the area you're known for, and become afraid of trying something else. Becoming "the expert" in an area means you'll get asked to review many of the papers in that area. After all, all the papers cite you, and that's what the editors see. And, of course, branding is a time-consuming activity, that takes patience and energy. Still, for most people, a little conscious attention to research branding efforts can probably go a long way.

I'm not sure how far this analogy can go. Can you develop brand loyalty for your research? I think so -- there are certainly researchers I try to follow regularly. Can you re-position brands? Parallel algorithms are new and exciting again, right? Are there any "iconic" brands in research?

I'm sure some of you think spending a whole post on branding is somewhat silly. For the theorists, let me point you here.

Wednesday, June 09, 2010

EC Day 1

EC is an interesting mix of theory/AI/economics and many other things as well (data mining, social computing, user interfaces....) Definitely a bit of a hodgepodge, but interesting in its variety. If you don't look at the titles/abstracts ahead of time, you never quite know what the next talk will bring. (Maybe I should be looking at the titles and abstracts ahead of time...)

Some quick highlights

Giorgos Zervas gave the talk on our Swoopo paper.

Rahul Sami gave an interesting talk based on the idea of dealing with external incentives in mechanism design settings.

Ramesh Johari gave a talk about a simple model meant to abstract the competing effects of network effects and congestion.

Sven Seuken talked about market design and analysis of a P2P backup system.

I'm sure there were other interesting things, but I was in and out of sessions at times.

Monday, June 07, 2010

STOC 2010, and Some Business Meeting Notes

STOC 2010 appears to be running quite smoothly. I've been popping in and out -- the problem with being a local is that I actually have to go home at reasonable times (and stop by the office quickly). While nominally (thanks to Lance) I was appointed Conference Chair, my role, happily, was minimal. Great thanks to the efforts of the Microsoft Local Arrangements Team -- particularly Paul Oka -- and also Yael Kalai, Adam Kalai, hosts Jennifer Chayes and Christian Borgs, and anyone else I'm forgetting. (Personally, I'm now all for a Microsoft team managing local arrangements every few years... all in favor?)

Thanks to everyone who has offered condolences for my new position, and special thanks to those who said they'd hope my scientific career might revive a few years hence.

A quick word on the business meeting. I wasn't actually there, but I understand Lance briefly discussed "The Future of STOC." Let me briefly give my take, as I understand there might have been some confusion.

1) Several of us on the SIGACT executive committee have been concerned that STOC seems to be at best stable (and perhaps slightly deteriorating) in terms of attendance the last several years, even as our field is ostensibly growing. Perhaps, we think, STOC/FOCS are not adequately fulfilling their roles as the central, flagship events of the theory community. (Needless to say, given this year's attendance, perhaps things are brighter than we think.)
2) So we think about it. For example, some of you might recall my wacky Double One, Half the Other post from November.
3) Lance comes up with an even more drastic proposal, and tries to get some feedback on it from targeted members of the community. I believe he tried to explain this proposal at the business meeting. To be clear, the feedback before the meeting was largely very negative; although many people think things could be structurally improved, there's a lot of different opinions as to what form that should take. This proposal was never meant to be on the table at the business meeting.
4) The one thing that DID seem to have reasonably widespread support is that we could accept more papers. We still have some room if we made it a full 3-day schedule. STOC acceptance rates have dropped from generally well over 30% in the 1990's and early 00's (some years it was over 40%!) to 25-30% the last few years. While upping the number of papers is, admittedly, a small delta, and not likely to change some potentially larger systematic issues with the conference, it seems like a good idea -- it should get some more people to come, and perhaps will allow a bit more leeway in terms of what is accepted. Moreover, if people liked the change, then perhaps we could think of further changes (a 3.5 or 4-day conference? triple sessions like SODA? your idea here?) that could further open up the conference.

So the SIGACT EC came up with the proposal that we advise next year's chair that, as long as the paper quality warrants it, there's no reason to aim for about 75-80 papers (the apparent "target" of the last many years) and we would be happy if they might aim for 85-90 papers.

Again, I wasn't at the business meeting, but I heard even this modest proposal did not receive enthusiastic support. I'm starting to get the feeling there's a non-trivial minority that thinks the conference is going just fine, and are in principle against any change. I can certainly understand trepidation at radical change, but I can't see any reason not to try this modest experiment, which seems to have little to no downside and some reasonable possible upside.

So feel free to explain YOUR opinions in the comments.

Sunday, June 06, 2010

Conference Attendance : How to Keep it Strong?

In honor of the STOC business meeting coming up shortly:

STOC, CCC, and EC are all apparently poised to have very strong attendance this year -- from what I understand, greatly improving over previous years. Why?

I suggest two obvious reasons:

1) Location: Cambridge is clearly a great location, with plenty of major universities and research labs nearby, and a major airport readily accessible.
2) Co-location: From what I can tell, having all these conferences in the same place in the same week is helping boost attendance for all of them.

Can we take something away from this for future years (please!)? Having conferences in easily accessible, researcher-rich cities is a good idea. ("Geographic equality" is overrated.) A reasonable amount of co-location (arguably less than a typical FCRC), appropriately managed, is quite desirable. Let's test out these principles over the next several years as we try to arrange our conferences...

Announcement: Matt Welsh Tenured

I'm very, very pleased to announce that Matt Welsh's Harvard tenure case has been successful.

There's a number of wonderful things I could say about Matt and his work, but I'll limit myself to two highlights. Matt is well known to be a leader in the field of sensor networks/distributed systems, but his work particularly stands out for his ability to look at how these technologies might be used outside of computer science. In particular, he's worked on sensors for medical care (the CodeBlue project and the Mercury project) and for "big science" (monitoring volcanoes and monitoring weather/pollution). At Harvard, because CS is smaller than at some of competing institutions, we want professors who are not only leaders in their area, but who can also take advantage of Harvard's breadth and other strengths, with an eye toward showing the world just what CS can do. Matt is an excellent example of the type of scientist we're looking for.

Outside of his science, Matt is a great colleague. When I think of a word to describe Matt, "leader" comes right to mind. When Matt sees something he wants to improve in our program or how we run things, he studies it, presents his opinions, garners support, and finds a way to make the change happen -- often taking on much of the required work himself. He has both incredible energy, and the ability to persuade people to follow where he wants them to go, two qualities I associate with leadership. Because of this, I know his presence will benefit CS at Harvard in countless ways in years to come.

Congrats to Matt! (Now go read his blog...)

Saturday, June 05, 2010

SIGCOMM Travel Grants

I was asked by the SIGCOMM program chairs to remind everyone that there are travel grants available, especially for students, postdocs, and junior faculty. Since the conference is far away this year (India), they've tried to get a lot of travel support. They've extended the deadline to 11:59 pm PDT on June 12, 2010.

Information can be found here.

Friday, June 04, 2010

Will Google Buy ITA?

As long-time readers of the blog know, I am a fan of ITA software, the company with the fare-finding technology and search engine that powers Orbitz and several other airline web sites. Indeed, for my recent trip to Bertinoro ITA saved my grants several hundred dollars -- they found a pair of flights involving two separate, currently non-partnered airlines at a price that couldn't be duplicated on several other sites I tried. (The flights should, I hope, even meet NSF's unwieldy international flight rules.)

For at least six weeks, there has been buzz that Google was looking at buying ITA, for the tune of roughly 1 billion dollars. But not much has been heard of late, with possible reasons being described in this recent post at tnooz. Indeed, I ran into an ITA person I know "on the street" a while back and asked when I could officially congratulate them, and was politely brushed off.

It's not clear to me if Google/ITA would be a match made in heaven or not, but I admit I'm eagerly awaiting to hear the news (one way or another). I would place ITA in the category of companies that show that cool things that can come out of smart algorithms and data structures (see this old writeup), so I wish them success. I think reaching the billion dollar selling point would certainly be one reasonable measure of success.

Don't be surprised to see more of this story in the news in the future.

Thursday, June 03, 2010

What Work to Give Up?

I'm not yet sure how many hours a week being a Chair (oops, "Area Dean") will chew up, but I think it's safe to say it will be a non-trivial number. I'm generally a fairly busy person, and while I imagine I'll have to (somewhat happily!) cut down on my bad TV-watching habit, I'll also be cutting back on some of my other work responsibilities. Obviously, as much as possible, I'd like "research time" not to be part of those cutbacks. (I've heard that's a problem with this job in general.) As Matt's blog post on secret lives of professors calls attention to, professors are just naturally overloaded; it's good to stop and think once in a while about what you can happily and safely do less of.

So, what could you give up to get yourself back a few hours a week?

For me, the blog is (sadly) one of those things. Here's some other things on my list:

1) Giving up an editor position: I'm going to step down as an editor of SICOMP. I've found I really don't enjoy being an editor -- the only thing worse than reviewing a paper yourself is having to cajole someone into agreeing to review a paper and then, after getting reminded yourself by the automatic system how long it has been since the paper was submitted, having to go back and cajole the reviewers into actually finishing the review. I'm happy to give it up. On the other hand, it won't save me so much time; I dislike the job so much I already avoid spending time on it now...

2) Fewer PC committees: I'm generally a bit too friendly about agreeing to serve on PCs. I counted and it looks like I've done about 35 over the last 10 years. Some have been small and some have been big, but that seems, from what I can tell, to be more than average. (What's the "average"? Is it different for networking people than for theory people?)

I don't want to give up on doing PCs completely. I find they're fun and interesting. But I think I'll cut it to 1 or 2 a year while on the job. Strangely, I'm most likely to say no to "small" conferences (like FUN, or ANALCO, say). They take less time, but I find they're less fun and interesting to do.

There's one thing that would definitely get me to a PC. I chaired STOC last year, and think it could be fun to instead (co)-chair a networking conference. Hint, hint, hint, powers-that-be...

3) Less reviewing. I seem to get asked to review papers a lot. (In part, I suppose, because I'm "well-branded". If it says "Bloom filter" on the title, and I didn't write it, odds are it's being sent to me. To a lesser extent, that's also true for "cuckoo hashing", "deletion codes", and things related to the power of two choices. I aslo get lots of things related to various flavors of LDPC codes and power laws. Like other bloggers I hope to do a post on "branding" in the future, but this is one of the downsides.)

I've gotten much better about saying no to reviewing since tenure, and even moreso since our third child was born. But I'm sure I can do better about saying no. (And I'm sure I'm still doing more than average.)

4) Less "other university work". I'm on one or two university committees that don't take much time but aren't really important. Good time to drop those.

Looking at all these things, it's clear I'm aiming at reducing other service activities, which happily makes sense. Since serving as Area Dean is really a service activity (for Harvard), other service activities are the first to cut in its place. As much as possible, I don't want my home life, research, teaching, advising, and consulting time to suffer.