Friday, February 26, 2010

Conflicts of Interest, Yet Again

I was just asked to serve on the ACM CoNEXT 2010 PC (I'll have to think about it -- NSDI,SIGCOMM, and CoNEXT all in one year?), and the chairs (Muriel Medard and Tim Griffin) sent along a note explaining the conference, the reviewing process, etc.  I was struck by the explicit and rigorous conflict of interest policy they stated:

 A program committee member (including the chair of the committee) is
 considered to have a conflict of interest on a submission that has an
 author in any of the following categories:

   1. the person themselves;
   2. a past or current student or academic adviser;
   3. a supervisor or employee in the same line of authority within
   the past five years; 
   4. a member of the same organization (e.g., company, university,
   government agency, etc.) within the past five years; 
   5. a co-author of a paper appearing in publication within the past
   five years; 
   6. someone with whom there has been a financial relationship (e.g.,
   grants, contracts, consultancies, equity investments, stock
   options, etc.) within the past five years; 
   7. someone with whom acceptance or rejection would further the
   personal goals of the reviewer (e.g., a competitor); 
   8. a member of the same family or anyone considered a close
   personal friend; or 
   9. someone about whom, for whatever reason, their work cannot be
   evaluated objectively.
These guidelines are roughly the same (with minor variations) as what I've come to expect from other networking conferences.  I feel I have to point out the remarkable difference between how conflicts are treated in the networking world and the theory world.  In the theory community #1 is a standard conflict; #2 and #3 are also pretty standard although, in my experience, definitely not universally applied;  and after that conflicts are generally, in my experience, up to the individual PC member to declare if they happen to feel like it.

There's been debate on this blog about the subject before, and I certainly don't mind there being more.  I maintain that the theory community is far too lax in its handling of conflicts.  We can certainly reasonably argue whether the true impact of conflicts in actual decisions in theory conferences is negligible or substantial -- a matter of appearance or a matter of substance.  I can say that, in terms of appearance, people from the networking side (and other communities) are shocked by the lax approach adopted by the theory community.

Thanks to Muriel and Tim for allowing me to post from their document.   

Thursday, February 25, 2010

STOC Budget Questions

I hate to follow the interesting conversations on FOCS/STOC/SODA (please keep commenting) with the mundane, but Lance forced me asked me to be General Chair for STOC, which means looking over things like the budget. Registration fees will probably end up being around $500 for early registration, $250-$275 for students. The numbers are still being played with.

Here's a bunch of questions that arise. I'm happy to hear input.
  1. Should all PC members' expenses for the PC meeting be paid for? That works out to, roughly, $80-100 on the registration per attendee. Hotels and airfare add up, and keep in mind the way the ACM forces us to do the budget you need to budget over 100% of the nominal cost to deal with contingencies.

    In many other areas, it's assumed you'll pay your own way to the PC meeting. For the networking conferences I've PC'ed, they cover meals, and usually have a very nice dinner after the work is done. For the theory conferences I've helped manage, I've usually aimed to cover everyone's meals and hotel (though the dinner is less nice than for the networking conferences...), and to cover anyone who couldn't fund their own travel. That works out to more like $40 per attendee.
  2. Do we really need morning and afternoon coffee breaks? The afternoon coffee break every day adds something like $25 per.
  3. When you look at fixed cost, every student who attends is actually a loss, that has to be covered from elsewhere. Is this the right way to go? (I like to think that the corporate sponsorships, from Microsoft/Google/IBM/+others, should be first thought of as going to reduce the cost of student attendance, so I think this is still the way to go.)
  4. At what point do registration fees become a noticeable concern?

Tuesday, February 23, 2010

Guest Post from David Karger

Continuing from my last "controversial" post, David Karger offered the following long comment, turned into a guest post:


I wanted to post a comment on Mike's FOCS/STOC post, but fittingly for one of the dinosaurs he mentioned it was too big for the comment length limit. Mike's been kind enough to offer to post my comment as a guest post instead.

Mike's question is one I care about a lot. I still respect theory and do work in it, but as Michael says, much of my attention has been drawn into other areas. Many of them have absolutely nothing to do with theory (see last year's ethnographic study of people's use of pencil and paper for notetaking in TOIS 2009 or our AJAX-flavored interface for visualizing and navigating semistructured data in UIST 2009).

But always, some of my favorite projects are where theory provides the answers to problems that matter in other areas. We just published a paper in Nature Genetics that used some simple applications of max-flow to help biologists visualize the "important" influences in biological netwoks (we didn't need to find NP-hard integral solutions because the scientists wanted to see all the possibilities in the mix). Before that, we applied a beautiful JACM paper of Alon etc., on finding longish paths in a graph, to a problem in natural language processing---designing a procedure to figure out a best selection and ordering of words for a machine-generated summary of a machine-generated document---and published a paper at NAACL, a linguistics conference.. I still remember thinking, when I first read Alon et al., that it was one of the prettiest and cleverest ideas I'd seen in a while, but that it would never be useful for anything practical. Ironically, when my colleague Regina Barzilay outlined the language problem to me, it was exactly the Alon problem with no need for translation; my entire contribution was to know that a solution existed (and thus keep up theoreticians' reputation for smart enough to solve any problem instantly).

Other applications of theory to practice have required more work. Mike recently wrote about our paper showing how to design "network codes" for efficient multicast; the core insight of this work was to connect it to the beautiful results that we all study in randomized algorithms courses, on finding perfect matchings by placing random numbers in a graph's Tutte Matrix. Most substantially, our line of work that led to Danny Lewin and Tom Leighton's founding of Akamai begin with a study in STOC of some theoretical problems around handling flash crowds on the internet, and also generated a whole line of research on building robust and scalable peer-to-peer systems.

With the exception of the first paper on consistent hashing, none of this work has appeared in theory conferences. I think there are several reasons for this. Selfishly, for the author it is much more fun (and valuable in generating future research leads) to present the work at non-theory conferences. It holds the same attraction as tourism, going to strange new places and learning new things from the experience. There's also vanity---the allure of being an exotic theoretician among practitioners instead of one of a crowd of better theoreticians than you. Most important, if you want your work to have an impact on the applied areas, you have to publish in their conferences so they'll pay attention---they don't read STOC/FOCS.

But the second reason is more problematic. Many theoreticians would tell you (some have certainly told me) that the above papers are "not theory". That by dint of their having applications, they are no longer suitable for STOC/FOCS. That these papers have a different home, and FOCS/STOC should be reserved for "pure" (i.e. homeless) theory research. I've been on STOC/FOCS PCs that have rejected nice applications of theory on the grounds that the theory part was too elementary.

In part they are right. The biology and NLP papers I mentioned above did not prove any new theorems. But the omission of applications papers from STOC/FOCS means that theory community is failing to celebrate one of its greatest contributions! There's always been a divide in the theory community between those who are enamored of theory problems that help them understand the deep nature of the universe and computation (scientist/mathematicians) and those who see theory as a way of thinking about solving concrete computational problems that often emerge from other areas (engineers). I think that STOC/FOCS is making a mistake by focusing too much on the science to the exclusion of the engineering.

I would really like to see more "applied algorithms" papers appearing at STOC and FOCS. These are likely papers that have not made a major theoretical advance, but rather have synthesized our existing theory knowledge into a solution to someone's particular problem. These papers are just as important to see as the ones that advance theory; they represent one of the major justifications for doing theory in the first place.

I haven't mentioned SODA yet. That is a conference that was founded in part to attract these kinds of applications papers. But at the same time it was founded to attract more of the theoretical discrete math community, and the multitude of targets makes the outcome diffuse. Possibly as a result, SODA doesn't have the stature as STOC/FOCS; I'd like to see applied algorithms appearing at our flagship conferences.

Even if the STOC/FOCS community decides to do this, we still have to deal with what I said at the beginning, that the applied conferences are often more attractive for this kind of theory. You might say "fine, if that's what they want, there's no problem." But I think there is a problem: our community, and in particular our theory graduate students are not being exposed to this important branch of theoretical computer science.

The best solution I can think of is to allow repeat submission. That is, to let the paper appear first in the applied conference, then at STOC/FOCS. Almost by definition, these two venues will not have a lot of overlap, so I really don't see a downside to presenting such a paper at both of them. There are two ways to get this by the copyright police. The first is to accept the paper for presentation but publish only a reference to it in the proceedings. The second is to ask the author to write a new version of the paper aimed at a theory audience. Given the different audience, the paper is likely to be quite different.

Monday, February 15, 2010

FOCS/STOC : What's the Big Deal?

As mentioned recently, the FOCS submission site is now up, and STOC acceptances have come out. Related blog posts have arrived, including an amusing one by Dick Lipton on if you could create a site that would automatically estimate the chances of your FOCS submission getting in.

So it seemed a good time to ask -- what's the big deal about FOCS/STOC?

FOCS/STOC are, I think, still generally thought of as theory's "flagship conferences." It's hard to get taken seriously as a theorist -- particularly when searching for your first (or even second) job -- without a reasonable number of FOCS/STOC papers under your belt. (SODA papers are, I think, now regarded as a reasonable substitute, but the lack of the heft of something in FOCS or STOC still stands out on a CV.)

But why?

When I was graduating, the colossuses that walked the earth were Jon Kleinberg and David Karger. If you look early in their careers, you'll see a steady and remarkable output of FOCS/STOC papers. (From 1995-1997, Jon had 6 STOC papers, 5 FOCS papers, and 2 SODA paper; from 1994-1996, David had 6 STOC paper, 2 FOCS papers, and 2 SODA papers.) These days, Jon's publications seemed focused on the KDD and WWW conferences, and maybe the EC conference. David, who has also always been eclectic in the most positive sense, publishes all over the place. You'll still see them appear in FOCS/STOC/SODA from time to time, certainly, but it's not where their energy seems to go. I don't find them any less colossal; they've just moved on to different things, different problems, that are generally not targeted to FOCS/STOC. Other examples include Bruce Maggs, who mostly does networking these days, and even Dick Karp, who -- while ever eclectic -- has published more in RECOMB than anywhere else the last decade. I could go on.

My point here is that many of the best theorists I know have, I would say, transcended FOCS/STOC. This does not mean there's not great stuff in FOCS/STOC; it just seems strange, given this, that these conferences are accorded such weight.

Perhaps, in fact, they're accorded less weight than I'm crediting them with these days. Certainly, the Innovations in Computer Science movement demonstrates some dissatisfaction with FOCS/STOC, and there are debates in subcommunities (SoCG, Crypto) about FOCS/STOC vs. the specialized conferences. It still seems to me, though, that FOCS/STOC is where most people would want their best theory results to appear, and it's still the lens through which fresh theory PhDs are viewed.

It seems to me that the theory community, as a whole, needs to think about FOCS/STOC/SODA and the other many conferences, and figure out what it wants them to me. FOCS and STOC haven't changed much over the years, and perhaps they've become just a bit too comfortable; the (theory) world around them has changed considerably, and it's not clear that they've adapted. Should they be the flagship conferences of theory, and if so, what does that mean, and how can they better fulfill that role? If they're not going to be the flagship conferences of theory -- which might be perfectly reasonable -- what is their role to be?

Saturday, February 13, 2010

News Roundup

This just in -- computer science students at Stanford cheat! I love this quote: "Historically, the computer science department accounts for between 20 to 60 percent of all honor-code cases, even though the courses represent about 7 percent of student enrollment." This may be because cheating in computer science is relatively easy -- cut-and-paste. But I think it's also more due to the fact that in computer science we have and use the tools to catch copying. If you're using automatic tools, it's easy to find; if you're not, then you'll find it less.

One of those "denied-tenure-leads-to-shooting" incidents, this time in Alabama. (To be clear, the details about the reasons for the shooting are still, I think, unofficial.) I always feel a twinge when I hear a story like that... it makes me glad that Harvard's tenure process is extremely super-secret. A professor's job is, naturally, very safe, so stories of students or faculty losing it like this hit home. Oh, and I like how most news articles feel important to point out the shooter was "Harvard-trained".

Not-just-Harvard with budget woes. At least we seem to be reducing our red ink at a good pace.

Speaking of budgets, we have proposed increases in the NSF budget for the coming year. Although it seems to take advantage of it, you might want to start working in energy technologies. And there will be a new NSF director; I'm not sure how that affects us academics.

Any other news of note?

Thursday, February 11, 2010

FOCS 2010 Call for Papers is Up

Luca Trevisan sent me the link for the call for FOCS 2010.

Key points: The deadline is April 7. And, in a move that I approve of, there's no page limit on submissions -- instead, "Material other than the abstract, references and the first 10 pages may be considered as supplementary and will be read at the committee's discretion." Having just submitted papers to ICALP where we had to go through the "move things to an appendix" routine, I think this is the right way to go.

Tuesday, February 09, 2010

Recent Award for Network Coding

I opened the January issue of IEEE Transactions on Information Theory and saw (what is probably old news to everyone) that Tracey Ho, Muriel Medard, Ralf Kotter, David Karger, Michelle Effros, Jun Shi, and Ben Leong have won the IEEE Communications Society and Information Society Joint paper award for their work A Random Linear Network Coding Approach to Multicast. Congratulations! Always nice to see coding work being recognized, and a "computer scientist" (Karger) winning a "EE" award for work on coding.

Sadly, it's also time to recall the passing of Ralf Kotter, who died of cancer just over a year ago. Communications Theory lost a brilliant mind and a great leader far, far too early.

Guest Post: Giorgos Zervas from WSDM, Part 3

I am back from WSDM and I have to say all in all it was a great experience. I think my talk went fairly well although I can see some ways in which it could have been better - especially if I was more experienced in public speaking.

Preparing to depart from New York, amidst rumors of a big snowstorm that never seemed to materialize, I was thinking of the other participants who didn't have the luxury of being just a four-hour drive away from home. Especially, those that had a long return flight ahead of them. After three days packed with talks, lunches, networking and even going out and enjoying what Brooklyn has to offer (a lot!), I am sure most people wouldn't have minded being teleported back. This makes me wonder: what are our main incentives for conference participation?

I am guessing one of them must be attending the actual talks. Which brings me to my next point... Sergej Sizov couldn't attend the conference and instead sent his presentation over: the usual slides accompanied by a video of him presenting the work. To be perfectly honest, I was rather negatively predisposed to the idea of a prerecorded presentation. And judging from the number of people in the auditorium I think more people may have thought the same. Yet, I was completely wrong. After 30 seconds or so, I was completely immersed and forgot that the presenter was a projection. The presentation itself was clear, finished on time and even got an applause at end - which was absolutely deserved (yet in the absence of the speaker reminded me of the awkward feeling I get when people clap in movie theaters.) I would say the only downside was that we had to skip the Q&A session. Technically, though I don't see why this couldn't have been arranged save for timezone considerations. So, if a taped delivery doesn't really compromise quality why don't we use this format more often and minimize travel? Could conference participation eventually evolve to a mixed model of in person attendance and participation over the web? I do realize the benefits of networking and meeting each other in person but do we really have to attend every single conference irrespective of cost and time issues?

Looking forward to WSDM'11 in Hong Kong!

Friday, February 05, 2010

Guest Post: Giorgos Zervas from WSDM, Part 2

Day two of WSDM was highlighted by two great presentations which I enjoyed for different reasons. I think the strong features of both could be incorporated in almost any talk.

Carlos Castillo did a fine job of presenting "An Optimization Framework for Query Recommendation" by Anagnostopoulos, Becchetti, Castillo and Gionis. My favorite part was using Cavafy and Machiavelli as presentation vehicles for two different utility functions they evaluated: the former aggregating utility along every step of a multi-step process, the latter ignoring the journey and solely caring about the value derived in the very last step. These utility functions were presented in the context of query reformulation, the query suggestions search engines provide users with to aid them in finding what they are looking for. I am not quite sure how they came up with this great metaphor but it may just be that the authors are Greek and Italian.

The second presentation I enjoyed was given by Alan Mislove. I think he nailed it by selecting just right level of abstraction for his talk. Not too much detail, but enough to maintain my interest and entice me to read their paper: "You Are Who You Know: Inferring User Profiles in Online Social Networks". The main idea here is that information that you may consider private and are unwilling to publish can potentially be inferred by information your friends reveal; not necessarily directly about you, but about themselves. Because of the homophily present in social networks what your friends say, can be telling about you. Hompophily was definitely word of the day today; it was featured in three different presentations. All in all I think this paper underlined some concerns anyone with an online presence should be having.

My only gripe so far has been the heat in the auditorium - I think, by the end of the day, it makes everyone feel more tired than they already are. But other than that WSDM has been very enjoyable so far.

PS: The Twitter feed disappeared during Thursday afternoon's sessions but was back this morning. I guess people must be enjoying it!

Guest Post: Giorgos Zervas from WSDM

Since I couldn't get myself to New York for WSDM, I asked my student Giorgos Zervas to report. This is his report from Thursday.


Greetings from Brooklyn.

I am here attending WSDM 2010 where on Saturday I will be presenting the work we did with John & Michael on Adaptive Weighing Designs for Keyword Value Computation. Michael asked for my grad-student perspective on the conference and I happily obliged as he offered me a decent revenue-share deal on any book sales resulting from this post.

On a more serious note, while being here, my primary concern is presenting our work in the best possible light and hopefully getting some people to read the actual paper. A secondary, but equally important concern, is that of networking. My impression is that for most conference participants time is a very scarce resource. Of course this is not limited to WSDM. The sight of a speaker surrounded after his or her talk by a bunch of people - just like myself - intimidates me and makes me feel that by adding myself to the pool, I am becoming an additional burden to someone who might have better things to do than listen to my 30 second blurb (even though personally I'd be flattered, so please surround me after my talk!). Ideally, I'd prefer interaction to be more organic and there are certainly some good opportunities for that. My question to you is: do you prefer some ways of being approached over others? How do you respond to cold introductions? Any advice on how grad-students should network at conferences? And if you are student: what do you find works for you in terms of introducing yourself to others?

On a different note, an interesting feature of WSDM has been the live Twitter feed projected behind the speaker, next to the actual presentation slides. Even though some of the tweets are insightful and they make for great conversation starters over lunch, I find the projection rather distracting. I reckon that, for 20 minutes, focus should be on the speaker and the tweets are almost impossible to ignore. Some of them are also quite repetitive ( anyone?) Those wishing to follow the Twitter stream could always do so from their laptops and phones. What are you thoughts? Do you think this backchannel adds to the discourse? (I should point out that the WSDM community definitely seems mature enough to avoid mishaps like this.)

Finally, on the research front, and even though we are still on the first day of the conference, I've had the opportunity to attend some great talks. In particular Soumen Chakrabarti gave this morning's keynote and I found his vision of extracting structure from the unstructured web fascinating. A few papers that have grabbed my attention (in no particular order) are "Automatic Generation of Bid Phrases for Online Advertising" (Ravi et al.), "SBotMiner: Large Scale Search Bot Detection" (Yu et al.) and "Evolution of Two-Sided Markets" (Kumar et al.); I am looking forward to these and the rest of the presentations. What have been your personal favorites?

Thursday, February 04, 2010

Admissions Handling

I've been spending time on both graduate admissions and undergraduate admissions.

For graduate admissions, we've moved to an all-electronic system; the applications are all online. The system is actually a complete pain to use. Does that surprise anyone? (Some of us get our admins to download everything for us, so we don't have to use log in and navigate the system when we need to look at an application, or spend an hour or more ourselves downloading files in a system that wasn't set up to download selected files in a straightforward way.) While I'm absolutely, positively, completely sure that no candidate's confidential information has ever, ever been compromised, or ever will (I believe I've now covered myself and Harvard legally), it seems like a privacy-risk nightmare with all the applications secured by a password that has to get distributed to all the faculty. Still, with all that, it seems slightly better than the paper folder system we had before, where it seemed impossible to track which professor had which folder, never mind actually arranging for folders to be transferred among multiple faculty in a timely manner.

For undergraduate admissions, I'm asked to look at folders -- usually, I'm being used to check that Johnny or Jane's science fair project actually has some interesting science in it, or similarly vouch for math/science talent, but they seem to appreciate if I make other comments as well. It's all paper. An actual person drops folders (a few a week) off to my admin; I type up comments and my admin prints them out, puts them into the folder, and calls to get the folders picked up. Apparently, it's unusual that I type my comments; the admissions officers write their comments by hand. (Often, I can't read them, and my handwriting is worse than theirs.) I've never lost a folder, but I do hope they have back-ups in the home office just in case. (It looks to me like I'm getting the originals; I've never asked. I just assume they can't give the folder out to faculty without keeping a copy of everything.)

Both systems seem flawed, but both also seem designed to fit the way the decision-process is made. I actually like the paper system, even though it clearly requires a lot of people-hours doing background tasks like getting folders from here to there. It definitely reduces the time and effort I have to put in to review the applications -- which ostensibly should be the goal of the system, since faculty time is (ostensibly) valuable.

Tuesday, February 02, 2010

Does Class Size Matter?

Preliminary stats show 55 students in my algorithms course. That's probably close to the mean and slightly above the median. It's certainly not the largest course in the School of Engineering and Applied Sciences (SEAS), but it's up toward the top.

Why should I, or any faculty member for that matter, care about class size? For junior faculty, at least, there's a clear answer. A tenure case without some teaching of significantly sized classes is one with a weakness, opening the way to the arguments that the faculty member in question is working in an area of little interest (since nobody wants to take their classes in their area) or is providing insufficient service to the department (since they haven't taken on a large core course).

For senior faculty, I'm not so clear. While class sizes are listed in our annual review, I've heard no mention that they're considered of any particular importance -- or even of non-zero importance -- in determining annual raises. Indeed, I can't think of any direct benefit to me personally for teaching a large undergraduate class as opposed to a small one.* One might want to take on a big class to support one's department, as ostensibly money (and positions) should, in some way, follow students at the departmental level. I'd like to think that's how it works at many places; however, recent conversations with some higher-ups suggest that that connection is fairly tenuous for SEAS. Harvard's system in that respect seems to be broken. (If it wasn't, I think we'd be further ahead in our hiring in CS.)

So why should I care about my class size? Primarily, I suppose, personal pride. I take satisfaction in teaching students; the more qualified students, the better. (Not the more students the better, though; the more qualified students, the better...) Indeed, I've done the math, and while I'm quite sure Harvard does not calculate things this way, in my mathematical model I'm earning what Harvard's paying based solely on the number of students I teach. That helps me sleep at night.

Overall, however, this seems like an area where the incentive structure doesn't seem set up right. I can understand that class size isn't an end in itself; indeed, I can understand that part of the mission of the University is to preserve knowledge in areas that might be of narrow interest. (The Sanskrit, Slavic, Turkish, and Yiddish courses, for instance, have remarkably low numbers.) But it seems naive to think that size doesn't matter **, so it's slightly disturbing that when I think in terms of incentives, I'm ending up wondering why I should care about my class size at all.

* I do see a potential direct benefit for having my large graduate project class; some student projects can get turned into papers, and often students have me take part in turning their project into a paper, so I may get some research benefit from having a large graduate class. It's not clear that's a big benefit, but at least it's demonstrable.

** Yes, we all knew that was coming before the end of the post....