Friday, January 02, 2009

Consistency in Scoring

Having just finished serving on the NSDI PC committee, and being hard at work on the STOC committee, one issue that has been on my mind is consistency in scoring papers. Ideally, when submitting to a conference, it shouldn't matter WHO reads your paper; your score should be intrinsic to your work. Of course, we don't expect -- or even necessarily want -- the complete ideal; there will be differences of opinion that occur for natural reasons. Arguably, for example, "visionary" papers are hard to judge and tend to lead to differences of opinion. So you can take solace that if you get widely varying reviews and your paper is rejected, your paper was probably just too visionary. The more cynical among us might instead think that large variances in opinion have to do with how close the reviewer is to the subfield of the paper. In theory, my take is that referees within a subarea generally give higher marks to papers in that subarea. (At NSDI, I found myself with a different impression; if anything, I thought referees working in a subarea tended to be a bit harsher towards papers in that subarea!)

My impression historically has been that there's more variance on the theory side of the world than on the systems side, particularly for the big conferences like FOCS/STOC/SODA. I was amazed, on the whole, at the consistency of NSDI reviews for papers I was on. STOC reviews are coming in, and I'm looking forward to seeing how consistent things are there, but my early impression is that there's not the same consistency, and it's interesting to hypothesize why. [Perhaps, after it's all over, I'll attempt to gather some real quantitative data on the issue, to see if my impressions match reality.]

I'll throw out a straw man for people to discuss (and attack). In theory, all (reasonable) papers start with the same basic grounding -- you have proofs of some theorems. After that point, though, things get a bit hard to judge. How "nice" are your theorems, in terms of technique/mathematical beauty/novelty? How "big" is your improvement over previous work -- for example, is improving an O(log n) competitive ratio to O(log n/log log n) "important" enough? How practical is your solution (hard to say, when nobody actually implements algorithms or gives performance results...)? A lot of this seems subjective, so naturally there are differing opinions.

In networking or other more practical fields, the same issues -- technique/beauty/novelty, quantity of improvement, practicality -- all come into play. But there's a lot more focus on the bottom line of "Could this lead to a substantial improvement for an important real-world problem." Perhaps this consistency in the criterion for what is a good paper leads to more consistency in reviews?

12 comments:

Anonymous said...

Could this lead to a substantial improvement for an important real-world problem.

Hah... Are you aware that it's extremely hard to show that a paper possesses this property? I'll posit a different theory: In NSDI, it's almost ALL sell job, without any real investigation into the claims made. There's your consistency.

Anonymous said...

In the past it was the chair's job to ensure consistency across reviews among other quality considerations, but with 200+ submissions this is no longer a realistic possibility for the chair, who is engaged in many other things, to carefully read all 600+ reviews.

Maybe it's time to appoint a person(s) for this task (this is something I've been thinking about for LATIN 2010).

Alex Lopez-Ortiz

Anonymous said...

I have recently found one of your articles on Internet Mathematics and got really interested and inspired by it. As by almost any of your papers or the book. To my opinion, you are one of those phenomenal people in Computer Science/Mathematics who not only can think big and important, but also try and successfully inspire others. From your blog entries I see that much of your time is spent with committees, paper work and policies... There are plenty people who are able to do the latter. I would guess many of us who read your works would be very happy if you were able to do the former two things as much as it is possible to and as you like to, which only a few are capable of doing.

Anonymous said...

Well, Anon #1, looking at how much systems work has hit the market as of late, the answer is almost uniformly "No".

Anonymous said...

Well, Anon #1, looking at how much systems work has hit the market as of late, the answer is almost uniformly "No".

How can the answer to a single question be "almost uniformly" no? I guess that allows for the answer to be yes?

If you mean that almost uniformly all systems ideas make it to market, then you're delusional.

If you think that "making it to market" is a short, easily evaluated process, then you are also delusional.

Anonymous said...

Interesting post. As for me, I have a different reaction. When I read a theory paper, I can always easily identify a result in the form of a new theorem and a careful discussion of how it improves upon prior work. (I do agree, though, that judging quality of the result is often difficult.) When I read many (most?) systems papers, I have a hard time identifying the exact contribution of the paper. This is partly because the comparisons to prior work are apples-to-oranges (or maybe it is just written that way?), and sometimes because the problem statement is never made clear.

Michael Mitzenmacher said...

Anon 1 (and possibly 5): Not all systems papers, of course, lead to (or are describing already implemented) improvements for real-world problems. But (in the better systems conferences) many of them do. Indeed, it's somewhat surprising these days how many papers describe deployments of real systems, not simulations or designs. But that's certainly a clear goal in PC discussions.

Otherwise, while you do make a point that there's often not "real investigation into the claims" -- in the sense that we do not attempt to repeat the experiments, or perform related experiments in other domains -- that would be challenging in the conference domain. As in other areas of science, if your results consistently don't hold up subsequently, I think you pay in reputation later. But even granting you that, and granting the obvious statement that "salesmanship" (in all areas I know of) help papers get in, I'd disagree with your contention that it's all salesmanship. PC members are a suspicious lot. (At least at NSDI.) If something doesn't sound right, people question it.

Michael Mitzenmacher said...

Anon #3: I thank you for your very find comments.

I think a non-trivial fraction of my time has always been involved in "administrative work"; I think it's part of the job, and I possibly place a higher value personally on service to the community than average.

But perhaps I've been blogging too much about it, and should try to blog more about research (mine and others!).

Thanks again,
MM

Michael Mitzenmacher said...

Anon #7: You raise an interesting point; whereas in theory papers you pretty much HAVE to clearly define the problem (and how it relates to previous results) systems papers don't have that forced on them in quite the same way. I think, however, the style is becoming more mathematical in systems papers. I know that, as a reviewer for NSDI, I "took off" for papers that did not carefully define terms or try to formalize what their goals/contributions were, and I felt that other reviewers judged papers similarly. I think strong systems papers are clear about defining the problem they are trying to solve, and formalizing their contribution theoretically and/or in clear and convincing experimental work.

Anonymous said...

Consistency in networking conferences. You are kidding right!!!. I like what Sigcomm did a few year ago, where the reviews for accepted papers were made public. The identity of reviewers was made public. Additionally for a fixed window of time everybody was allowed to comment on the work. I bet you a lot of the usual Sigcomm and Mobicom (Ha ha at blind review) would think twice about writing those lucid fairy tales called systems papers. A number of systems papers from famous schools, play citation games and consitintly keep publishing in top conferences. If the work is so brilliant why not provide an open forum for debate.

Anonymous said...

what makes you say you value service to the community more than the average person?

Michael Mitzenmacher said...

Anonymous #11:

I've done local arrangements for FOCS (twice). I generally serve on multiple PCs each year. I served on the Theory funding committee. I've done a number of NSF panels. And so on.

Generally, when asked to serve or when volunteers are asked for, I respond. My understanding is this puts me above the average in terms of valuing service (by putting my time into it).