- 1: Bottom 1/2 of submissions.
- 2: Top 1/2 but not top 1/3 of submissions.
- 3: Top 1/3 but not top 1/5 of submissions.
- 4: Top 1/5 but not top 1/10 of submissions.
- 5: Top 1/10 of submissions.
Overall, I think it worked well. One plus it that I think the scale makes it very easy to find the bottom half of the papers (easy rejects) and top 10-15% of the papers (easy accepts), so that less time needs to be spent discussing those papers.
On the other hand, on day 2, we were left with a bunch of papers with scores with about a 3 average. This makes sense -- since we accepted about 25% of the papers, papers with about a 3 average were, by definition, borderline. In short, a grade of 3 could mean, "I like the paper but it's a reject" or "I like the paper and it's an accept."
One solution might be to tweak those percentages (an experiment worth trying) to better match the acceptance boundary. But, at the end of the day, I think the fact of it is that borderline papers are hard -- that's why we still have face-to-face PC meetings. No matter what voting system you use, these papers are the hardest to deal with. When you get down to these papers, the real question is, "Do you want to accept the paper or not?" I think a mechanism in the review software to allow a second round of voting -- corresponding to the question, "Conditioned on this being one of the X papers left we have to decide on, do you think we should accept or reject?" would be useful and would have saved us some time. In practice, we just did that verbally (approximately) in the meeting (as part of the discussion).
I think there are other advantages of this 5 point scale. When a PC member isn't following the scale -- say assigning much less than 1/2 of their papers scores of 1, or much more than 20% scores of 4 and 5 -- it's essentially immediately apparent to everyone. That's more transparent than the 10 point scale. (One can always use software that "re-calibrates" individual's scores to some sort of baseline -- that also works, but I think is much less transparent.)
To me it's just clear the 5-point scale approach must be better. At the end of the day, we have to make a binary decision on each paper. This scale gets us most of the way there, while giving enough room to distinguish the best papers and papers that need more discussion. I would use it again as a chair, and I prefer it as a PC member as well.
There's one downside to this scale -- which I'd appreciate comments on. Do we send the scores with the reviews, or not? It can be disheartening to get back scores of 1. On the other hand, it's always annoying when your paper is rejected, and I think scores provide useful feedback to the authors. (If you got all 1's and 2's, perhaps you should consider a conference other than FOCS for re-submission.) Several PC members said we shouldn't send the scores with the comments. I think we should -- of course, I'm used to getting scores of this form back from networking conferences. What do you think?