## Sunday, February 01, 2009

### STOC PC Meeting : Part II

For this PC, I asked reviewers to use a 5 point scale, corresponding to
1: Bottom 1/2 of submissions.
2: Top 1/2 but not top 1/3 of submissions.
3: Top 1/3 but not top 1/5 of submissions.
4: Top 1/5 but not top 1/10 of submissions.
5: Top 1/10 of submissions.
I'd like to reflect on how that experiment worked.

Overall, I think it worked well. One plus it that I think the scale makes it very easy to find the bottom half of the papers (easy rejects) and top 10-15% of the papers (easy accepts), so that less time needs to be spent discussing those papers.

On the other hand, on day 2, we were left with a bunch of papers with scores with about a 3 average. This makes sense -- since we accepted about 25% of the papers, papers with about a 3 average were, by definition, borderline. In short, a grade of 3 could mean, "I like the paper but it's a reject" or "I like the paper and it's an accept."

One solution might be to tweak those percentages (an experiment worth trying) to better match the acceptance boundary. But, at the end of the day, I think the fact of it is that borderline papers are hard -- that's why we still have face-to-face PC meetings. No matter what voting system you use, these papers are the hardest to deal with. When you get down to these papers, the real question is, "Do you want to accept the paper or not?" I think a mechanism in the review software to allow a second round of voting -- corresponding to the question, "Conditioned on this being one of the X papers left we have to decide on, do you think we should accept or reject?" would be useful and would have saved us some time. In practice, we just did that verbally (approximately) in the meeting (as part of the discussion).

I think there are other advantages of this 5 point scale. When a PC member isn't following the scale -- say assigning much less than 1/2 of their papers scores of 1, or much more than 20% scores of 4 and 5 -- it's essentially immediately apparent to everyone. That's more transparent than the 10 point scale. (One can always use software that "re-calibrates" individual's scores to some sort of baseline -- that also works, but I think is much less transparent.)

To me it's just clear the 5-point scale approach must be better. At the end of the day, we have to make a binary decision on each paper. This scale gets us most of the way there, while giving enough room to distinguish the best papers and papers that need more discussion. I would use it again as a chair, and I prefer it as a PC member as well.

There's one downside to this scale -- which I'd appreciate comments on. Do we send the scores with the reviews, or not? It can be disheartening to get back scores of 1. On the other hand, it's always annoying when your paper is rejected, and I think scores provide useful feedback to the authors. (If you got all 1's and 2's, perhaps you should consider a conference other than FOCS for re-submission.) Several PC members said we shouldn't send the scores with the comments. I think we should -- of course, I'm used to getting scores of this form back from networking conferences. What do you think?

Anonymous said...

Without any doubt: send the scores!

(If I'm rejected then I would certainly like to know whether I was in the borderline or a clear reject.)

Anonymous said...

Scores are mildly useful for authors. I would generally appreciate how my work was viewed, esp. if it was "an easy and clear reject for reason X" (X = uninteresting, interesting only to specialists, competent and useful results but unexciting, buggy, suspicious, etc.). In each case, there is an implicit (or explicit) advice of what to do with it: respectiely, dump it, send it to SODA, send it to journal, fix the bugs, write it better...

In the STOC/FOCS program committes that I've been on, the issue of the "3" papers (borderline) has always been a bit problematic. I think it's a good thing for the field to have so many competent papers to choose a program from; I just feel that the authors of papers rejected in the last round deserve to know that their paper was very much in consideration, and the precise reason it was dropped was X.

Even the two popular funny comments have some value in them, even though they are somewhat harsh:

"This paper should go to SODA, it will improve the average quality of both conferences"
and
"This paper has both new and interesting results; unfortunately, the new results are not interesting and the interesting results are not new"

Jeffe said...

When a PC member isn't following the scale -- say assigning much less than 1/2 of their papers scores of 1, or much more than 20% scores of 4 and 5 -- it's essentially immediately apparent to everyone.

Scores not fitting the curve doesn't always mean the reviewer isnt' following the scale. Some PC members might just get worse/better papers than average (a hot/cold year in their subsubarea), and their off-average judgment is correct.

Several PC members said we shouldn't send the scores with the comments.

I agree with them. Scores without detailed comments to back them up look capricious; you're much more likely to piss people off. But with detailed comments, the actual scores are unnecessary. If you want to tell authors that their paper was seriously considered, then just tell them.

I find it amusing, but not surprising, that the commenters advocating sending scores didn't sign their comments.

Michael Mitzenmacher said...

Jeff says:

Scores not fitting the curve doesn't always mean the reviewer isnt' following the scale. Some PC members might just get worse/better papers than average (a hot/cold year in their subsubarea), and their off-average judgment is correct.

Indeed, this is actually a bigger problem with recalibration software, but I don't think it's a big problem with the 5-point scale. This still doesn't excuse assigning 30+% of the papers 4 or 5 scores without some justification (which of course the PC member can give at the meeting if they think it's really an exceptional year in some area). And when a number of your 4/5 scores are on papers where others are giving 2's, it really stands out.

Scores without detailed comments to back them up look capricious; you're much more likely to piss people off. But with detailed comments, the actual scores are unnecessary.

In theory, I agree with you -- detailed comments would obviate the need for scores. In practice, they don't always do so, and there aren't always detailed comments... and the tradeoff is not so clear. But you definitely make the good argument for why scores shouldn't sent.

Anonymous said...

I'd like to see the scores!

I have often gotten very neutral comments saying that the paper has some nice qualities, but the reviewer seems unexcited. I would like to know if such a comment translated to a 2,3 or 4 which is often ambiguous.

Jeffe's reason for not sending scores, because they are unnecessary under idealized circumstances, does not really argue AGAINST sending them unless we're trying to save bandwidth!

Anonymous said...

I really think sending the scores is very helpful. At least a score is a bit of more information. Indeed I think the scores are more reliable than commetns. The people do not give very bad comments (to avoid any harm to themselves later, since our community is small), but they easily give bad scores which are the real things that they think.

Anonymous said...

Definitely send the scores. They are very useful, contrary to what Jeff said. He might have forgotten already how is like when one is a grad student.

When my first submissions as a student were rejected it was very difficult from the scant comments to figure out by how much the paper missed the mark. Given the history of reviewers for writing "diplomatic" rejections it is even harder.

Fast forward a few years later, when I started submitting papers to network conferences. We got a couple of rejections from big conferences early on but in this case the scores came along with them together with extensive comments. From the scores it was clear we had just missed the cutoff and from the comments it was easy to see what needed to be done to get them over the threshold.

For the PC chair sending the scores meant little effort, yet for the authors (and eventually for the community) this is very helpful. Good papers are resubmitted to big conferences and get to be known and used by others, middle of the road papers get resubmitted to medium conferences, thus saving reviewing time and bad papers are dropped. What is not to like?

Anonymous said...

I really can't see any good reason not to send out the scores. Here are the arguments so far:
* It can be disheartening to get back scores of 1. -Michael M.
* Scores without detailed comments to back them up look capricious; you're much more likely to piss people off. But with detailed comments, the actual scores are unnecessary. If you want to tell authors that their paper was seriously considered, then just tell them. -Jeffe (who is proud of signing his comment with only his first name?)

It should be disheartening to get back a score of 1. In an ideal world, the detailed comments would be just as disheartening as the scores. In the real world, the comments won't be detailed and will be diplomatic, leaving the submitter with no idea of how good their paper is and how well it is viewed. They'll just resubmit it to another conference. A score of 1 will definitely influence them to rewrite the introduction on significance/motivation at the very least before resubmitting.

In reply to Jeffe, there are two cases: comments are detailed or not. If the comments are detailed, then yes the scores are unnecessary but not harmful. If the comments are not detailed (which is more often the case), then it is much *more* capricious to give a rejection with no information than to at least give out the score.

Anonymous said...

I guess the PC also has to deal with authors that get all 4's, but are still rejected...

Michael Mitzenmacher said...

Anonymous 8: Everyone knows (or should) that JeffE is Jeff Erickson, who, before producing offspring, wrote the funniest theory blog ever, Ernie's 3D pancakes. (I suppose it's still the funniest, but he posts so rarely now.)

Anonymous 9: Another argument that was brought up at the committee meeting was, indeed, that I'll have to deal with people who contact me to say, I got scores X, X, and X on my paper (where each X is say 3 or greater), how could my paper get rejected!!!

Anonymous said...

I think, not many people will write such e-mails. For instance, if I receive scores like 3, >3, >3 I will just think that there could be many papers with scores 4, >3, >3 and that mine was a borderline reject. And after all, if such e-mails are annoying, they can always be ignored. I strongly support sending scores. At least people will know where to send it next, and how their work was viewed by the community. I have seen many people saying "The review says the problem is interesting, the result is nice, then why the hell do they reject it?", and I think the scores will be more informative in such cases.

Anonymous said...

I would also like to receive scores. I would, of course, prefer very detailed comments to just scores, but since it's unlikely we'll get detailed comments, why not include scores? (I imagine that it requires more work to write meaningful comments, and some things might be confidential to the PC, etc.)

I think scores would be particularly helpful to students, who typically haven't yet learnt what the acceptance thresholds for various conferences are, or how to judge whether they're close to these thresholds. As you said, if one gets all 1s and 2s from STOC, one probably shouldn't submit to FOCS; on the other hand, if one gets 4,3,3 and has slightly strengthened the results in the meantime, sending the paper to FOCS might be a good idea.

This would probably be even more helpful to students for papers not written with their advisors, who might be unable to provide their usual guidance because they are not experts in the area.

Anonymous said...

In a previous FOCS (I think), the comments I got back included a short comment containing what seemed like a summary of the PC discussion on the paper. In my view, this was much more useful; the comments marked as comments for authors often do not capture what really led to the decision one way or the other.

MiP said...

Whether the scores "should" or "should not" be sent is irrelevant. I never had problems finding out my scores from a friend on the PC (I have not seen many rejections, but I find it useful to understand the scores and comments even for accepted papers).

Thus, not sending scores only means that we are creating an old-boys club, in which some people can get scores, and others cannot.

(Yes, I did feel like a silly bleeding-heart ideologue to criticize a club I'm in...)

Anonymous said...

In a previous FOCS (I think), the comments I got back included a short comment containing what seemed like a summary of the PC discussion on the paper.

Alistair did this for FOCS 2007. It simply requires assigning one paper to every PC member, who then writes the comments in real time as the PC discusses the paper.

someone said...

SoCG 2004 sent scores. A paper of mine got rejected. The comments were extensive, but I appreciated getting the scores anyway. It just gies extra information.

I never had problems finding out my scores from a friend on the PC

What about people who do not have friends in the PC?

Anonymous said...

There are three groups of people involved here:

1) authors. obviously, every author would love to get more information. it's not like they have to publish all 1's on their web site :). So, as confirmed by the comments, you get overwhelming support for the scores being sent.

2) PC. totally the opposite. in each area, there are usually 2-3 PC members assigned, so it is pretty obvious who is going to review each paper most of the time. If a PC member is friends with the author, and still feels like rejecting the paper, he/she will give a score of 1, but will write a sugar-coated review. Sending the score will remove the hypocrisy, and might create the unwanted tension, since people do not like to see their papers rejected, especially "famous" people whose former student "rejected" their paper. Thus, a vast majority would prefer the "Real stuff" to stay in the house; i.e., not sending score.

3) The PC chair. The chair might go either way. On the one hand, the person is usually senior enough not to worry about people being upset about rejection, plus people understand that the PC chair only orchestrates the discussion, but does not have enough time to really participate in a majority of decisions, deferring to the PC. So people do not take the rejection on the chair. On the other hand, the PC chair is friends with the PC members (dah), and the latter do not want to send scores, so the PC chair has some pressure not to do so. In majority of cases, the PC wins. But here Michael seems to support the (correct) decision to send the scores, so he does the next best thing: write a blog, get overwhelming support, and have a good excuse to "over-rule the PC" :).

Thanks, Michael, the authors would definitely appreciate this! In all honestly, we have 1000+ people in the first category and 30- in the second, so sending following the majority is the right decision...

Anonymous said...

Don't send the scores!! In many cases the identity of the referees and/or relevant PC members is obvious. If you do send the scores, future referees/PC members will be too careful and won't give scores that truly reflect their opinion. This may cause future PCs to make much more mistakes (and may cause damage for many years).

Anonymous said...

to 11:12 AM Anonymous:

I doubt this change people's behavior. The PC is responsible for the program produced. Why would they risk not following their judgement/not doing their job?

On the -3--3 scale, it could cause people to give higher scores, raising the mean and making things more complex, but it probably would not change things on the current 1-5 scale as described.

The fact is that if some people already have access to this info (which they do if they are better connected to PC members), then everyone should have access to this info. Otherwise it is just not fair. If you can justify a low score with good comments and questions, there is simply no reason to worry about giving out the score. Anyway, if the system is so messed up that transparency is "bad", then perhaps this whole conference system has to be redesigned.

Anonymous said...

I continue to be amazed at the theory community for finding reasons - any reasons - to resist change. Even change that is positive!

Sending scores? How on earth can this be a bad thing? Yes, I've read the 'arguments' posted above. When committee members argue that people might say "I got scores X, X, and X on my paper (where each X is say 3 or greater), how could my paper get rejected!!!", and really believe that it's a strong argument against sending scores, I'm really at a loss for words.

And by the way - "future referees/PC members will be too careful and won't give scores that truly reflect their opinion."

When people don't want to change, it really looks like they're willing to come up with *any* justification.

And yeah - double blind reviews, people. Now, what are the arguments for not doing what virtually every other community (and not just in CS, mind you) has adopted? I'd love to read these justifications, oh yes.

Anonymous said...

Since some people (outside the PC) seems to be privy to the scores (a fact that I wasn't aware before) I believe that it is fair to disclose them to all authors. After all NSF program directors regularly disclose the scores (and often the rankings of the proposals) -- so why not the STOC committee.

On the downside I can visualize the smirk and the condescension on the faces of our colleagues in Mathematics (think Neal Koblitz) once they learn this --

"... did you hear that the theoretical CS people actually receive grades for their scholarly articles on a 5 point scale. Grades I tell you !"

Anonymous said...

"And yeah - double blind reviews, people. Now, what are the arguments for not doing what virtually every other community (and not just in CS, mind you) has adopted? I'd love to read these justifications, oh yes."

I have often heard that double blind reviews with "impede progress" because people will have to try to keep their work from being public in order to preserve anonymity. I don't think this is a good argument as much (borderline) work is not public until after it has been accepted anywhere.

Additionally, it is interesting because if someone really does get scores such as a 3,>3,>3, they are likely to resubmit to FOCS without much change and have a good chance of the paper being accepted. So the community will just wait another 6 months to learn about the result. How is that for impeding progress? If there are more publishable papers than the number accepted, then we are slowing down the field, hurting people for whom the publicity of their results could lead to collaborations, etc.

Anonymous said...

> I continue to be amazed at
> the theory community for
> finding reasons - any reasons
> - to resist change. Even
> change that is positive!

In the past, 10 years ago or more, PC was usually sending out the scores, so the change did happen!
No, I don't know what was the reason of changing this 10 or so years ago (even if you're saying the community should support changes).

Claire Mathieu said...

Scores: it doesn't matter to me. But I was quite surprised by the result, so I am very curious to see the comments on my submissions.

When will we get those comments?

Warren said...

How about explicitly telling authors at what stage their paper was rejected rather than forcing them to reverse-engineer that information from the reviews and scores? For example say whether or not it was discussed at the PC meeting. I suspect that's a more accurate measure of a paper's quality than the scores given by reviewers. It also eliminates the possibility that authors could guess which PC member gave them a bad score.

If the choice is between no quantitative feedback and reviewer scores I vote for revealing the reviewer scores.

I also like the idea of a brief summary of the overall PC's opinion. Reviews are often contradictory so it's nice to know what the PC finally thought!

Anonymous said...

I like Warren's idea of letting the authors know if the paper was discussed very much. In addition to useful information, that way, the responsibility of rejection falls less on the shoulders of each reviewer.

Luca said...

At many PC meetings I have participated in, the discussion on controversial papers changes some people's opinion, and then, when a consensus is built, a vote is taken to accept/reject the paper. But the PC members do not change the scores online to reflect their changed opinion. If you are going to send the score you have to make sure this hasn't happened.

Also to consider if you send scores: not only you will get emails saying, "how come my paper was rejected when the scores were 4.4.4", but also "how come my paper was rejected when the scores were 4,4,4 and my officemate's paper was accepted and its scores were 4,3,4??"

Anonymous said...

You should certainly send the scores. In many cases one cannot tell from the comments whether a paper was an easy reject or whether it was rejected after long discussions. The scores may give some indication as to what happened (and a summary of the PC meeting can be even more helpful)

MIP's comment is irrelevant because if don't want to put your friends in an uncomfortable situation then you don't ask them for your scores.

I agree with Luca that sending the scores may cause some nuisance but the current situation is also not optimal and if I have to choose between concealing information or revealing it then, at least for me, the answer is clear.

Michael Mitzenmacher said...

Luca --

In a perfect theoretical world, yes, everyone would go back and edit their reviews/scores to reflect also the discussion at the PC meeting. In practice, we've just spent 2 full days talking about the papers, plus flying time for many, and we've got our own teaching/research/administration/kids/spouses to get back to. And people want their reviews back as fast as possible, since 3-5 other conferences have made their deadline the week after our announcement. So realistically, you can choose to get the scores or not get the scores, but you can't choose to have the scores reflect the discussion of the PC.

As for the officemate questions, I'm already wording my canned response, which is "Scores may not reflect the discussion at the PC meeting."

hoeteck said...

I strongly agree with Warren's suggestion.

It is also important that we agree on what the issues/premises are:

1. Authors should receive as much information on the review process as is possible and reasonable. This is particularly important for student-authored papers, for single-authored papers and for junior researchers and people new to the field.

2. It's not fair that some people are privy to more information just because "they know the right people".

3. Lots of discussions take place during the PC meeting and are not necessarily reflected in the scores or the text.

Here's a suggestion drawing on my (singular) PC experience on TCC 2008 using Shai's software.

Typically, the PC chair marks each paper as "accept", "maybe accept", "maybe reject" or "reject" at each stage of the discussion. What the chair can do is select milestones in which to save this state information, which will then be sent to the authors. In particular, I think it will be very helpful to know the ranking information:

* right before the PC meeting.

* when the PC is down to discussing the last 10 borderline papers.

This should be very minimal work on the part of the PC (the chair has to click "save" twice and add some short descriptive text), and yet very useful aggregate information for the authors. In addition, it's not tied in to any text and will not reflect on any specific reviewer. Finally, this ranking should be largely monotone, thereby avoiding the problem that Luca raised.

Anonymous said...

Michael, I am really glad you posed this question. For the last few years I have been a big fan of sending out the scores in TCS conferences. The main (IMHO) two benefits were already mentioned here, but it is worth to repeat them again:

1) The comments alone tend to provide factual information (e.g., relevant related work, missing steps in the proof, etc). Occasionally, they provide opinions ("I like this and dislike that"). However, it is often hard for the authors to deduce the weight of the opinions from the text alone. This is in part because many reviewers do not feel comfortable writing negative statements that would be the equivalent of a low score. As a result, the authors are in the dark about the reasoning behind the decisions.

The summaries (e.g., as in FOCS'07) would also help on this front.

2) As it was pointed out, people who are "connected" can find it easier to learn the scores than people who are not, which is plainly unfair.

So yes, we should send the scores. And while we are at it, why not show score distributions for the papers during the PC report at the business meeting ? This would provide useful info (especially if the authors have the scores), and has been done before, e.g., in SODA'05.

Piotr (Indyk)

Anonymous said...

This is yet another anonymous vote for sending out the scores.

Arguments that people will complain that their scores are too high to be rejected are weak, since one can already get reviews suggesting that their (rejected) paper is great. Moreover, I've had accepted papers with bad/lukewarm reviews, which would probably reflect a lower score (but this is unclear... and that is the whole point of releasing scores). The PC should just give as much information as possible, within reason (and privacy constraints).

I am not posting my name because I don't want people to make inferences about my vote based on the outcomes of my STOC submissions...

-- STOC Submitter

Michael Mitzenmacher said...

Hoeteck --

I think you're asking for just far too much information -- and, moreover, that's not how I used the system. But really, just because I think it's reasonable for authors to see their scores doesn't mean I think they should receive every state change their paper was involved in on the way. (That info is too noisy, and too much work/too constraining for me to deal with.)

Warren --

There are various reasons a paper might or might not have been discussed at the meeting. For example, one clear rule I had was that ANY PC member could keep ANY paper open for discussion at the PC meeting (without giving a reason).

But if you think about the given scale for a minute, along with the public knowledge that I tried to reject 1/2 the papers before the meeting, you can probably come up with the appropriate approximate score threshold for being discussed at the meeting.

It would be an interesting experiment for a PC chair to send messages before the meeting that "your paper has already been rejected". Of course, since any PC member could ask for discussion of any paper at the meeting (even those previously considered rejected), I wouldn't have done things that way.

Anonymous said...

In particular, I think it will be very helpful to know the ranking information...when the PC is down to discussing the last 10 borderline papers.

Are you sure you would want to be told "your paper was the very last one we rejected"?!

Warren said...

Do the PC members who object to sharing scores also object to sharing the median of the three scores?

Even if an author learns that their median is 1, they can't be sure that their friend/former advisee/enemy on the PC gave them a 1. The median of {1,1,5} is 1 after all! If people are worried about PC members giving inflated scores to avoid revenge, it seems far more important to prohibit PC members from telling their friends scores than to withholding the median.

Anonymous said...

I'm not sure that sending out the scores is a good idea. The scores can be even more random and noisy than the decision itself. If you are not sending out confidence scores, then the scores are meaningless. As a single bad review pointing out an error/previous work can kill a paper, good average score does not necessarily mean that the paper is a good paper (if the scores are not changed after the meeting).

I think a good compromise would be to say something like "Your paper was rejected in stage x out of 3" without further explanations. Probably this would not be too hard to do. This information is also noisy (for example, a PC member can keep the discussion open for certain papers), but this is not a problem, as it does not give the false impression of being an exact data such as the scores.

Anonymous said...

But *most* papers are not rejected due to an error (most errors are not found by PC) or due to previous work, so what this last poster says is not accurate. Just send the scores.

Anonymous said...

The argument by anonymous 1:16 is actually an instance of the Nirvana fallacy.

It consists of comparing an idea or proposal against an idealized standard instead of comparing it to the status quo that it would replace if adopted.

The second form of this fallacy, also used by anonymous 1:16 is focusing on some odd circumstances in which the idea or proposal fails or can be abused. Nearly any method or proposal can be abused, if enough effort is put into it. The real question is if this would be common place and easy to do.

Michael Mitzenmacher said...

Thanks Anon #38 -- I have noticed when trying experiments with the standard PC setup that indeed I get a lot of arguments based on a comparison to an ideal as opposed to a realistic assessment of whether the proposal would be better or worse than the default, and alternative proposals based on idealized behavior of the individuals involved. Perhaps it's the nature of theoreticians.

Anonymous said...

In continuation to those arguments suggesting that sending the scores might put PC members in a difficult position, I further propose not to send the referees' report to the authors at all. Because, basically, the same argument applies in this case.

To be on the safe side, I also recommend not to inform the authors upon rejection. However, since they might find out that their peers have been accepted, and conclude that in fact they were rejected, I also propose not to send acceptance notifications. Thus, the conference proceedings would consist of ALL submitted papers. Only the PC members would know who was in fact "accepted" and who was not (of course, they would be obliged to attend all talks, in order not to hint on the "real program" of the conference).

Paul Beame said...

A note on the scoring scheme: Do not use this scheme if you are doing an all electronic PC. It only worked because we had a face-to-face meeting where people could explain what a grade of 3 on a particular paper meant to them. A grade of 3 was supposed to cover anywhere from roughly paper 60 to 105. That meant that it could be applied to papers that PC members thought should be accepted as well as those that they didn't.

Because we had a face-to-face meeting, the interpretation could be discussed so it worked but the initial order of papers with average scores near 3 did not have much correlation with the final decisions on those papers.

Michael Mitzenmacher said...

Paul,

I'm unconvinced this scheme is really any worse than the 1-10 scheme for an electronic PC; the difficult papers are the difficult papers. You'll make bad decisions if you just rank by score and draw a line with that scheme as well.

A score of 3 did end up meaning borderline -- to be discussed. And that's what we did. With a 10-point scale, things in the 5.5-7 range or so --- and there are a lot of those -- means borderline, and in you need to hash those out as well, in an electronic or face-to-face PC.

Paul Beame said...

Michael,

These borderline papers are exactly where you want the most information since this is where the real action of the committee decisions will need to be. Why remove all distinguishing information about such borderline ratings?

One thing that all-electronic program committees don't do well is interaction and hashing things out among PC members as a whole. Typical debates are between two or occasionally three members of the PC without an audience to judge the merits of the debate. Other PC members give their initial opinions and retreat from debates because they cannot make time over a long period of days for the interaction.

The only reason not to get this more detailed information from PC members up front is if we somehow believe that these initial distinctions are not valid. Is that your view?

Michael Mitzenmacher said...

Thank you, Paul. I think you've frame the issue nicely. And yes, my contention is the extra distinguishing information from the 10 point scale is at best useless, and potentially harmful.

There are just too many noisy aspects to the original data -- the difference in how people interpret the scale, the use of subreviewers, the fact that it's a first reading, and so on. Every PC meeting I've been on, when you get to the papers in the middle, the scores are often a poor predictor of the final accept/reject decision. People change their minds; they get new information based on other reviews; and they readjust their "scale" based on what else they see going on at the meeting.

Harry Lewis discusses this in one of this books -- but the general idea is a larger scale (1-100) gives the appearance of precision at the cost of accuracy. One experiment he mentioned (which I forget if it was done by him, or someone else) one time was giving his TAs papers and asking them to score it from 1-10; there were significant deviations. But if he asked them to divide it into A,B,C,D,F, he got much greater reproducibility.

I claim the 1-5 scale is very strong at dividing papers into the 3 basic categories important for a PC meeting: easy reject, easy accept, discuss. You suggest that a 1-10 scale gives more precision for the papers that need to be discussed. And I disagree; it gives the appearance of precision at the expense of reproducibility and accuracy.

The problems you suggest for electronic PC meetings are certainly accurate -- discussion is more difficult in such a forum. Using a 1-10 scale and assuming the score are accurate is one way to deal with that. I don't think using a 1-5 scale would lose you much -- if anything -- since I'd argue at that point that if you're bypassing the discussion phase you're just flipping random coins with random bias anyway. If you are having discussions, I don't see the problem with the 1-5 scale at all (and see the accuracy/reproducibility as a feature).

Anonymous said...

Thank you very much for reporting the scores! I do think it gives much more information. I could not have guessed the scores from the reviews.

Shai Halevi said...

I personally prefer 4- or 5- or 6-level grading schemes over 9- or 10-level schemes (although my choice of assigning semantics to these scores is different from Michael's).

The important thing is to let PC members express in their score the difference between "hell, no" and "I don't think so" (and similarly the difference between "yes, absolutely" and "I think so"). I doubt that allowing a finer scale than this helps much.

But ultimately, the scoring scheme really doesn't play much of a role. Identifying the "middle range" is rather easy, no matter what scoring system you use. And once you identify these submissions, you must have an actual discussion (either on-line or face-to-face, and preferably both).

Like every committee, a PC is ultimately an exercise in communication and joint decision-making. As long as the chair ensures that all the papers in the middle get the discussion that they deserve, the decisions will be reasonable. (And if the chair does not ensure this, then no scoring system will save you.)

About sending the scores: it seems rather harmless to me. (Although as author I am never really interested in learning this piece of information.) Of course it will annoy some people, but when your paper is rejected you always get annoyed, whether or not you learn your scores.

Brighten Godfrey said...

Anon 2 wrote: "This paper has both new and interesting results; unfortunately, the new results are not interesting and the interesting results are not new"