Comments on My Biased Coin: STOC PC Meeting : Part II

Anon 2 wrote: "This paper has both new and interes...

2009-02-12T05:22:00.000-05:00

Anon 2 wrote: "This paper has both new and interesting results; unfortunately, the new results are not interesting and the interesting results are not new"
(attributions unknown to me).

That has been attributed to Samuel Johnson who wrote: "Your manuscript is both good and original; but the part that is good is not original, and the part that is original is not good." (This according to Pushcart's Complete Rotten Reviews & Rejections.)

I personally prefer 4- or 5- or 6-level grading sc...

2009-02-05T23:23:00.000-05:00

I personally prefer 4- or 5- or 6-level grading schemes over 9- or 10-level schemes (although my choice of assigning semantics to these scores is different from Michael's).

The important thing is to let PC members express in their score the difference between "hell, no" and "I don't think so" (and similarly the difference between "yes, absolutely" and "I think so"). I doubt that allowing a finer scale than this helps much.

But ultimately, the scoring scheme really doesn't play much of a role. Identifying the "middle range" is rather easy, no matter what scoring system you use. And once you identify these submissions, you must have an actual discussion (either on-line or face-to-face, and preferably both).

Like every committee, a PC is ultimately an exercise in communication and joint decision-making. As long as the chair ensures that all the papers in the middle get the discussion that they deserve, the decisions will be reasonable. (And if the chair does not ensure this, then no scoring system will save you.)

About sending the scores: it seems rather harmless to me. (Although as author I am never really interested in learning this piece of information.) Of course it will annoy some people, but when your paper is rejected you always get annoyed, whether or not you learn your scores.

Thank you very much for reporting the scores! I do...

2009-02-05T19:37:00.000-05:00

Thank you very much for reporting the scores! I do think it gives much more information. I could not have guessed the scores from the reviews.

Thank you, Paul. I think you've frame the issue n...

2009-02-04T12:54:00.000-05:00

Thank you, Paul. I think you've frame the issue nicely. And yes, my contention is the extra distinguishing information from the 10 point scale is at best useless, and potentially harmful.

There are just too many noisy aspects to the original data -- the difference in how people interpret the scale, the use of subreviewers, the fact that it's a first reading, and so on. Every PC meeting I've been on, when you get to the papers in the middle, the scores are often a poor predictor of the final accept/reject decision. People change their minds; they get new information based on other reviews; and they readjust their "scale" based on what else they see going on at the meeting.

Harry Lewis discusses this in one of this books -- but the general idea is a larger scale (1-100) gives the appearance of precision at the cost of accuracy. One experiment he mentioned (which I forget if it was done by him, or someone else) one time was giving his TAs papers and asking them to score it from 1-10; there were significant deviations. But if he asked them to divide it into A,B,C,D,F, he got much greater reproducibility.

I claim the 1-5 scale is very strong at dividing papers into the 3 basic categories important for a PC meeting: easy reject, easy accept, discuss. You suggest that a 1-10 scale gives more precision for the papers that need to be discussed. And I disagree; it gives the appearance of precision at the expense of reproducibility and accuracy.

The problems you suggest for electronic PC meetings are certainly accurate -- discussion is more difficult in such a forum. Using a 1-10 scale and assuming the score are accurate is one way to deal with that. I don't think using a 1-5 scale would lose you much -- if anything -- since I'd argue at that point that if you're bypassing the discussion phase you're just flipping random coins with random bias anyway. If you are having discussions, I don't see the problem with the 1-5 scale at all (and see the accuracy/reproducibility as a feature).

Michael,These borderline papers are exactly where ...

2009-02-04T12:25:00.000-05:00

Michael,

These borderline papers are exactly where you want the most information since this is where the real action of the committee decisions will need to be. Why remove all distinguishing information about such borderline ratings?

One thing that all-electronic program committees don't do well is interaction and hashing things out among PC members as a whole. Typical debates are between two or occasionally three members of the PC without an audience to judge the merits of the debate. Other PC members give their initial opinions and retreat from debates because they cannot make time over a long period of days for the interaction.

The only reason not to get this more detailed information from PC members up front is if we somehow believe that these initial distinctions are not valid. Is that your view?

Paul,I'm unconvinced this scheme is really any wor...

2009-02-03T17:28:00.000-05:00

Paul,

I'm unconvinced this scheme is really any worse than the 1-10 scheme for an electronic PC; the difficult papers are the difficult papers. You'll make bad decisions if you just rank by score and draw a line with that scheme as well.

A score of 3 did end up meaning borderline -- to be discussed. And that's what we did. With a 10-point scale, things in the 5.5-7 range or so --- and there are a lot of those -- means borderline, and in you need to hash those out as well, in an electronic or face-to-face PC.

A note on the scoring scheme: Do not use this sch...

2009-02-03T16:29:00.000-05:00

A note on the scoring scheme: Do not use this scheme if you are doing an all electronic PC. It only worked because we had a face-to-face meeting where people could explain what a grade of 3 on a particular paper meant to them. A grade of 3 was supposed to cover anywhere from roughly paper 60 to 105. That meant that it could be applied to papers that PC members thought should be accepted as well as those that they didn't.

Because we had a face-to-face meeting, the interpretation could be discussed so it worked but the initial order of papers with average scores near 3 did not have much correlation with the final decisions on those papers.

In continuation to those arguments suggesting that...

2009-02-03T10:08:00.000-05:00

In continuation to those arguments suggesting that sending the scores might put PC members in a difficult position, I further propose not to send the referees' report to the authors at all. Because, basically, the same argument applies in this case.

To be on the safe side, I also recommend not to inform the authors upon rejection. However, since they might find out that their peers have been accepted, and conclude that in fact they were rejected, I also propose not to send acceptance notifications. Thus, the conference proceedings would consist of ALL submitted papers. Only the PC members would know who was in fact "accepted" and who was not (of course, they would be obliged to attend all talks, in order not to hint on the "real program" of the conference).

Thanks Anon #38 -- I have noticed when trying expe...

2009-02-03T09:59:00.000-05:00

Thanks Anon #38 -- I have noticed when trying experiments with the standard PC setup that indeed I get a lot of arguments based on a comparison to an ideal as opposed to a realistic assessment of whether the proposal would be better or worse than the default, and alternative proposals based on idealized behavior of the individuals involved. Perhaps it's the nature of theoreticians.

The argument by anonymous 1:16 is actually an inst...

2009-02-03T09:44:00.000-05:00

The argument by anonymous 1:16 is actually an instance of the Nirvana fallacy.

It consists of comparing an idea or proposal against an idealized standard instead of comparing it to the status quo that it would replace if adopted.

The second form of this fallacy, also used by anonymous 1:16 is focusing on some odd circumstances in which the idea or proposal fails or can be abused. Nearly any method or proposal can be abused, if enough effort is put into it. The real question is if this would be common place and easy to do.

But most papers are not rejected due to an error...

2009-02-03T08:48:00.000-05:00

But *most* papers are not rejected due to an error (most errors are not found by PC) or due to previous work, so what this last poster says is not accurate. Just send the scores.

I'm not sure that sending out the scores is a good...

2009-02-03T01:16:00.000-05:00

I'm not sure that sending out the scores is a good idea. The scores can be even more random and noisy than the decision itself. If you are not sending out confidence scores, then the scores are meaningless. As a single bad review pointing out an error/previous work can kill a paper, good average score does not necessarily mean that the paper is a good paper (if the scores are not changed after the meeting).

I think a good compromise would be to say something like "Your paper was rejected in stage x out of 3" without further explanations. Probably this would not be too hard to do. This information is also noisy (for example, a PC member can keep the discussion open for certain papers), but this is not a problem, as it does not give the false impression of being an exact data such as the scores.

Do the PC members who object to sharing scores als...

2009-02-02T23:15:00.000-05:00

Do the PC members who object to sharing scores also object to sharing the median of the three scores?

Even if an author learns that their median is 1, they can't be sure that their friend/former advisee/enemy on the PC gave them a 1. The median of {1,1,5} is 1 after all! If people are worried about PC members giving inflated scores to avoid revenge, it seems far more important to prohibit PC members from telling their friends scores than to withholding the median.

In particular, I think it will be very helpful to ...

2009-02-02T21:33:00.000-05:00

In particular, I think it will be very helpful to know the ranking information...when the PC is down to discussing the last 10 borderline papers.

Are you sure you would want to be told "your paper was the very last one we rejected"?!

Hoeteck -- I think you're asking for just far too ...

2009-02-02T21:00:00.000-05:00

Hoeteck --

I think you're asking for just far too much information -- and, moreover, that's not how I used the system. But really, just because I think it's reasonable for authors to see their scores doesn't mean I think they should receive every state change their paper was involved in on the way. (That info is too noisy, and too much work/too constraining for me to deal with.)

Warren --

There are various reasons a paper might or might not have been discussed at the meeting. For example, one clear rule I had was that ANY PC member could keep ANY paper open for discussion at the PC meeting (without giving a reason).

But if you think about the given scale for a minute, along with the public knowledge that I tried to reject 1/2 the papers before the meeting, you can probably come up with the appropriate approximate score threshold for being discussed at the meeting.

It would be an interesting experiment for a PC chair to send messages before the meeting that "your paper has already been rejected". Of course, since any PC member could ask for discussion of any paper at the meeting (even those previously considered rejected), I wouldn't have done things that way.

This is yet another anonymous vote for sending out...

2009-02-02T20:25:00.000-05:00

This is yet another anonymous vote for sending out the scores.

Arguments that people will complain that their scores are too high to be rejected are weak, since one can already get reviews suggesting that their (rejected) paper is great. Moreover, I've had accepted papers with bad/lukewarm reviews, which would probably reflect a lower score (but this is unclear... and that is the whole point of releasing scores). The PC should just give as much information as possible, within reason (and privacy constraints).

I am not posting my name because I don't want people to make inferences about my vote based on the outcomes of my STOC submissions...

-- STOC Submitter

Michael, I am really glad you posed this question....

2009-02-02T20:21:00.000-05:00

Michael, I am really glad you posed this question. For the last few years I have been a big fan of sending out the scores in TCS conferences. The main (IMHO) two benefits were already mentioned here, but it is worth to repeat them again:

1) The comments alone tend to provide factual information (e.g., relevant related work, missing steps in the proof, etc). Occasionally, they provide opinions ("I like this and dislike that"). However, it is often hard for the authors to deduce the weight of the opinions from the text alone. This is in part because many reviewers do not feel comfortable writing negative statements that would be the equivalent of a low score. As a result, the authors are in the dark about the reasoning behind the decisions.

The summaries (e.g., as in FOCS'07) would also help on this front.

2) As it was pointed out, people who are "connected" can find it easier to learn the scores than people who are not, which is plainly unfair.

So yes, we should send the scores. And while we are at it, why not show score distributions for the papers during the PC report at the business meeting ? This would provide useful info (especially if the authors have the scores), and has been done before, e.g., in SODA'05.

Piotr (Indyk)

I strongly agree with Warren's suggestion.It is al...

2009-02-02T19:59:00.000-05:00

I strongly agree with Warren's suggestion.

It is also important that we agree on what the issues/premises are:

1. Authors should receive as much information on the review process as is possible and reasonable. This is particularly important for student-authored papers, for single-authored papers and for junior researchers and people new to the field.

2. It's not fair that some people are privy to more information just because "they know the right people".

3. Lots of discussions take place during the PC meeting and are not necessarily reflected in the scores or the text.

Here's a suggestion drawing on my (singular) PC experience on TCC 2008 using Shai's software.

Typically, the PC chair marks each paper as "accept", "maybe accept", "maybe reject" or "reject" at each stage of the discussion. What the chair can do is select milestones in which to save this state information, which will then be sent to the authors. In particular, I think it will be very helpful to know the ranking information:

* right before the PC meeting.

* when the PC is down to discussing the last 10 borderline papers.

This should be very minimal work on the part of the PC (the chair has to click "save" twice and add some short descriptive text), and yet very useful aggregate information for the authors. In addition, it's not tied in to any text and will not reflect on any specific reviewer. Finally, this ranking should be largely monotone, thereby avoiding the problem that Luca raised.

Luca --In a perfect theoretical world, yes, everyo...

2009-02-02T16:41:00.000-05:00

Luca --

In a perfect theoretical world, yes, everyone would go back and edit their reviews/scores to reflect also the discussion at the PC meeting. In practice, we've just spent 2 full days talking about the papers, plus flying time for many, and we've got our own teaching/research/administration/kids/spouses to get back to. And people want their reviews back as fast as possible, since 3-5 other conferences have made their deadline the week after our announcement. So realistically, you can choose to get the scores or not get the scores, but you can't choose to have the scores reflect the discussion of the PC.

As for the officemate questions, I'm already wording my canned response, which is "Scores may not reflect the discussion at the PC meeting."

You should certainly send the scores. In many case...

2009-02-02T16:23:00.000-05:00

You should certainly send the scores. In many cases one cannot tell from the comments whether a paper was an easy reject or whether it was rejected after long discussions. The scores may give some indication as to what happened (and a summary of the PC meeting can be even more helpful)

MIP's comment is irrelevant because if don't want to put your friends in an uncomfortable situation then you don't ask them for your scores.

I agree with Luca that sending the scores may cause some nuisance but the current situation is also not optimal and if I have to choose between concealing information or revealing it then, at least for me, the answer is clear.

At many PC meetings I have participated in, the di...

2009-02-02T16:14:00.000-05:00

At many PC meetings I have participated in, the discussion on controversial papers changes some people's opinion, and then, when a consensus is built, a vote is taken to accept/reject the paper. But the PC members do not change the scores online to reflect their changed opinion. If you are going to send the score you have to make sure this hasn't happened.

Also to consider if you send scores: not only you will get emails saying, "how come my paper was rejected when the scores were 4.4.4", but also "how come my paper was rejected when the scores were 4,4,4 and my officemate's paper was accepted and its scores were 4,3,4??"

I like Warren's idea of letting the authors know i...

2009-02-02T16:13:00.000-05:00

I like Warren's idea of letting the authors know if the paper was discussed very much. In addition to useful information, that way, the responsibility of rejection falls less on the shoulders of each reviewer.

How about explicitly telling authors at what stage...

2009-02-02T15:50:00.000-05:00

How about explicitly telling authors at what stage their paper was rejected rather than forcing them to reverse-engineer that information from the reviews and scores? For example say whether or not it was discussed at the PC meeting. I suspect that's a more accurate measure of a paper's quality than the scores given by reviewers. It also eliminates the possibility that authors could guess which PC member gave them a bad score.

If the choice is between no quantitative feedback and reviewer scores I vote for revealing the reviewer scores.

I also like the idea of a brief summary of the overall PC's opinion. Reviews are often contradictory so it's nice to know what the PC finally thought!

Scores: it doesn't matter to me. But I was quite s...

2009-02-02T14:48:00.000-05:00

Scores: it doesn't matter to me. But I was quite surprised by the result, so I am very curious to see the comments on my submissions.

When will we get those comments?

> I continue to be amazed at > the theory co...

2009-02-02T14:04:00.000-05:00

> I continue to be amazed at
> the theory community for
> finding reasons - any reasons
> - to resist change. Even
> change that is positive!

In the past, 10 years ago or more, PC was usually sending out the scores, so the change did happen!
No, I don't know what was the reason of changing this 10 or so years ago (even if you're saying the community should support changes).