Thursday, September 19, 2013

Reviewing Bad

A pun title, relating to two quick things.

First, I had the wonderful experience of getting to see (through a special deal set up by Harvard's faculty development office) All the Way at the American Repertory Theater.  It's a new play following the history of Lyndon Johnson (and Martin Luther King) from the November 1963- November 1964 time period (from when Kennedy was assassinated to when Johnson won the presidential election).  I was silly enough to not realize when I got the tickets that Bryan Cranston, of Malcolm in the Middle Breaking Bad fame, was playing Johnson as the lead.  It was a really fantastic show.  (Hey, the rest of you in Harvard CS who aren't going to these things -- why not?  They're awesome!  Krzysztof was the only one I saw this time...)  Well acted and fascinating history.  The cast was amazing (and large -- I think 17 actors total), and I kept recognizing them from TV.  My inner gushing fan was set off by Michael McKean -- at one point, some of the actors were out in the audience area, and I excitedly noted McKean was about six feet from me.  (I chose not to seek his autograph given the performance was going on at the time.) 

[Note -- sadly, the show is already sold out... at least for this run.]

Ah, then the bad news.  After being on the executive PC for STOC 2013, I heard from multiple colleagues afterwards who had their papers rejected about what they felt was the low quality of reviewing.  (In my defense, I commiserated with several of them at the time.)   So, after getting the reviews from the SODA PC (for my rejected papers), I feel obliged to comment.  Quality-wise, they're terrible.  (Not universally so... but some of them....)  I was going to put in specific examples, but after the heat of the moment died down, my cooler head prevailed and determined that was inappropriate.   But suffice to say that beyond the usual we don't understand the motivation type stuff, there are comments that are factually wrong that betray fundamental misunderstandings, and opinions regarding "what's important" in the paper that are -- in my experience -- just off the mark.  I've been through it before -- you suck it up, find the useful comments, rewrite, and re-submit.  But it is disturbing (from both sides, as the receiver of reviews and as one helping manage the reviewing process), and worrisome if it's an increasing problem for many submitters.    

27 comments:

Grigory Yaroslavtsev said...

+1 for "All the way"! The show is fully sold out now, though.

Michael Mitzenmacher said...

Grigory -- you're right, I should have mentioned the show is now sold out; I'll update accordingly.

Anonymous said...

I am actually not sure why it is inappropriate. Are we under some kind of obligation to keep reviews to ourselves? Why not publish them and point out why they are so bad? Alternatively, you could ask the SODA PC chair to forward your replies to the referees.

Vitaly said...

I believe that misunderstandings in reviews in SODA/STOC/FOCS are to a large degree a result of a bad review process design. Of course human error and time constraints are also a factor but we have little control over those.

Misunderstandings happen, probably more often these days than in the past since the field has grown larger, deeper and more diverse. At the same time the review process in TCS has become worse. PC members increasingly rely on external reviewers. If a mistake is made by an external reviewer there is no reasonable mechanism to try to correct it. An external reviewer does not see other reviews and usually does not get any feedback from the PC member. Such feedback could help the reviewer to realize the mistake. PC members often don't have enough expertise, time and confidence to understand if the reviewer made a mistake.
The authors have no chance to respond and correct a misunderstanding before the final decision. The response itself (visible to the whole PC) is a good incentive to review more carefully.

Finally, uncorrected and unanswered bad reviews can end up being reused for future submissions.

All this might be not that important in the long run for most researchers but it does create a lot of additional noise in the system and can be quite significant in the short run (e.g. for students). Not to mention the emotional toll on frustrated authors.

These issues have been realized by many communities and require relatively little effort to fix. Author-response period allows the author to respond. Two tier committee with few external reviewers (as in STOC 2012) remove a degree of indirection and lead to more consistent reviewing (consistency does not guarantee quality but in the current system neither is guaranteed).

I would recommend everyone who has got a particularly bad review to email a response to the PC chair (as some already do). Even in the cases where the bad review might have not affected the final decision. At the very least this would give a PC chair a better idea of the scale of the issue and also indicate to the relevant PC members which reviewers should be less trusted.

Anonymous said...

I'm a mathematician - with degree in theory of CS - and I've been on a CS theory conference committee in the past. I've also submitted papers to CS conferences. After this PC experience and getting several papers rejected on rather flimsy grounds like the ones you mention, I've decided to stop submitting papers to CS conferences. Why put myself through the stress of deadlines when the work is done at the last minute? I don't think that in the short time frame allowed by most of these conferences for review, one can expect well-considered reviews to come out of them except from a small subset of very conscientious people.

Anonymous said...

Anonymous at September 20, 2013 at 9:19 AM again.

Another thought. Most of the world does not need to know about my groundbreaking work within 6 months of it being done. For the people that do, there is arXiv or CoRR. Post your papers there, submit it to a good journal and stop with this conference hoopla.

Anonymous said...

A simple rebuttal process would take care of this. I've been on PCs with rebuttal phases and they work fine. The vast majority of the time the rebuttal takes only a few minutes to read and adds nothing, so no one is worse for the wear. Occasionally it does correct some basic misunderstanding and every one is the better for it.

I have yet to understand why the theory community is so adamantly opposed to a rebuttal phase, when it has been widely adopted elsewhere.

Anonymous said...

I left TCS for a job in industry because I wanted my work to be evaluated by people whose incentives are aligned with helping me succeed, not overworked, anonymous reviewers who can admit in reviews that they don't even bother to read the paper with no consequences.

Anonymous said...

Observe that highly competitive conferences such as SODA/STOC/FOCS only make things worse. In a more reasonable conference two-out-of-three positive reviews are enough to get in, whereas at the top tier you need all positive reviews for an acceptance.

Now, let's conservatively estimate that the probability of a review being wrong is 1/4. In a good-but-not-great conference, the chances of a good paper being rejected are thus 1/4^2 or 1 in 16, whereas in SODA/STOC/FOCS fully one in four good papers are rejected.

Anonymous said...

I like the calculations of Anon 1:01pm. He is a troll who knows his audience!

Anonymous said...

Vitaly is absolutely spot-on. Author-response is easy to implement, and helps correct mistakes, with no downside as far as i can see. Allowing external reviewers to see other reviews and to discuss would also help massively, not just for correcting mistakes but also leading to more consistent scoring. This is independent of whether to adopt 2 tier PC or not: allowing discussion within the scope of 1 paper is already helpful.

Anonymous said...

I've learned to realise that factually erroneous reviews and reviews that totally miss the point are invaluable for revising the paper. They point out where I have failed to communicate my results and their significance properly. (Of course, I still hate rejection.) The worst review to get is ``This paper is pretty good, but not quite up to (Conference name) standards'' because it is no help whatsoever.

From the other side, I have often seen committee meetings where a sub-reviewer's erroneous points are dismissed, but the paper is rejected for another reason (like, it's pretty good, but not quite good enough to get in.) The authors still receive the sub-reviewers review. So just because a reviewer said fallacious thing X and your paper gets rejected does not mean that it got rejected due to the stupid reviewer who thought X. I have to admit, sometimes my own reviews have fallacies that are pointed out to me in the PC meeting or on-line discussion, and I forget to go back and change my review before it gets sent to the authors.

Getting rejected is hard on the ego, and I don't think there's any perfect reviewing process. That's not to say we shouldn't try to improve reviewing, but we shouldn't expect perfection.

Anonymous said...

Michael, try the following approach: Of the three reviews, drop the one you found most unhelpful/infuriating. How is the quality of what remains?

Anonymous said...

I think Russell's response is spot on.

Rebuttal could resolve some of the problems, but I don't think it's so often that the rebuttal would help, and it's much more often up to us to make sure that we're making the point clearly.

Day-dream said...

Some bad reviews should be exposed. The worst reviews in my perspective are the _rude_ ones. If you want to criticize, do it in a professional manner, pointing specifically what is the problem. But in no case use exaggerated, subjective and emotional tone.
I recently got a reviewer saying general things like: "I find the introduction in the paper weak!", without further explanation. What does it mean? Say _specifically_ what is weak about it? If you can't, because you are incapable of intelligent articulation, don't use such general harsh statements. In fact, don't review at all. You can't have it both ways. The review then uses aggressive emotional tone, while using excessive exclamation and questions marks ?!?!?!

Michael Mitzenmacher said...

Russell -- I disagree when you say: "I've learned to realise that factually erroneous reviews and reviews that totally miss the point are invaluable for revising the paper. They point out where I have failed to communicate my results and their significance properly." SOMETIMES this can be the case. But sometimes, no, it's really just the fault of the reviewer. The line that came out when I was talking with some people who had their paper rejected from STOC was (and I can't recall who said it), "I cannot predict ahead of time all the ways that someone can misunderstand my paper."

I do acknowledge and agree that reviews, including erroneous ones, are potentially orthogonal to the final decision to accept/reject the paper. But that's a severe reviewing problem. If the paper is rejected, the purpose of the reviews should be to provide advice on to how to improve the paper. Erroneous reviews don't do this. (I find other communities are much more dedicated to the idea that reviews for rejected papers are meant to help make the next iteration of the paper better. I admit I found this idea somewhat foreign when I first encountered it, being used to theory conferences program committees.)

Michael Mitzenmacher said...

day-dream :

I agree with your statements about vagueness and rudeness. Vagueness is very frustrating -- as you point out, it doesn't give useful information on how to improve the paper. (The bad reviews I received suffered from vagueness in several places.)

Here's something that I also find, which relates to your statements on rudeness. I find of late I get reviews telling me that I should take things out of my paper, that the parts of the paper I think are important aren't important, etc. These statements are not made in the form of humble suggestions, but as you say, more aggressively. I believe reviewers should present such opinions if they feel strongly about them. But there should be more humility there -- those are opinions, based on a quick read of the paper. The author may have a better idea than the reviewer has as to "what's important". (I admit, I particularly resent comments of this form in areas where I've been writing papers for 15+ years now.) And such opinions, I think, should be a minor issue in accept/reject decisions.

I'd like to see reviewers focus on what they LIKE instead of just on what they dislike. Mikkel Thorup wrote about this recently -- on this blog, as I recall -- it's the idea of "taking the max" of the paper, not finding the min.

Anonymous said...

In general, one would expect worse reviews if one sends their papers to inappropriate venues (for instance, to venues which are unlikely to accept your paper). Then the reviewers feel like their time is better spent on papers which are "near the borderline," as opposed to those that clearly won't make the cut.

From an anecdotal perspective (with a decent sample size), I have about 8% of my conference papers rejected and 10% of my journal papers.

Despite being upset with the politics of our field for many reasons, looking back I have always respected the opinions of my colleagues. I've never felt that any rejection was significantly unfounded. Sure, I could have made the argument that "paper X that got in had less merit than my submission," but this would be a misguided way of interpreting the goal that PCs are meant to accomplish.

So perhaps some advice on getting higher quality reviews would be to submit one's papers only to venues where one is pretty sure they should be accepted.

Anonymous said...

I had a paper rejected recently where the biggest criticism was "what if X happens? how come the authors didn't discuss this", while funnily enough we have an entire paragraph in page 3 showing what to do when X happens.

A simple rebuttal round would have dealt with this issue, to the benefit of all. I am aware that the paper might have been rejected nonetheless, as Russel points out, but this is not a valid argument against rebuttals.

stone said...

I agree with Grigory. '+1 for "All the way"! The show is fully sold out now, though. '

Day-dream said...

I second Michael further comments of course.

As to the claim by Anon (September 21, 2013 at 2:09 PM), that one should send his/hers papers to conferences were they have a high chance of getting accepted: I agree with this claim, but I disagree with your claim that most bad reviews are a result of not obeying this suggestion. My rude reviewer for instance actually gave it a "border-line paper" tag, and the other reviews were positive.
Other rude and aggressive reviews I got in the past resulted in fact in acceptance. Perhaps the PC committee realized that one cannot rely on aggressive and emotional reviewers?

JeffE said...

I agree with Russell. No, of course I can't predict how the reader is going to misunderstand my paper, but it's still my job to try. Even the most clueless reviews—and I've had some _amazingly_ clueless reviews—offers some insight into how to improve my papers. (Maybe move that paragraph on X from page 3 to page 2, and add a bold heading "What about X?")

Rejection still hurts, of course. Especially rejection for stupid reasons. But the world doesn't owe me a SODA paper.

I strongly disagree with Anonymous at 2:09. If only 10% of your submissions are getting rejected, you are _dramatically_ underselling your work. Some of my most highly reviewed and highly cited papers were speculative submissions, where I honestly had no idea whether people would like the paper. (Most of my speculative submissions are rejected, of course.)

Anonymous said...

Maybe move that paragraph on X from page 3 to page 2, and add a bold heading "What about X?"

Then you have another reviewer complain that you discussed X in too much detail. [This is a true story, with another paper on second submission with added detail. I kid you not].

I wonder where the resistance to improvements is coming from. No system is perfect and making minor adaptations makes obvious sense.

Rebuttals is one such minor tinkering, which has been tried elsewhere and worked. In fact we might have had rebuttal phases from day one if the web had been prevalent back when conferences were first created.

The fact that sometimes the apparently erroneous review is due to improper writing is no refutation of rebuttals, and it is surprising that several opinions here (and elsewhere) keep on bringing this fallacious argument forward.

We have all seen a reviewer be wrong on the other side i.e. our fellow PC members and likely at least once we have been wrong ourselves. I know I have. Sometimes the error gets caught in the discussion, sometimes it doesn't.

And as Russel points out, even if it has, far too often the review is not amended to reflect this; which is yet another thing that can be improved in the review process: make sure comments reflect the actual discussion.

Day-dream said...

Let me repeat here my comments left on (S. Har-Peled's referring blog):

The problem is that not everyone is doing the same level of reviewing. Some of us are good sub-reviewers and some are less so. That’s fine. But the problem is that some of us are extremely bad and illegitimate reviewers. Let me explain: in no way should the reviewer exhibit subjective emotional hostility (e.g., “this is typical in this field!”), vulgar and haphazard way of speaking (e.g., “so if X, what do we get? Nothing!!!?”), exaggerated exclamation and question marks (see before), generic and vague accusations without further explanations (“the exposition is weak”). Every criticism should be backed up by explanations, specific argumentation and a professional tone. Anything else is totally unacceptable.
---

And the favorite of some aggressive reviewers: “This is an oversell of the result”.
This is an ethical accusation of a very sever nature. You claim it’s an oversell, fine. Please back it up with a detailed account of the history of the problem, previous attempts, and why precisely it is an oversell, while comparing it to other published results in the field, and the way they were described by their authors.

Anonymous said...

I have had quite a few useless reviews recently. Two reviews said that my results were interesting and so on and therefore suggested acceptance. Then the third review wrote a short line "I don't find these results interesting" then suggested rejection. My paper ended up getting rejected.

One may explain this by saying that "reviews do not reflect PC discussion", but why should any of us tolerate this kind of reviews? This review does nothing but harm to the community: How can I know which venue I should target next if this is the quality of review I got? If the reviews do not reflect PC discussion, then try to make them do!!!

I agree with Day-dream that any rude/silly remark should be accompanied by a detailed explanation.

Ryan Williams said...

I mostly agree with Russell: reviews muddled by confusion and errors have been helpful to me, but I would still advocate a "rebuttal" phase in theory conference reviewing as well.

It's certainly been the case that a rejection (with seemingly "clueless" reviews) was the best possible thing that could have happened to my paper, at least in hindsight. The misunderstanding and apathy displayed in the reviews made me rethink what the "center of mass" of the paper really was: what I really wanted to say, and how I should say it. I was personally much happier with the final paper (moreover, it was accepted in the next go-round). Had the paper been accepted the first time, it's likely that I would have missed that bigger picture, and definitely would have had a weaker paper.

I have also gotten reviews where it appeared that the rejection hinged on erroneous doubts about whether a certain lemma was true, rather than confusion about the main message. A (short) rebuttal phase would help those painful cases without much further load on the PC. I am not sure which is more effective: the PC chair handling these cases individually (by email with the authors), or a universal rebuttal phase. Depending on the submissions, it's possible the latter would put a significantly lesser load on the PC chair; also, the latter seems more likely to reveal potential problems that the former could miss. I have served on a PC with a rebuttal phase, and I thought the extra time involved was minor.

For those opposed to a rebuttal (or "author response") phase, what do you think about allowing rebuttal phases for only student-authored papers? That would have been helpful for me; it took me a few years to learn how to write for STOC/FOCS, and getting an additional round of interaction with the PC could have decreased this time interval.

YS said...

Micheal,

It is rather disappointing that you as a PC member cannot come up with a solution to this problem, or not even try to propose a solution?

I think the easiest thing is to blame reviewers. Do not get me wrong, I agree to your comments about bad reviews. But what about the role of PC members?
Why they cannot choose reviewers wisely, or they cannot at least skim papers?

Some reviews are direct insult as I see from the comments here (I felt fortunate for not getting such insults, but I did definitely receive bad reviews).
These reviewers can be easily detected and blacklisted, they do either bad reviews or adversarial reviews; in any case this does not help the research community.

There is very little incentive to write better reviews. Some journals like Communication Letters publishes best reviewers list.

There are really good reviewers that do their job great, even though the may make some mistakes time to time, and there are just plain bad or adversarial reviewers.
When I receive a paper to review, if there are methods/theories/algorithms that I do not know, I read other papers just to learn the fundamentals.

Jeff Mogul, in his "Towards More Constructive Reviewing of CS Papers" paper, says that publishing encourages authors to write better papers. What does encourage reviewers write
constructive reviews? I think the identity of reviewers should be open. Then, people would at least stop writing nasty reviews as in the case of Youtube/Facebook vs anonymous boards.
I would be glad to write open reviews, as I never offended anybody in my opinion.
I gave constructive feedback, pointed out mistakes, which were corrected.

Do we want cool reviewers who do not need to offend anybody, or do we want to show Ph.D. students that there are some uncool people, who offends people behind the mask of anonymity.
Where is the integrity?

So, the question is, who scares from open reviewing? And also there must be a some system of assessing reviewers. In third world countries, as some European countries,
I notice that being a reviewer is regarded as an important thing and people put these on their resume. How do they qualify? How do we know they are doing good job?
How about if we were able to rate their reviews, and all the people see that they are really bad.

My Ph.D. advisor always told me that reviewers are always right, even if they do not read or understand the paper. So I never write back to PCs, nor many other people write back.
Like the reviewers rate our paper, we should be able to rate the reviews. Is that too hard?

Bottom line: Most of the times, I think it is PC members' fault rather than reviewers.

For example, I recently got reviews for a paper. Two reviewers say that they are totally familiar and the paper should be accepted,
One reviewer clearly wrote that he is unfamiliar with the area and suggest a weak reject. But you see that reviewer is really honest, even though he is clueless about the problem
he tries to give constructive feedback. How can I blame the reviewer? It is totally PC members' fault, that in a very narrow area conference, which has 4 major problems, they are able to find a reviewer who is clueless about the major problems (probably an undergraduate student). (let's say we submitted a paper sorting algorithms, and reviewer thought we are trying to solve shortest path algorithms, and for some weird
reason we call SPAs as sorting algorithms.)

If only one TCP member had just skipped, he would have seen the problem easily. So, I cannot stop myself not pitying these PC members.