Comments on My Biased Coin: STOC 2009 : "Impending Doom"

I am one of the authors who has submitted abstract...

2008-11-14T01:05:00.000-05:00

I am one of the authors who has submitted abstract at STOC 2009. I had a query regarding the format of the extended abstract. Can one use fullpage package?
Thanks.

So having used (suffered through?) the SIGCOMM sys...

2008-08-05T14:01:00.000-04:00

So having used (suffered through?) the SIGCOMM system on several PC a few comments.

The goal of the SIGCOMM system is to figure out which 40 or 50 papers (out of the hundreds submitted) are to be discussed at the PC meeting.

In that light, there's no point in differentiating between papers that are clearly below the bar -- so a single grade (1) covers papers that are in the bottom 50%. Indeed, one of the most important innovations in SIGCOMM reviewing has been the quick reject process -- in the first round, every paper goes to two reviewers -- if both rate it in the bottom 50% it is rejected. Only papers with at least one rating in the upper 50% get additional reviews. So the additional reviews are concentrated on the papers that have a chance at acceptance.

The remaining grades distinguish among papers that might be discussed (most in the second 25% are not discussed, but some are) and help focus the PC discussion a bit.

Also, someone asked if the ratings fit the percentages (i.e. is the top rating given in only 5% of all reviews?). Some PC chairs have looked at this issue and the answer, roughly, appears to be yes. But it is only rough.

The issue is that it takes a while for a reviewer to calibrate. Consider that the typical PC member (in a big PC) sees only about 20 papers out of 300. The likelihood that in this sample of 20 they'll get a quality distribution matching that of the 300 is pretty small. (Compounded by the fact that their selections aren't random, but rather match their reviewing expertise -- and some years are more fertile/innovative than others in a particular sub-field).

hi michael, what is the homepage for stoc 2009? Wh...

2008-07-27T02:52:00.000-04:00

hi michael, what is the homepage for stoc 2009? When I tried googling for stoc 2009 all i got was your blog :-) . Though I enjoyed reading your blog entry and the comments which followed, I was wondering where the actual home page for stoc 2009 was :-) Please let me know.

Sudarshan Iyengar
India

Hi Michael,A big PC, sometimes, becomes unfair to ...

2008-07-08T03:25:00.000-04:00

Hi Michael,

A big PC, sometimes, becomes unfair to students who write papers with a co-author on the PC. As much as I like the idea of having a reasonably big PC, I wonder how will this concern be taken care of. I don't have a good answer...

Omkant

I'm all in favor of a bigger PC; I think there ar...

2008-05-12T22:25:00.000-04:00

I'm all in favor of a bigger PC; I think there are budget limitations since STOC has a PC meeting, but I'll see how large I'm allowed...

Michael, please consider expanding the size of the...

2008-05-11T13:22:00.000-04:00

Michael, please consider expanding the size of the PC. I just don't see the point of having each member of the PC responsible for 40+ papers. Plus with more people on the PC you can get better coverage of more areas.

"Because of concerns that members of the community...

2008-05-11T11:47:00.000-04:00

"Because of concerns that members of the community raised, in the future non-expert sub-referees will not be allowed. Student sub-referees must be experts."

Isn't the last sentence redundant anyway?

We are in agreement. I didn't include the "Studen...

2008-05-10T13:59:00.000-04:00

We are in agreement. I didn't include the "Student sub-referees must be experts" part of the quote because I though it was actually too strong.

Paul,The exact quote seems to be:"Because of conce...

2008-05-09T23:32:00.000-04:00

Paul,

The exact quote seems to be:

"Because of concerns that members of the community raised, in the future non-expert sub-referees will not be allowed. Student sub-referees must be experts."

And I agree with that...except that there's obviously some room for interpretation. (When is a student -- or anyone -- qualified as an expert? It seems not to be spelled out in the guidelines...)

I would never give a paper to a student who I felt didn't know enough to give me significant useful information. Indeed, whenever I use a subreferee as a PC member, I feel its incumbent on me to take their report and use it as information for my own decision as a PC member. If someone has much greater expertise than me on a subject (for example, pretty much whenever I'm stuck with a quantum paper) I may not have anything further worthwhile to add, but usually that's not the case, and even then, in the end I'm responsible for the final opinion and judgment. If I use a student as a subreferee, by definition it's because I think they qualify as expert enough to provide me with a useful opinion, but if a student has less experience, I would be responsible as the PC member for taking that into account.

In short, I read (and recall) that situation and the resulting report being a clear statement about how PC members need to view their responsibility -- which should inherently limit the amount of subrefereeing given to students. I see it less as a dictum drawing a bar meant to prevent students from reviewing -- which is an important experience for graduate students to undertake as they become more advanced.

to ease the load, give students experience, and/or...

2008-05-09T23:00:00.000-04:00

to ease the load, give students experience, and/or get better reviews.

This doesn't seem completely accurate. According to SIGACT guidelines, to "give students experience" is not a legitimate reason for them to be sub-referees. This guideline was instituted way back at the STOC 1999 business meeting (see Sept 1999 SIGACT News):

"Because of concerns that members of the community raised [about the STOC 1999 process] in the future non-expert sub-referees will not be allowed." The exact implementation of the guidelines is supposed to be interpreted by each PC chair.

Generally, students are appropriate sub-referees when they know the area. They can also be appropriate sub-referees when the object is to check correctness of a paper.

thanks for answering that first question!

2008-05-09T16:14:00.000-04:00

thanks for answering that first question!

Carter1) The Program Chair (in this case, me) wit...

2008-05-09T09:55:00.000-04:00

Carter

1) The Program Chair (in this case, me) with advice from others picks the PC.
2) The PC is responsible for reviews; sometimes PC members request others (experts, or students) to do some of the reviewing for them, to ease the load, give students experience, and/or get better reviews.

Hello Mike, I have two questions:1) How are the co...

2008-05-07T21:36:00.000-04:00

Hello Mike, I have two questions:

1) How are the committee members and referees typically chosen for theory conferences like stoc?

2) I'm nonharvard student doing an REU with Prof Morrisett this summer, are there any interesting theory talks that might be going on that I can look forward to?

thanks,
-Carter

Michael, I like your idea of cutting down on the n...

2008-05-01T08:51:00.000-04:00

Michael, I like your idea of cutting down on the number of different rating categories. I also agree with Paul that categories like "top 33% but not top 10%" are difficult to assign without looking at the whole pool. It must be even harder to get an opinion from a subreferee not on the committee and arrive at a grade based on their remarks.

Anyway, however you end up doing it, good luck! Hope to see a great program for STOC '09.

-Amit

Harry Lewis reminded me to go back and look at Exc...

2008-05-01T07:55:00.000-04:00

Harry Lewis reminded me to go back and look at Excellence Without a Soul, page 120, discussing the number of categories used for grading. The main point is the first sentence, "A scale with more categories allows more precise comparisons, but the value assigned to any individual piece of work is more arbitrary." He suggests that even 5 categories is too many. The SIGCOMM scale is designed so that the arbitrariness is focused at the top (differentiating between top 5 and top 10%), but perhaps for STOC 4 categories -- given either as percentages, or nominally in the style of Alan -- would be sufficient.

Anonymous -- Just because I'm no Scott Aaronson do...

2008-05-01T07:50:00.000-04:00

Anonymous -- Just because I'm no Scott Aaronson doesn't mean that I'm not dropping a subtle exaggeration here or there.

"...and, most importantly, I felt it might lead to...

2008-05-01T06:34:00.000-04:00

"...and, most importantly, I felt it might lead to interesting fodder for the blog."

So this blog is the center of your universe?

Something like this would be an interesting experi...

2008-04-30T15:50:00.000-04:00

Something like this would be an interesting experiment. I wonder how the size of the PC will impact things. Even with the large number of submissions that individual committee members are responsible for, they see only a small fraction of the papers. How should a committee member produce their ratings? Should their ratings be relative only to the papers they are assigned? If so, this is a bit of a quota system on ratings divided into subareas associated with committee members. This issue becomes more complicated because external reviewers will not have access to the pool of submissions to compare.

If there is no such quota, which seems more reasonable, how do they judge the pool as a whole? (This is particularly difficult for someone who has not served previously.) It seems that the only unambiguous standard that an inexperienced PC member can use in this case is how a paper rates relative to accepted papers at previous conferences. What will likely happen is that people will not actually abide by percentages and will use something between the top 25% or top 33% as code for something that is up to the standards of previous STOC/FOCS (since that is roughly what the acceptance rates have been). They'll use "top 50%" or something similar for the middle section of good papers and they will save time by not worrying precisely about grading the rest. (Committee members now probably spend too much time overall trying to tease out whether a paper is publishable, a 4/10 rating say, versus garbage 0-2/10. This would allow them to save time for more important things and seems to be the biggest win with the proposal.)

The software will tend to want to produce list of papers based on "averages" from these scores but by changing the scale the "averages" probably will have less meaning. (I've never much liked these averages anyway.)

Why might this be any different from SIGCOMM? SIGCOMM has a significantly larger committee and a much higher proportion of reviewers will likely have prior SIGCOMM reviewing experience. On the other hand, I bet every SIGCOMM reviewers know the typical acceptance percentages so actual behavior might not be so different.

I think the real measure should not be relative to...

2008-04-30T14:52:00.000-04:00

I think the real measure should not be relative to the pool of submissions, but absolute, and I think there are just 4 real levels needed: "I will fight hard to get this paper in, because it will change our field", "This is worthwhile, and won't lower the standard of the conference, but others might also be worthy; I won't spend effort fighting for it", "Not as good as usual in the conference, but it is a minor contribution which should appear somewhere", "This is unclear, wrong, silly, already known or trivial, it would be an embarrassment to be associated with a conference that publishes this as submitted". [There may also be a fifth category for "something is unclear/wrong/silly/etc, but there is also an idea worth pursuing that could be published here if the paper were rewritten substantially" - unless the conference has a shepherd process allowing revision, this should be treated like "embarrassing if it appears", except for the feedback it sends to authors]

With such an absolute system, it's easy to focus attention on the real debates that matter for the program. One can simply accept every paper where someone will fight for it, and no-one thinks it below the usual standard; reject every paper where no-one will fight for it and someone has doubts. One must resolve the cases of real debate (when someone will fight for a paper, and someone else thinks it is below the usual standard for the conference, or even bad). Finally, one can fill up the program with a random (or better, spread by topic among submissions from authors new to the community) selection from the papers that everyone thinks are worthy to appear but not important enough to fight for.

I wonder, when asked to use that rating system, ho...

2008-04-30T11:59:00.000-04:00

I wonder, when asked to use that rating system, how different the histograms look. E.g. do 20% of papers get rated as "top 10%"?