Tuesday, July 17, 2007

The Old SODA vs. FOCS/STOC Debate

With the SODA deadline just past, and FOCS coming up nearby at Brown, I got to thinking about bringing up the old SODA vs. FOCS/STOC debate. For those new to the subject, FOCS (Symposium on Foundations of Computer Science) and STOC (Symposium on the Theory of Computing) are generally considered the top two conferences each year in theoretical computer science. SODA (Symposium on Discrete Algorithms) is considered by many (including myself) to be better suited for algorithmic results, and many have argued that the big two theory conferences should really be the big three.

Now, being a quantitative sort of person, I decided the best way to bring something new to the discussion would be to gather some data. So I picked a non-random year -- 2000 -- and decided to look at the citation counts on Google Scholar for all of the papers from the conferences and compare. I figured 2000 was far enough back that we'd get meaningful citation counts for comparison. Yes, I know, citations counts are not the end-all and be-all and all-that, but that's an argument for a different day (or for the comment section). In aggregate, they must tell us something important.

You can imagine my surprise when I looked at the numbers and found SODA completely dominated. Here's a little chart:

Papers Median Cites Max.Cites Total Cites
FOCS 66 38 318 3551
STOC 85 21 459 3393
SODA 122 14 187 2578

FOCS seems to have had an absurdly good year, but the data speaks volumes. At least as of 2000, the best results went overwhelmingly to FOCS and STOC. Overall, I found the results of this experiment rather disappointing for SODA.

It would be nice to check to see what the current trend is, but it was annoyingly time-consuming to do this by hand, and I didn't feel like writing a script. (There seem to be many exceptional cases; if you try to do an exact match on the title, you'll often not find the paper because the title has changed somewhere along the line or a typo or something. But with a regular query you often have to do some searching to find the paper. Perhaps someone more talented will be inspired to write a script?)

I learned something interesting and useful by gathering this data. As a community, should we be paying more attention to numbers like these, so we can make our conferences better? For example, I can't recall anyone ever trying to systematically answer whether a PC made good decisions or not after the fact, but apparently we now have the data available to try to answer such questions in an empirical fashion. (We might check if rejected papers were later accepted and highly cited, for example.) What possible lessons can we learn using these citation tools?


Anonymous said...

Comparing FOCS/STOC to SODA is a bit like comparing apples to oranges, and it could very well be the case that both FOCS/STOC are better theory conferences than SODA, while SODA is a better algorithmics conference than FOCS/STOC.

The main difference with SODA, in my opinion, is that it is rather narrow in scope. As a cryptographer, I find lots of interesting papers in STOC/FOCS (even beyond the small handful of crypto papers accepted), while I rarely (if ever) have occasion to read a SODA paper. Just based on scope alone, I don't think you can call SODA one of the "big three" theory conferences, even if its quality may be on par with STOC/FOCS.

SODA also accepts way more papers than STOC/FOCS. Whether this is good or bad is irrelevant here, but it surely dilutes the average paper quality.

For a more meaningful comparison of the citation counts, you may want to compare citations of the algorithms papers in STOC/FOCS to the papers in SODA.

Anonymous said...

It is interesting to try to evaluate each conference as a whole, and it has some limited value (for a department chair deciding whether to fund travel, or for a newbie trying to decide where to send their work, or where to read to learn about a field), but there is a dismaying tendency to evaluate a piece of research, or a researcher, by using the conference where it appeared as a proxy. Once work is say 5 years old, it should surely be evaluated by the number and type of citations to that paper, or similar paper-specific measures, rather than by measures of the conference as a whole.

Anonymous said...

I'd look at scott's blog, the entry about the Turing article.

Anonymous said...

dang, at first i thought you were using "dominated" as a verb, not an adjective.

Anonymous said...

It is not surprising that a conference accepting 66 papers across a broader portion of the TCS spectrum dominates an algorithms oriented conference with 122 acceptances. The real question is if there is anything gained from the extra selectivity of STOC and FOCS.

Say we could create a conference that would publish only the single best paper that year (like Aaronson's fictional FOCS'36). Such a conference would blow away your median cites and max cites count, but as a conference it would be awful: few people would attend, one would get a rather limited perspective of all recent relevant work, there would be little opportunity for followup work, with low probability of the single paper being relevant to your work, etc.

Then it seems to me that what most people implictly contend when they include SODA in the big three is that as a conference SODA is equally if not substantially more successful than the modern incarnation of STOC/FOCS.

To wit, SODA seems to attract most of the researchers in algorithms (including stars and substantial numbers from overseas), it has broad coverage (as opposed to overly narrow) of the intended areas, almost any result worth noting in algorithms will appear there and as such, regardless of one's research area, there always are several sessions of one's own interesting with papers well worth listening to, among others, and overall the quality of the papers is not that much below of STOC/FOCS.

This is why SODA has claims to being part of the big three. For the good of the field I hope it never goes the ultraselective route of STOC/FOCS which has turned them into nearly conferences which only authors attend.

Anonymous said...

Could it be that there is some reverse causality here? It could be that FOCS/STOC papers simply have more visibility, being more well-regarded conferences. If this caused some valuable papers to go unread by as broad an audience, whereas FOCS/STOC papers get built upon more often, you'd get a lower citation count for SODA.

Anonymous said...

Citation counts are very rough proxies for quality. I wonder if the level of resolution is good enough to distinguish between 14 (SODA) and 21 average citations (STOC). In fact the disparity between STOC and FOCS which are rather similar in scope and quality suggests that the noise factor of citation counts is upwards of 80%.

Anonymous said...

Re: quality of citation count.

Here's a data point: my top eight papers quality-wise --according to my own very subjective opinion-- are number 11, 2, 9, 1, 14, 8, 12, and 13 (in that order) by citation count. IMHO numbers 3, 4, 5, 6, 7 and 10 by citation count are much inferior work, but they just happen to be early work in areas that are currently hot.

Observe that the average rank by citation count is 8.75 while the average rank by quality count is 4.5. The noise factor is then (8.75/4.5-1)*100= 94% which is close to the 80% difference between STOC and FOCS noted by the previous post.

Suresh Venkatasubramanian said...

More here, as well as some raw data to play with.

Anonymous said...

To build on the comment by anonymous (July 18, 2007 12:26 PM), there are at least two potential endogeneity problems here -- in addition to the increased visibility of STOC/FOCS after the fact, there is the effect that researchers might want to send papers first to the venues where they feel they will be more visible...maybe someone wants to check on the fates of SODA papers that were first rejected by STOC/FOCS? ;) (Not that that would solve the first problem...)

Anonymous said...

What possible lessons can we learn using these citation tools?

Given anonymous Jul 18 12:57 little experiment, probably the answer is "not much". I doubt that they give much information even in the aggregate. Anyway, before starting playing with the numbers one would have to try to establish this.

This seems to me a very bad way to evaluate conferences. Probably the single best strategy to increase citation counts would be to publish biology papers.

Anonymous said...

There're a couple qualitative issues to bear in mind:

1. "vision"/"ground-breaking" papers are likely to be submitted and accepted to focs/stoc. these include papers that introduce areas, definitions, fundamental techniques. [think ARV, DDN, etc.]

2. a solution to an open problem posed in an IPCO paper is unlikely to especially interesting. by that reasoning, we'd want to relate our work to other focs/stoc papers in a soda/focs/stoc submission.

Anonymous said...

To "theory grad student": I can guess what DDN is (though I doubt anyone outside of crypto will be able to --- and I wonder how important this paper is outside of the crypto community, but that is a question for another day), but what is ARV?

pálenica said...

ARV == Arora, Rao, Vazirani. O(sqrt(log n)) approx for sparsest cut (and best paper in STOC'04).

Anonymous said...

ARV == Arora, Rao, Vazirani. O(sqrt(log n)) approx for sparsest cut (and best paper in STOC'04).

The paper still doesn't ring a bell, though the accomplides are well known.

Widely used theory acronyms based on initials: RSA, L-Z, AVL, BMP, any others?

Anonymous said...

Well according to http://libra.msra.cn/default.aspx the best CS conference is SOSP (STOC is number 10). .

The best CS journal is TOCS (Transactions on Computer Systems, JACM is no. 7 after CACM).

The best author is Ullman (Karp is #15, Wigderson #187).

Anonymous said...

According to web site above WADS at 171 is better than SoCG at 175. So much for citation counts.