Monday, October 19, 2009

WSDM Paper : Acceptance Rates

I'm happy to announce our paper "Adaptive Weighing Designs for Keyword Value Computation" -- by me, John Byers, and Georgios Zervas -- was accepted to WSDM 2010 -- The Third ACM Int'l Conference on Web Search and Data Mining. (The submission version is available as a technical report here.) The abstract is at the bottom the post for those who are interested.

The paper's acceptance gives me an excuse to discuss some issues on paper writing, research, conferences, and so on, which I'll do this week. To start, I found it interesting that WSDM had 290 submissions, a 70% increase in submissions over 2009. Apparently, Web Search and Data Mining is a healthy research area in terms of the quantity of papers and researchers. They accepted 45, or just about 15.5%. This turns out not to be too far off from the first two years, where acceptance rates were also in the 16-17% range. I'm glad I didn't know that ahead of time, or I might not have submitted!

I'm curious -- why would a new conference, trying to establish itself and gain a viable, long-term group of researchers who will attend, limit itself to such small acceptance rates when starting out? Apparently they thought the key to success would be a high quality bar, but I find the low acceptance rate quite surprising. I can imagine that the rate is low because there are a number of very poor submissions -- even the very top conferences, I've found, get a non-trivial percentage of junk submitted, and although I have no inside knowledge I could see how a conference with the words "International" and "Web" in the title might receive a number of obviously subpar submissions. But even if I assume that a third of the submissions were immediate rejects, the acceptance rate on the remaining papers is a not particularly large 23.3%.

The topic of low acceptance rates for CS conferences has been a subject of some discussion lately -- see Birman and Schneider's article at the CACM, Matt Welsh's thoughts, Dan Wallach's thoughts, and Lance Fortnow's article at the CACM for instance. Here we have an interesting example case to study -- a new conference that starts out with an accept rate in the 16% range, and an apparent abundance of submissions. Anyone have any thoughts on why that should be? (I'll see if I can get some of the conference organizers to comment.) Or opinions on if that's the way it should be?

Now for that abstract:
Attributing a dollar value to a keyword is an essential part of running any profitable search engine advertising campaign. When an advertiser has complete control over the interaction with and monetization of each user arriving on a given keyword, the value of that term can be accurately tracked. However, in many instances, the advertiser may monetize arrivals indirectly through one or more third parties. In such cases, it is typical for the third party to provide only coarse-grained reporting: rather than report each monetization event, users are aggregated into larger channels and the third party reports aggregate information such as total daily revenue for each channel. Examples of third parties that use channels include Amazon and Google AdSense.

In such scenarios, the number of channels is generally much smaller than the number of keywords whose value per click (VPC) we wish to learn. However, the advertiser has flexibility as to how to assign keywords to channels over time. We introduce the channelization problem: how do we adaptively assign keywords to channels over the course of multiple days to quickly obtain accurate VPC estimates of all keywords? We relate this problem to classical results in weighing design, devise new adaptive algorithms for this problem, and quantify the performance of these algorithms experimentally. Our results demonstrate that adaptive weighing designs that exploit statistics of term frequency, variability in VPCs across keywords, and flexible channel assignments over time provide the best estimators of keyword VPCs.


Anonymous said...

Having reviewed for, submitted to, and published at WSDM, WWW, and SIGIR, I have found that the greater the web focus of the conference, the lower the quality of the submissions and often the acceptances.

Anonymous said...

More puzzling to me is that the PC seems huge, for such a small conference.

Brian Davison said...

The WSDM 2010 conference organizers (of which I am one) and steering committee have, of course, discussed many of these issues.

The intent of the WSDM series is to provide a smaller, focused venue for topics related to web search and web data mining. As a result it is single-track. The conference has been fortunate to have attracted plenty of interest even in the first three years to make acceptance rates highly competitive.

The conference has also grown over the years. The first WSDM conference was only two days, the second roughly 2.5 days, and this one will be three full days of the conference. By adding more presentation sessions and shrinking the presentation times, we are almost keeping up this year (70% increase in submissions, about 55% increase in acceptances compared to 2009).

Finally, WSDM acceptance rates are also quite in line with top venues with overlapping interests, like WWW, CIKM, and SIGIR, and a little lower than recent years for KDD.

Anonymous said...

Yeah, I was just going to say that the WSDM acceptance rate is right in line with related conferences like SIGIR, CIKM, and WWW. Since these along with ECIR (around 24%) are my main publication venues, I've started thinking these rates are typical, which affects the way I view the higher acceptance rates that my colleagues seem to enjoy.

Anonymous said...

Hi Michael,

What worries me most is your sentence "I'm glad I didn't know that ahead of time, or I might not have submitted!"

Why should the acceptance rate of a conference matter in the decision of where to submit? Perhaps this is a subject of a different post, but do you aim for: (1) the venue with the best fit (2) the most prestigious venue even if it takes a few tries (read: year/two) to get it in (3) the easiest venue to get into (that's still respectable enough to list on the cv) or (4) whichever deadline is next.

In the social optimum (1) is the best choice. Clearly though, other things cross your mind :)


Michael Mitzenmacher said...

Anonymous 4: Before answering the question, I should point out that "best fit" could keep one from submitting to a conference based on quality; if I know something is almost certainly not going to get into, say, STOC/FOCS, for perceived quality reasons, arguably isn't it best not to submit it there? Maybe to you that smacks of your choice (3); in my mind, it's just efficient handling of papers (on both sides, mine and the reviewers).

Honestly, for me (4) is a big issue; I like to get things done, and not leave papers lying around, so I can get on to the next interesting thing. Otherwise, I try to aim for the best fit (1), although really it's more like good fit. Roughly, I guess my calculation is something like, "What's the next conference where this paper is a good fit, and hence has good odds of getting in."

David Molnar said...

Just a quick comment, not knowing WSDM at all: to me, a 16% rate sounds normal, and 23.5% certainly on the high side. The main conferences in security range in the low teens (and IEEE security and privacy dipped towards 8% one year). I am not arguing that the rates are good or bad, just noting that to me those rates come across differently.

Anonymous said...

What does it mean for a acceptance rate to be good or bad? Obviously, if there are several conferences with the same audience and all rejects from one conference are submitted to the next, etc, then the acceptance rate from the community is much higher than the individual acceptance rate of each conference.

Also, in my experience, recent WSDM papers seem to introduce interesting problems, but have very little in the way of algorithmic ideas, at least this is true for the algorithms papers. This is probably because the PC is so large and the topics so broad, there is a high probability that a paper is refereed by someone who has no clue about the topic in the paper.

Brian Davison said...

Anonymous said: "This is probably because the PC is so large and the topics so broad, there is a high probability that a paper is refereed by someone who has no clue about the topic in the paper."

While conference organization is nicely divorced from the paper review and acceptance process, which means I can't offer any insight on this WSDM, I'm still surprised about the above statement. I would expect exactly the opposite -- modern conferences typically have an extensive paper reviewer-matching process, including both topic matching as well as explicit paper bidding. So to me, a large PC would help to make sure that qualified reviewers are available. It also means that papers likely received more reviews than in conferences in which program committee members have to review 8-12 papers.

Sylvain said...

I guess that the PC was overwhelmed by the number of submissions since this part of the CFP "Each paper will be reviewed by at least three regular PC members and one or two senior PC members" was not really respected (as far as I know, e.g. for the papers I was involved in).

Also, but I am new to the field of WSDM, I was surprised by the fact that reviews were very short (only a few lines).

Anonymous said...

Acceptance rate in top tier data-mining, database, networking conferences are all close to or below 20%. This is mainly because they get loads of junk papers. While in conferences like STOC/FOCS/SODA, people don't submit papers unless they know the works are quite good. So don't be afraid to see 16-17% acceptance rate in WSDM--if you have a nice idea, and have written your paper properly, your paper has a very good chance of getting in.

Paper Research said...

Many institutions limit access to their online information. Making this information available will be an asset to all.

Unknown said...

I am wondering, if acceptance criteria of WSDM is comparable to acceptance criteria of top conferences like WWW, CIKM and SIGIR?