Saturday, November 15, 2008

Technical Depth vs. Novelty vs. Potential Impact

Let's say you're the PC chair for a major (theory) conference, about to give instructions to the committee. The standard way to judge a paper in theory is primarily based on its technical depth, but there's certainly a push in the community (and, arguably, from our funders the NSF) to consider other aspects, including novelty (could this start a new research direction, or give us a new way to look at things) and potential impact (might people actually use these ideas)? How, exactly, should you instruct the PC to weight these various factors?

Conceivably, we could set up the reviews to have a score for each factor. For example, I'm on the PC for NSDI, a systems conference, and we have to give scores for Overall Merit, Technical Merit, Longevity (= how important will this work be over time), Novelty, and Writing (as if that score matters :) ). Personally, I don't like this, and I'm not intending to do it for STOC. It's more pain for me as a reviewer without I think giving meaningful information to the authors (instead of spending time trying to decide if a paper is a 2 or 3 in terms of novelty, let's give another comment in the review text!), and when it comes time to make the decisions, I'm not really sure what I'm supposed to be (Pareto-)optimizing in this multidimensional space.

I'm a big believer, for conferences, in the "simple" method, as I've said before -- papers just get a score from 1-5, under the following scheme:
1: Bottom 1/2 of submissions.
2: Top 1/2 but not top 1/3 of submissions.
3: Top 1/3 but not top 1/5 of submissions.
4: Top 1/5 but not top 1/10 of submissions.
5: Top 1/10 of submissions.
but that doesn't mean that reviewers shouldn't be using factors such as Longevity and Novelty, and even Writing, in deciding their overall score. So, as you're all finishing your submissions, now is your chance to make a suggestion -- how do you think the PC should weight these various factors?

15 comments:

Suresh Venkatasubramanian said...

This is possibly less of a problem at STOC/FOCS, but at SODA for example, the "simple" strategy you recommend is difficult to execute, because *while reading a single paper*, I have to make a judgement about its place in the entire ensemble. As a PC member, this is hard to do since I can't see even a large fraction of the papers, and as an external reviewer this is completely impossible.

This is also why I have a beef with the "Accept, weak accept, weak reject" categorization popular in the DB community (among others). These are relative notions that require the rater to have a global view.

The irony is that overall I think that this approach is the most truthful when it comes time to report back to the authors. That is to say, a paper is rejected or accepted not because of absolute externalities, but because of its relative position in a zero sum game.

Anonymous said...

"instead of spending time trying to decide if a paper is a 2 or 3 in terms of novelty, let's give another comment in the review text!"

The issue with that is: how do you instruct the PC to add one more comment to the review text? "Give one more comment than you were originally going to"? Having the scores in addition to the comments forces PC members to do at least one more high-level review of a certain aspect of the paper.

Whether these scores are actually useful or not is an entirely separate question, but I don't think that the parenthetical remark you made is really a good argument against the separate scores.

Having the separate scores also helps encourage PC members to consider all those aspects (longevity, novelty, technical difficulty, etc.) instead of just their favorite aspect. Even if the scores themselves aren't useful, it may be useful to force committee members to go through the process of thinking about those separate aspects of each paper.

Anonymous said...

Btw, what is "technical depth"? The reason I'm asking is that I'm not so sure it's necessarily a good thing...

Anonymous said...

if the statement of the theorem is interesting/useful, it is better that it is *not* technically deep if possible. But this is the age-old weblog back and forth: if it is very complicated, aka "technically deep", then it is more likely to get in to stoc/focs than if you demonstrate that there is a simple proof.

In particular for algorithmic techniques, it is better if the technique is simple, since it is more likely to be used and therefore have "longevity". If it is too "technically deep", what if no one ever reads/uses the result?

Anonymous said...

I'm unhappy with technical depth as a reason for judging papers. The reason is that, when I prove a result, often my first proof is overly complicated; I try to simplify them as much as possible, because simpler proofs are often both easier to understand and more likely to be correct.

I don't want to name specific instances, but I've seen papers that were given high marks on technical depth only because the authors did not appear to be making any attempts at such simplifications. The actual proofs could have been written in a way that came across as simple and elegant, but instead seemed complicated and "deep".

I don't want to reward this kind of lazy author and punish the ones who work to make things clear.

Incidentally and off-topic, your support for OpenID signatures on comments seems to be broken.

D. Eppstein

Anonymous said...

I've been trying to stay out of this argument, because I think not only is the ``too deep/ too shallow'' arguments ill-posed, but that it is really to a large extent an argument about hurt feelings over specific papers, disguised as an argument about abstract principles. I have had papers rejected for being too simple, and papers rejected for being too ``mathematical'', and it hurts about the same either way.

On the one hand, a hard proof of a statement is worth less than a simple proof of the same statement. On the other, a statement that takes five years to prove should be worth more than one that takes five minutes (usually). If you spent five years, its tempting to think there is no five minute proof, but sometimes you're wrong. The very difficult problem is to come up with the simplest presentations of the most important ideas. Sometimes that means a great simple idea with a clean simple argument; other times, it's a deep result that actually requires a sixty
page proof.

A few things to keep in mind when you're on program committees or just trying to evaluate them: 1. The system is imperfect, because people are imperfect. Incorrect decisions don't necessarily mean the system can benefit from being overhauled. 2. Program committees are evaluating papers, not ideas. How things are presented makes a difference. 3. The job of the PC is to pick a conference, not to judge quality. The decision is not, ``Which papers are best?'' but ``Which papers will giving a spotlight to best benefit the theory community?''

Coming from that angle, there are several types of papers that are ``too simple'' and several types that are ``too complicated''. The following list is of course partial and subjective.

``Too complicated''
1. The obfuscated paper. The author is either trying to bamboozle the committee into thinking the result is difficult or merely didn't spend much effort on clear explanation. In either case, the reader won't benefit from reading it, so why accept it?

2. Killing flies with machine guns. The authors are using heavy duty mathematics to get a not very cutting edge result. Great for a term paper, showing you understood the math, not great research.

3. The intricate machine. There is a difficult and tedious process known for getting a certain kind of result. You spent a lot of time going through that process to get the next result in the sequence... Okay work, but since there is nothing the reader LEARNS except the final result, unless that is astonishing, there's really no reason for a committee to accept.

``Too simple''
1. Old wine in new bottles. (Whoever made up English cliches was not a connaisseur...) From your statement, I thought this might be original, but after I've read your proof, I realize it's isomorphic to known results or almost implicit in known results.

2. Assuming away the difficulty. There was a lemma you couldn't prove, so you changed the model or added an assumption until the rest of the proof went through.

3. Least publishable unit. This is one of a series of very related papers, that really could be a single paper. The delta over the previous paper is insignificant.

4. Ill-motivated or mismatched motivation. You introduced some interesting new ideas, but didn't clearly explain what the point was. Or there's an interesting story as motivation, but you didn't specify how the formal model captures the intuitive story. Since it's so original, you're unlikely to get scooped, so the PC can reject it and encourage you to resubmit with a clearer presentation.

5. Ambiguous presentation. Some interesting ideas, but you didn't actually spell out what terms mean. Some people have argued for such papers, and there certainly are examples of highly influential papers along these lines, but I personally worry. Ambiguous papers can block others' work on the subject, because they can't claim originality and they can't build on what you've done (since they don't know exactly what you've done.) The USSR was ahead of the US in complexity, with one author defining ``brute force search'' problems. Then that guy ``proved'' that NP problems required ``brute force search''. This killed Levin's attempt to introduce NP-completeness and the P vs. NP question. Computational complexity in the USSR became a hopeless political tangle, and the area almost died.
(Please correct the details, someone who actually knows the full story.)

So my conclusion is that we need to judge each paper on whether the technical complexity is APPROPRIATE to
the overall paper, and how it affects the UTILITY of the paper for the reader. Often a paper that is either too complex or too simple can be improved with more work.

Russell Impagliazzo

Anonymous said...

Michael, I think you found the answer - just forward to all the PC members Russell Impagliazzo's comment above.
This is by far the most thoughtful note I have ever seen on this topic.

Boaz Barak

p.s. of course there are no strict rules, and there are examples of great papers that managed to tick off many of Russell's "no no's".

Michael Mitzenmacher said...

Boaz -- I'll certainly point the PC to Russell's comment. (Well, I'll point them to the whole discussion, but I agree that Russell's comment is particularly thoughtful and well written.)

Russell, I appreciate your taking the time to share your thoughts on the blog!

Anonymous said...

But, wait! After we apply Russell's filters, is there any paper left to accept?

Anonymous said...

Hmm, I know you are TCS people, but it seems like it is abdicating responsibility for the program by using an algorithm to measure the otherwise intangibles. Isn't that why you have expert on the PC anyways?

Double blind, sort them into tracks, rank the papers using a proportional voting system and go. Use the experts to [implicitly] resolve the complex, value relationships between technical depth, novelty, and impact. No?

Unknown said...

There appears to be some confusion in the discussion here--and perhaps sometimes in the reviewing process--between technical depth (novel and interesting / compelling / impactful mathematical structure) and technical difficulty.

To me it seems technical depth is unambiguously a positive feature while technical difficulty is debatable and certainly can be negative. Actually, I guess the best result is both deep and simple.

Anonymous said...

To me it seems technical depth is unambiguously a positive feature

All other things being equal it is. However at times it feels that it is a necessary condition at STOC/FOCS in the sense that non-trivial-yet-not-too-technical solutions to problems of established pedigree are seem to be passed over in favor of horrendously complicated 1/8th improvements to the approximation problem du jour.

Mikkel Thorup said...

EXPECTED RELEVANCE TO COMPUTING

In the discussions of values within TCS, I see a lot of T for Theory,
but less attention to the CS in terms of expected relevance to
actual computing, now or in the future. This evaluation factor is, of
course, as fuzzy as factors like mathematical depth and originality, yet
I think it should be given a more qualified weight in the evaluation than is currently being done.

From my own experience, I have dealt with problems like k-way cut and
integer predecessor search. The latter problem is equivalent to IP-lookup
and is used billions, if not trillions, of times daily. Moreover,
it is a common subproblem in many efficient algorithms. By contrast,
I am not sure if anyone is really using the k-way cut problem for
anything. Both problems are nice and relevant
to computing but predecessor search is heavily used while k-way cut is
lightly used, and the difference is by many orders of magnitude.

It is good for TCS to have impact on heavily used computing problems, and
that suggest paying special attention to the details of these problems,
e.g., sometimes favouring a technique specialized for a single heavily used
problem over a "general technique" for a whole class of lightly used
problems.

Anonymous said...

To illustrate difficulty for difficulty's sake, a while back we got a review where it was highlighted that our technique led to many new theorems of interest which were not known before but "on the bad side" none of the results had a sophisticated proof.

Shouldn't the fact that the technique makes some previously hard problems trivial be an argument in favor rather than against it?

Anonymous said...

Just as Mikkel, I also favor more weight to the potential applications of a theoretical results.

Let me offer a few other reasons why more attention ought to be paid to the CS part of TCS: (1) elegant solutions to applied problems tend to be useful in many settings (2) deep analysis of practical problems leads to fundamental theory advances and (3) it makes it easier to justify increased funding.