Comments on My Biased Coin: Text-book Algorithms at SODA (Guest Post, Mikkel Thorup)

Instead of taking papers on this theme (which woul...

2009-12-23T22:21:05.239-05:00

Instead of taking papers on this theme (which would, incidentally, be a great idea), perhaps the area could serve as the basis for a lighter afternoon entertainment session, providing cool stuff that one could take home and show students.

Let me play the contrarian here.

Just to be sure, I like what you call textbook algorithms. The hashing example is a great one. I think such papers should be accepted to SODA, and we should as a community probably appreciate such works more than we do.

That being said, I disagree with the idea of a separate "track" or afternoon session for this area. Where would the resources come from? Would you accept fewer papers in SODA to make room? Would you add even more tracks?

Book proofs are also hard to publish and we usually think more about them and try to use to them to generalize. Book algorithms combined with experimental comparison with known techniques would likely be accepted by SODA in the current setting. I do not support giving special treatment to one area, likely at the expense of others.

I think we need to make sure the PC has members who are qualified to judge such submissions (most theoreticians aren't). In the meantime, such submissions need to work a little harder and make compelling arguments and give empirical support to their claims of efficiency.

There has been several comments/questions concerni...

2009-12-23T09:36:51.610-05:00

There has been several comments/questions concerning what is the fastest universal
hashing for strings etc.

Just for the reference, in section 5
of the paper below, I survey what
I think are the fastest methods known on real computers. I do believe that they out-perform most non-universal multiplication-free hacks,
both in speed and in quality, e.g.,
recently I have used them to replace
FNV in applications. Of course, they cannot be as fast as the most naive hacks, e.g., just using the least significant bits.

@INPROCEEDINGS{Tho09,
AUTHOR = {Mikkel Thorup},
TITLE = "String hashing for linear probing",
BOOKTITLE = {Proceedings of the 20th ACM-SIAM Symposium
on Discrete Algorithms (SODA)},
YEAR = {2009},
pages ="655--664",
}

It sounds to me like what the SODA community needs...

2009-12-22T16:59:03.751-05:00

It sounds to me like what the SODA community needs is something analogous to Richard Bird's "Functional Pearls", which favour nifty presentations, and explicitly discourage the "5 to 10 pages of complications". (Declaration of interest: I now edit this column.)

The fastest 2 universal hashing algorithm appeared...

2009-12-22T02:09:27.978-05:00

The fastest 2 universal hashing algorithm appeared in stoc couple of years back: it runs in linear time. It is simple and elegant.... Though the ideas are nontrivial and use spielman codes. See the work of Ishai at al

Three points: 0) I think it would be great if we ...

2009-12-21T11:36:38.835-05:00

Three points:

0) I think it would be great if we
for text-book algorithms
had something like the Math Monthly in CS, going out cheaply to a wide audience (IPL is expensive and
commercial). Perhaps a good role for
Comm. ACM.

1) I do not think it is in our interest if the best general text-book algorithms end up in applied conferences. Applied conferences
are even less likely to accept something
simple and elegant. They prefer to have it tied up in a system, and then it will be even harder to discover that this is a general purpose algorithm that goes beyond the limits of the applied conference.

2) I do think theory of CS can be a bit too tied to certain traditional theoretical measures. One of the points
with the hashing scheme [DHKP97] is that
it gives a large constant factor in any reasonably realistic mathematical measure, e.g., taking into account that
w-bit multiplication is mod 2^w (discards overflow). This is not an experimental observation, but something that follows from the definition of standard programming languages like C. I do realize that the theory community does not care too much about details of the running times these days, but the hashing I mention is something of major impact, e.g., in the processing of high volume streams where the info is lost if not handled in time.

Dan Spielman wrote: I would love to see more examp...

2009-12-20T21:02:02.065-05:00

Dan Spielman wrote:
I would love to see more examples of textbook algorithms. If you have more in mind, please post them!

Here are two examples of papers (of mine) that aren't nearly as useful as that hashing trick but have a little text-book flavor in the naturalness of the problems and simplicity of solutions (in my biased opinion).

Yet another algorithm for dense max cut: go greedy
* simple algorithm
* implementable except that constants are universe-sized if you want theoretical guarentee
* problem natural and interesting to theoreticians but not to practitioners
* main contribution simple idea but enough complications to make it look hard
* previous work got same results more complicated ways
* SODA accept

Finding Strongly Connected Components in Parallel using O(log^2 n) Reachability Queries
* simple algorithm
* very implementable and amenable to heuristic improvements
* problem of minor interest to practitioners
* analysis quite simple (important parts about 1 page), combining existing techniques in new but non-surprising ways.
* The previous work includes some quite famous people who would have scooped me if they had found my result obvious in foresight
* STOC reject, SPAA accept

Dan Spielman wrote:
As for getting them into SODA, I think we have a bit of a chicken-and-egg problem. Because they don't tend to appear in SODA, it is harder for a reviewer to become familiar with the state of the art. This makes it more difficult to make confident reviews.

Another chicken-and-egg problem: with the current status quo the only text-book algorithms that are likely to be submitted to STOC/FOCS/SODA are those that the relevant practitioner communities are not inclined to accept at their own conferences. It's even harder for the reviewers to select the best text-book algorithms if the best aren't even submitted!

I know it's just loosely related to the main t...

2009-12-20T20:06:15.944-05:00

I know it's just loosely related to the main topic of the discussion:

Michael wrote:

"I don't see why impact is harder to judge than "quality" for other theory papers."

In the UK, there is a very interesting discussion about the quality vs the impact (usually, in the sense of economic impact though). See, for example, http://www.csc.liv.ac.uk/~leslie/impact/impact.html

Most of people in the UK I know wouldn't agree with your claim: it's very difficult to assess "the impact".

I think we are conflating two different qualities ...

2009-12-20T15:05:05.418-05:00

I think we are conflating two different qualities of an algorithm here: is it simple, and is it useful (likely to have a big impact on disciplines outside of theory)? The hash example has both. We can judge simplicity immediately (and at theory conferences it is often seen as a negative), but the best test of usefulness is the test of time.

As a demonstration that it is still possible to get simple algorithms into SODA (I make no claim of usefulness), I'll point to my own paper from this coming conference. Simple algorithm for approximating max independent set: use the leaves of a depth-first search tree. Simple algorithm for approximating TSP when all distances are 1 or 2: use a depth-first ordering on the graph of distance-1 pairs. Although both problems are hard to approximate, one of these two is always a good approximation for any particular graph; the paper contains several similar results. The paper also does contain the requisite "5-10 pages of complications" that, as Mikkel observes, seem to be necessary to get something like this into a good theory conference.

If an algorithm is simple and useful, but not theoretical novel (we already know an impractical solution to universal hashing, so why do we need another more practical one) then it needs more than theory to be published on its own: algorithm experiments showing it to be much better than the alternative, for instance, leading to a paper at ALENEX. This post seems to be arguing that the difficulty of publishing a purely theoretical algorithms paper for this sort of improvement is a systemic problem, but I'm not entirely convinced: if we want to argue that an algorithm is better than prior alternatives despite not giving us any new theoretical results, shouldn't there be some justification for that claim?

Notation overload is common in both math and CS pa...

2009-12-20T14:24:28.555-05:00

Notation overload is common in both math and CS papers.

Indeed. Imagine if TCS couldn't deal with the difference between the definitions of "source code" among programmers and information theorists!

My main suggestion was that deriving good text boo...

2009-12-20T13:54:26.137-05:00

My main suggestion was that deriving good text book algorithms is
something that we should be competitive about, giving the best
contributions the same kind of conference credit that we do for other
important algorithmic contributions. They should lead to great talks.

Authors of text books should be good at collecting and presenting the
best publicly known algorithms, but we cannot expect them to come up
with ideas for new more elegant algorithms than those already known.

Currently, we don't really know what to do with a cool simple text book algorithm giving a really useful solution to an important
problem. They may just be discussed locally as folklore, and never
come out in public, or if they do, they may be well-hidden as part of
a solution to a different problem. Without proper published records,
it is hard to claim that one has a new better text book solution.

I don't think evaluation is a principal problem. At STOC/FOCS we
already handle apples and oranges, e.g., algorithms and crypto. Math
has a long tradition for appreciating simpler proofs, and I think we
can all appreciate a really nice text book solution. If the abstract
uses the key-word "text book (style)" then reviewers should stop
looking for complexity and instead judge based on simplicity and
elegance, ease of implementation etc.
Needless to say that we still want
a clear new angle for the problem considered, as in my hashing example where [DHKP97] find an elegant use relative primality.

Perhaps there could be a special track for papers ...

2009-12-20T12:17:20.995-05:00

Perhaps there could be a special track for papers on textbook algorithms.

I would love to see more examples of textbook algo...

2009-12-20T11:38:04.385-05:00

I would love to see more examples of textbook algorithms. If you have more in mind, please post them!

As for getting them into SODA, I think we have a bit of a chicken-and-egg problem. Because they don't tend to appear in SODA, it is harder for a reviewer to become familiar with the state of the art. This makes it more difficult to make confident reviews.

It also makes it harder for the next textbook author to become aware of these algorithms!

Perhaps someone should start a wiki or blog about textbook algorithms?

Hi Adam. I'm glad you agree it would be a fun...

2009-12-20T00:12:34.250-05:00

Hi Adam. I'm glad you agree it would be a fun thing to try!

I do wish to still register my disagreement with this paragraph:

The task of the SODA PC (note: I've also served on one of those) is to evaluate a paper's contribution to our broad algorithmic knowledge base. The notion of impact is much less well-defined, and this is why I think it is likely to be more subjective.

Sentence 1: I think the type of thing Mikkel is talking about would certainly count as a "contribution to our broad algorithmic knowledge base". I don't think he's talking about programming tricks; he's talking about connecting the theory to implementation. I believe a SODA PC could/should judge these types of papers. Some efforts, though, would have to be made; as I stated in my earlier comment, I'm willing to cede that theorists have, as a community, moved so far from implementation that many theorists would not be suitable to review such papers. [I should point out I think that's a problem, that we as a community should be trying to fix...] At the same time, I feel underqualified when having to, for example, review a quantum computing paper for FOCS/STOC; that doesn't mean such papers shouldn't be submitted or accepted when I am on the PC, or even that I can't review them!

Sentence 2: I don't see why impact is harder to judge than "quality" for other theory papers. I review a lot of papers; I see papers with valid proofs -- as valid (and sometimes moreso) than papers that are in FOCS/STOC -- that I think should be rejected from LATIN. What separates one class of papers from another? My (or, rather, the community's) subjective notion of what's "interesting", "quality" work.

Impact can be demonstrated through actual implementations and deployment (in which case I think such judgments will not be subjective), or potential impact can be gauged by suitable reviewers -- of which there are many available. Yes, there will be some subjective judgment in choosing papers for potential rather than demonstrated impact -- but such subjective judgments are made all the time, even for purely theoretical papers.

Answering Michael M. (NB: threading comments would...

2009-12-19T19:59:50.228-05:00

Answering Michael M. (NB: threading comments would be helpful here):

I understand that systems-oriented communities evaluate impact all the time and fairly reliably. (I have seen this in action on various "applied" security PCs and panels.) However, they do it in an application-specific context. And when they referee papers that step a little outside their specialty, they tend to be much less reliable (I am thinking, for example, of papers on privacy at database, datamining and security conferences).

The task of the SODA PC (note: I've also served on one of those) is to evaluate a paper's contribution to our broad algorithmic knowledge base. The notion of impact is much less well-defined, and this is why I think it is likely to be more subjective.

All that said, it would be great to do the experiment and try it at some upcoming SODA.

PS: As for x mod y: I encountered this notation overload in my first abstract algebra class in college. The professor explained briefly the differences between the two widespread uses of the notation, and from then on I always was able to deduce, from context, which meaning of "mod" was intended. Notation overload is common in both math and CS papers. Sometimes it's bad, sometimes it's good. I don't think it has much to do with the relative importance of mathematical depth in different fields.

I'd rather see discussion of Mikkel's poin...

2009-12-19T19:44:46.058-05:00

I'd rather see discussion of Mikkel's points here; feel free to continue this arcane and inane discussion

This discussion is not arcane and in fact might in fact be pertinent to the point in the original post.

Clearly, there are two views about TCS. From one point of view, algorithms are like any other mathematical objects, and proving bounds on their complexity is a natural mathematical endeavor (akin to proving uniform bounds in any other area of mathematics). From this point of view, questions of "practicality" is really outside the domain of TCS, best left to the "practitioners". To people who subscribe to this point of view, writing "a mod b" when one means "rem(a,b)" is a horror of horrors.

A second point of view of algorithms (that perhaps the original poster subscribes to) holds practicality as the more important goal. From this point of view mathematical rigor (and its associated baggage of culture and discipline etc.) fades into background, and such basic questions about notation becomes moot.

but TCS is not a subfield of number theory -- and ...

2009-12-19T19:20:12.982-05:00

but TCS is not a subfield of number theory -- and hence has its own style and variants of notation.

TCS is certainly not a subfield of number theory, but it is a mathematical field. For instance, ECCC the principal repository of papers of TCS, requires that the papers it accepts have ...
" ... clear mathematical profile and
strictly mathematical format."

As a mathematical discipline TCS papers should stick to commonly accepted mathematical formalism -- deviating from these principles usually results in controversies such as Neal Koblitz's well publicized and justified criticism of TCS style cryptography.

I'd rather see discussion of Mikkel's poin...

2009-12-19T18:51:09.416-05:00

I'd rather see discussion of Mikkel's points here; feel free to continue this arcane and inane discussion about x mod y elsewhere.

"This careless attitude towards mathematical ...

2009-12-19T18:28:40.877-05:00

"This careless attitude towards mathematical language and notation is a persistent problem in theoretical CS literature (probably due to a lack of proper training in mathematical writing). However, unless such levity with mathematical notation is gotten rid of, it is hard to take such papers seriously."

Really?? this for a notation for x mod y? It amazes me how easy you find it to belittle the mathematical maturity/training of an entire area by the somewhat different notation they use for such a simple concept, a concept that everyone understood back in highschool. Admittedly, the notation is different from what is common in number theory, but TCS is not a subfield of number theory -- and hence has its own style and variants of notation.

x mod y (where y is a positive integer), written ...

2009-12-19T14:00:18.083-05:00

x mod y

(where y is a positive integer), written in LaTeX using \bmod, is the non-negative integer that is less than y and is congruent to x (mod y).

This is a very bad and confusing notation that needs to be avoided. It involves an arbitrary choice of representative from a congruence class. Normally, there is no sensible choice of such a representative (for example, when one "mods" out by a vector subspace in a vector space). Moreover, it conflicts with the standard definition of "modulo" as the equivalence relation induced by quotienting by a sub-object.

In the number theoretic context, or in the case of rings of polynomials in one variable or other Euclidean domains, the notation rem(a,b) -- the remainder of a after dividing by b -- is well defined (upto multiplication by units) and should be used instead.

Unlike, what some other commentators have said, this is not a matter of mathematical pedantry at all. Language is a very (most ?) important aspect of mathematics, and should be respected as such. Choice of the right notation often makes complicated statements look completely obvious -- and the ultimate goal of mathematics is to make the "unobvious look obvious" and a huge part of that is the right choice of notation.

As someone who's spent far more time on comput...

2009-12-19T11:47:50.231-05:00

As someone who's spent far more time on computer program efficiency than on theoretical computer science, that example comes off to me as pretty cool (though I'm still wondering what the "(*)" is doing there, and a bit distressed that a computer scientist can be so theoretical that he or she forgets what "mod" means in an algorithm as opposed to in number theory). It also explains a lot about my publication record: My advisor wrote a defining textbook, renown for the brevity of its proofs and explanations. But his favorite result of mine, something sort and sweet, was rejected by referees. I had to eventually just shoehorn it in to a somewhat related paper. Your point makes the case for accepting short papers as a special case. If it were clear what was being asked for and if the acceptance rate were low enough, getting one of them in could be even more prestigious than having a full paper (Eventually I could envision the following process: Prepare the short result; if that gets rejected as too "uncomplicated," then add to it for the next conference.) But first you have to make room. Sadly, some forums (e.g., IEEE Transactions on Information Theory) have done just the opposite.

Here is my eurocent: The background knowledge of t...

2009-12-19T08:21:15.630-05:00

Here is my eurocent: The background knowledge of the reader determines what is complicated and what is not. With space constraints, authors must judge how much background to include. Even without space constraints, you run the risk of boring your readers if you include too much background.

How hard it is to understand a paper is mainly a function of the overlap between the background knowledge of the writer and that of the reader and secondarily a function of the author's writing skills. It is not a function of the value of the idea being presented.

A good paper should be easy to understand and useful. Of course, the trick is that "useful" means different things to different people: can be put in a program, reveals unexpected theoretical connections, etc.

Now let me tell you a secret. I wrote the above because I felt obliged to contribute to the main subject to earn the right to ask my silly(?) question.

My TAoCP copy recommends the hashing scheme h_b(x) = ((bx) mod 2^w) div 2^(w-u), with b being an odd number close to 2^w/1.62. (The golden ratio is supposed to make it work well for keys in arithmetic progression. And then says how for hashing short strings something different than the golden ration works better.) Knuth credits Floyd for 'most of these ideas'. So the main observation of [DHKP97] is that random (odd) bs gives a universal set of hash functions. Right?

PS: The complaint about 'mod' not being a binary operator illustrated perfectly the background mismatch I mentioned earlier. I know I'm evil, but I found it hilarious.

Adam, I think I disagree with your final point: ...

2009-12-18T23:48:34.634-05:00

Adam,

I think I disagree with your final point:

"How do we measure impact reliably? Whatever the answer is, it will probably look very different from (and, my gut suggests, more subjective than) the criteria that are currently the norm in theory."

Perhaps it is my experience in networking/systems conferences, but I think that reasonable PCs can perfectly well assess how interesting an algorithm implementation idea is. In systems conferences, a major component of what they do is determine which implementations they think are interesting and will have the most impact, in terms of the actual implementation or the underlying ideas. I'd agree some of the criteria are different than those used in judging theory papers, and I'd agree that the theory community, as a whole, has not exercised this type of judgment sufficiently in, say, the last two decades and would have to "re-train those muscles," so to speak. But I think that impact (realized or potential) can be adequately judged by people who work on algorithms "in the real world", and their judgments would likely be no more (and possibly would be less) subjective than those used to decide what papers get into theory conferences currently.

The main judgment, I think, would be, "Would I or someone I know potentially find this useful?" People who code and deploy algorithms regularly will, I think, have good judgment about what sort of things pass this test. Of course, good papers will attempt to demonstrate that their algorithm is useful in real-world contexts by explaining where they could be used or actually showing their performance in real applications.

Anon #4: It's pretty clear that Mikkel has us...

2009-12-18T23:34:28.301-05:00

Anon #4: It's pretty clear that Mikkel has used "math terminology" to contrast with the line later in his text that he labeled "C code" -- the point being is that while the two equations look quite similar at the mathematical level, they're quite different at the code implementation level. And yes, he's using mod as a binary operator here.

(Still, always nice to have some mathematical pedant to point out that mod is an equivalence relation so how can we computer scientists write things like a = 1 mod p ; I get one in my algorithms class most every year, too!)

I do hope as DE suggests we go back to comments on the actual theme of Mikkel's post, which relates to why there are so few algorithms-as-I-like-to-define-them (that is, algorithms that people might actually use in running programs) at a conference like SODA, and whether perhaps that could be changed.

To get back on to the topic of the post: I think ...

2009-12-18T23:31:49.620-05:00

To get back on to the topic of the post:

I think one of the reasons that simple ideas (algorithms, models, whatever) have trouble getting accepted to conferences is that it is much harder to referee them; their value is tricky to assess.

Some reasons:

1) It is difficult to assess the novelty of something that is obvious in retrospect. (What if it was folklore? What if it was an idea that was "in the air" but not really folklore yet?) It is remarkably easy to convince yourself that you basically knew something but had never bothered to flesh out the idea... unless fleshing out the idea obviously took a month of time from someone talented and technically well-versed in the details of a field.

2) There are lots clean, nontrivial implementation ideas that aren't all that important since they are not the bottleneck in any crucial applications.

In most communities, simple nontrivial ideas can get attention as long as they change the state of the art for some important problem. For example, if you last year you had come up with a simple trick that halved the error rate of prediction algorithms for the Netflix prize, you would have gotten $1M.
And similarly, simple ideas that solve important open problems in theory (think Moser's STOC best paper) get well-deserved praise.

I like the idea of an afternoon at SODA devoted to the type of work described in Mikkel's post. IPL is also a good journal venue for them.

My question is: if we want major *theory* conferences to reward this type of work with publication in the main track, how should we evaluate the submissions?

Again: improving a widely deployed algorithm (like universal hashing) is pretty clearly a good thing; but what if I improve my own algorithm, that is not yet widely deployed? How do we measure impact reliably? Whatever the answer is, it will probably look very different from (and, my gut suggests, more subjective than) the criteria that are currently the norm in theory.

On the off-topic thread: it is clear that C gets m...

2009-12-18T20:48:42.957-05:00

On the off-topic thread: it is clear that C gets modular arithmetic wrong, but it is not entirely clear what the correct answer is in the unusual case that the modulous is negative. For example should 5 mod -3 equal 2 or -1?

See http://portal.acm.org/citation.cfm?id=128862