Saturday, October 03, 2009

"Core" TCS

Enough time has perhaps passed from Mihai's controversial post to consider, constructively I hope, some of the comments that arose here on this blog from it.

One issue that arose is what a PhD in theoretical computer science should know -- what's the "core" of theoretical computer science? The issue arose as some anonymous commenter claimed to have a PhD in TCS but not know of Voronoi diagrams, range counting, etc. after some other commenter claimed that these were topics one learned as an undergraduate. For the record, I think it's the rare undergraduate that learns about these data structures/algorithms; I learned about them in graduate school, and only because I took the computational geometry course (from the fantastic Raimund Seidel, with the occasional guest lecture from fellow-graduate-student Ernie Jeff Erickson).

As TCS expands, so that it's harder and harder to have exposure to everything as a graduate student, the question of what is the TCS core will become more challenging. The same problem, of course, happens at "all levels of the tree" -- what is the core knowledge that a PhD (or undergraduate) in CS should learn across all areas of CS, or the core for an undergraduate in a liberal arts college? Anyone who has served on a faculty committee to deal with this issue knows that this is a challenge -- what is "core" is usually defined as the area the person on the committee is working on. (From my point of view, how could a PhD in TCS not know Azuma's inequality, the fundamentals of entropy, and Bloom filters?... But I am sure there are many that don't.) Arguably, TCS has been small enough until fairly recently that one could describe a core that most everyone knew most of, but I think that's increasingly less true. (I imagine it's been true of mathematics for some time.)

In any case, I think the people who expressed disbelief and dismay that a PhD in theory might not know a fair number of this list of things ("Range counting, predecessor, voronoi diagrams, dynamic optimality, planar point location, nearest neighbor, partial sums, connectivity.") should consider that they might have overreacted -- I believe most PhDs in TCS learn them, but I don't think it's by any means close to mandatory.

This leaves two several questions for comments:

1) What should be the "core" of TCS that (almost) all PhDs should be expected to know? This is going to be a moving target, of course, but what topics would you place on it now? [It's not clear to me whether one would want to put specific examples or thematic concepts in the list -- for example, would you put "Azuma's inequality" or simply "probability tail bounds" -- feel free to suggest both.]

2) How do we enforce that this core gets learned? I find increasingly PhDs expect to get right to research, and view classes as a hindrance rather than an opportunity. I have long found this problematic. I personally find that courses are a great way to inculcate core material. After all, it's because of that course in graduate school that I learned about Voronoi diagrams, and they've proven useful enough that they appeared in a paper I co-authored.


Anonymous said...

("Range counting, predecessor, voronoi diagrams, dynamic optimality, planar point location, nearest neighbor, partial sums, connectivity."

I believe the conclusions drawn from the comments were a little unfair, because of unfortunate terminology. I am a PhD in AI, and I have not heard of "dynamic optimality", and "predecessor search". However, I have heard of splay trees and heaps and binary search trees -- all these were taught in my undergrad data structures class. It is quite possible that this is also the case with many of the commenters in your previous post.

Michael Mitzenmacher said...

Anon 1: I'd agree that heaps and binary search trees are part of the the standard undergrad curriculum, and that splay trees are probably taught a fair amount. I think, though, this is in fact different than knowing about the predecessor problem as a problem in itself and the other approaches to solving it. Which interpretation various commenters were using I leave to your judgment.

It doesn't change the issue raised in the post, though, which is what we think is in the TCS core.

Suresh Venkatasubramanian said...

Have to be careful here: not all universities have the wherewithal to teach everything you'd need in a TCS core. For example, I mix and match seminars and coursework to try and get some coverage of material, but it's hard. Are my students not true theory Ph.Ds ?

having said that, a clear core in my mind is (beyond the basics):

randomized methods, advanced data structures, approximations (LPs etc), complexity theory, geometry, combinatorial optimization, graph alorithms.

this is generated from the principle that the core should be sufficient for you to pick up and read the first part of any STOC/FOCS/SODA paper, and at least understand the problem, and possibly its relative importance.

The current core would not satisfy this for topics like AGT and quantum computing, so I think they're closer to the core now than they would have been maybe 5 years ago.

Anonymous said...

The issue wasn't so much that there are people who haven't heard of problems like predecessor. One commenter was actually proud of not knowing it.

Somebody for some odd reason said "at least i know what connectivity is.." and the sarcastic response was "I envy you. You can now read Mihai's papers". Somehow connectivity = Mihai. Replace connectivity with bloom filters and Mihai with Michael.

Another idiotic comment was about how to gain the respect of the general math community. Sadly, we have some people who consider themselves a part of the tcs community, yet have no respect for the discipline itself. When told 'why be a second rate mathematician when you can be a first rate computer scientist', this person responded he can be a first rate mathematician who does tcs.

This is not unique to CS. It seems the mathematician wanna-be syndrome can also easily be seen in Econ departments for instance.

What is it about mathematicians that makes people desperately envy them?

Also, why don't these mathematicians wanna-bes just go to math departments?

Anonymous said...

"Range counting, predecessor, voronoi diagrams, dynamic optimality, planar point location, nearest neighbor, partial sums, connectivity."

Predecessor is a funny example. Most undergrad algorithms courses i've seen do not cover it. I think this is absurd given how fundamental the problem is.

Does anybody know why it isn't standard material?

As for the other problems, perhaps with the exception of dynamic optimality, they're quite easy to state. People should at least know what the problem is.

Michaël Cadilhac said...

For the record, here is how the CS department of the Universite de Montreal (Quebec, Canada) makes sure the PhD students have a certain (and common) basis in (T)CS. I'm pretty sure only a few universities across the world do the things this way. Please note that I'm just stating facts and not trying to make a point.

The requirement is the following: after some time in the PhD (usually between one year and two), the student is expected to take 3 exams and give one oral presentation on his research. Those exams, called the pre-doctoral exams, focuses on algorithmics, data structures and TCS. The first two are based on undergraduate classes, and the last one is specific to the Theory lab.

Here is the interesting fact: the TCS exam only focuses on complexity, computability and language theory. There, you got it.

Being myself a PhD student in this university, but coming from a different country, I've had lengthy discussions about the purpose of these exams with the professors in charge. The current debate in the department is on whether or not they should only require the credits from the corresponding undergraduate classes and cancel the exams. They are not considering changing the contents of the required knowledge.

In addition to that, the students are expected to take 3 graduate classes, and any will do (I personally took Advanced Complexity, Introduction to Quantum Computing, Semantics of Programming Languages and plan to take an advanced class in graph).

For the record again, I don't (or rather, didn't) know about Voronoi Diagrams, and my research is focusing on small complexity classes, descriptive complexity and automata.

The question is, should I know at least a little about each chapter of, say, the Handbook of TCS (editor
Jan van Leeuwen) ? I do think I know about most of those chapters, but admittedly nothing about computational geometry. How would I know this is a great miss?

JeffE said...

What should be the "core" of TCS that (almost) all PhDs should be expected to know?

(I assume you mean "all theoretical computer science PhDs" here. I'm somewhat conflicted about what theory other CS PhDs should know.)

I'm sure my answer will be horribly biased, but at the very least, it should touch on Voronoi diagrams, Van Emde Boas trees, the dynamic optimality conjecture, Vapnik-Chernovenkis dimension, the Immerman–Szelepcsényi theorem, the Lovasz Local Lemma, spectral partitioning, Bloom filters, boosting, Arrow’s paradox, Nash equilibria, Azuma's inequality, zero-knowledge proofs, the PCP theorem, smoothed analysis, coresets, Reed-Solomon codes, distributed cache consistency protocols, interior-point methods, persistent homology, AKS sorting networks, monadic second-order logic, snake sort, evasive graph properties, the Okamura-Seymour theorem, natural proofs, Ramsey theory, fractional cascading, communication complexity, doubling dimension, derandomization, soft heaps, the power of two choices, the Graph Minor Theorem, Krylov subspace iteration, Euclid's algorithm, Dijkstra's algorithm, Boruvka's algorithm, Dehn's algorithm, Grover's algorithm, Buchberger's algorithm, de Casteljau's algorithm, Khachiyan's algorithm, Bresenham's algorithm, Dinits's algorithm, Lloyd's algorithm, and how to quickly look up lots of impressive-sounding results on Wikipedia.

Paul Beame said...

I agree with Jeff -- including his Wikipedia comment. There are so many worthwhile concepts and results that it is a bit absurd to think that there this is some small "core" that all TCS PhDs can pick up in just a few courses. (Suresh's list is just a start and even it wouldn't fit in a few courses.)

It is true that because TCS problems are often easy to state, the barriers to entry for many TCS topics is small, but ability to solve some TCS problems doesn't mean that one has a clue about most of the rest of the field.

As for the specifics of the data structure problems mentioned: Typical undergraduate data structure courses have a fairly outdated selection of topics and give short shrift to geometric data structures. (Do the texts really need to discuss all of: binary heaps, d-ary heaps, leftist heaps, binomial heaps, and Fibonacci heaps at the expense of these other topics?)

A lot of the discussion in this thread has come down to questions of research taste. I would count the complexity of nearest neighbor data structures in high dimensions as among the tastiest specific open problems in any subarea of TCS. There are two obvious algorithms for the problem: (1) Compare the query point to each stored data point in turn. (2) Precompute and store the answers to all possible queries (in case the space is discrete). Each has terrible complexity in either space or time. Can one beat this or show that it is optimal? A small-space and fast solution would have remarkable consequences in both theory and in practice. The best lower bounds are exponentially weaker than the best known algorithms.

js said...

My recommendation would be this:

(1) Just take a random, diverse set of courses in TCS and math.

(2) Work with other people who have a different background.

This way you are much stronger than a bunch of people who know the "core" (whatevere it means) and little else.

By the way, usually all this happens automatically: even if you tried to carefully plan which courses to take, you will end up taking courses which happen to be offered in the right place at the right time. And if you are working with anyone, especially someone from a different institute, it is likely that they have learned a different subset of things.

Anonymous said...

There is no real advanced core TCS topics one must know. TCS is still too young. For the undergrad there is just the acquaintance with basic concepts and examples like P, NP, SAT, data-structures, some simple algorithms, some first-order logic, and the likes.

And for commenter #4, your arguments are obviously ad-hominem and therefore invalid.
It is immaterial to the question what are the psychological factors motivating some scientist. What matter are the scientific justifications themselves. And I cannot see any real conclusive argument not to consider TCS as a mathematical subject, in contrast to an engineering/natural-science/social-science subject.

Alex Lopez-Ortiz said...

Typical undergraduate data structure courses have a fairly outdated selection of topics and give short shrift to geometric data structures. (Do the texts really need to discuss all of: binary heaps, d-ary heaps, leftist heaps, binomial heaps, and Fibonacci heaps at the expense of these other topics?)

At Waterloo we revised the contents of the basic data structures course a few years back: more geometric data structures, less flavors of binary search trees, more text indexing structures less flavors of grep.

rgrig said...

The ACM/IEEE curriculum was used as a basis by some faculty in UCD (Ireland) to start a discussion about revising the content of lectures.

Sasho said...

Something which should be at the core of _any_ PhD is an attitude towards knowledge which leads you to learn what you don't know rather than proclaim it insignificant. Just a side comment

Yisong Yue said...

I'm a Ph.D. student not studying theoretical CS (machine learning).

1) Here are some topics that I think should be taught in introductory graduate-level courses.

1. Randomized algorithms and their analysis (e.g., the beloved Azuma's inequality)

2. Graph algorithms (everything's a network these days)

3. Optimization and approximation algorithms

4. Advanced data structures, especially hashing and data structures with geometric considerations

At my institution (Cornell), it's natural for CS Ph.D. students to take 2-3 theory courses. It's not clear to me that theory students necessarily take more "core theory" courses than non-theory students. Additional theory courses are generally of the advanced variety and fall in the province of the instructor's research interests.

2) My experience is obviously fairly limited, but here are some things done in my department that I've found appealing. I must confess my naivety here, since I don't know to what extent these tactics are employed elsewhere.

1. Students are not required to choose an advisor until some time during their 2nd year. Students are also encouraged by the faculty to take many different courses to determine what area of CS they find most interesting.

2. There's been some effort to provide fellowships to all first-year students (although I'm not sure how much the current economic climate has thwarted this effort). I believe the reasoning is that, ideally, first-year students would be relieved from TA responsibilities and also would not be beholden to any particular professor. What's left to do but to take more classes?

Anonymous said...

I find it ridiculous to "demand" what a TCS theory guy should know. Rather it should be more of a suggestion (on what set of courses they might want to take and how they can be really helpful) to help them build their skills/tools/understanding to solve problems in whatever sub-area they like; and in addition, interest them in other things that we find exciting and possibly will help them in solving their own problems (like Voronoi diagrams did to Michael).

The metric for a PhD should be to ensure that: (a) you have done good work (b) you are capable of doing good work.

The metric should not be (a) how many things do you know in TCS, (b) how smart you are.

Please not that this comment is not about what should be "taught" to graduate students as "core" TCS. This is a lovely and very constructive thing we do. Imposing/demanding that a TCS graduate must have finished this list is what I find really stupid. Rumor has it that Sanjeev Arora failed his WQE (Written Qualifying Examination -- designed to ensure that you've taken your "core" TCS courses and can be considered "capable" of doing research) at Berkley during his PhD TWICE... TWICE!! Berkley, later on, removed this requirement -- I have heard. I don't have any confirmations about this story, its just what I have heard.