My Biased Coin

Friday, February 26, 2010

Conflicts of Interest, Yet Again

I was just asked to serve on the ACM CoNEXT 2010 PC (I'll have to think about it -- NSDI,SIGCOMM, and CoNEXT all in one year?), and the chairs (Muriel Medard and Tim Griffin) sent along a note explaining the conference, the reviewing process, etc. I was struck by the explicit and rigorous conflict of interest policy they stated:

CONFLICT OF INTEREST GUIDELINES
=============================================================================
 A program committee member (including the chair of the committee) is
 considered to have a conflict of interest on a submission that has an
 author in any of the following categories:

   1. the person themselves;
   2. a past or current student or academic adviser;
   3. a supervisor or employee in the same line of authority within
   the past five years; 
   4. a member of the same organization (e.g., company, university,
   government agency, etc.) within the past five years; 
   5. a co-author of a paper appearing in publication within the past
   five years; 
   6. someone with whom there has been a financial relationship (e.g.,
   grants, contracts, consultancies, equity investments, stock
   options, etc.) within the past five years; 
   7. someone with whom acceptance or rejection would further the
   personal goals of the reviewer (e.g., a competitor); 
   8. a member of the same family or anyone considered a close
   personal friend; or 
   9. someone about whom, for whatever reason, their work cannot be
   evaluated objectively.

These guidelines are roughly the same (with minor variations) as what I've come to expect from other networking conferences. I feel I have to point out the remarkable difference between how conflicts are treated in the networking world and the theory world. In the theory community #1 is a standard conflict; #2 and #3 are also pretty standard although, in my experience, definitely not universally applied; and after that conflicts are generally, in my experience, up to the individual PC member to declare if they happen to feel like it.

There's been debate on this blog about the subject before, and I certainly don't mind there being more. I maintain that the theory community is far too lax in its handling of conflicts. We can certainly reasonably argue whether the true impact of conflicts in actual decisions in theory conferences is negligible or substantial -- a matter of appearance or a matter of substance. I can say that, in terms of appearance, people from the networking side (and other communities) are shocked by the lax approach adopted by the theory community.

Thanks to Muriel and Tim for allowing me to post from their document.

Thursday, February 25, 2010

STOC Budget Questions

I hate to follow the interesting conversations on FOCS/STOC/SODA (please keep commenting) with the mundane, but Lance ~~forced me~~ asked me to be General Chair for STOC, which means looking over things like the budget. Registration fees will probably end up being around $500 for early registration, $250-$275 for students. The numbers are still being played with.

Here's a bunch of questions that arise. I'm happy to hear input.

Should all PC members' expenses for the PC meeting be paid for? That works out to, roughly, $80-100 on the registration per attendee. Hotels and airfare add up, and keep in mind the way the ACM forces us to do the budget you need to budget over 100% of the nominal cost to deal with contingencies.

In many other areas, it's assumed you'll pay your own way to the PC meeting. For the networking conferences I've PC'ed, they cover meals, and usually have a very nice dinner after the work is done. For the theory conferences I've helped manage, I've usually aimed to cover everyone's meals and hotel (though the dinner is less nice than for the networking conferences...), and to cover anyone who couldn't fund their own travel. That works out to more like $40 per attendee.
Do we really need morning and afternoon coffee breaks? The afternoon coffee break every day adds something like $25 per.
When you look at fixed cost, every student who attends is actually a loss, that has to be covered from elsewhere. Is this the right way to go? (I like to think that the corporate sponsorships, from Microsoft/Google/IBM/+others, should be first thought of as going to reduce the cost of student attendance, so I think this is still the way to go.)
At what point do registration fees become a noticeable concern?

Tuesday, February 23, 2010

Guest Post from David Karger

Continuing from my last "controversial" post, David Karger offered the following long comment, turned into a guest post:

------------

I wanted to post a comment on Mike's FOCS/STOC post, but fittingly for one of the dinosaurs he mentioned it was too big for the comment length limit. Mike's been kind enough to offer to post my comment as a guest post instead.

Mike's question is one I care about a lot. I still respect theory and do work in it, but as Michael says, much of my attention has been drawn into other areas. Many of them have absolutely nothing to do with theory (see last year's ethnographic study of people's use of pencil and paper for notetaking in TOIS 2009 or our AJAX-flavored interface for visualizing and navigating semistructured data in UIST 2009).

But always, some of my favorite projects are where theory provides the answers to problems that matter in other areas. We just published a paper in Nature Genetics that used some simple applications of max-flow to help biologists visualize the "important" influences in biological netwoks (we didn't need to find NP-hard integral solutions because the scientists wanted to see all the possibilities in the mix). Before that, we applied a beautiful JACM paper of Alon etc., on finding longish paths in a graph, to a problem in natural language processing---designing a procedure to figure out a best selection and ordering of words for a machine-generated summary of a machine-generated document---and published a paper at NAACL, a linguistics conference.. I still remember thinking, when I first read Alon et al., that it was one of the prettiest and cleverest ideas I'd seen in a while, but that it would never be useful for anything practical. Ironically, when my colleague Regina Barzilay outlined the language problem to me, it was exactly the Alon problem with no need for translation; my entire contribution was to know that a solution existed (and thus keep up theoreticians' reputation for smart enough to solve any problem instantly).

Other applications of theory to practice have required more work. Mike recently wrote about our paper showing how to design "network codes" for efficient multicast; the core insight of this work was to connect it to the beautiful results that we all study in randomized algorithms courses, on finding perfect matchings by placing random numbers in a graph's Tutte Matrix. Most substantially, our line of work that led to Danny Lewin and Tom Leighton's founding of Akamai begin with a study in STOC of some theoretical problems around handling flash crowds on the internet, and also generated a whole line of research on building robust and scalable peer-to-peer systems.

With the exception of the first paper on consistent hashing, none of this work has appeared in theory conferences. I think there are several reasons for this. Selfishly, for the author it is much more fun (and valuable in generating future research leads) to present the work at non-theory conferences. It holds the same attraction as tourism, going to strange new places and learning new things from the experience. There's also vanity---the allure of being an exotic theoretician among practitioners instead of one of a crowd of better theoreticians than you. Most important, if you want your work to have an impact on the applied areas, you have to publish in their conferences so they'll pay attention---they don't read STOC/FOCS.

But the second reason is more problematic. Many theoreticians would tell you (some have certainly told me) that the above papers are "not theory". That by dint of their having applications, they are no longer suitable for STOC/FOCS. That these papers have a different home, and FOCS/STOC should be reserved for "pure" (i.e. homeless) theory research. I've been on STOC/FOCS PCs that have rejected nice applications of theory on the grounds that the theory part was too elementary.

In part they are right. The biology and NLP papers I mentioned above did not prove any new theorems. But the omission of applications papers from STOC/FOCS means that theory community is failing to celebrate one of its greatest contributions! There's always been a divide in the theory community between those who are enamored of theory problems that help them understand the deep nature of the universe and computation (scientist/mathematicians) and those who see theory as a way of thinking about solving concrete computational problems that often emerge from other areas (engineers). I think that STOC/FOCS is making a mistake by focusing too much on the science to the exclusion of the engineering.

I would really like to see more "applied algorithms" papers appearing at STOC and FOCS. These are likely papers that have not made a major theoretical advance, but rather have synthesized our existing theory knowledge into a solution to someone's particular problem. These papers are just as important to see as the ones that advance theory; they represent one of the major justifications for doing theory in the first place.

I haven't mentioned SODA yet. That is a conference that was founded in part to attract these kinds of applications papers. But at the same time it was founded to attract more of the theoretical discrete math community, and the multitude of targets makes the outcome diffuse. Possibly as a result, SODA doesn't have the stature as STOC/FOCS; I'd like to see applied algorithms appearing at our flagship conferences.

Even if the STOC/FOCS community decides to do this, we still have to deal with what I said at the beginning, that the applied conferences are often more attractive for this kind of theory. You might say "fine, if that's what they want, there's no problem." But I think there is a problem: our community, and in particular our theory graduate students are not being exposed to this important branch of theoretical computer science.

The best solution I can think of is to allow repeat submission. That is, to let the paper appear first in the applied conference, then at STOC/FOCS. Almost by definition, these two venues will not have a lot of overlap, so I really don't see a downside to presenting such a paper at both of them. There are two ways to get this by the copyright police. The first is to accept the paper for presentation but publish only a reference to it in the proceedings. The second is to ask the author to write a new version of the paper aimed at a theory audience. Given the different audience, the paper is likely to be quite different.

Monday, February 15, 2010

FOCS/STOC : What's the Big Deal?

As mentioned recently, the FOCS submission site is now up, and STOC acceptances have come out. Related blog posts have arrived, including an amusing one by Dick Lipton on if you could create a site that would automatically estimate the chances of your FOCS submission getting in.

So it seemed a good time to ask -- what's the big deal about FOCS/STOC?

FOCS/STOC are, I think, still generally thought of as theory's "flagship conferences." It's hard to get taken seriously as a theorist -- particularly when searching for your first (or even second) job -- without a reasonable number of FOCS/STOC papers under your belt. (SODA papers are, I think, now regarded as a reasonable substitute, but the lack of the heft of something in FOCS or STOC still stands out on a CV.)

But why?

When I was graduating, the colossuses that walked the earth were Jon Kleinberg and David Karger. If you look early in their careers, you'll see a steady and remarkable output of FOCS/STOC papers. (From 1995-1997, Jon had 6 STOC papers, 5 FOCS papers, and 2 SODA paper; from 1994-1996, David had 6 STOC paper, 2 FOCS papers, and 2 SODA papers.) These days, Jon's publications seemed focused on the KDD and WWW conferences, and maybe the EC conference. David, who has also always been eclectic in the most positive sense, publishes all over the place. You'll still see them appear in FOCS/STOC/SODA from time to time, certainly, but it's not where their energy seems to go. I don't find them any less colossal; they've just moved on to different things, different problems, that are generally not targeted to FOCS/STOC. Other examples include Bruce Maggs, who mostly does networking these days, and even Dick Karp, who -- while ever eclectic -- has published more in RECOMB than anywhere else the last decade. I could go on.

My point here is that many of the best theorists I know have, I would say, transcended FOCS/STOC. This does not mean there's not great stuff in FOCS/STOC; it just seems strange, given this, that these conferences are accorded such weight.

Perhaps, in fact, they're accorded less weight than I'm crediting them with these days. Certainly, the Innovations in Computer Science movement demonstrates some dissatisfaction with FOCS/STOC, and there are debates in subcommunities (SoCG, Crypto) about FOCS/STOC vs. the specialized conferences. It still seems to me, though, that FOCS/STOC is where most people would want their best theory results to appear, and it's still the lens through which fresh theory PhDs are viewed.

It seems to me that the theory community, as a whole, needs to think about FOCS/STOC/SODA and the other many conferences, and figure out what it wants them to me. FOCS and STOC haven't changed much over the years, and perhaps they've become just a bit too comfortable; the (theory) world around them has changed considerably, and it's not clear that they've adapted. Should they be the flagship conferences of theory, and if so, what does that mean, and how can they better fulfill that role? If they're not going to be the flagship conferences of theory -- which might be perfectly reasonable -- what is their role to be?

Saturday, February 13, 2010

News Roundup

This just in -- computer science students at Stanford cheat! I love this quote: "Historically, the computer science department accounts for between 20 to 60 percent of all honor-code cases, even though the courses represent about 7 percent of student enrollment." This may be because cheating in computer science is relatively easy -- cut-and-paste. But I think it's also more due to the fact that in computer science we have and use the tools to catch copying. If you're using automatic tools, it's easy to find; if you're not, then you'll find it less.

One of those "denied-tenure-leads-to-shooting" incidents, this time in Alabama. (To be clear, the details about the reasons for the shooting are still, I think, unofficial.) I always feel a twinge when I hear a story like that... it makes me glad that Harvard's tenure process is extremely super-secret. A professor's job is, naturally, very safe, so stories of students or faculty losing it like this hit home. Oh, and I like how most news articles feel important to point out the shooter was "Harvard-trained".

Not-just-Harvard with budget woes. At least we seem to be reducing our red ink at a good pace.

Speaking of budgets, we have proposed increases in the NSF budget for the coming year. Although it seems to take advantage of it, you might want to start working in energy technologies. And there will be a new NSF director; I'm not sure how that affects us academics.

Any other news of note?

Thursday, February 11, 2010

FOCS 2010 Call for Papers is Up

Luca Trevisan sent me the link for the call for FOCS 2010.

Key points: The deadline is April 7. And, in a move that I approve of, there's no page limit on submissions -- instead, "Material other than the abstract, references and the first 10 pages may be considered as supplementary and will be read at the committee's discretion." Having just submitted papers to ICALP where we had to go through the "move things to an appendix" routine, I think this is the right way to go.

Tuesday, February 09, 2010

Recent Award for Network Coding

I opened the January issue of IEEE Transactions on Information Theory and saw (what is probably old news to everyone) that Tracey Ho, Muriel Medard, Ralf Kotter, David Karger, Michelle Effros, Jun Shi, and Ben Leong have won the IEEE Communications Society and Information Society Joint paper award for their work A Random Linear Network Coding Approach to Multicast. Congratulations! Always nice to see coding work being recognized, and a "computer scientist" (Karger) winning a "EE" award for work on coding.

Sadly, it's also time to recall the passing of Ralf Kotter, who died of cancer just over a year ago. Communications Theory lost a brilliant mind and a great leader far, far too early.

Guest Post: Giorgos Zervas from WSDM, Part 3

I am back from WSDM and I have to say all in all it was a great experience. I think my talk went fairly well although I can see some ways in which it could have been better - especially if I was more experienced in public speaking.

Preparing to depart from New York, amidst rumors of a big snowstorm that never seemed to materialize, I was thinking of the other participants who didn't have the luxury of being just a four-hour drive away from home. Especially, those that had a long return flight ahead of them. After three days packed with talks, lunches, networking and even going out and enjoying what Brooklyn has to offer (a lot!), I am sure most people wouldn't have minded being teleported back. This makes me wonder: what are our main incentives for conference participation?

I am guessing one of them must be attending the actual talks. Which brings me to my next point... Sergej Sizov couldn't attend the conference and instead sent his presentation over: the usual slides accompanied by a video of him presenting the work. To be perfectly honest, I was rather negatively predisposed to the idea of a prerecorded presentation. And judging from the number of people in the auditorium I think more people may have thought the same. Yet, I was completely wrong. After 30 seconds or so, I was completely immersed and forgot that the presenter was a projection. The presentation itself was clear, finished on time and even got an applause at end - which was absolutely deserved (yet in the absence of the speaker reminded me of the awkward feeling I get when people clap in movie theaters.) I would say the only downside was that we had to skip the Q&A session. Technically, though I don't see why this couldn't have been arranged save for timezone considerations. So, if a taped delivery doesn't really compromise quality why don't we use this format more often and minimize travel? Could conference participation eventually evolve to a mixed model of in person attendance and participation over the web? I do realize the benefits of networking and meeting each other in person but do we really have to attend every single conference irrespective of cost and time issues?

Looking forward to WSDM'11 in Hong Kong!

Friday, February 05, 2010

Guest Post: Giorgos Zervas from WSDM, Part 2

Day two of WSDM was highlighted by two great presentations which I enjoyed for different reasons. I think the strong features of both could be incorporated in almost any talk.

Carlos Castillo did a fine job of presenting "An Optimization Framework for Query Recommendation" by Anagnostopoulos, Becchetti, Castillo and Gionis. My favorite part was using Cavafy and Machiavelli as presentation vehicles for two different utility functions they evaluated: the former aggregating utility along every step of a multi-step process, the latter ignoring the journey and solely caring about the value derived in the very last step. These utility functions were presented in the context of query reformulation, the query suggestions search engines provide users with to aid them in finding what they are looking for. I am not quite sure how they came up with this great metaphor but it may just be that the authors are Greek and Italian.

The second presentation I enjoyed was given by Alan Mislove. I think he nailed it by selecting just right level of abstraction for his talk. Not too much detail, but enough to maintain my interest and entice me to read their paper: "You Are Who You Know: Inferring User Profiles in Online Social Networks". The main idea here is that information that you may consider private and are unwilling to publish can potentially be inferred by information your friends reveal; not necessarily directly about you, but about themselves. Because of the homophily present in social networks what your friends say, can be telling about you. Hompophily was definitely word of the day today; it was featured in three different presentations. All in all I think this paper underlined some concerns anyone with an online presence should be having.

My only gripe so far has been the heat in the auditorium - I think, by the end of the day, it makes everyone feel more tired than they already are. But other than that WSDM has been very enjoyable so far.

PS: The Twitter feed disappeared during Thursday afternoon's sessions but was back this morning. I guess people must be enjoying it!

Guest Post: Giorgos Zervas from WSDM

Since I couldn't get myself to New York for WSDM, I asked my student Giorgos Zervas to report. This is his report from Thursday.

-----------------------

Greetings from Brooklyn.

I am here attending WSDM 2010 where on Saturday I will be presenting the work we did with John & Michael on Adaptive Weighing Designs for Keyword Value Computation. Michael asked for my grad-student perspective on the conference and I happily obliged as he offered me a decent revenue-share deal on any book sales resulting from this post.

On a more serious note, while being here, my primary concern is presenting our work in the best possible light and hopefully getting some people to read the actual paper. A secondary, but equally important concern, is that of networking. My impression is that for most conference participants time is a very scarce resource. Of course this is not limited to WSDM. The sight of a speaker surrounded after his or her talk by a bunch of people - just like myself - intimidates me and makes me feel that by adding myself to the pool, I am becoming an additional burden to someone who might have better things to do than listen to my 30 second blurb (even though personally I'd be flattered, so please surround me after my talk!). Ideally, I'd prefer interaction to be more organic and there are certainly some good opportunities for that. My question to you is: do you prefer some ways of being approached over others? How do you respond to cold introductions? Any advice on how grad-students should network at conferences? And if you are student: what do you find works for you in terms of introducing yourself to others?

On a different note, an interesting feature of WSDM has been the live Twitter feed projected behind the speaker, next to the actual presentation slides. Even though some of the tweets are insightful and they make for great conversation starters over lunch, I find the projection rather distracting. I reckon that, for 20 minutes, focus should be on the speaker and the tweets are almost impossible to ignore. Some of them are also quite repetitive (videolectures.net anyone?) Those wishing to follow the Twitter stream could always do so from their laptops and phones. What are you thoughts? Do you think this backchannel adds to the discourse? (I should point out that the WSDM community definitely seems mature enough to avoid mishaps like this.)

Finally, on the research front, and even though we are still on the first day of the conference, I've had the opportunity to attend some great talks. In particular Soumen Chakrabarti gave this morning's keynote and I found his vision of extracting structure from the unstructured web fascinating. A few papers that have grabbed my attention (in no particular order) are "Automatic Generation of Bid Phrases for Online Advertising" (Ravi et al.), "SBotMiner: Large Scale Search Bot Detection" (Yu et al.) and "Evolution of Two-Sided Markets" (Kumar et al.); I am looking forward to these and the rest of the presentations. What have been your personal favorites?

Thursday, February 04, 2010

Admissions Handling

I've been spending time on both graduate admissions and undergraduate admissions.

For graduate admissions, we've moved to an all-electronic system; the applications are all online. The system is actually a complete pain to use. Does that surprise anyone? (Some of us get our admins to download everything for us, so we don't have to use log in and navigate the system when we need to look at an application, or spend an hour or more ourselves downloading files in a system that wasn't set up to download selected files in a straightforward way.) While I'm absolutely, positively, completely sure that no candidate's confidential information has ever, ever been compromised, or ever will (I believe I've now covered myself and Harvard legally), it seems like a privacy-risk nightmare with all the applications secured by a password that has to get distributed to all the faculty. Still, with all that, it seems slightly better than the paper folder system we had before, where it seemed impossible to track which professor had which folder, never mind actually arranging for folders to be transferred among multiple faculty in a timely manner.

For undergraduate admissions, I'm asked to look at folders -- usually, I'm being used to check that Johnny or Jane's science fair project actually has some interesting science in it, or similarly vouch for math/science talent, but they seem to appreciate if I make other comments as well. It's all paper. An actual person drops folders (a few a week) off to my admin; I type up comments and my admin prints them out, puts them into the folder, and calls to get the folders picked up. Apparently, it's unusual that I type my comments; the admissions officers write their comments by hand. (Often, I can't read them, and my handwriting is worse than theirs.) I've never lost a folder, but I do hope they have back-ups in the home office just in case. (It looks to me like I'm getting the originals; I've never asked. I just assume they can't give the folder out to faculty without keeping a copy of everything.)

Both systems seem flawed, but both also seem designed to fit the way the decision-process is made. I actually like the paper system, even though it clearly requires a lot of people-hours doing background tasks like getting folders from here to there. It definitely reduces the time and effort I have to put in to review the applications -- which ostensibly should be the goal of the system, since faculty time is (ostensibly) valuable.

Tuesday, February 02, 2010

Does Class Size Matter?

Preliminary stats show 55 students in my algorithms course. That's probably close to the mean and slightly above the median. It's certainly not the largest course in the School of Engineering and Applied Sciences (SEAS), but it's up toward the top.

Why should I, or any faculty member for that matter, care about class size? For junior faculty, at least, there's a clear answer. A tenure case without some teaching of significantly sized classes is one with a weakness, opening the way to the arguments that the faculty member in question is working in an area of little interest (since nobody wants to take their classes in their area) or is providing insufficient service to the department (since they haven't taken on a large core course).

For senior faculty, I'm not so clear. While class sizes are listed in our annual review, I've heard no mention that they're considered of any particular importance -- or even of non-zero importance -- in determining annual raises. Indeed, I can't think of any direct benefit to me personally for teaching a large undergraduate class as opposed to a small one.* One might want to take on a big class to support one's department, as ostensibly money (and positions) should, in some way, follow students at the departmental level. I'd like to think that's how it works at many places; however, recent conversations with some higher-ups suggest that that connection is fairly tenuous for SEAS. Harvard's system in that respect seems to be broken. (If it wasn't, I think we'd be further ahead in our hiring in CS.)

So why should I care about my class size? Primarily, I suppose, personal pride. I take satisfaction in teaching students; the more qualified students, the better. (Not the more students the better, though; the more qualified students, the better...) Indeed, I've done the math, and while I'm quite sure Harvard does not calculate things this way, in my mathematical model I'm earning what Harvard's paying based solely on the number of students I teach. That helps me sleep at night.

Overall, however, this seems like an area where the incentive structure doesn't seem set up right. I can understand that class size isn't an end in itself; indeed, I can understand that part of the mission of the University is to preserve knowledge in areas that might be of narrow interest. (The Sanskrit, Slavic, Turkish, and Yiddish courses, for instance, have remarkably low numbers.) But it seems naive to think that size doesn't matter **, so it's slightly disturbing that when I think in terms of incentives, I'm ending up wondering why I should care about my class size at all.

* I do see a potential direct benefit for having my large graduate project class; some student projects can get turned into papers, and often students have me take part in turning their project into a paper, so I may get some research benefit from having a large graduate class. It's not clear that's a big benefit, but at least it's demonstrable.

** Yes, we all knew that was coming before the end of the post....

Sunday, January 31, 2010

Justifying Growth : We Need Better PR...

Background: As part of the "getting a new Dean" process, we're undergoing a "make a 5 year plan" process. The School of Engineering and Applied Sciences (SEAS) had a mini-retreat just before classes started to discuss it all.

I wandered into lunch the other day and saw some other faculty from SEAS -- but from well outside computer science -- already eating, so I joined them. In our friendly discussions, they asked about our growth plan, and asked me to justify it further to them. One point I brought up was that we did a lot of "service" to the rest of the university. Sure, they said, they knew about the very large intro programming class, but that was just one class. What else?

I listed off several of our other classes that they didn't seem really aware of -- our course for non-majors CS 1 (Great Ideas in Computer Science) and our Gen Ed course Bits, our more advanced programming classes CS 51 and CS 61, our new interdisciplinary visualization course CS 171 and our new course on design of usable interactive systems CS179, and probably a few others. I then mentioned that even our theory courses (introduction to complexity, and introduction to algorithms and data structures) were attracting a lot of non-majors, and pointed out that they each had 80 students last year. (We have about 30-40 or so CS majors a year right now.)

Their jaws literally dropped. One of them asked me, multiple times, how there could be 80 people at Harvard who were interested in taking Algorithms. (While I, of course, always wonder why it's so few.) I still have doubts that they believed me; I think I ought to get something official-looking from the registrar and send it to them.

Now, admittedly, last year's class was big -- my class this year looks to be a more normal about 50 or so. But I was still surprised by their surprise. I check out the class sizes around SEAS to get a feel for what's going on every year (we usually get an e-mail with course counts). Also, I'm pretty sure I mention my class size fairly often to other SEAS faculty when the opportunity arises, but apparently less often than I think. I was left with the feeling that I, and maybe the rest of the CS faculty, needed to engage the other SEAS faculty a bit more and let them know more about what we're doing. In particular, as another faculty member said to me, "When they talk about a really big class, they mean 40 students. When we talk about a really big class, we mean over 100."

I'm not exactly a shy, retiring type. (I mean, c'mon, I blog.) But I'll be upping my efforts to make sure others at Harvard have a better idea of what we're doing, especially in terms of teaching our undergraduates.

Friday, January 29, 2010

Paper updates

We updated our Swoopo paper, Information Asymmetries in Pay-Per-Bid Auctions: How Swoopo Makes Bank, mostly making small writing improvements and typo fixes that came up when we were preparing our conference submission. It's still up on the arxiv. In an effort for student promotion, I'll point you to grad-student-co-author Giorgos Zervas's web site, which also contains links to our condensed submission version and our dataset.

Also now up on the arxiv is a preprint (submitted) by myself and Raissa D'Souza on Local cluster aggregation models of explosive percolation. We were looking at variations of the Achlioptas process. (I've promised myself I'll stop making fun of Dimitris for his keen ability to get a process named after himself -- sometime around 2020.) In the basic Achlioptas process you start with an empty graph of n nodes, and at each step you choose 2 edges at random, and add one to the graph according to some deterministic rule, such as add the edge that minimizes the size of the resulting merged component. Usually there is an associated goal, such as to delay (or speed up) the emergence of a giant component in the graph. As you might expect, the power of choosing an edge allows one to potentially delay or speed up the emergence of a giant component substantially. Our paper looks at what seem to be a simpler class of similar processes, the most basic of which is that at each step you choose 2 edges at random that are adjacent to some randomly chosen vertex v. That is, we look at local variations, where the choice can be thought of as the vertex v choosing which of 2 random edges to add to the graph. It doesn't seem these local variations have been analyzed previously. We look at differential equations that describe the underlying process and empirically examine whether such processes have discontinuous phase transitions, like the original Achlioptas process appears to have. We're promoting the idea that these "local variations" may prove simpler to analyze fully and rigorously than the original variations, for which there still remain many open questions.

Addendum: Just in case there's any confusion, I should add that Dimitris in no way named the Achlioptas process after himself. (I was just teasing him.) He was the first to suggest and promote the study of the process and when others wrote the first papers about it they nicely credited him by calling it an Achlioptas process. The name has rightfully stuck.

Tuesday, January 26, 2010

Teaching, Day One

First day of classes for me today. It's "spring", sort of, so I must be teaching Algorithms and Data Structures.

Last year there were about 80 students in the class, a remarkable reversal after several years of decline -- the year before that there were fewer than 40. Of course, this was not successful for everyone, so even though our intro programming class has continued to grow, I'm expecting fewer students this year. The fall theory class (our intro complexity class) had about 60 students, and usually my numbers are a few less than that. They had about 80 students last year too. So my expectation for the final course size is in the (mid) 50s.

When I got to class I wondered if the room assignment hadn't been posted -- only about 20 people showed up. But lots of people wandered in a few minutes late, and then more and more as the class went on. The joys of Harvard's shopping period (though I can't recall ever quite such a number of late arrivals). I'm pretty sure at least 60 showed up at some point during the class; my initial estimate may be pretty close.

What's very odd this year is that on day one there's 48 people signed up for the Distance Education version of the class through the Extension School. (Still time to sign up!) That's definitely more than normal. A big fraction of those students are likely to drop out -- many quickly figure out the class is more than they can handle -- but that's still a much larger starting number than usual.

Hopefully, by next week, I'll have more accurate numbers for both versions of the class.

Lecture went fine. Very well, in fact, in that a number of students raised their hands in response to my questions, and they had, as a whole, very good answers and insights. Optimistically, I'm looking forward to an above-average class this year.

Sunday, January 24, 2010

On Formatting

Matt Welsh's recent amusing-but-also-sad post on having two recent conference submissions rejected for violating format requirements reminded me how much I hate wasting time dealing with formatting. Matt calls for a standardized template -- as he writes:

But isn't it time that we define a single standard for conference paper formatting that everyone uses?

(With such a system, conferences could still vary the length of the paper they accepted -- but the style file would be the same.)

I think that would be a great step. In the same spirit, I'd also like to see less focus on page limits -- both for submissions and for the final version. (Ostensibly, the final version should be closely related to the actual submission -- so if you're going to relax page limits for the final version, it seems best to do so for the submission version as well.) I think the recent change for SODA -- allowing up to 20 page conference papers -- is a great idea.

I understand that, in some cases (where printing is involved), some sort of page limit may be needed -- although with less and less printing of full proceedings, it's less clear that this is an important consideration. I also believe that page limits can force people to improve their writing -- bad writing and excessively wordy writing generally go together. (If we remove page limits, we'll have to tell people in reviews more frequently when they need to cut things down or even out in their papers.)

But the effort spent on meeting arbitrary page limits -- just the time spent on formatting -- seems silly at this point. And often it forces you to cut content. I can't count the number of times I've had a review say, "You should have included this...", where my response would be, "We did include this, but had to cut it to make it fit into the page limit..." (Yes, you can create multiple versions, and post the full versions online -- except for double-blind conferences, like SIGCOMM, of course -- but that involves yet further overhead, when you have to update to deal with reviewer comments, and...) And if your paper is rejected from one conference, to submit to another, you have to re-format and create another version to meet their arbitrary format and page limit standards.

In some ways, I'm glad that Matt is out there, talking about the papers being rejected for formatting. And I hope it starts happening to more people, more often. Because I worry that the only way for change to happen is for bad things to start happening to good papers so that people will start to realize that the current system is just bizarre and broken, and a revolution needs to occur. Perhaps I'm wrong, and slow evolutionary change -- with 20 page papers a SODA being an example -- will lead us the right way. If so, I encourage PC chairs to experiment with flexibility in their formatting requirements.

Tuesday, January 19, 2010

Housework

The Chronicle of Higher Education has an article up titled Time Crunch for Female Scientists : They Do More Housework Than Men. Worth reading, understanding, and discussing, although I admit some trepidation, as such posts generally receive some bizarre commentary. (One point of note; this is not just an issue that affects scientists, but "...because a successful scientific career demands more than 40 hours a week, she said, female scientists could be especially affected.")

Monday, January 18, 2010

New TCS postdocs/jobs website at CCI

There's a new page http://intractibility.princeton.edu/jobs/ that was set up as a centralized location for advertising postdocs and jobs for theoretical computer science. There's been quite a bit of talk for some time that TCS could use a site like this, so I'd like to advertise it here and encourage people to use it. And thanks to those who are putting in the work behind the scenes.

Thursday, January 14, 2010

Letters and Rights

I generally assume (undergraduate) students know to waive their rights to look at a letter of recommendation when they ask me to write one, but I recently ran into a student that did not. I explained that my default was that they had to waive their rights, and I'd send in the letter after the checkbox had been changed. If you don't look, though, I find it's easy to miss which box they've checked as you're filling the form out on line.

I simply prefer my confidential letters to be confidential. I understand that waiving their rights is not any sort of guarantee that they won't see the letter, but I think the understanding is important: I'm writing an evaluation of them to my colleagues, and that information is, as a matter of default policy, not meant for their eyes, so I can be forthright in expressing my opinions.

Wednesday, January 13, 2010

Algorithms and Data Structures : Course Goals

Richard Lipton has an excellent blog post up right now on "What should a course on complexity theory cover?" where he describes what he is putting in his graduate course and what he expects students to get out of it. It seemed like a great idea, making me think that I should write a similar post for my undergraduate course (Algorithms and Data Structures).

I have a somewhat different take on the problem than Richard, though. He ends up creating a list of things (definitions and theorems) he thinks student should know after his class. I admit I think a bit more in terms of high-level concepts and skills I want students to get out of the course. (I think both approaches can work.) My list includes:

1. Students should understand the basics of algorithm/data structure language and notation; in particular: order notation, what it means (and doesn't mean!), and how to calculate running times of algorithms.
2. Students should learn how to model a problem through proper abstraction. In particular, they should learn how to set problems into the language of graphs, linear programs, etc.
3. Students should learn basic algorithmic techniques: greedy, divide and conquer, dynamic programming, linear programming.
4. Students should learn what is required for a formal and rigorous argument (even if they are not so good at writing one themselves). They should learn how to argue/prove correctness of algorithms and data structures in the contexts above.
5. Students should clearly understand the power of reductions -- not just to show problems are hard, but to show problems are easy, by using the solution of one problem to solve another.
6. Students should obtain some insight into the power of randomness in algorithmic thinking.
7. Students should practice translating algorithmic ideas into working code. It is hoped they will learn the importance of thinking before coding (choosing the right approach), and that issues such as memory requirements and running time can and should be reasoned about before implementation.
8. Students should learn that correctness, like computation time and memory, is just a "resource" that can be traded off with the others, by learning the notion of an approximation algorithm.
9. Students should learn the basic of heuristics and heuristic techniques, so that if asked in practice to solve an NP-hard problem (in the sense of coming up with a very good but not optimal solution), they have some idea of what approaches might give them reasonable answers.
10. Students should see some really cool, unusual, interesting algorithms and data structures, and be forced to work on some really hard, challenging problems, to challenge their thinking and demonstrate the power of theory.

There's probably more I could add, but those seem like ten clear objectives I have for the students who take my class. Perhaps in another post I can elaborate on how I try to see those objective are met.

Like Richard, I should end the post by asking the important open questions:
What have I left out? What should I leave out? What do you think?