Sunday, January 31, 2010

Justifying Growth : We Need Better PR...

Background: As part of the "getting a new Dean" process, we're undergoing a "make a 5 year plan" process. The School of Engineering and Applied Sciences (SEAS) had a mini-retreat just before classes started to discuss it all.

I wandered into lunch the other day and saw some other faculty from SEAS -- but from well outside computer science -- already eating, so I joined them. In our friendly discussions, they asked about our growth plan, and asked me to justify it further to them. One point I brought up was that we did a lot of "service" to the rest of the university. Sure, they said, they knew about the very large intro programming class, but that was just one class. What else?

I listed off several of our other classes that they didn't seem really aware of -- our course for non-majors CS 1 (Great Ideas in Computer Science) and our Gen Ed course Bits, our more advanced programming classes CS 51 and CS 61, our new interdisciplinary visualization course CS 171 and our new course on design of usable interactive systems CS179, and probably a few others. I then mentioned that even our theory courses (introduction to complexity, and introduction to algorithms and data structures) were attracting a lot of non-majors, and pointed out that they each had 80 students last year. (We have about 30-40 or so CS majors a year right now.)

Their jaws literally dropped. One of them asked me, multiple times, how there could be 80 people at Harvard who were interested in taking Algorithms. (While I, of course, always wonder why it's so few.) I still have doubts that they believed me; I think I ought to get something official-looking from the registrar and send it to them.

Now, admittedly, last year's class was big -- my class this year looks to be a more normal about 50 or so. But I was still surprised by their surprise. I check out the class sizes around SEAS to get a feel for what's going on every year (we usually get an e-mail with course counts). Also, I'm pretty sure I mention my class size fairly often to other SEAS faculty when the opportunity arises, but apparently less often than I think. I was left with the feeling that I, and maybe the rest of the CS faculty, needed to engage the other SEAS faculty a bit more and let them know more about what we're doing. In particular, as another faculty member said to me, "When they talk about a really big class, they mean 40 students. When we talk about a really big class, we mean over 100."

I'm not exactly a shy, retiring type. (I mean, c'mon, I blog.) But I'll be upping my efforts to make sure others at Harvard have a better idea of what we're doing, especially in terms of teaching our undergraduates.

Friday, January 29, 2010

Paper updates

We updated our Swoopo paper, Information Asymmetries in Pay-Per-Bid Auctions: How Swoopo Makes Bank, mostly making small writing improvements and typo fixes that came up when we were preparing our conference submission. It's still up on the arxiv. In an effort for student promotion, I'll point you to grad-student-co-author Giorgos Zervas's web site, which also contains links to our condensed submission version and our dataset.

Also now up on the arxiv is a preprint (submitted) by myself and Raissa D'Souza on Local cluster aggregation models of explosive percolation. We were looking at variations of the Achlioptas process. (I've promised myself I'll stop making fun of Dimitris for his keen ability to get a process named after himself -- sometime around 2020.) In the basic Achlioptas process you start with an empty graph of n nodes, and at each step you choose 2 edges at random, and add one to the graph according to some deterministic rule, such as add the edge that minimizes the size of the resulting merged component. Usually there is an associated goal, such as to delay (or speed up) the emergence of a giant component in the graph. As you might expect, the power of choosing an edge allows one to potentially delay or speed up the emergence of a giant component substantially. Our paper looks at what seem to be a simpler class of similar processes, the most basic of which is that at each step you choose 2 edges at random that are adjacent to some randomly chosen vertex v. That is, we look at local variations, where the choice can be thought of as the vertex v choosing which of 2 random edges to add to the graph. It doesn't seem these local variations have been analyzed previously. We look at differential equations that describe the underlying process and empirically examine whether such processes have discontinuous phase transitions, like the original Achlioptas process appears to have. We're promoting the idea that these "local variations" may prove simpler to analyze fully and rigorously than the original variations, for which there still remain many open questions.

Addendum: Just in case there's any confusion, I should add that Dimitris in no way named the Achlioptas process after himself. (I was just teasing him.) He was the first to suggest and promote the study of the process and when others wrote the first papers about it they nicely credited him by calling it an Achlioptas process. The name has rightfully stuck.

Tuesday, January 26, 2010

Teaching, Day One

First day of classes for me today. It's "spring", sort of, so I must be teaching Algorithms and Data Structures.

Last year there were about 80 students in the class, a remarkable reversal after several years of decline -- the year before that there were fewer than 40. Of course, this was not successful for everyone, so even though our intro programming class has continued to grow, I'm expecting fewer students this year. The fall theory class (our intro complexity class) had about 60 students, and usually my numbers are a few less than that. They had about 80 students last year too. So my expectation for the final course size is in the (mid) 50s.

When I got to class I wondered if the room assignment hadn't been posted -- only about 20 people showed up. But lots of people wandered in a few minutes late, and then more and more as the class went on. The joys of Harvard's shopping period (though I can't recall ever quite such a number of late arrivals). I'm pretty sure at least 60 showed up at some point during the class; my initial estimate may be pretty close.

What's very odd this year is that on day one there's 48 people signed up for the Distance Education version of the class through the Extension School. (Still time to sign up!) That's definitely more than normal. A big fraction of those students are likely to drop out -- many quickly figure out the class is more than they can handle -- but that's still a much larger starting number than usual.

Hopefully, by next week, I'll have more accurate numbers for both versions of the class.

Lecture went fine. Very well, in fact, in that a number of students raised their hands in response to my questions, and they had, as a whole, very good answers and insights. Optimistically, I'm looking forward to an above-average class this year.

Sunday, January 24, 2010

On Formatting

Matt Welsh's recent amusing-but-also-sad post on having two recent conference submissions rejected for violating format requirements reminded me how much I hate wasting time dealing with formatting. Matt calls for a standardized template -- as he writes:

But isn't it time that we define a single standard for conference paper formatting that everyone uses?

(With such a system, conferences could still vary the length of the paper they accepted -- but the style file would be the same.)

I think that would be a great step. In the same spirit, I'd also like to see less focus on page limits -- both for submissions and for the final version. (Ostensibly, the final version should be closely related to the actual submission -- so if you're going to relax page limits for the final version, it seems best to do so for the submission version as well.) I think the recent change for SODA -- allowing up to 20 page conference papers -- is a great idea.

I understand that, in some cases (where printing is involved), some sort of page limit may be needed -- although with less and less printing of full proceedings, it's less clear that this is an important consideration. I also believe that page limits can force people to improve their writing -- bad writing and excessively wordy writing generally go together. (If we remove page limits, we'll have to tell people in reviews more frequently when they need to cut things down or even out in their papers.)

But the effort spent on meeting arbitrary page limits -- just the time spent on formatting -- seems silly at this point. And often it forces you to cut content. I can't count the number of times I've had a review say, "You should have included this...", where my response would be, "We did include this, but had to cut it to make it fit into the page limit..." (Yes, you can create multiple versions, and post the full versions online -- except for double-blind conferences, like SIGCOMM, of course -- but that involves yet further overhead, when you have to update to deal with reviewer comments, and...) And if your paper is rejected from one conference, to submit to another, you have to re-format and create another version to meet their arbitrary format and page limit standards.

In some ways, I'm glad that Matt is out there, talking about the papers being rejected for formatting. And I hope it starts happening to more people, more often. Because I worry that the only way for change to happen is for bad things to start happening to good papers so that people will start to realize that the current system is just bizarre and broken, and a revolution needs to occur. Perhaps I'm wrong, and slow evolutionary change -- with 20 page papers a SODA being an example -- will lead us the right way. If so, I encourage PC chairs to experiment with flexibility in their formatting requirements.

Tuesday, January 19, 2010

Housework

The Chronicle of Higher Education has an article up titled Time Crunch for Female Scientists : They Do More Housework Than Men. Worth reading, understanding, and discussing, although I admit some trepidation, as such posts generally receive some bizarre commentary. (One point of note; this is not just an issue that affects scientists, but "...because a successful scientific career demands more than 40 hours a week, she said, female scientists could be especially affected.")

Monday, January 18, 2010

New TCS postdocs/jobs website at CCI

There's a new page http://intractibility.princeton.edu/jobs/ that was set up as a centralized location for advertising postdocs and jobs for theoretical computer science. There's been quite a bit of talk for some time that TCS could use a site like this, so I'd like to advertise it here and encourage people to use it. And thanks to those who are putting in the work behind the scenes.

Thursday, January 14, 2010

Letters and Rights

I generally assume (undergraduate) students know to waive their rights to look at a letter of recommendation when they ask me to write one, but I recently ran into a student that did not. I explained that my default was that they had to waive their rights, and I'd send in the letter after the checkbox had been changed. If you don't look, though, I find it's easy to miss which box they've checked as you're filling the form out on line.

I simply prefer my confidential letters to be confidential. I understand that waiving their rights is not any sort of guarantee that they won't see the letter, but I think the understanding is important: I'm writing an evaluation of them to my colleagues, and that information is, as a matter of default policy, not meant for their eyes, so I can be forthright in expressing my opinions.

Wednesday, January 13, 2010

Algorithms and Data Structures : Course Goals

Richard Lipton has an excellent blog post up right now on "What should a course on complexity theory cover?" where he describes what he is putting in his graduate course and what he expects students to get out of it. It seemed like a great idea, making me think that I should write a similar post for my undergraduate course (Algorithms and Data Structures).

I have a somewhat different take on the problem than Richard, though. He ends up creating a list of things (definitions and theorems) he thinks student should know after his class. I admit I think a bit more in terms of high-level concepts and skills I want students to get out of the course. (I think both approaches can work.) My list includes:

1. Students should understand the basics of algorithm/data structure language and notation; in particular: order notation, what it means (and doesn't mean!), and how to calculate running times of algorithms.
2. Students should learn how to model a problem through proper abstraction. In particular, they should learn how to set problems into the language of graphs, linear programs, etc.
3. Students should learn basic algorithmic techniques: greedy, divide and conquer, dynamic programming, linear programming.
4. Students should learn what is required for a formal and rigorous argument (even if they are not so good at writing one themselves). They should learn how to argue/prove correctness of algorithms and data structures in the contexts above.
5. Students should clearly understand the power of reductions -- not just to show problems are hard, but to show problems are easy, by using the solution of one problem to solve another.
6. Students should obtain some insight into the power of randomness in algorithmic thinking.
7. Students should practice translating algorithmic ideas into working code. It is hoped they will learn the importance of thinking before coding (choosing the right approach), and that issues such as memory requirements and running time can and should be reasoned about before implementation.
8. Students should learn that correctness, like computation time and memory, is just a "resource" that can be traded off with the others, by learning the notion of an approximation algorithm.
9. Students should learn the basic of heuristics and heuristic techniques, so that if asked in practice to solve an NP-hard problem (in the sense of coming up with a very good but not optimal solution), they have some idea of what approaches might give them reasonable answers.
10. Students should see some really cool, unusual, interesting algorithms and data structures, and be forced to work on some really hard, challenging problems, to challenge their thinking and demonstrate the power of theory.

There's probably more I could add, but those seem like ten clear objectives I have for the students who take my class. Perhaps in another post I can elaborate on how I try to see those objective are met.

Like Richard, I should end the post by asking the important open questions:
What have I left out? What should I leave out? What do you think?

Tuesday, January 12, 2010

New Harry Lewis Opinion at HuffPo

Harry Lewis has a new opinion piece up at Huffington Post. The title,
Larry Summers, Robert Rubin : Will The Harvard Shadow Elite Bankrupt The University And The Country?
gives some idea of where he's coming from...

Saturday, January 09, 2010

Visiting Dartmouth for a Research Symposium

I spent the day at Dartmouth, as I was asked to be "keynote speaker" for their Computer Science Annual Research Symposium. Essentially, it's their own mini-conference for a day, with a mix of faculty and graduate student talks as well as posters, both giving graduate students a chance to present and letting everyone get a better idea of what everyone is working on. They're giving best student talk and best poster awards to motivate students.

They've done a very nice job in organizing it - they seem to have corralled most students and faculty into coming, and they have brought in huge quantities of food (a known motivator for student attendance). [Note to self: Dartmouth CS apparently gets its breakfast muffins from "Lou's". Those things are amazingly good.] The presentations are going well and I'm sure it's a useful experience for students. So I'm trying to learn from this how we might do something similar at Harvard.

We used to have something like this at Harvard -- we had an "Industrial Partners" day where we'd try to bring people from labs/companies in and do posters and presentations (including some by graduate students) in a similar fashion. At some point, it fell below critical mass, but we're perennially thinking of how to bring something like it back. You do have to get people to commit to it, and to find time for it -- notice Dartmouth has chosen to hold it on a Saturday, and at a time where the campus is otherwise pretty dead. Maybe that's the approach that would work for us. I've found it's hard to schedule anything on a regular M-F time slot with a group of faculty, even for a hour, never mind a whole day.

Another issue with scheduling a research symposium is that we, already, suffer from talk overload. We have our own colloquium; the various groups (theory, AI, systems) run their own seminars or lunch-talks; there's plenty of other talks around Harvard run by various departments or organizations; and there's a seemingly endless number of talks nearby (at MIT, Microsoft NE, and other places). The problem about having so many talks is you start to lose interest in organizing more talks, and arguably the graduate students benefit at least as much by presenting in their own group seminars. But visiting Dartmouth today has given me some further evidence that the good that can come out of a symposium day like this might make it worthwhile. Perhaps the key is to let the graduate students plan and run the day, so that they own it, and feel responsible for making it a success.

How many of you have a similar research symposium day, how does it work, and what do you think of it?

Tuesday, January 05, 2010

New Paper: Information Asymmetries in Pay-Per-Bid Auctions

I'm happy to announce a new paper that answers, among other questions, how I spent a huge chunk of my winter holiday break. The paper is by me, John Byers, and the graduate student we're co-advising, Giorgos Zervas, and is entitled Information Asymmetries in Pay-Per-Bid Auctions : How Swoopo Makes Bank (arxiv link, pdf). As you might guess from the title, the paper is about pay-per-bid auctions, the current popular example of which is Swoopo, which you might want to examine if only for cultural awareness. (Besides the Swoopo site, you can read a bit about Swoopo from major news sites here and here.) The trick in pay-per-bid auctions is the seller doesn't make the big money on the final item price, but on the bids themselves, which have a cost. The paper was a lot of work, but it was also a lot of fun both to work on and write; hopefully some of you will find it nearly as fun to read. The full version right now is about 40 pages; a shorter version will be submitted to EC 2010 next week. Our abstract provides a good opening description:

Innovative auction methods can be exploited to increase profits, with Shubik's famous "dollar auction" perhaps being the most widely known example. Recently, some mainstream e-commerce web sites have apparently achieved the same end on a much broader scale, by using "pay-per-bid" auctions to sell items, from video games to bars of gold. In these auctions, bidders incur a cost for placing each bid in addition to (or sometimes in lieu of) the winner's final purchase cost. Thus even when a winner's purchase cost is a small fraction of the item's intrinsic value, the auctioneer can still profit handsomely from the bid fees. Our work provides novel analyses for these auctions, based on both modeling and datasets derived from auctions at Swoopo.com, the leading pay-per-bid auction site. While previous modeling work predicts profit-free equilibria, we analyze the impact of information asymmetry broadly, as well as Swoopo features such as bidpacks and the Swoop it Now option specifically, to quantify the effects of imperfect information in these auctions. We find that even small asymmetries across players (cheaper bids, better estimates of other players' intent, different valuations of items, fully committed players willing to play "chicken") can increase the auction duration well beyond that predicted by previous work and thus skew the auctioneer's profit disproportionately. Finally, we discuss our findings in the context of a dataset of thousands of live auctions we observed on Swoopo, which enables us also to examine behavioral factors, such as the power of aggressive bidding. Ultimately, our findings show that even with fully rational players, if players overlook or are unaware any of these factors, the result is outsized profits for pay-per-bid auctioneers.
Here's a bit more description. As a starting point, there have been a few working papers that present essentially the same basic equilibrium analysis of Swoopo auctions. A problem is that the analysis results in zero profit for Swoopo, and it's easy to check that, instead, Swoopo appears to be raking in the bucks. The working papers offer some possible solutions, but they seemed (to us) fairly unconvincing.

Our starting point was looking at the assumptions of the model, which require remarkable symmetry: the players all know how many other players are in the auction, they all have the same value for the item, and they all have the same cost to make a bid. We argue that in practice none of these assumptions are viable, and in fact there are huge asymmetries (particularly asymmetries in information) that throw off the basic model. A fun example is that if some players pay less to bid than others, and the others don't know or realize that, the profit in a pay-per-bid auction grows. So auction sites can (potentially) make money by giving some people a cheaper price for bids! Analyzing these asymmetries, we find huge profit potential for pay-per-bid auctioneers that may explain Swoopo's profitability. We also consider the very interesting Swoop It Now feature, which lets users turn bids in a lost auction into a down payment for the auction item, and show how it can naturally lead to Swoopo auctions turning into games of chicken -- which also seems to benefit Swoopo's bottom line.

Sunday, January 03, 2010

How Much Grading Should Professors Do?

One of the outcomes of the Harvard budget crisis is that the budget for Teaching Assistants has been brought into line. At Harvard, we've been ridiculously spoiled with TAs for many years now, with a TA for roughly every 12-15 students; last year, I was able to get 6 undergrad TAs for 80 students in my Algorithms and Data Structures class. This year, I'll have a more reasonable TA for every 18-20 students or so. Yes, I know that remains quite luxurious compared to many places; I'm not complaining. But it is a change I have to deal with.

One of the big responsibilities for my TAs is grading. I assign a fairly substantial load of homework in my undergrad class, and unfortunately the number of problems where you can just check the answer without reading through the student's work is small. (They often have to prove things.) And even checking answers takes a long time. Given the new TA/student ratio, it seems unfair (and, quite frankly, unworkable, assuming TAs stick to anything close to their supposed working hours) to have the TAs grade the same homeworks in the past. So, it seems, I'll have to change something. The natural options seem to be:

1) Reduce the assignments. Take what would in the past have been the hardest problem on each assignment and make it an ungraded "challenge problem", for instance.
2) Change the scope/style of the assignments. Less proof-type work, more numerical exercises where you can just check the final answer.
3) Introduce probabilistic grading. Only 5 of the 6 problems will be graded -- you just don't know which 5 in advance.
4) Allow people to work in pairs for assignments. Fewer writeups to grade.
5) Grade more myself.

The first four options all have fairly big negatives associated with them. (Actually, I don't have a problem with probabilistic grading, but I have no doubt it would cause bitter complaints from the students, even if I gave a lecture explaining why it was a reasonable approach. There would always be students coming up afterward to complain that they would have gotten a better grade if I just graded all of their problems, and I don't look forward to having to explain the policy to higher-ups. And working in pairs isn't necessarily a negative, but they can already talk about problems together, and they can work in pairs for programming assignments; I think they should practice writing up proofs themselves.)

The main downside to the final option is, of course, to me personally. I do, already, do some of the grading. It's very time-consuming. Even if the time per assignment is small, multiply by the number of students (I expect 60 this year) and we're talking a good number of hours.

So how much grading should a professor do? How much are others of you doing? Or does anyone have other creative suggestions for solutions before the semester starts?