Tuesday, December 30, 2008
Now, while I don't actually believe that positive thinking alone will allow me to prove P = NP (or the other way) in the coming year -- or, for that matter, win me a lottery! -- I do believe that the act of thinking clearly about the goals you want to achieve, and keeping them firmly in your mind, increases the probability that you will actually accomplish these goals. I personally find that when I set myself goals over multiple time scales -- a task list for the day, for the month, and for the year -- I'm surprisingly much better about getting things done. When I get distracted from setting goals, less happens. The conscious effort of writing tasks down and reminding myself of them makes them easier to accomplish.
So I encourage all my readers to take some time around the New Year and set some tangible, if difficult, work-related goals for the coming year. Maybe it's time to learn a new area or work with that person you've always wanted to work with. Or there's that result you know is just out of reach -- but you should keep reaching for it. Post them somewhere, remind yourself of them, and work to make them come true. I'd bet more do than you'd first think.
Whatever your goals are, best of luck with them, and for the New Year.
Tuesday, December 23, 2008
100 additional papers -- a bit over 7% of the original submissions -- were apparently accepted to the miniconference, covering most of that "top 20-30%" range. The main difference appears to be the labelling (INFOCOM miniconference, not INFOCOM) and that the paper will be limited to 5 pages in the proceedings.
While I'm happy the paper got in, I must admit, I don't understand the reasoning behind the INFOCOM Miniconference, and I hope some readers in the know will explain and elaborate. If the purpose is to have a 2-tier conference, it seems an odd structure -- why not just accept 25-30% of the papers? (A few hundred pages in the proceedings wouldn't seem to matter much since it's on a CD?)
I wonder if theory conferences like STOC, FOCS, or SODA should adopt some sort of 2-tier structure in order to accept more papers. Certainly most people who have papers rejected from these conferences (myself included) believe they should have gotten in, and some fraction of them are probably right. On the other hand, such a structure would seem to lessen the prestige associated with these conferences. Any opinions?
Monday, December 22, 2008
The INFOCOM mail said 282 papers were accepted from 1435 submissions (post-withdrawals). A quick check shows that INFOCOM has been below a 20% acceptance rate regularly in recent years, and even assuming a completely unverified estimate that 10-25% of the submissions are things that really shouldn't have been submitted in the first place, in my opinion that's still a pretty low acceptance rate for what's supposed to be the big, open-tent networking conference of the year. (In the 1990s, the acceptance rate was more commonly around 30%.) I'm sure there were a lot of good papers that got rejected this time around.
Most networking conferences have acceptance rates around 20%. Is this a good thing? Conference competitiveness has been blogged about before, but there doesn't seem to be much of a high-level discussion about the issue -- I recently saw Ken Birman and Fred Schneier wrote an article about it for Communications of the ACM. Any ideas out there?
Thursday, December 18, 2008
Graduate school is a long stretch of time -- 5 years (or more) for most people. There are few clear goals during that time, although the obvious one is to learn how to do good research, with the hope of getting a tenure-track faculty job. With this somewhat singular -- and difficult -- goal, it's easy to fall into the extreme of focusing only on your research to the exclusion of most everything else, or to waste a lot of time not really working. Both of these things may be OK for various individuals. (As some will undoubtedly respond, if you're doing great research, it can excuse a lack of many other skills. And many people don't mind spending an extra year at graduate school with a more relaxed lifestyle than life after graduate school.) But with the new year approaching, I thought it worthwhile to suggest some of the additional skills one should try to develop in graduate school over those stretches where you need to break from research -- skills which, unfortunately and understandably, are often given short shrift by the university. (Please add to the list in comments.)
1) Time management: How much are you working each day? (And how much time do you waste reading -- or worse yet, writing -- blogs?) Even if you don't set yourself to a regular 9-5 or 10-6 schedule, it's a good time to learn to manage your working and non-working patterns. My suspicion is that people who manage a regular work schedule graduate on average a semester or year earlier.
2) Writing/speaking: If ideas are our business, idea presentation is a big contributor to the bottom line. And if you want a faculty position, the ability to give a good talk to a general audience goes a long way. If your institution doesn't have a program for improving writing and speaking, start your own (like a student seminar series, no faculty invited).
3) Leadership: Find a way to lead a research project -- maybe advising/mentoring some undergraduates. Or organize a club or student group to make your department a better place to be. Eventually, the ability to organize people to follow your goals will make you more productive.
4) Entrepreneurship: Have you looked at the economy? And professor's salaries? Graduate school is where you're supposed to learn to be creative, and to develop specialized skills. It's quite reasonable to spend some of those creative efforts or utilize those specialized skills on money-making endeavors. While it's not for everyone, for some the tangible reward of money helps unleash creativity; for others, you may learn the satisfying lesson that your intellectual achievements bring you higher rewards than a big paycheck could (a lesson worth learning early on).
5) The skill to learn additional skills: If you're a theorist, learn to program a little. If you're a systems person, learn some probability or other theory. Maybe set aside a few days to learn time-saving Latex tricks, or some other piece of useful software. There are plenty of skills that will make you a better researcher/teacher/writer in the future.
Monday, December 15, 2008
1) Face-to-face PC meetings involve far too much sitting. Especially if you fly in and out on the same day. (I know there's a time tradeoff in scheduling an "exercise break", but seriously...)
2) Conferences with 20% acceptance rates are, by their nature, a bit depressing -- it's hard to reject so many papers, some of which simply MUST be pretty good.
3) It's easier to argue how a paper is flawed than to argue about how it's making an important contribution.
4) Taking reviewer expertise into account is important, and one outlier can cause problems; sometimes papers live on longer than they should if one reviewer gives too high a score, and sometimes papers are put way lower in the ordered list than they should be because one reviewer gives too low a score. (Of course, one expert reviewer can also bring an otherwise ignored paper back to life.)
5) There are more interesting papers than paper slots.
6) There's generally plenty of down time, when papers you didn't read are being discussed; bring something to work on quietly (but pay attention to what's going on).
7) You really do learn a lot reading 20-30 papers for a conference PC.
8) As you go down the paper list by score rank, eventually (and sooner than you think) you hit a paper that you start to question -- are these scores too high? And as you go up the list from the bottom, you'll hit a paper where you question -- are these scores too low? The rules of randomness tell us that some papers will get comparatively mis-scored in the first round of reviews, so it is good to stop and talk about the papers (and not just take the first X).
9) Hot trends come and go.
10) A steady supply of drinks (mostly caffeinated) and a good lunch can help the PC move happily along.
Thanks to the PC chairs and other members -- I had a good time. But now I'm very tired...
Sunday, December 14, 2008
First, it's very "civilized" -- I had only 20 papers to read the first round, and then had 5 more for the second round. (The first round was designed to get each paper 3 reviews; the second round was for paper missing reviews, or where the scores suggested another review would be helpful.) That's not too much, compared to most theory conferences.
Second, there are a good number of "algorithmically" oriented papers for me to read. Overall, networking has become a lot more theoretical, which is good. It still seems to me, though, for a network-oriented conference, you have to be careful not to go overboard with the theory. They want results -- backed by theory, preferably -- but at the end of the day, it's the results that matter. As usual when serving on a PC, seeing how it works gives insight into how to frame my own papers.
Third, one thing that's impressed me is how long and detailed the reviews are for this conference. I tend to write shorter reviews, covering what I think the high order points are. (And one thing that has been interesting -- there's generally a lot of agreement on these high order points.) But most reviewers go into a lot more detail -- the average review is at least a good page plus of text. Very different than what I usually find in theory conferences -- although I know there's a push to improve that.
I'm not sure why the culture of networking conferences has led to more detailed reviews. Fewer papers per PC member probably helps; maybe because few papers go on to journal papers (but that's equally true in theory, I think). But it's a marked and interesting change.
Anyhow, now I'm looking forward to SIGCOMM... except that I'll need to be ready to write longer reviews.
Friday, December 12, 2008
I was wondering if there would also be an effect on summer internship programs. Summer interns are often a different budget line item, but it's hard to believe that the Microsoft/Google/Yahoo/everywhere else programs, for both undergraduates and graduates, won't be curtailed in this environment.
I haven't heard anything about summer internships yet, and though it's a bit early, late December/early January is usually when I start get reminders from people to have good students apply for the summer. Can anyone comment (anonymously if needed) if they have any actual information?
Wednesday, December 03, 2008
One of the scams he gave for pay per conversion schemes is related to this blog. I often encourage people to buy my book on randomized algorithms -- to remind you all, the Amazon link is here. Now, when I put on that Amazon link, I've embedded a link that includes my Amazon Associates information, so if you buy the book, I get a small cut. Very small.** In fact, if you buy anything else after clicking that link, I get a cut -- I actually think this link alone puts a cookie on your system, so if you happen to buy on anything on Amazon for the next week or so, I get a cut. I think I got about $100 in store credit from Amazon last year, from people buying the book off my home page or buying things after hitting an Amazon link on the blog.
So again, I'll encourage you to click that link. (Heck, really, while you're there, just buy the book!)
For me naturally the gain of doing this is small, but apparently, scammers have found a way to make big money off this sort of thing. They have sneaky ways of getting these cookies onto your system -- not just for Amazon, but for other vendors also, multiple vendors at a time -- so if you click one of their links, and then buy things on the Web, they get a cut for "recommending" the purchase to you, even though they've really done nothing of the sort. When you think about it, a 5% cut from the purchases of a million Amazon customers can really add up...
If Ben Edelman happens to be coming to present at a venue near you, I highly recommend you go. And that's not false advertising.
**Actually, not so small. I get almost as much from Amazon as I do from the publisher if you buy the book through the link. Of course, I don't get much from the publisher either.
Wednesday, November 26, 2008
I try not to continuously send in NSF proposals, but it's seemed like that's the way it's been going recently. I'd prefer to metaphorically let the tank get close to empty as far as funds go, then apply and refill the tank, so I could spend less effort on the grant process. But there's ever-growing competition, and smaller amounts available. And given the long lag between calls, and the high risk factor each year, I feel the need to be more proactive than even a few years back. Especially with the economy tanking, so I'm not clear if corporate money will be available in the coming year.
Anyhow, since a proposal did go in yesterday, I can relax over the holiday weekend. I hope all of you will be as well.
Update -- someone at the site noticed and took down the original page (although it's still in the Google cache here.)
Monday, November 24, 2008
I am writing to tell you about a new Center for Computational Intractability that is funded by the National Science Foundation and is a collaborative effort between Princeton University, Institute for Advanced Study, Rutgers University, and New York University. The PIs are Allender, Arora, Barak, Charikar, Chazelle, Impagliazzo, Khot, Naor, Saks, Szegedy,Wigderson, and Tarjan.
The Center's mission views "intractability" fairly broadly, and covers both algorithms and complexity theory as well as other topics in theoretical CS. I would like to draw your attention specifically to the following: (a) Visiting possibilities for short-to-medium term (up to 1 year). We can pay travel costs and salary. (b) Several workshops a year, with possibilities to visit the center at the same time. (We can pay expenses.) (c) Postdoctoral positions both at the center and at the affiliated institutions. (d) Streaming videos of all center activities and seminars and other web-based resources.
I invite you to visit our website http://intractability.
I was wondering if you could forward this email to others in your department (including students and postdocs) who may be interested in the center's activities. Please note that the deadline for postdoc applications is December 15, and the deadline for postdoc applications to IAS is December 1.
With best regards,
Tuesday, November 18, 2008
Thanks to everyone who withdrew the papers they weren't submitting. Otherwise, I had to go through and withdraw them manually myself.
Thanks to Shai Halevi for continuing to help me with the reviewing system.
Yes, I did get mail from about 10 people who hadn't known they'd need to file an abstract the week before, and I accommodated them. I think this should become a standard and everyone should get used to it. Again, keep in mind there's 40-50 papers per PC member; anything that makes their work easier is a good thing. (And Shai's interface for choosing preferences lets you see a paper's abstract pop up with a mouse rollover, so it's really nice to have abstracts early!)
Assuming things continue to go well, expect e-mail from PC members asking for you to handle a subreview before you head off for Thanksgiving....
Saturday, November 15, 2008
Conceivably, we could set up the reviews to have a score for each factor. For example, I'm on the PC for NSDI, a systems conference, and we have to give scores for Overall Merit, Technical Merit, Longevity (= how important will this work be over time), Novelty, and Writing (as if that score matters :) ). Personally, I don't like this, and I'm not intending to do it for STOC. It's more pain for me as a reviewer without I think giving meaningful information to the authors (instead of spending time trying to decide if a paper is a 2 or 3 in terms of novelty, let's give another comment in the review text!), and when it comes time to make the decisions, I'm not really sure what I'm supposed to be (Pareto-)optimizing in this multidimensional space.
I'm a big believer, for conferences, in the "simple" method, as I've said before -- papers just get a score from 1-5, under the following scheme:
- 1: Bottom 1/2 of submissions.
- 2: Top 1/2 but not top 1/3 of submissions.
- 3: Top 1/3 but not top 1/5 of submissions.
- 4: Top 1/5 but not top 1/10 of submissions.
- 5: Top 1/10 of submissions.
Monday, November 10, 2008
Little was given in way of specifics, but the general theme was clear. About 1/2 of Harvard's budget comes from the endowment payout each year. (The curse of a large endowment -- this number is probably higher than most institutions.) While nobody is giving a number for Harvard, the widely quoted statement from Moody's financial research is that endowments will lose about 30% this year. Given that Harvard has been outperforming the market and most other endowments over an extended time period, you can take your guess as to whether our losses will be above or below average. No matter how you do the math, it's not good.
So that means there will be some belt-tightening, and some delays in various plans. Again, very little in the way of specifics, but I'm sure (and Faust's letter suggested) that Harvard's new progressive financial aid program would not be touched. I imagine most everyone is getting similar messages at other institutions, but feel free to share your stories in the comments.
I've seen over 150 submissions so far, and it's not even close to midnight...
Friday, November 07, 2008
Tuesday, November 04, 2008
David Alderson also gave a very interesting talk on modeling the Internet topology using optimization methods, and using it to study the scale of the damage an adversary could do to the Internet. His point of view is very different (and I think much more compelling in the Internet setting) than the scale-free analysis popularized by for example Barabasi. I recommend checking out his papers (most of them co-authored with Doyle and Willinger, among others) on the theme.
Finally, I ended the day nicely by getting to talk with UCLA faculty Rafail Ostrovsky and Amit Sahai. Rafail expressed his usual enthusiasm insisting that we find a problem to all start working on -- so now I even have homework to keep me busy on the plane ride home!
The highlight for me for Monday was listening to Tim Roughgarden (who always gives excellent, crisp, clear talks) on comparing FIFO vs. FairShare queueing policies using the worst-case price of anarchy over all possible utility functions (and, unfortunately, only quadratic cost functions). This is work in progress, so I don't think Tim will have slides up, but it's a very nice analysis. [Spoiler: FairShare wins!]
David Clark gave a public lectre in the afternoon on the Internet -- what's wrong with it, and his ideas for fixing it. It got a big audience, and he was certainly interesting, amusing, and provocative. I've got to admit I have a lot of skepticism so far about how we get to a better Internet, one with security and designed with the economics of the creature in mind from the start instead of developing by accident. Which reminds me, what's the state of the NSF GENI project? I've been hearing rumors that it's headed to an early rest, but I'd be curious if anyone more in the know can comment anonymously.
Monday, November 03, 2008
In explaining why you are presenting simulation results, you say, "First we wish to check our theoretical analysis..." I don't understand this motivation. Your theoretical analysis is substantiated by mathematical proofs. What more evidence do you need of its validity?Please keep this statement in mind.
I've stated frequently that theorists should actually implement their algorithms. I have some further recent anecdotal evidence to back up this claim.
I have students working on a variety of projects, and recently, I've had a strange convergence: in several of the projects, the students have found what appear to be non-trivial errors in recent theory papers (2 in CS theory, 1 in EE theory). It's not clear that any of the results actually break -- indeed, I don't think any of them do. I don't want to exaggerate. In one of them, a constant factor seems to be messed up -- not disastrous, but it is an important constant factor in context. And perhaps in one of the papers we could chalk it up to really just a sequence of confusing typos rather than an actual error.
Now I'm not naive enough to expect conference papers (or even journal papers) without errors. But the key here is that these errors were either easily found and/or proven to me by the students by having them implement the algorithms described in the paper. Then they sort of jumped out.
Implementing your own algorithm is a good way of checking your work. If you aren't implementing your algorithm, arguably you're skipping a key step in checking your results. Why should a reviewer go to the trouble of checking your work carefully if you're not?
Moreover, imagine not some student but an actual real systems-builder who decides your algorithm is worth implementing for their system. Imagine how they feel when they find things don't quite work as you've stated. That doesn't exactly inspire confidence, and is the sort of thing that discourages someone from using your algorithm. More globally, it gives systems people the feeling that theory (and theorists) aren't important. Why should they be debugging your proofs? You should give them evidence your algorithm can be implemented and works as claimed by actually implementing it if possible or reasonable to do so.
I'm not stating that you personally have to implement your algorithm. Hire a student to do it! It's good experience for them. You can apply for an REU, or other student research funding.
So, to answer the anonymous reviewer, I think there's always plenty of good reason to check our mathematical proofs by experiment and impelementation, and it's suitable to put those results in the journal paper as an aid to the reader (and reviewer!).
Tuesday, October 28, 2008
Guest post by Harry Lewis:
Harry Lewis publicly backed Darcy Burner's statement, and in fact is now featured in an ad explaining Darcy's academic qualifications. You can read the Crimson article here, and see Harry set the record straight here at Youtube, or here, and probably many other places. Thank you, Harry! No matter what party you favor, certainly everyone is entitled to the facts without the spin, and that is what Harry has provided.
Harry is naturally polite and reserved in his statements, but I'll be more clearly biased: any district should be thrilled to have a Congressperson with a computer science degree, and especially one with a degree from Harvard!
Friday, October 24, 2008
The piece was just so over-the-top negative, and blatantly factually wrong (it's hard to find a stated fact in the text that is actually correct), that it makes this season's political ads look good by comparison. So I took it upon myself to respond.
I suppose only my Harvard readers might care about this, but here's the editorial, and the text of my response is below. We'll see next week if the Crimson publishes it.
UPDATE: The Crimson did print my letter here. (It changed a few things, making it shorter and a bit more generic, but the spirit is there.) Also, since it seems from the comments that many people are simply uninformed, let me point to the Student Guide for the Ad Board; I'd encourage people with questions (or complaints) to read that first, as it gives a lot of detail about how the Ad Board works.
Bad Board, No; Bad Editorial, Yes
To the Crimson editorial staff:
Having finished a year-and-a-half of service as a faculty member of the Ad Board last June, I was shocked by the opinion “Bad Board” that appeared in the Crimson on October 22. First, it was simply riddled with factual errors. For example, contrary to your statement that resident deans are “outranked” by the faculty, in fact there are only two or three faculty members on the Ad Board at any time, and over a dozen resident deans; we all get equal votes. Even if we didn’t take the resident deans seriously as you suggest (and we do), they could simply outvote us. As another example, contrary to your statement, in disciplinary cases students are always allowed to present their side of the story, both in written form and by attending an Ad Board meeting, where students can make a statement and, if they choose, respond to questions. I could go on, but there are so many additional factual errors that it would take a letter much longer than the original editorial to go through them all.
Second, your editorial fundamentally misunderstands the Ad Board’s setup and purpose. You complain that students cannot hire an attorney for an Ad Board hearing. That is because the Ad Board is not a legal institution, like a court or the police, but an academic institution, to administrate the rules of the college. The Ad Board’s purpose is fundamentally education, not punishment. As you quote, the Ad Board is “primarily concerned for the educational and personal growth of undergraduates, both as individuals and as members of the Harvard community.” Sometimes, when a rule is broken, a punishment must be given, but we view that as a learning process for the student. Attorneys should not be a part of that learning process, much in the same way you can’t hire a lawyer to complain about or negotiate for a better grade in my class. (Please don’t try.)
Finally, your article includes what I would call errors not of fact but of spirit. You say “rulings are clear before it [the Ad Board] convenes”. That’s a surprise to me, as I regularly spent multiple hours in these meetings each week listening to and deliberating cases. Punishments are not, as you say, “one-size-fits-all”; we discuss the appropriate response for each case, based on the rules of the Faculty and the needs of the individual student. We take seriously both these rules and these needs, with the goal of best serving both the students that come before us and the larger Harvard community.
I understand that, as you say, going before the Ad Board is intimidating and terrifying for a student. They are generally there because there is an accusation that they have broken a rule of the College, and there may be consequences. I know of no system we could possibly set up where that wouldn’t be intimidating and terrifying. But students should know going in that the Ad Board will listen to them, fairly, and that no punishment is given lightly.
Michael Mitzenmacher '91
Professor of Computer Science
Harvard School of Engineering and Applied Sciences
Thursday, October 23, 2008
In the theory community, we generally don't do this. PC members can't submit papers, apparently to avoid any conflict of interest. This is, from what I've seen, unusual, even for CS, which is already unusual in the competitiveness and importance of conferences. Most networking conferences, for instance, allow PC members to submit.
Does conflict of interest really exist in these situations? I don't think so; generally, from what I've seen, it gets handled, and handled appropriately. People on the PC realize it's a competitive process, and understand when their papers aren't accepted. Often when PC members can contribute papers the process is double-blind, so nobody "knows" the PC-authored papers. While I understand allowing PC-authored papers is a risk, I don't think it's much of one. As a comparison point, theory conferences almost never use double-blind reviewing, and I think just the name of a "prestige author" on the paper has a significant effect on the odds of acceptance.
What's the upside? The biggest is that it makes it easier to have larger PCs. I have 20 papers to read for NSDI. I can't remember the last time I had less than 40 papers to review as a PC member for a significant theory conference.
Tuesday, October 21, 2008
Undoubtedly the people happiest with the settlement are the jury. They were going to have to sit through several weeks of trial, and while I'm sure they were all eager and thrilled to fulfill their civic duty, I'm sure they all have their own to-do lists to deal with.
Friday, October 17, 2008
I'm on the NSDI program committee, and we recently got our review assignments. There's an unfortunate overlap between this reviewing cycle and the STOC deadline/paper assignments but I'm hoping that won't be too problematic. (I got asked to be on NSDI before being asked to do STOC. And Jennifer Rexford asked, and as I'm a fan of Jennifer Rexford's work, it pushed me to say yes.) In other PC news, I was asked to be on the SIGCOMM 2009 committee, and again, just couldn't find a way to say no. I'll be finding it easier to turn down PCs the rest of the year...
A case I'm serving as an expert witness on is now in trial. That will take up significant time over the next few weeks. On the positive side, the trial is here in Boston, so I'm not traveling somewhere for it.
I'm scheduled to be at an IPAM workshop in a few weeks, and still have to prepare the talk.
Those NSF deadlines are looming large.
A few papers are in various stages in the journal process (either getting ready for submission, or revising based on reviewer comments). I'd like to get those off the stack.
Overall, it's all good and fun. Well, maybe not the NSF proposal(s), those are always hard to describe as fun. But it is busy. So I think blog posting will be more sporadic than usual.
Guest posters are always welcome....
Monday, October 13, 2008
While papers are due November 17, you must submit an abstract by November 10. Please pass the word.
For those of you with an interest in such things, we're using Shai Halevi's conference software. I was going to go with easychair, but the one major complaint (that I'm very sympathetic to) was that they didn't allow as a default the use of scorecards for offline reviewing. (They said they could do something for us, but we'd have to pay for it.) Shai also very generously offered to help me out with everything, so we all owe him thanks. (But especially me.)
If you notice anything not working right with the site, please let me know right away.
Saturday, October 11, 2008
Wednesday, October 08, 2008
Find time to find a girlfriend.Now, I'm not claiming to be most sensitive Neanderthal in the cave, but I am the father of three daughters, and this raised my hackles. The small number of women in the audience didn't seem to notice or mind, but maybe they did, or maybe one would the next time he gave the talk. As a field, this is exactly the type of throwaway comment that we have to, repeat, have to avoid.
I mentioned it to the speaker afterward, and to his credit, he was not defensive, and said he'd remove it or change "girlfriend" to the gender-neutral "significant other". (I'd vote for removal. Why even go with the stereotype computer scientists can't find love?)
Friday, October 03, 2008
Thursday, October 02, 2008
Still, the last reviews for a grant we didn't get contained just the stupidest comment, I really have to share it, because it just frightens me. I'm used to reviews I don't agree with -- the typical excuse not to fund a theory grant being, "The proposal doesn't explain how it will solve the proposed problems," while if I already knew how to solve the proposed problem, I'd have written the paper already -- but again, this goes beyond that. If this was just an excuse to not fund the proposal -- because the NSF for some reason never says "We only have money for the top 10% this year, and I'm afraid there are some better proposals," but instead has to have a reason -- that's fine, but I hope it's not a real reason.
This was a CDI proposal (so apologies to my co-authors, who do not necessarily share my opinions). The primary theme was mechanism design, but we focused on network formation and ad auctions as examples. One reviewer wrote:
[ad placement] is a very active research area for corporate research labs at places such as Yahoo and Google. Given the vast resources that are being invested at these corporate labs (that have hired top economists and computer scientists) and that have direct access to logs documenting advertiser and user behavior, it is unclear how much of a contribution an academic team can make here.One review might be forgivable. The panel summary listed the following as a weakness:
- There were also questions regarding how this willLet's ignore that the PIs all have relationships with industry, that ad auctions was just an example (of pretty wide interest right now), and that (I had thought) working on problems of interest to industry is, generally, a good thing.
compete with much larger-scale multidisciplinary
efforts (CS-economics) of similar kind in
industry (Google, Yahoo!, MS, etc.).
With this kind of thinking, there's no way a couple of graduate students from Stanford (or, more academically, a future well-known Cornell professor) should have been working on a silly thing like "search engine algorithms", since Altavista was already out there leading the way. (That's my #1 big example, fill in your own.)
Is "industry will do it better than you could" really a good reason not to pursue (or fund) a research topic? How many research areas would that really preclude? I'd assume we should also stop funding research in operating systems, compilers, and even computer security based on that comment, but oddly, I don't see a rush to cancel programs in those areas. Seriously, anonymous reviewer, if you actually meant that, congratulations, you've truly scared me about the future of NSF-sponsored research.
As an addendum, some comments from Harvard colleagues:
1. Where does the reviewer think the people who are going to go work for Google/Yahoo/Microsoft will be coming from?
2. This was the kind of thinking that led Harvard (a CS powerhouse post-WW II) to decide to drop computer science decades ago. IBM was doing it, no need to have a department doing research in the area. It took a while to recover from that decision....
3. That kind of comment is common for chip/architecture research. "Isn't Intel doing that already? How can you possibly contribute?" [I have increased empathy for architecture people.]
Wednesday, October 01, 2008
On the STOC 2009 side, some announcements:
1) Submission date is Nov 17, BUT abstracts will be due Nov 10 -- following the same sort of procedure used for SODA. Spread the word, more info on the Web site shortly.
2) Accept/reject decisions should be out February 6, in response to those who have asked me. (It seems like every other conference in the rough vicinity will have paper deadlines the following week!)
3) The day before the conference is shaping up to be an event, about which I'll hopefully have more news for shortly.
4) It's looking like EasyChair will be the submission/review system. The risks of self-hosting just seems too severe at this point...
Tuesday, September 30, 2008
Part of that might be that we have a new instructor, who is apparently young, dynamic, bringing a more modern perspective, and all those things that inspire students to take introductory courses. Another explanation is that Harvard last year introduced minors (called secondary fields), so now by taking 4 classes you can get a minor in computer science. A more timely explanation is that the financial meltdown is encouraging students to look at computer science and applied math this year (instead of the more commonly popular major of economics).
The gain is starting to translate into other courses. The intro theory course (automata/complexity) is up over 50% from last year.
Needless to say we're excited by this change, although still processing what it means and how to deal with it longer term. Obviously, we're hoping this will help us make the case for more hires...
Friday, September 26, 2008
[The discussion below is, admittedly, probably more for those at least familiar with our previous work...]
Moving an item after another item is deleted is difficult -- generally, you'd want to move some other item deeper in your Multi-Level Hash table into the now empty space higher up in your hash table, but how can you find what item to move? Actually, we'd had ideas on how to do this back when we were working on the insertion paper. But we didn't include it then, for several reasons. First, our ideas required additional space (as explained below). Second, we couldn't analyze them, and one of the points we were aiming for in that paper was to analyze schemes. Third, we didn't have space -- not in the hardware, but in the actual paper. Blame page limits (yes, even for the journal version).
Another recent INFOCOM paper that offered an idea of how to handle moves on deletions inspired me to have us write down our approach -- because I feel our approach is clearly better than that proposed by this other INFOCOM paper, as we discuss in our paper. Even if we can't prove what we'd like, we can still show it works well through simulation.
Our (obvious) idea is just to keep a hint at each hash table cell, if there was a collision there, of where the collided item was placed, so you can find it later on a delete. Hints take space, but not too much. If multiple collisions occur, you have to choose which possible hint to keep; we found most recent collision worked best. Moves on deletes work reasonably well, but the real gains naturally occur from combining moves on insertions and deletes. This gives pretty good performance -- you can get space utilizations more in line with (though of course still not as good as) cuckoo hashing, albeit with more hash functions -- but again, using just 1 move per insert/delete. More details in the paper.
Thursday, September 25, 2008
Part of it is I come most every year. So by now, I know my way around, which is a nice aspect of a conference being held at the same place every year. Got the early flight in, rented my car in about five minutes at the tiny but functional Willard airport in Urbana-Champaign, and had my badge about a half hour later. After a talk and catching up with some people I went to lunch at the Brown Bag, the crowded spot for people who don't like the conference lunch. Back to listen to more talks, and give my talks. After talks, more catching up with people, while they set up the open bar and snacks. (JeffE, forget beer at the business meetings, take a day -- or just an afternoon -- off and join us!) Then dinner with a very old friend (Steve Lumetta) and his family.
Saw interesting talks by Ramesh Johari, Balaji Prabhakar, Jean Walrand, Adam Wierman, and others. I talk with people about hashing, Bloom filters, deletion channels, sticky channels, floating codes, all manner of things. It's fun being at a conference where the usually disparate seeming parts of my research don't seem quite so disparate.
Sadly, a quick trip this year. I fly back early tomorrow.
Wednesday, September 24, 2008
Optical disks were a potential application. Suppose you have a k-bit value that is going to be written t times. How many write-once bits (or "wits") will you need to store it through the t changes?
Fast-forward a few decades, and similar models are coming back into vogue, because of flash memory. My understanding is flash memory uses floating cells; roughly, cells are organized into larger blocks of n cells, and each cell can hold a number of electrons from 0 to q-1. It's easy to add electrons to cells, but not to remove them; to do that, you have to reset the whole block back to 0's first, which is both expensive and wears out the memory, which has a limited lifetime of resets. Suppose you want to store k ell-ary variable values in a block. How many rewrites before you need to reset? Note that a "code" in this setting consists of two things: a mapping from cell states (the state of the entire block) to variable states, and a transition function giving how cell states change when variables change. For more of an introduction, see this paper by Jiang, Bohossian, and Bruck.
In our paper (pdf,talk slides) for Allerton, we introduce (as far as I know) the question of considering floating codes in the "average case"-- the previous work was considering the worst-case number of rewrites before a reset. Given the lifetimes available to flash memory, and the potential to model system behavior before productions, we think analysis of average case performance makes sense here. For our notion of "average case", we assume that the variable values follow a Markov chain, and the goal is to maximize the long-term average time between resets. Given a code, the Markov chain on the variable states induces a Markov chain on the cell states. So the question becomes how to design a code to maximize time between resets on this Markov chain.
Following in the footsteps of the JBB paper, we find some initial interesting results, and leave a lot of open questions -- including the complexity of computing optimal codes for general versions of the problem. I think is yet another potentially interesting coding area where CS ideas should be able to play a strong role. And while I'm not (yet) an expert on the practical implications of these theoretical coding models, the connection to flash memory seems promising.
Monday, September 22, 2008
There was a panel discussion on interdisciplinary research (a theme of the new lab), and a natural question was how such research should be encouraged. Nobody on the panel seemed to make what I thought was an obvious point: one way to encourage such research is to make sure people across the sciences are well trained in mathematics and (theoretical) computer science. Interdisciplinary research depends on finding a common language, which historically has been mathematics, but these days more and more involves algorithms, complexity, and programming.
Luckily, to make the point for me, Erik Demaine next gave a talk entitled "(Theoretical) Computer Science is Everywhere", whipping through examples in arts, business, society, games, other sciences, and so on. It was a really inspiring talk, one that should be given to incoming freshman in the field, or better yet, to high school students who don't know what computer science is about. The talk was taped and hopefully I'll have a link to it soon. Afterwards, as several of us were talking about it, there were some minor issues raised. I myself brought up my common concern that perhaps more theorists should take a more direct role in promoting the practice of CS theory, including in other fields. My colleague Greg Morrisett questioned whether Erik couldn't have replaced "Theoretical Computer Science" with "Applied Math" and given pretty much the same talk. I think it's an interesting point, although I do think TCS's emphasis on complexity and algorithmic thinking does give it a distinct perspective.
Another talk I enjoyed (more than I thought I would from the title) was "Understanding Socio-Technical Phenomena in a Web2.0 Era" by danah boyd, who will be starting at MSR NE this January. She is an ethnographer who studies how adolescents interface with technology. So, to summarize, the talk was about why kids (at least in the US) spend all their time on MySpace and Facebook, and what it all means. It was very well presented (similar in style to Lawrence Lessig, who gives fantastic talks), and perhaps my interest stems from now having three kids; getting a perspective on how adolescents use online spaces to interact and communicate (as well as rebel and establish their own identitites) was quite enlightening.
So while they've been around for most of the summer, it's nice to have an official welcome for the new neighbors!
UPDATE: Link for talks.
Saturday, September 20, 2008
I wasn't sure if I was going to go to Allerton this year. While I enjoy the conference, I have had growing concerns that because many talks/papers are invited and their proceedings aren't widely distributed that the papers I present there aren't as noticed as they would be otherwise. They've alleviated that concern somewhat by having the papers now available on the IEEE online system. At least I know if anyone tries to find the paper, the official copy will be available somewhere (besides my home page). Hopefully the old Allerton proceedings will go online at some point too.
The other motivation for going to Allerton this year was that it forced me to get some things done. Between the faculty search last spring and the offspring this summer, it's been a bit difficult to complete some of my research tasks. There's nothing like promising to have a paper in on a certain date - a non-artificial deadline -- to get things to some sort of stable state.
I'll hopefully have posts up later this week about the new papers.
Wednesday, September 17, 2008
I'd like to offer a contrasting opinion, one I've stated before, but may be new to some. (And since the criticisms above are repeated cyclically, I don't see why I can't repeat my response.)
Having worked some in the area of heuristic algorithms, I've gained a healthy respect for the fundamental approaches of repeated refinement, parallelization of efforts to explore a search space, and the occasional bit of random wandering. Greedy-style heuristics don't get to good answers by big jumps, but a multitude of small steps. Parallelization, leveraging the power of crowds, greatly speeds up such heuristics, and frequent communication between the agents working in parallel helps move everything along. And most good heuristic algorithms require a bit of a random (or explicit!) push to keep moving through the space of possibilities, to find new parts of the search space to yield better solutions than the already plumbed over areas.
The CS conference publication model shares these features. Yes, there are many more small steps than big jumps. Yes, there are times where less fruitful and interesting directions are explored. But the community as a whole moves rapidly, churning through ideas and, I think, rapidly focusing on promising ones. Also, new ideas and directions arise with, I think, remarkable frequency. Looking at any single conference, you might think over 90% of it is junk -- or, at least, no so important in the grand scheme of things. And you'd probably be right.** But that doesn't a priori mean the conference publishing system is broken, and any argument that it is based on such reasoning doesn't even start to convince me.
This doesn't mean that we are excused from having taste and opinions as to what constitutes good work. Indeed, without this, we'd lack an evaluation function to lead us in the right directions. And I'm not saying the the contrasting "mathematics" style of publishing only fuller, complete papers is necessarily worse -- I don't think I've seen evidence one way or another. (I might admit to having a personal preference.) But if you want to argue the point, you need to do more than just look at individual papers in individual conferences, and focus on the whole.
[** A favorite quip from grad school that I still remember was when a systems friend told me "95% of theory papers are crap!", and I simply responded, "So that means we're 4% better than systems." Good times.]
Thursday, September 11, 2008
I admit, I'm no longer just automatically throwing it away. The entire magazine seems more interesting since the editorial refreshening. But deserving nods to the work of theory community will certainly help hold my attention.
Tuesday, September 09, 2008
For both my own curiosity, and probably Claire Mathieu's, does anyone have any ideas or suggestions on how to schedule parallel sessions? It seems like a nasty problem, particularly since you have limited information on what conflicts are, other than very general categories for papers.
Wednesday, September 03, 2008
Surprisingly, it seems to give a number of people significant grief every year. I know that for C and Java it's a pain -- the natural thing to do is to download and install a BigNumber library, which should be easy, but often some trouble arises. (And most students, even though they've had the intro programming courses, do not seem to have learned how to do useful things like download and run useful libraries.) There are other programming languages which handle big numbers essentially transparently -- ah, the days of programming in Lisp -- which are fine for the numerical exercise, but may not be as nice for the larger programming assignment.
My only real point here is that, in those lists of skills I would expect CS majors to have, I'd have to put "being able to do computations with very large numbers" somewhere on the list. Yes, you're unlikely to need it for your standard checkbook program, but it seems to me this issue arises in plenty of places. (It has for me -- and my graduate students -- several times.) And it's not so important that they know how to do this specific task off the top of their head, but more that when they're given a task like this, they are able to find the information they need and complete it in a timely fashion. For better or worse, that's something some of them get out of my class.
Friday, August 29, 2008
There's still time for feedback for the final version; please mail me if you have any constructive suggestions.
Monday, August 25, 2008
I visited Google and gave the CAM talk there also, where it also seemed to find a receptive crowd. (Some challenging questions arose as to whether cuckoo hashing is the right approach for hashing in software, as opposed to hardware.) Visiting Google is now like visiting a college campus, or maybe a small city. I was greeted at the parking lot by someone who directed me to valet parking. (The valet didn't seem to be there, though, so I parked myself.) I passed the volleyball courts, cafes and other places to eat, and that dinosaur on the way in; saw the gym, laundry room, doctor's office, massage area, and many many coffee-soft drink-snack bar areas; and ate at the amazing cafeteria. (The sushi line was too long, so I had to skip it. However, they did have an entire freezer full of It's It ice cream sandwiches, packaged specially with the Google logo. It's It alone is worth coming to they Bay Area for.)
I find myself without envy for the Google campus and its well-publicized perks. My limited impression was that it's too crowded and busy for me; it seems like it would be a hard place for me to concentrate get work done. I'd surely balloon up in size surrounded by open larders of food, even with the gym. I suppose I'm now just too old to enjoy the place properly, though I imagine it's fantastic for recent graduates!
The last stop on my tour was Yahoo! Research. It's always great to catch up with my "mentor" Andrei Broder, who these days always seems to be running at 110% or more. Their research group (like Google's) seems focused on Web-related algorithmics, machine learning, and this new subfield of computational advertising (I believe Andrei coined the term, in any case I like it). I talked with people about some compression-related problems, and perhaps something further will come of that.
As usual, I find myself wishing these trips could last longer. There's always too much to do and too many people to see on these visits, although that's what makes the trip interesting and fun.
Saturday, August 23, 2008
The first stop on my trip was actually a visit to Microsoft Silicon Valley Research Lab (MSRSV). I still haven't figured out how to get research money from Microsoft, but MSRSV "started" when a lot of my colleagues at what had been DEC/Compaq/HP Systems Research Center moved en masse to Microsoft, so I have historical ties and recent collaborations with people there as well. Since my visit last year, MSRSV has moved into a very nice new building. Lots of open spaces and whiteboards everywhere. It seems wonderfully set up for group collaborations. (One very nice space for group-work, though, is a bit too close to a loud and frequently used coffee machine for my taste...) Besides catching up with everybody, Udi Wieder and others indulged me by talking about some of the many variations of trace reconstruction problems that are still open. Hopefully we'll get somewhere on some of them.
Cisco is a huge sprawling collection of buildings, and the visit there itself similarly felt chaotic. They asked me to give two talks, which caused me a bit of stress the week before the trip as I reworked some slides. I ended up talking about my work with Salil Vadhan on Why Simple Hash Functions Work (talk, paper, blog post), and gave a mini-survey covering my work with Adam Kirsch (and part with Udi Wieder) on how to use CAMs to improve cuckoo hashing (talk, various papers on my papers page). [Actually, I have a new survey article covering this stuff I'll put up shortly.] Cisco still seems very, very generally interested in hashing, and applications of hashing in network measurement and monitoring in particular. I had about 40-50 people show up for the first talk, and the second mini-survey talk was broadcast and recorded for Cisco -- about 50 people showed, and apparently more than that were also listening remotely. (Just like when I teach, and my class is taped...) They have a pretty elaborate setup for these recorded technical talks, with a room set up for guests like Steve Wozniak (who was there a couple of weeks ago) rather than me. Besides giving talks there were a lot of high-level discussions about things going on with Cisco where I might be able to collaborate usefully with them.
One thing I noticed at Cisco was a much larger number of women than usual at my talks. Perhaps EE is turning out more female graduates than CS recently, or it's somehow reflective of Cisco's hiring practices.
Visiting Cisco is always very exciting. They're a lot more short-term focused than research labs, but there is this wonderful sense that what you're talking about could become a part of the fundamental network architecture. They keep me away from details, but multiple-choice hash tables and Bloom filters seem to be standard tools in their arsenal now. I'm hoping some form of cuckoo hashing might be as well someday.
Friday, August 22, 2008
Wednesday, August 20, 2008
That being said, I'll express two concerns:
1) It's odd to see so much money for theory concentrated into such a small geographic area. I realize that was the nature of the Expeditions program, and I don't fault the proposal for it. It just strikes me as strange when the general budget for CS theory is so small to earmark such a large sum of money to this project. It feels like an over-concentration of resources in what's already a small community.
The solution to this, of course, is to get more money for the general CS theory program. And I'm sure a significant chunk of the Expeditions money will go to open DIMACS-style collaborations like workshops and other events, minimizing this concern.
2) I know it's just the nature of theory, but reading over the blurbs about the various funded Expeditions proposals, I can't help but notice that while the others seem to have some sort of statement of clear goals to take things in new directions ("hope to create a new field of computational sustainability", "It aims to create an "open" alternative to mobile ubiquitous computing and communication that can spur innovations, which will have a dramatic impact on the choices users will have in the way their data and information is computed, stored and communicated", "The project aims to develop tools and theories for molecular programming--such as programming languages and compilers--that will enable systematic design and implementation of technological and biotechnological applications that require information processing and decision-making to be embedded within and carried out by chemical processes."), the complexity grant will "hope to better understand the boundary between the tractable and the intractable" and "attack some of the deepest and hardest problems in computer science". Doesn't that sound, I don't know, just like business as usual? My concern is that it's probably important to the theory community long-term for this Expedition to have some major concrete success attributed to it at the end of the day. I have no doubt that good things will come out of this, just based on the people, who already do good work -- but will the output be the sort of things that in retrospect justify this investment?
Tuesday, August 19, 2008
Well, FSP has re-worked past blog entries into an on-line book available at lulu.com. I haven't yet bought and downloaded it yet, but from the Table of Contents, it appears to be a particularly worthwhile book for graduate students thinking about a life in academia, and for new faculty. The bulk of the book seems gender-neutral, if that's a concern. I thought I'd give it a free plug.
Sunday, August 17, 2008
There's a paper on analyzing BitTorrent in the game-theoretic incentive-style analysis sense. It will require a more careful reading from me, as I'm not a full-fledged game-theory/CS type researcher, but it sure looks interesting on the first perusal. I'm naturally biased to the idea that if all this current effort on game theory that is going on in computer science (and particularly in theory) is to have payoff, real-world protocols must be considered and analyzed. So in that sense, this should be a really interesting paper.
While it doesn't appear particularly theoretical (it looks like what I like to joke is a standard networking paper -- lots of pictures and tables, no equations...) this paper on spamming botnets from Microsoft includes Rina Panigrahy (well known for his work in both theory and practice) as one of the co-authors. (I figure Rina had something to do with where I saw the words "entropy reduction", but that's just a guess...)
Saturday, August 16, 2008
I notice they used a Bloom filter in the paper without even giving a citation. Have Bloom filters now become so successfully widespread in the networking community that no citation is needed? What a nice thought! (Or maybe the authors just ran out of space for the citation.)
Another SIGCOMM paper continues on the path set out by for example Feigenbaum, Papadimitriou, Sami, and Shenker, on using game theory to study the behavior of BGP. They propose a more realistic model (where, for example, Autonomous Systems can be paid for attracting traffic) which, naturally, leads to more negative results in terms of the truth-telling behavior of ASes. (Why is reality so often disappointing this way?)
Friday, August 15, 2008
Before getting into papers, I thought I'd mention that Don Towsley is being given the ACM SIGCOMM award. This is a great choice, and well deserved. And relevant to this site's audience, Don is, in my mind, primarily a theorist. Not a FOCS/STOC theorist to be sure, but a theorist nonetheless. As the award announcement states:
Towsley, who is Distinguished Professor of Computer Science, has made innovative and pioneering contributions in developing foundational modeling and analysis techniques that have enabled a better understanding of some of the most important aspects of today's computer networks, network protocols and networked applications.Modeling, analysis, understanding... that's what theory is all about. It's people like Don that made networking an open and understanding place for people like me. Thanks! And hooray for Don!
Now for papers. As before, I'll give brief synopses (at the level of the posted abstracts :) ), as I'm just looking at these papers on the fly. The network coding crowd has attacked again with the MIXIT system, which seems to throw together a bunch of ideas in a clever fashion to improve performance on wireless mesh networks. Recall that the basic working definition of network coding is that intermediate nodes do more than store and forward, they can process the packets as they come through (creating encoded packet variations). Here, the basic unit is not taken to be a packet, but a symbol (a small collection of bits), with symbols being packed into a packet. This allows nodes can "take apart" packets; if a whole packet doesn't come in error-free, the node can take symbols that appear to be right with high enough probability (based on information from the physical layer), and re-package (via linear combinations, a la "standard" network coding) and send on only those symbols. Because erroneous symbols might get through, an end-to-end error-correcting rateless code is also used. All of this appears to improve throughput.
The paper seems interesting -- another proof-of-concept paper for network coding in wireless systems, which is where I suspect network coding will be able to make the most inroads over the next few years. I can't tell yet how practical this really seems (without a more detailed reading), but the idea of taking apart packets and sending only the good pieces in combination with multiple coding techniques seems quite nice.
As an aside, the pdf for this paper seems to contain a picture or something that crashes my poor Mac around the 8th or 9th page. Help!
Sunday, August 10, 2008
Of course, Harvard isn't the only institution in Cambridge where students can obtain skills in the security area. Some MIT students, working under the famous Ron Rivest (the R of RSA!), figured out several flaws with the new ticket system for the Boston subway system, including ways to rewrite tickets so that they have lots of money available on them. So, naturally, the subway system sued to keep them from talking about the flaws at a security conference.
In both cases, the systems seem easily breakable (well, at the least the Harvard IDs were easy, not sure about the subway) with a card writer that can be obtained for a couple hundred bucks.
Of course, I'm not surprised, based on previous experience.
I wonder when organizations that want secure cards will realize that perhaps they ought to ask the students to try to break the system before they deploy it, rather than wait for them to break it after.
Wednesday, August 06, 2008
I'm always amazed at how averse theory people seem to be to doing simulations. I find them useful for generating ideas and thinking about problems in the early stages -- cutting off wrong directions and giving insight into the right ones. If you don't like doing simulations for such purposes, because it doesn't work for you, or you're clever enough to not need data, I have no issue with that -- people work differently.
But I also use simulations as a way of checking my work. If I have a theorem that says that a random process will behave a certain way, and it's possible to code a simulation of the process, I'll check my theorem with code. If the theory and the code don't match up, my assumption is that something is wrong somewhere, and the result is not ready until the two match or I know why they don't. Surprisingly, I think it's about 50-50 as to which I end up finding is wrong, the code or the theorem. (By the same token, if I don't have a theorem, and there's more than one way to simulate a process, I'll code multiple simulations, and make sure they match!)
Of course not all results can be checked by coding something up -- but many can. Particularly in the study of random processes, which is my area. And it's clear to me that many researchers don't check by coding -- because I (or students working with me) have several times found mistakes by doing a simple implementation and finding that we get different numbers out than the paper gives. Generally the mistakes aren't "fatal" -- usually a constant is off somewhere, and often eventually the O notation will take care of it -- but of course it is grating as a reader when something in a paper is plain wrong and you're left to figure out why. When someone doesn't do a check-by-code, I must admit, it lowers my trust of and my overall opinion of the person's work. Sure, people make mistakes (myself included) -- but if you're ignoring a straightforward approach for checking your work, that doesn't inspire confidence.
I imagine some theory people are so out of practice coding they "can't" do a simulation. (But hey, that's not really an excuse, that's what grad students are for...) And others probably just consider it a waste of time. If you really are good enough not to need to check your work this way, more power to you. Me, I'll get back to the (admitted drudgery of) coding my things up and seeing if they work the way I think they should....