Tuesday, March 30, 2010

Two Inspiring Posts

UPDATE:  Make that three!

Inspiring me to blog, that is....

Matt blogs about the psychology of program committees.  It's one of those things that would be funny except that it's true.  So true.  So, so true...

One of the commenters raises an interesting option.  Every PC member gets one "trump card" to decide to accept a paper unilaterally.  (I suppose one could also allow a trump card to be used to reject a paper unilaterally.  But then what happens if two people play opposing trump cards?)  What do you think?  I like the idea in principle -- if a PC member has such a strong opinion on the paper, it should be accepted -- but I think in practice it's an idea rife with complications.  Gamesmanship in the committee about when/how to use the trump card and potential abuses from conflicts of interest come to mind.  Might you be inclined to use a trump card rather than "go to waste", even if you didn't feel quite so passionately about a paper?  I think if everyone used it in the manner it was intended, it could be a great idea.  On the other hand, if PCs were perfect, we wouldn't have posts like Matt's (and might not need the trump idea in the first place).

Lance complains about traveling too much.  Some academics I know would laugh at the idea that hitting 50K miles gets you anywhere close to being a too-frequent traveler, but (like Lance) that's pretty much my threshold too -- it's roughly a flight across the country or to Europe each month.  Add in some other shorter travel, and it really does up.

At this point, I avoid trips that keep me away overnight if at all possible.  I've turned down a few colloquium talks this year because of this problem.  If we can work out a travel schedule based on my taking an early flight from Boston and a late flight back the same day, I'll definitely try to make it work.  The kids/family can manage school/dinner without me for a day.  (There's also my class to consider, of course, during the semester, but let's temporarily leave that aside.)  I'll take a late flight out the night before sometimes if needed.  But if I have to miss two dinners (or two morning walks to school) it's much, much less likely I'll do the travel.  It just adds up quickly to too much time away.  (Usually the way flights work out you're stuck leaving early the afternoon before, killing the day, or getting back mid-afternoon on the way back, killing the day.  2 days is a lot of time to devote for giving a talk.  And if I'm missing two days in a row, I'm definitely missing a class during the semester...)

Conferences obviously are different -- but it's rare I go to a conference where I don't have a paper, for essentially the same reason.  I do try to arrange longer trips, where I can bring the family -- I arrange a 1-2 week west coast Bay Area tour most every summer and try to give talks at all the major places I can manage -- but it's a difficult balancing act.  I suppose I'll want to travel more again once the kids grow up and leave the house.  I hope I'll still have work worth traveling to talk about then!

A third inspiring post by way of MuthuTim Roughgarden has won the Grace Murray Hopper Award, and Bellare and Rogaway have won the Paris Kanellakis Theory and Practice Award.  Congratulations all around!




     


Wednesday, March 24, 2010

Reading a Research Paper

In the category of "waste nothing", here's a handout I wrote way back when (over a decade ago) that I use for my standard graduate seminar class on How to Read a Research Paper.  Others have found it useful enough to use it (with permission) rather than construct their own.  If you like it, feel free to use it, and modify it as you like.  I even have the Latex around;  just ping me if you want it. 

There are other pages and places with more advice on the topic, so you might want to look around.  Searching on phrases like How to Read a Research Paper (and other variants, like How to Read a Technical Paper) will yield plenty of pages.  I like the "What to Read" section at the bottom of this page from Jason Eisner -- creative Web search and forward/backtracking references are, I think, often underappreciated skills.

Thursday, March 18, 2010

Google-Viacom Documents

I don't blog about my work as an expert witness, but for those who are interested in such things, since I (currently) have nothing to do with the case, I can happily point you to the court documents now out in public for the Google-Viacom case.  You can find pointers to them here.  Both sides are moving for summary judgment;  if you haven't heard of that before, Wikipedia describes it.  I haven't read them over yet, so I don't have any opinion to offer.   

Tuesday, March 16, 2010

More on SIGCOMM

The first round of SIGCOMM is pretty much done.  (As usual, many reviews are still out, though the deadline has passed.)  I had mentioned earlier that my first round papers, in general, seemed pretty terrible.  My colleagues agreed.  This year the ratings scale was 1-10 (which I dislike over the standard 5 point scale from previous years), and I had five papers that didn't get a score higher than a 3.  (A score of 3 is "Reject";  scores of 1 and 2 are below reject.  A score of 5 is still only a borderline reject, for comparison.)  In some cases, I gave a 3 and was the high score.  I did (eventually) read a couple of good papers that may well be accepted.  Hopefully, I'll see better papers in round 2.  

Matt Welsh, perhaps at least partially inspired by being on the SIGCOMM PC as well, has suggested an approach for dealing with the large quantity of awful submissions by charging authors to submit;  Suresh disagrees.

Interestingly, the scores (and reviews) on my papers were generally consistent across the line, with a rare exception or two.  Since there's always some overenthusiastic anonymous commenter who thinks its important to call SIGCOMM an insider's club whenever I blog about it, I'll repeat that I'm not aware of belonging to any such club, and once again, the reviews I see from others not only make sense, they match my own opinions, which I view as independent, to a striking degree.

Monday, March 15, 2010

Kindle Textbooks

I've had a couple of people point out to me that Probability and Computing: Randomized Algorithms and Probabilistic Analysis (my book) is now available on Kindle.  I looked around and saw that several other standard texts are now available that way as well:  Algorithm Design, Algorithms, Approximation Algorithms, Computational Complexity: A Modern Approach, and Introduction to the Theory of Computation.  (Even the omnibus The Princeton Companion to Mathematics is available on Kindle.)  Strangely, several other texts apparently aren't:  Introduction to Algorithms, Third Edition, Randomized Algorithms, Algorithmic Game Theory, and Concentration of Measure for the Analysis of Randomized Algorithms.  While I can understand that older texts might not be easily moved to a Kindle format, the Concentration of Measure book is new, so I'm not sure what it is that is separating Kindle-ized books from unKindled peers.  Anyone out there have any insights?

I suppose I'll see in my next book sales statement if any Kindle copies were sold.  In general, the Kindle version seem to go for just a few dollars less;  Algorithm Design is a big exception, with the Kindle edition going for over $25. less than the hardback counterpart.  Sometimes, it looks like you can buy new copies of the book less than the Kindle price (usually from Amazon third party dealers).  I don't own a Kindle (yet), and I wonder how they would be for textbooks.  The textbooks I've listed above I'm happy to have on my shelves for reference.  I suppose if I had them in a universal, pdf-like format that I could access and make use of essentially anywhere, I'd be happy with that too.  Particularly if I had capabilities like search available.  But I wouldn't want my copies of the book tied to a particular piece of hardware.  If I lose my Kindle, do I lose my books?  That's fine for disposable books -- and probably for many students many textbooks fall into that category.  (Use them for a semester, then forget about them.)  It wouldn't be fine for me for these books.

Can anyone comment on the Kindle textbook experience?  I'm interested generally, and in the particular issue of the "permanence" of books that might be references one wants to keep.  Someday soon, I'll be getting one of these (or another e-reader), and it would be useful to know whether it's currently worth moving to a system where I try to keep important texts in an electronic, rather than paper, format.        

Saturday, March 13, 2010

And More Fun News....

Right after Stuart Shieber sent me news on CS undergrad earnings, Harry Lewis sent me two additional links.  The first is a nice Boston Globe piece on the increase on undergraduate majors in the sciences at Harvard.  

The second is much more amusing.  Apparently, the Harvard Robobees project has made #1 (that's right, we're number 1!) on Sean Hannity's list of the 102 worst ways the government is spending your tax dollars.  Now, I try to stay apolitical on this blog, but I have to say, I'm impressed by Sean Hannity's lack (well, actually, more like a complete absence) of acumen in understanding the nature of scientific research.  A look at the Robobees home page, I would think, would certainly suggest that there's important scientific and engineering questions underlying the long-term challenge of building a robotic bee.  Of course, maybe the Hannity camp just objects to the government spending money on science generally, I don't know.  I'll go on record as suggesting that nobody from the Hannity camp bothered to look at the Robobee home page.      

Good News for Computer Science Majors

Undergraduate computer science majors, according to the National Association of Colleges and Employers, are still getting paid well.  If you don't want to be an engineer for an oil or chemical products company, we're still apparently the best way to go.  (I'd like to think careers in CS are more interesting than in these areas as well, but I really don't have the experience to say.)

Thanks to Stuart Shieber for the link. 

Wednesday, March 10, 2010

A Conflict Question

I was recently asked the following question:

Suppose you're the PC chair, and someone who has submitted a paper asks you NOT to have the paper reviewed by a specific person on the PC. Do you honor that request?
It's an interesting question -- that I hope others will comment on -- though as a default my answer would be yes.  I certainly have had run-ins of sufficient severity with various people through the years that I would not want (and would likely ask) for them not to review my papers if the issue came up.  Looking at it from the other end, if I am on a PC and those people submit a paper, I make sure not to review them.  (Usually it is sufficient simply to rank them low on my list of desired papers, but I have also told PC chairs in advance I would not review certain papers if they seemed likely to head my way.)  It is not that I actually think I couldn't give a fair review;  it's that I think it's inappropriate, in such a situation, for me to give a review in the first place.  If as a PC member I have the right (actually, I would say, a responsibility) to refuse to review a paper under such circumstances, it seems fair that a submitter can ask for a specific PC member to not review a paper as well.

Context does matter, though.  In the networking conferences I have served on, this is standard -- PC members and submitters are expected to list their conflicts.  Indeed, one issue that seems to have arisen lately is that there is suspicion that some people submitting papers are abusing this right, listing people as conflicts when they are not because they are known to be "challenging" reviewers.  While I'm skeptical this sort of gamesmanship gains anything (challenging reviewers are usually calibrated appropriately at the PC meeting), it is a concern that once you open the door to such requests, you may need to make sure the privilege isn't abused.

For theory conferences, where many people seem painfully unclear on what "conflict of interest" even means, I'd grant such a request as a matter of course.  

Tuesday, March 09, 2010

Congratulations to Chuck Thacker

The news has hit the wires -- Chuck Thacker is this year's Turing Award winner.

I had the great pleasure of getting to know Chuck while I worked at DEC SRC.  He's a character, a tinkerer, and a great and curious mind.  I think recognizing his work -- the Alto -- is a great choice.

 

EC Papers Up

The list of accepted papers for EC is up.  I'm happy to say that our Swoopo paper made the list.

I was surprised in the acceptance letter to find that there were 45 acceptances out of 136 papers -- an acceptance rate of about 1/3.  (Compare with WSDM.)  This makes it one of the less "competitive" CS conferences I know of, although a little research shows this is a bit unusual -- last year the numbers were 40/160 or so, so they accepted more papers and had fewer submissions this year.  Is that a trend in the making or an accident of timing this year?  Also, while I'm an EC newbie, the list of papers looks very interesting, with plenty of top-tier names.  "Competitive" or not, I'm expecting high quality.  I'm really looking forward to it -- and not just because its location makes it remarkably convenient for those of us in the greater Boston area.      

Sunday, March 07, 2010

Carousel (NSDI Paper)

The "final version" for our NSDI paper, Carousel: Scalable Logging for Intrusion Prevention Systems, is now up.

Here's the main idea.  In IPS systems, the logger can get overwhelmed during an attack.  Bad sources (with other related info) need to be recorded for later examination, but there's only a small amount of on chip memory to buffer bad sources, and only a small about of bandwidth (compared to the rate data is coming into the system) from the memory to the more robust recording infrastructure.  How do you get all, or almost all, of the sources?

In our model, bad sources are hitting the system repeatedly -- the denial of service attack setting.  On the plus side, this means you don't have to log a bad source the first time it appears - the assumption is it will come back again.  On the negative side, you want to avoid sending the same source to the recorder multiple times, as it wastes your small bandwidth.  (Dups can be removed at the recording side, though.)

The baseline solution is to just grab a bad source to record for the buffer as soon as you have room after you send one out.  We consider a random model -- there's a bunch of bad sources, and the next one to appear is random from that set -- that shows that this approach is bad;  by connecting it to the coupon collector's problem, we show it can be a logarithmic factor off of optimal.  Other experiments show it can be even worse than this in realistic situations. 

Our solution has two parts.  We hash-and-partition the bad sources, adaptively finding a partition size that so that the bad sources in each partition fit into our small memory.  That is, we hash each source, and put in a partition according to the last k bits.  This breaks our N bad sources into groups of size (roughly) N/2^k, and if N/2^k is small enough so that the partition fits into memory, then (assuming we can avoid duplicates), we no longer have a memory problem.  We just run through all the partitions, giving us our "Carousel".   

To deal with duplicates, we do the obvious -- we use a Bloom filter (or equivalent structure) to avoid sending duplicates within each partition.

This hash-and-partition plus Bloom filter type framework seems like a potential generally useful trick;  more details -- including both theoretical analysis and experiments -- are in the paper.

An interesting thing came up in the reviews.  We pointed out that the "straw man" approach -- do nothing -- was quite bad.  We mentioned that just using a Bloom filter -- without partitioning -- wouldn't really help, but then ignored that option.  Apparently, this was a big concern for the reviewers;  my understanding is that it almost "sunk" the paper.  They wanted to see more details, including simulations, on this option.  Luckily, it was still accepted, and in the final version, in response to the reviews, we've added some theory and simulations to prove our point.  (The point is you'd need a really, really big Bloom filter -- too big for your memory -- to track all the sources, so eventually you have to clear the Bloom filter, which causes you to lose whatever it was going to gain you in the first place.)  I'm not sure what the takeaway is there.  The right outcome happened -- they should have accepted the paper, and we could add the appropriate stuff.  But I'd have hated for what in my mind is a minor issue to have killed the paper.  Perhaps we should have said more about it, although at some point space prevents you from providing details on every possible variation you could consider (even if you have actually considered them).    

This seems like a good place to remind people about this book -- Algorithms for Next Generation Networks -- which includes a survey about these sorts of hashing applications in networks by Adam Kirsch, George Varghese, and me.  The book does seem a little pricey;  we have a slightly older version of the survey available online.   

   


Friday, March 05, 2010

SIGCOMM papers

A commenter asked me a while back to say what I thought of the SIGCOMM submissions this year.

I'm finally getting around to reading and reviewing. (First round reviews aren't due for at least a week!) And so far, by and large, the papers I'm getting are pretty terrible.

This generally seems to happen to me on the first round, but this year is extreme. My first several papers just don't belong at this conference. (Arguably, they don't belong at any conference...) There's some number of papers submitted at every conference that are just not serious submissions, and apparently I got more than my fair share on the first round.

This makes it harder to judge the other papers -- it's hard to calibrate when you start with a lot of junk. Many of my other papers are theoretically oriented, and I'm not too optimistic about them. There's room for theory papers at SIGCOMM, but I think the bar is, rightly, pretty high. When I read a theoretical paper for SIGCOMM, I look for one of two things. First, it could be the paper has a nice theoretical idea that's actually useful. The problem there is that it's incumbent on the paper to clearly demonstrate the utility, and most fall down in that regard. I quote the SIGCOMM call: "SIGCOMM is a highly selective conference where full papers typically report novel results firmly substantiated by experimentation, simulation, or analysis." A mathematical analysis alone generally does not count as a firm substantiation. [Such papers generally have a better chance at INFOCOM -- which I think is a good, and very important, thing! There needs to be an outlet for more theoretical networking work, and perhaps it's just better suited for a big conference. Many such papers will end up having minimal impact, but once in a while, a good idea gets built on and has a real impact.]

Second, it could be the paper really challenges our way of thinking, introducing a new framework that seems a clearly important guide for future work. Such papers are rare, but important. I seem to have a number of economics-networking papers that are very high-level, and I'm really trying to understand if any of them have that character. Again, because SIGCOMM is so selective, I think the bar is very high for such papers. I'm really looking for something that enhances our fundamental understanding of the network.

That's it for now. If you have a submission, don't let my comments make you antsy -- the meeting is still a long way away, and I'm quite sure I'm not reading your paper anyway.

Wednesday, March 03, 2010

Congratulations to David Johnson, Knuth Prize Winner

I'm pleased to hear that David Johnson has won the Knuth Prize for "his contributions to theoretical and experimental analysis of algorithms."  David has done a great many wonderful things, but I thought I'd highlight something that I imagine is underappreciated today, his book Computers and Intractability: A Guide to the Theory of NP-Completeness.  We're a bit spoiled these days, what with Wikipedia pages with lists of NP-complete problems and online compendiums with references.  But in ye olden days, when I was a grad student, if you had a reduction you needed to ponder, you went to Garey and Johnson.  The book was an inspiration, and an invaluable research resource, to many.  It's one of the most cited (the most cited?) references in computer science, and the book will rightly hold a special place in the history of computer science. 

Tuesday, March 02, 2010

Teaching Bloom Filters

As I finished my lecture today for my undergraduate algorithms and data structures course, after spending about half an hour explaining Bloom filters and what they did, I couldn't help but wonder, yet again, why this incredibly useful, simple data structure, which also didactically demonstrates some very nice concepts like one-sided error and the ability to trade off speed/memory with correctness, isn't in the standard undergraduate textbooks?  (Of course, you can always buy this book....)

If you're teaching algorithms and data structures, do you students a favor, and sneak Bloom filters in one lecture. 

Monday, March 01, 2010

Stuff in press

The Economist has a special section this month devoted to The Data Deluge.  (Much of it may still be behind their firewall.)  Many computer scientists are quoted/named in the collection of articles, and it gives a nice overview of what's going on generally.  (It makes a horrible mistake, though, by saying that Google's original advance was counting the number of links to a page to score relevance;  that ignores PageRank, and Altavista...)

Bach, Chawla, and Umboh take our previous work on the hiring problem in new directions.  Or, even, new dimensions:  they also consider multidimensional variations of the problem.  It's definitely a general problem with plenty of variations to consider;  perhaps this paper will inspire further looks at the problem.