Tuesday, August 31, 2010

Blog Retrospective

I started blogging a little over 3 years ago, as something of an experiment.  Lance had given up blogging, and I had been a reasonably frequent and opinionated commenter on his blog, enough so that people often asked when I would start my own.  I hadn't planned on it, but Lance's stepping down (which, later, turned out to be temporary) felt like it had left a hole.  I hoped that I might provide, in my own way, a community forum for discussing issues, and a connection point for the areas I'm interested in -- algorithms (or theory more broadly), networking, and information theory.

In the end I'm not sure how well I achieved the various goals.  I don't feel this blog has ever become a strong authority (or hub, in Kleinberg's language) in the way that I might have liked.  Commenting has been sparse with infrequent spikes;  longer more detailed discussions seem rare.  Perhaps this is just hard to do -- people have, on the whole, better things to do with their time.  Or perhaps (probably?) it represents flaws in my posts.  Certainly one wish I think I had going in is that my posts would be more technical, but technical posts take a great deal of time, and are, quite frankly, hard.  I'm ever-impressed by what Dick Lipton is doing, in terms of technical depth, at his blog;  it's a wonder to me.

On the other hand, I'm amazed and pleased to find that people read this blog, and have enjoyed the "behind-the-scenes" look at life and work as a professor.  Everywhere I've gone in the last few years, there are people who tell me they've been reading it.  I've never implemented tools to tell the size of my readership, but anecdotally it must be larger than I think.  (The joy of low expectations.)  It's opened the doors to lots of interesting discussions about research, the state of computer science, what being a professor is like, and a whole range of various things.  And from what I can tell, it has given the different communities I was targeting a better idea of what each of them is like, in terms of culture and process.  That perhaps hasn't always been a good thing, but overall I'll view it as a success.

I didn't realize when I began how much blogging would raise my "visibility", but that seems to have been a pleasant side effect.  I'll admit, I'm glad to have been able to take advantage of that.  Perhaps I'll be invited to give fewer talks now that I'm giving up the blog.  Or fewer PCs.  Maybe at this point that's not all bad.  Or maybe stopping will force me to explore other positive ways of raising my visibility, perhaps by writing another book.   

Overall, I've had a great deal of fun blogging, and that alone has made it worthwhile.  Over the last several months, however, I've found blogging less enjoyable.  Some of that must just be fatigue;  I suppose I've been running out of things to write, making writing harder.  But also there have been fewer comments, and -- as discussed in this post over at the Complexity blog -- there has been much more of an unpleasant tone in (anonymous) comments (across many blogs) of late.  It's a sign for me that, as fun as this all has been, it's time for me to stop.  My new position has provided a good excuse, but I probably would have stopped anyway.

Perhaps blogging has just been the latest Internet fad -- perhaps our social networks can't support the number and diversity of blogs that we have, and our attention is now moving elsewhere.  (Like, back to work.)  I'd like to think not.  I think the latest P=NP? phenomenon is an excellent demonstration of the potential power and importance of blogs.  (Again, Dick Lipton's blog was a wonder.)  I hope that all the bloggers we have in our community keep going, that new bloggers come into the picture, and that we use blogging -- or whatever new tools come along -- to enhance communication within and across our communities.  As an example, I've spent some time the past few days looking around at the CS Theory StackExchange Q and A site, prompted by Suresh's posts.  I'm not quite sure what to make of it yet, but it's been fun to explore and seems to have interesting potential.

Thanks to all of you who have been reading, and especially to those of you who have been taking the time to provide thoughtful comments.  I wouldn't have continued for as long as I did without you.  I've enjoyed this experiment, and I'm gratified to think that some of you have enjoyed it to.  I'm sure I'll still be around, offering my opinion at other places.  And I hope when you see me around (physically), even though I'm not blogging, you'll consider trying to strike up a conversation with me;  I'm sure we can find things to talk about, and, without the blog, I'll be missing this type of conversation.

Saturday, August 28, 2010

Doing the Right Thing? (Quick Links Edition)

From Shots in the Dark, a pointer to a new "feature" -- apparently, there's not a tweet system recording and listing books checked out from Harvard libraries.  No, they're not putting names with it, just times.  But who thought this was a bright idea?  Seems like a clear potential privacy-violating nightmare with no upside that I can see.  I'll have to find out who to call on Monday to complain and spread the word to other profs...

From the Crimson, Marc Hauser will be teaching his classes in the Harvard Extension School this year.  Now, in some sense, this isn't a big deal;  Marc's on leave from FAS, and the Extension School is separate from FAS.  And trust me, he won't be getting any huge paycheck from the teaching;  while the Extension school pays its teachers (naturally), I'm sure it's a small fraction of Marc's Harvard salary (which he may or may not be getting;  I haven't heard confirmation one way or another whether he's on paid or unpaid leave).  Given that he's been heralded as a great teacher for a number of years, arguably, why shouldn't he teach?  But I admit, as someone who works with the Extension School, it's leaving me with an uncomfortable feeling that I'm still trying to process. 

Anyone have gossip to tell about why the Crypto 2010 proceedings were put online, but then taken down (apparently once the link got publicized)? 

Scott Aaronson answers some questions for MIT news about the P/NP proof.  I won't opine on whether his bet was a right thing or not (his own blog has had plenty of discussion on that) -- what's wrong with the article is that it has multiple links to Deolalikar's paper that are now non-functioning.  I understand that web-news links aren't going to be kept up to date in perpetuity, but you'd think for this fairly recent article and controversial topic someone might have updated accordingly.  One thing I wonder -- given the unusual amount of press that this proof attempt was given, and the current consensus that it's incorrect and recovery isn't possible, how many people are left with the misinformation that this very important problem was solved?

Wednesday, August 25, 2010

Conference/Journal Versions -- Transactions on Networking

I was recently asked to review a paper for Transactions on Networking, and noticed the following bit in the e-mail?

Please note that while this paper may have had a previous conference version, ToN does not mandate any specific differences between conference papers and their versions subsequently submitted to the journal.

Is this new???  Am I reading this right, that there's no mandated "30% new material", or some similar rule  It's been a while since I've submitted to ToN, but I seem to recall being explicitly asked by reviewers or editors from ToN before what "new material" there was in the paper over the conference version.  I'd be interested to know if this was an actual policy change -- it's one I've called for before, but didn't expect to see implemented anywhere.  

Just curious if anyone can share any insight....

Monday, August 23, 2010

Various Quick Pointers, Redux

There were many interesting things at the CRA Snowbird conference for CS chairs (which I missed...), but I haven't heard any blog-level discussion of their call to move up the schedule for hiring (as well as related changes in procedure).  Anyhow, lots of slides from various presentations.

UC Campuses are tops in Washington Monthly rankings, which are different in substantial ways from the US News and World Report rankings...

Still time to sign up for Harvard Extension School courses for this semester;  here's the list for computer science.  Including, for example, E-210.

Dick Lipton's future book, taken from his blog, appears on Amazon (you can pre-order now!).

Nature's take on Hauser's MonkeyBusiness opens with: "When news broke last week that famed Harvard University evolutionary psychologist Marc Hauser had been investigated for scientific misconduct, it was no surprise to many in the field. Rumours had been flying for three years, ever since university officials arrived to snatch computers from Hauser's laboratory at the start of the inquiry. By the time Harvard completed its investigation in January, the gossip had become standard cocktail-hour fare at conferences."
Maybe they're right -- I'm not in his field -- but I'd never heard any sort of rumor at Harvard.  I'm clearly not getting invited to the right cocktail hours.

Sunday, August 22, 2010

How's that New Job Treating You? Edition

I told myself I'd quit blogging when summer ended.  That's a bit over a week away, as classes start September 1.  Also nicely, from the count on the right, I'm nearing 500 blog posts.  Seems like a good stopping time.

I'm now not infrequently asked how the new "Area Dean" job is working out.  Just fine, thanks.  I figured I'd say a little more about it, and perhaps that will also explain why it's a good time to give blogging a rest;  I can't imagine people would want regular blog posts on this sort of stuff.

So far, the time commitment is about what I expected, but only because I was told to expect that it would take more time than I would expect.  There's a lot of meetings, e-mail, and writing.  Pleasantly, the time thus far has been spent on fairly worthwhile endeavors -- most of the time has been spent on hiring and promotion plans.  Since, really, managing those issues are the highest priorities of this job, that feels like time well spent.  Some time has been spent on letter-writing -- those CAREER, Sloan, and other fellowship letters get written by someone, and now that someone is often me.  Finally, some time has been spent as being "voice of the faculty" on certain issues.  For example, there are some non-trivial changes supposed to take place on our e-mail system, and unsurprisingly the CS faculty are more concerned than the average faculty member about this.  (A little knowledge is a dangerous thing...)  My job, where possible, is to be the consensus voice and contact point on faculty-administration issues like this.

Because I'm new -- and because it's summer and we're not having our regular faculty meetings -- there's been a lot of e-mail.  We're a consensus-oriented faculty in CS, so I want to represent the consensus.   I feel at this point it's important for me to check carefully with other faculty members before expressing a collective opinion (or even my own, since often it will be taken as the collective opinion).  Being new at the job, this means -- in my mind -- checking in with the faculty perhaps more than is truly necessary, both so I am secure that I am representing them accurately, and perhaps even more importantly, so that THEY'RE secure that I am representing them accurately.  I suspect after a few months, assuming that I've grown into the role and the faculty has developed a trust in how I perform the job, there will be less need for as many explicit checks on things.  (I suspect some faculty will just get tired of getting e-mail from me!)  On the other hand, maybe they'll appreciate this conservative style, even if it means they get e-mail pings on administrative issues more frequently.  We'll see.   

I expect further aspects of the job will reveal myself as the semester begins -- more committee meetings, more curricular issues to handle, more faculty concerns.  There are also some long-term initiatives that I expect CS to be at the center of that are just starting up but will require my attention.  (They're not ready to talk about yet.)  And, perhaps, I'll find myself involved in other activities like fund-raising.  (I may have to convince my Dean that, although my standard work wardrobe is a simple button-down shirt and jeans -- or a T-shirt and jeans over the summer -- I do own a few suits and ties and can be made to don them for appropriate occasions.) 

It is time-consuming, and it will, sadly, clearly eat into my research time.  I'm thinking about how best to handle that.  And when you're shafted with given a job like this, it really makes you appreciate your predecessors.  (I knew Greg Morrisett was doing a great job before, but now I really appreciate it.)

So far, though, it's all fine.  I hope to do some good in the position;  and I hope I end up being good at the position. 

Friday, August 20, 2010

RATS roundup

I didn't see every talk (my brother lives in the area, so I took a break to see family) but I did have a fun day at RATS.  There was a brief introduction by Chris Anderson of Wired/The Long Tail fame (on video -- I was disappointed he couldn't make it in person, I wanted to meet him), which was very interesting.  I was pleased that as he was talking he kept mentioning power law and lognormal distributions; I knew he mentioned my survey on his blog at one point, but I (and others, as expressed to me later) were still surprised he mentioned them together when discussing long tail issues.  That nicely set up my nice "survey talk" on lognormal/power law distributions.  This was followed by the excellent talk by Aaron Clauset on power-law distributions in empirical data, discussing the challenging issues of how do you determine, based on your data measurements, whether you're looking at something that seems to be following a power law or some other distribution.  (I'm asked this question a lot;  happily, I can just point people to Aaron's paper.)  Sharad Goel gave a fascinating talk on the implications of the long tail in marketing/web sales, arguing that the "value" for sites like Amazon in offering the "long tail" of items is NOT necessarily in the additional sales, but in the power of locking in customers.  (Since Amazon has "essentially everything", at a reasonable if not optimal price, why bother wasting time going anywhere else?)  Neel Sundaresan of eBay discussed insights form eBay data about the differing "shape" of different market segments, and the implications for assisting customers to find items in the large landscape that is eBay.  Silvio Lattanzi talked about implications of power laws in compressing social networks, and on models for affiliation networks.

The slides, apparently, should all be up at some point on the RATS webpage, or I'll update with an appropriate link.

MonkeyBusiness : Some Resolution

Wow.  After days of various speculation and reports from multiple new sources, Dean (Mike) Smith of the Harvard Faculty of Arts and Sciences has made an announcement regarding the investigation of Marc Hauser.  The opening paragraph is the key:

"No dean wants to see a member of the faculty found responsible for scientific misconduct, for such misconduct strikes at the core of our academic values. Thus, it is with great sadness that I confirm that Professor Marc Hauser was found solely responsible, after a thorough investigation by a faculty investigating committee, for eight instances of scientific misconduct under FAS standards. The investigation was governed by our long-standing policies on professional conduct and shaped by the regulations of federal funding agencies. After careful review of the investigating committee’s confidential report and opportunities for Professor Hauser to respond, I accepted the committee’s findings and immediately moved to fulfill our obligations to the funding agencies and scientific community and to impose appropriate sanctions."

Rather than reproduce the whole letter here, I can point you to Harvard Magazine, or Science.  Mike also discusses the Harvard process and the reason for confidentiality in such cases.  I'm glad to see this come out, and I can imagine the difficulties Mike had in deciding to produce such a letter.  (As I have stated previously in this blog, I have great respect for Mike Smith, who was in the office next to me before getting proverbially kicked upstairs, and I'm very happy that someone of his talents is serving as Dean of FAS).  On the other hand, it's a sad day for Harvard, and arguably science more generally.

Thursday, August 19, 2010

Various Quick Pointers

While it may not be news elsewhere, I'm certainly interested in the "local" case of Marc Hauser, the evolutionary psychologist at Harvard whose work has been "under review".  The latest interesting update appears at the Chronicle of Higher Education

As reported elsewhere, congrats to Dan Spielman for winning the Nevanlinna Prize.

I'll be at the ill-named RATS (Research and Analysis of Tail Phenomena Symposium) tomorrow reviving my introductory talks on power laws, lognormal distributions, and the importance of verification.  Stop  by and say hi!   

While I was away in the UK the Microsoft PR machine must have gone to work, and I've seen a few articles like this describing our work on password popularity.  I'm happy to see my name "in lights" a bit -- why not? --  I just think it's interesting that this is the paper that gets it there.  (I guess our coding work is also being touted a bit as part of Dan's Nevanlinna Prize, so that's "in the news" as well.)  

The Museum of Mathematics is getting more notice -- check out their web page

Friday, August 13, 2010

In Need of a Few Bad Papers

For my graduate class this semester, there's a lot of paper-reading, and I view learning how to critically and constructively read papers as part of the student goals for the class. 

A corollary of this, it seems to me, is that the class should include some bad papers, so students learn to recognize (and, if possible, get something out of) reading those.  So I need some really good examples of bad papers.  (In one of the areas of the class focus -- web search, compression. coding, streaming data structures...)

Now I should be clear about what I mean by bad papers.  I'm looking for something of a higher standard than an automatic journal reject -- I get at least one of those a month in my mailbox, and it's not clear there's much to learn from that.  I'm talking about papers that at least superficially look quite reasonable -- indeed, I'm expecting papers that have been published in reasonable places -- but when you think about them more, there are subtle (or not-so-subtle) flaws.  In theoretical papers, possibly it might be that the paper starts with a model that sounds really nice but it just clearly wrong for the problem being addressed.  For systems papers, it might be a paper where the experiments just don't match up to what ostensibly was being proposed.

[I had a nice example of a bad paper in earlier incarnations of the class, but I don't think it's aged well, and I've removed it.]

Maybe bad is even the wrong term for what I'm looking for.  Perhaps I should use a more neutral word, like "controversial" -- indeed, then I can get the students to take sides.  (Is the Faloutsos, Faloutsos, Faloutsos paper still considered controversial these days?  That could be a nice example, but it's not really on topic for the class.)  Or perhaps I just want papers that reached too high for their time -- noble failures.  The key is that, in my mind, just showing students examples of great papers doesn't seem didactically sound.  Negative examples are important for learning too (especially if they also show that great scientists don't always get it right).

Feel free to mail me rather than post a comment if you're afraid of offending anyone.  Naturally, mailing me links to my own papers will be taken with the appropriate humor. 

STOC tutorial online

Paul Oka asked me to announce that the STOC 2010 tutorials are now all online.  You can find them here.

Thursday, August 12, 2010

Monkey Business

I see Harvard's in the news yet again, as the Boston Globe broke a story about psychologist Marc Hauser, who is "taking a year-long leave after a lengthy internal investigation found evidence of scientific misconduct in his laboratory."  One paper has been retracted, others are under examination.  As discussed over in Shots in the Dark, an unpleasant issue is that Harvard is being silent regarding its investigation.  It's not clear to me what the right approach in such cases are -- what rights to privacy, if any, does an academic have in such situations, or, assuming improper behavior is found, is it incumbent on the institution to correct the scientific record itself?  The issue is also raised in a New York Times article.  Feel free to discuss the institutional ethics in the comments. 

I have no inside insight on what has actually transpired;  however, I have served on university committees with Marc in the past, and found him an enjoyable colleague.  I hope to the extent possible the issues are resolved satisfactorily.

This controversy provides an interesting contrast with the current hubbub over the P not equal NP paper -- best considered over at Richard Lipton's blog here, here, here, and here.  In theory we don't have to worry about people "forging" a proof in the way that experimental data might be forged, but proofs can easily have mistakes or unclear gaps, and this is not viewed as misconduct -- just embarrassing.  I wonder what the state is in computer systems work -- I can't recall hearing of cases where there were accusations of misconduct with data, although I've certainly heard mutterings that experiments in papers were carefully chosen to (excessively) highlight positive results.  Such cases can lead to heated discussions in PC meetings, and to some interesting discussions post-publication, but I haven't heard people suggest that that level of data manipulation corresponds to misconduct.  Our field seems to have, for now, sidestepped these particular issues. 

Wednesday, August 11, 2010

Other UK Adventures

While in the UK, I went out to some other places to give talks -- Liverpool and Cambridge.

At both places I gave my talk on our analysis of the auction site Swoopo, which seemed well received.  Of course it's a topic that can appeal to a wide, general audience and is just fun to think about, but the major credit has to go to our student Giorgos Zervas.  Not only did I swipe his excellent slides, I even shamelessly adopted some jokes from his presentation, and of course they got the biggest laughs.  Maybe I need to get him to prep all my talks.  (I also talked about some recent work on networking+hashing at UCL and Cambridge as well.  Slides are up at my talks page.)

At Liverpool I was hosted by Leslie Goldberg, and it was great fun to ask her questions about the UK system.  One issue that came up is a UK policy to use "short-term economic impact" as one of the bases for research funding decisions.  Leslie has rallied against the idea -- she has an interesting web page devoted to the issue with a host of opinions on why it's a bad idea.  We also discussed the RAE, the Research Assessment Exercise, where schools are scored and ranked based on their research output, and this affects their future government funding for research.  (Here's an article from the Guardian in 2008 when the last results came out.)  It's interesting that the NSF does not do something like this, but the link between university funding (apparently, even for research) and the government is perhaps more direct in the UK.  It's worth pointing out that Cambridge is at the top overall, and my host institution University College London was 5th in the latest rankings;  specifically for computer science, if my info is right, Cambridge is still 1st, UCL is still fifth, and Liverpool is 11th.

At Cambridge I was hosted by Jon Crowcroft at the computer lab and Peter Key of Microsoft.  The two building are right next to each other, far from the Cambridge center (about 2 miles).  They're just past Churchill college, where I spent almost a year after college, so I got to experience the waves of nostalgia as I walked by.  (I would have experienced it even more had I had a bike.)  More than nostalgia, I felt a twinge of jealousy -- when I was at Churchill XX years ago, it was far removed from everything.  Now it's pretty much at the center of the mathematical sciences complex and the computer science buildings, which have moved out to the outskirts (for space reasons, and so nice new modern buildings could be made for them).  Why couldn't they have had that when I was around?  Re-visiting Cambridge was also a blast -- it's just a lovely city.  Hey, come to think of it, why doesn't someone plan a major conference there (not hard to get to from London airports;  I'm pretty sure the main conference could be held at University/Microsoft lecture rooms;  hotels, though, are probably quite expensive).

Sunday, August 08, 2010

Papers to Teach This Year

This fall I'm again teaching my "introductory" graduate class loosely centered on the themes of big data and communications/networks, Algorithms at the End of the Wire (Computer Science 222).  [The real subtitle for the course should be "Things I'm Interested In."]  The subthemes include big graphs (PageRank, HITS, link prediction, etc.), compression, data streams/streaming algorithms, and coding.  Most of the class involves reading and discussing papers, and I try to have some fraction of them be current rather than historical.  Since it's been over a year since I last taught the course, and I'm lazy enough to wish to have other people do my work for me, I thought I'd ask for recommendations -- any good new papers to teach?

One aim of the class is to try to bridge the gap between theory and systems, so papers that fall into that area are highly desirable.  For example, this year's Best Paper at SIGCOMM, Efficient Error Estimating Coding: Feasibility and Applications, will surely be added in to this year's reading (assuming I can get a copy -- should be online at the conference site shortly).

Last year's web page is still up here, though I'll be aiming to take it down and update it this week.

Saturday, August 07, 2010

Back from Travels

The slowdown in posting for the past month has been primarily due to travel.  For the last month, I've been in England, based primarily at the Computer Science Department at the University College London.  Thanks to my host, Brad Karp, I got funding through a Royal Academy of Engineering Distinguished Visiting Fellowship -- that's a mouthful -- to spend time working with the networks research group at UCL (and bring my family).  UCL has a strong computer science group, and a particularly strong networking group, with Brad Karp, Mark Handley, Kyle Jamieson, Damon Wischik, and others.  If you're going to or through London -- a frequent stopover flight -- you should visit there, maybe give a talk.  It's a great place, you'll definitely enjoy it. 

Hopefully they'll be some interesting products from the visit in the future.  Overall, I had a productive time, and it was nice to break out of my routine and do something different for a month's time.  (They managed to put me up walking distance from the CS building -- for most of the time, I was a two minute walk away -- so just not driving for a month was a shake-up from my status quo!)   Of course London is a wonderful city, so evenings and weekends were filled with tourist adventures.  And I can't thank Brad enough -- besides being great fun to work with, he really helped set everything up so it all went smoothly.  (As everyone from the SIGCOMM 2009 PC meeting probably recalls, Brad takes hosting duties very seriously.)

I'll probably have another post about other parts of the trip.  Twos thing I found, though, were that I didn't miss blogging so much, and that being Area Dean is indeed going to suck up large chunks of my time (well, it already is).  So when classes start, my more permanent break begins...