Thursday, July 29, 2010

The 2050 Calculator

As regular blog-readers know, I'm a tremendous fan of David MacKay, who has gone from being a leader in the general area of Bayesian inference (author of Information Theory, Inference & Learning Algorithms) to the author of the popular book Sustainable Energy - Without the Hot Air and now Chief Scientific Advisor to the Department of Energy and Climate Change in the UK.

David recently showed me a fantastic tool he and his department have made available:  Essentially, it's a calculator tool that let's people determine levels of effort they'll put into creating various types of energy on the supply side, as well as effort into curbing the demand side, and figures out based on those inputs whether the resulting configuration will lead to Britain reaching its legally mandated 2050 greenhouse gas goals (as well as other related outputs), all with a pleasant user interface.  One could view it as a "game" with the player figuring out what policy decisions will have to be made to reach the desired target.  Further, it's all open source!  Here's a link to a description page, and a direct link to the calculator.  Try it and see...

I've already recommended it to my environmental engineering colleagues as a potential learning tool.  Moreover, since it's open source, one could imagine building projects on top of it -- David suggested that developing calculators for various countries (including the US), or providing enhanced user interfaces for various purposes could be interesting.  

Relating this back more directly to computer science, does anyone have pointers to similar interesting tools that might be useful for computer science classes?  It would seem one could imagine many such things in the networks economics space.  Luis von Ahn's work (like the ESP game) had some associated sites that were fun to point students to.

Tuesday, July 27, 2010

Teaching Time Conflicts

As the semester ominously approaches, we've noticed that we're facing a number of class time-slot collisions in computer science.

This is unsurprising.  Until recently, we've been fairly ad hoc in assigning class times;  for the most part, each faculty member, more or less, just picked their time.  Not surprisingly, our Tuesday-Thursday slots are packed.  Computer science faculty like to have Monday and Friday free -- often for travel to conferences, but also to minimize teaching days (MWF vs Tu-Th).  This also seems to match student desires;  many seem to like to avoid Friday classes if possible.  (We can sometimes arrange M-W classes, but strictly speaking it's against some policy -- we can't do them in the morning.)   And class times are further limited -- many faculty (including myself) and students are put off by classes before 10 am or after 4pm, there's the weekly faculty lunch meeting and various seminars to consider, and so on.

The problem has gotten a little worse in recent years, as we've happily been offering more classes (new faculty, and the benefit of some visitors), making it more noticeable.  

We're not so clear that this is a terrible thing.  Other classes -- such as popular distribution requirement classes, various math classes, and so on -- appear to have taken the natural MWF slots, so we're avoiding those external conflicts.  (Although not entirely -- my class has conflicted the last few years with the 2nd semester math class on algebra, which leads to a few e-mail exchanges each year on what can be done about that.)  We tend to avoid really dumb conflicts where it's clear lots of people might want to take both courses naturally.

But it's clear it's become enough of an issue that we have look more carefully at it, and honestly, it wouldn't hurt to put some more reasoning into our scheduling rather than keep following the path tread by historical accident.

While I'm sure the CS faculty will generate plenty of ideas, to get the ball rolling, does anyone have any good suggestions on how to design a procedure to assign class times?  In fact, I think we're still at the size where we can all do it together and resolve possible conflicts peacefully and happily, but rather than waste lots of faculty time going through all the permutations, it seems worthwhile to create an approximately good starting point.  It seems like intro classes should get top priority, and following that theme, perhaps grad classes should get placed last.  Or perhaps an ordering should be tied to faculty members, not classes?  I was thinking every faculty member give their three top time orderings, and give preference to junior faculty members.  Other insights welcome.

Friday, July 23, 2010

"I Understood Your Talk"

Often, after I give a talk, the feedback I hear back is "I understood your whole talk," or something near that.  I take it as a compliment (although I don't think it's always meant that way).

When I present a talk to systems people, I think the statement is actually meant in gratitude.  Instead of trying to force some challenging theoretical method or computation on them in the space of an hour, I've tried to give the high-level overview of what we've done (and why), and provide some simple and understandable examples behind the work that went into it.  With any luck, there's an idea or technique in there they can use themselves sometime.  I have some suspicion that many of them have suffered through a fair number of theoretical talks that have left them behind, and were grateful not to have to sit through one of those.  Also, in my experience when systems people say they don't understand a systems talk that's a bad thing;  in that case, it's often that there are some significant nagging details that haven't been discussed sufficiently that are making the listener suspicious that there's some flaw or an important side case that hasn't been adequately addressed. 

When I present a talk to theory people, it's not always clear to me how to take that comment.  There's certainly a subculture in theory CS that seems to think it's important to show how complex your result is (or, perhaps, how smart you are), never mind the audience.  On the other hand, some theory results are so complicated it is truly a challenge to try to present a lucid 1-hour talk.  In some cases, people saying they understand the talk feels like a real compliment -- thank you for presenting things in an understandable way.  In some cases, it feels like a backhanded compliment -- if it's that easy to explain, you're not working on very hard stuff, are you?  When people say after a talk they didn't understand it, it's more ambiguous -- did the speaker do a bad job, or is this really exciting new difficult stuff that will take some time to learn?

Generally, when I plan my talks, the aim is to make almost all of it understandable to as large an audience as possible.  For specialized audiences, I'm happy to go into details, but for more general audiences -- the very large bulk of my talks -- I'll aim for simple when I can.  

Monday, July 19, 2010

New Paper: Popularity is Everything

Another new paper announcement:  Popularity is Everything: A New Approach to Protecting Passwords from Statistical-Guessing Attacks, which will appear next month at HotSec 2010, is online.  My co-authors are Stuart Schechter and Cormac Herley of Microsoft.  The idea here is that the real problem with passwords is that some are too popular, making them easy to guess.  Providers respond by forcing users to choose passwords that pass certain rules -- you must have a capital and lower-case letter, you must have a number, etc.  These rules are somewhat arbitrary and don't directly tackle the significant problem of popularity.  Our paper is about how that can be done.  (As you might imagine, from my involvement, some Bloom filter variant -- the count-min filter in this case -- is part of the solution.) 

This paper was one of those great examples of serendipity.  Stuart (who was a grad student at Harvard, before joining Microsoft) came back to give a talk.  I met with him and talked about problems.  We found a nice intersection point and, some months later, a paper appears.  As faculty we're often cajoling our students to go to the colloquia or to interesting talks outside their direct field -- and to, on occasion, talk with and meet the speakers.  I admit that it's time consuming, and not all talks end up being worth going to.  But if you give up on the chance to talk to people, including (and perhaps especially) people outside your direct problem space, you miss the chance for these wonderful bits of serendipity, where you can work on something new and different, and ideas can potentially cross between areas.

Friday, July 16, 2010

Facebook Movie Preview

It's nice to see Harvard finally getting some recognition in the mainstream media, with the upcoming Facebook/Zuckerburg movie the Social Network (based on Ben Mezrich's book The Accidental Billionaires: The Founding of Facebook: A Tale of Sex, Money, Genius and Betrayal -- Mezrich is also a Harvard grad... graduated the same year I did!) -- here is (one of many) pages with the latest movie preview with some analysis, and here is a Youtube link.  Somewhere in there I saw the words Maxwell-Dworkin (the Harvard CS building).  And the scene where he is in front of the Ad Board (the Harvard body responsible for disciplining students) looks amusing -- I want to see how close it matches reality in full.

Thursday, July 15, 2010

Arxiv Paper: Heapable Sequences and Subsequences

I haven't been blogging much;  consider me on extended vacation.  It's helping me prepare for giving up the blog soonish. 

But I thought I'd mention our newly posted arxiv paper Heapable Sequences and Subsequences.  I mentioned some time back I was looking at an Aldous/Diaconis survey on Longest Increasing Subsequences.  That was for this paper, where we look at a variation on this theme.  Say that a sequence is heapable if you can sequentially place the items into a (binary, increasing) heap, so each new item is the child of some item already in the heap.  So, for example, 1 4 2 3 5 is heapable, but 1 5 3 4 2 is not.  Then, just as you might consider increasing sequences and subsequences, you can consider heapable sequences and subsequences.  We (as far as I know) introduce this problem and provide some first results, as well as a lot of interesting open problems.  We give a simple (greedy) algorithm for determining if a sequence is heapable, but show that determining if a sequence is perfectly heapable (so the heap is a perfect binary tree) is NP-hard.  We show that the longest heapable subsequence of random permutation of [1,n] has length (1-o(1))n.  And we show some other thing as well.

The paper ends up being essentially combinatorial in nature, but we were actually thinking of a more economic motivation when we started the problem.  It's meant to be a variation of in the space of hiring problems (like the secretary problem);  here you are hiring for an organization with an org chart -- say in the shape of a perfect binary tree -- and your hiring restriction is that each node is the boss of its children nodes, and therefore must have a higher ranking than those nodes.  (One would, naturally, argue whether in practice a boss is always higher-ranked than the employees directly reporting to them, but one can always consider noisy variations in a future paper.)  Now this is like a multi-hire secretary problem, but with restrictions. 

Given the large number of papers related to longest increasing subsequences, it seems like there might be a lot of interesting things to find in looking at this variation on the them.

Sunday, July 04, 2010

(n)On Vacation

Despite appearance, the blog hasn't yet disappeared.  I was on family vacation somewhere tropical.  

Even though I was on vacation, it didn't always feel like vacation.  With a consulting engagement that required a full-day phone call, several papers to deal with, some unfortunate emergency administrative stuff that came up for a later trip this summer, and various other administrivia, work was with me more on vacation than I would have liked.  I'm not the sort that minds bringing some work along on vacation (often to my wife's annoyance), but this time it felt, even to me,  like much more of an intrusion.**

The same seemed to be true for my brothers and sister-in-law, however.  They're all in business (of various sorts) and having to deal with mail, sit in on conference calls, and otherwise work while nominally on vacation seemed like the norm.  There's been plenty written in mainstream media about this (a little random searching yields examples like this), so it's not exactly new, either for me or my family.  But the magnitude of it -- and the expectations of others about one's accessibility -- definitely seems to have increased substantially since our last similar family vacation.

On a day-to-day basis, I don't mind the "blurring" of work and non-work time;  I appreciate that my job lets me walk my kids to school in return for having to put in some hours after they go to bed.  But as a society the level of this blurring seems very dramatic, and I think we'll be figuring out the consequences for years to come.

**I'd exclude my co-authors from this statement.  They've actually been quite lenient and understanding;  I admit, however, to some guilt in being busy the week before the SODA deadline.