My Biased Coin: April 2008

Wednesday, April 30, 2008

STOC 2009 : "Impending Doom"

Another surprise announcement: I was asked to be the PC chair for STOC 2009, and I accepted.

I have to admit, I was surprised to be asked, as I am, after all, somewhat vocal in my opinions, which are not always popular. (I was going to label myself a crank, with examples, here, here, here, and here, before someone else did, but I prefer "vocal".) Once asked, I found it hard to decline, despite the warnings that the job is a lot of work (and arguably little reward). It is an honor to be asked, I do believe in service for the community, and, most importantly, I felt it might lead to interesting fodder for the blog. (Obviously nothing confidential will be discussed, but the challenges of the process -- an inside view -- might be interesting.) It was something I was hopefully going to do once in my life. And (as Cynthia Dwork nicely pointed out to me, when I asked her about the job), I'm only getting older, and will have less energy.

Some people might be worried, given my noted worldview, that I might set out to change things drastically. I was thinking it would make a great joke to take "competitive analysis" off the list of topics of the call for papers, only to find it wasn't actually there. Rest assured that things will probably change incrementally; I respect the traditions, and I view the main part of this job to be selecting and empowering a solid, talented PC to do their best job.

The one change I'd really like to implement, so much so that I have to let people object already, is to introduce the rating system that conferences like SIGCOMM use:

5: top 5%
4: top 10%, not top 5%
3: top 25%, not top 10%
2: top 50%, not top 25
1: bottom 50%

I like this approach much better than trying to guess what everyone thinks a 7 means, or differentiating between a 6.3 and a 6.6. (Though, depending on the projected acceptance ratio, I could see changing the numbers a bit -- to top 10%, 10-20%, 20-33%.) I think this approach makes it easier to concentrate attention on controversial papers that need attention.

Tuesday, April 29, 2008

Barbara Grosz now Dean of Radcliffe

Barbara Grosz was officially appointed dean of the Radcliffe Institute for Advanced Study. Barbara had previously been the dean of science for Radcliffe and took over as Interim Dean of the Institute when Drew Faust was made President of Harvard.

For those who don't know Barbara, she's known for her work in AI, and has been a professor at Harvard since 1986. I took her class and TA'ed for her as an undergraduate. I'm excited this has been made official, because I was getting tired of using the word "interim" whenever discussing Barbara's status.

Harvard computer science continues its tradition of being quite "outgoing". Harry Lewis was Dean of Harvard College, Mike Smith is currently Dean of the Faculty of Arts and Sciences, and now Barbara Grosz is Dean of Radcliffe. Per capita, we're well above average in Dean-ness. The joke around the department is once you get tenure you have to watch your back, or someone might make you Dean of something....

Friday, April 25, 2008

Book Management

Talking about a "group-book" is a lot of fun. Actually putting one together, maybe less fun.

I like to think this project will be self-promoting, if it just gets started. A few people write their chapters. I put them online, blog about them, maybe push others to blog about them, a few more people think that's a good idea and look people are actually doing it, a few more chapters come in, and so on. Soon we have 20 or so chapters, and we think an actual book. (Maybe soon after that it catches on so well we get 40 chapters, and we make an omnibus.) To me, though, the hard part is getting started.

We need some rules of management, and if I'm currently the editor assigned to your paper for say SICOMP, you might now that management is not always my strong suit. (In self-defense, I'll say it depends on the situation. Corralling referees is not my best skill. And probably not corralling authors, leading to the failure the first time around.) So here, I'm happy to take advice. Send advice! Indeed, I've had a prod or two from people who'd be interested in co-managing/editing this project, so it won't be just me-- which is great!

I'll eventually work it out with co-editor(s), but here's my rough thoughts -- totally open to change.

1. We'll send annoying e-mail to people who we might like to write chapters. (Those who were on the old list, you're not off the list.)
2. People who want to write chapters can pick (and send me) a topic, so we don't get many people writing on the same topic.
3. Authors can make their participation public, so their name and a working title goes on the Web site, or private, so they can back out.
4. There has to be some notion of deadlines -- if you say you're going to write a chapter, after time x, you have to back out (and someone else might take the topic), or show significant progress.
5. I'd hope that several people might take time over the summer to write chapters, and then maybe some others over the fall, and so on. So if you think you can spare cycles over the summer to put a draft together, let me know!

Thursday, April 24, 2008

Writing a Chapter, and Answering Some Questions

Yesterday, I posted the chapter I had drafted on codes for the hoped for book on CS aimed at high school sophomores. (You'll also note I now have a link to Bernard Chazelle's marvelous essay, The Algorithm: Idiom of Modern Science. Bernard agreed to "donate" this essay to this book projects two years ago, and has again given permission for its use.) I just wanted to clarify, the chapter did not take long to write. There was some background time in thinking about what I wanted to present, and time for looking a few things up. The actual writing, though, took something like 3 half-days during the summer, and that includes pictures. While your mileage may vary, I don't think this project should be a time sink for anyone. I'd like to think that if you made it a priority, it could be done fairly quickly.

The chapter was designed to cover what I think are important features:

A simple introduction, based on (hopefully) easy-to-follow examples.
An explanation of why this is important in "the real world".
Important past results, that are known and understood. (Here, Reed-Solomon codes and LDPC codes.)
A goal for the future. (In this case, network codes, and figuring out how they might actually be used in networks.)
Links/pointers to relevant references.

I don't think the chapter is perfect. (Feel free to send me suggestions.) But I do think this chapter is good. And that is a key point that answers many of the questions I receive about this project, about why I choose to do things one way or another. Right now, there really isn't a book/resource like what I think I have in mind. I'd rather have something that's good up in a year, rather than aim for something perfect that will take forever. "The perfect is the enemy of the good" applies here in my mind.

Other questions:

1) Why not do it in a wiki?

Answer: I've explained some in comments-- I don't think a wiki is appropriate for a "creative writing" exercise. Also, I'd like some actual commitment from authors, and I'd like them to get credit for their work. (If this all works out, you can put this on your next NSF grant application as your broad educational service.)

2) Why focus on "theory"?

Answer: It's that perfect/good thing. I'd like something coherent, that I'm qualified to manage, that can be done in a reasonable time frame. That means some focus, and I'm choosing to focus on theory. (And theory, in my mind, needs the PR.)

If this works out, we can expand to other volumes. Heck, I'll manage/edit volume 2, communications and networking. Then I'll hand it off to someone else.

3) Who do you want to volunteer?

Answer: I'll be honest -- I have a very strong preference for Ph.D.'s (over graduate students), and I'd generally prefer "name" professors/research scientists -- we need at least some of those, so we can start developing some of that cult of personality in our field, and so the book will carry some weight with publishers and other outsiders. But I'd be open to any and all good ideas.

4) Timeline?

Answer: Well, I'll talk about "management" more next post. But I'd hope that a dozen or so people could be convinced to take some time over the summer -- maybe a week or so -- and write up a chapter draft. But since last time that sort of schedule seemed too optimistic, I'd be thrilled to get chapters by the end of the year....
Next post: ideas on management?

Wednesday, April 23, 2008

The TheoryCS Book

I'm glad to see interest of various kinds in the Book on Computer Science (I really need to come up with a catchy name) I talked about last post. And yes, as many of you correctly interpreted, I'm looking for volunteers!

To flesh the idea out for you, here is (with some small updating) the original proposal I sent out for what I had in mind. For those who don't like clicking, here are what I wrote as the main rules:

The intended audience you should have in mind when writing a chapter is smart high school sophomores. I see this as the most important rule. The goal is a book (or, more concretely, a collection of chapters) the general populace can pick up and read, to learn about computer science and why it is interesting. We want to hear people say: I decided to try computer science because of this book.
This rule has some corollaries. First, fewer equations, more pictures. Second, lots of examples. Third, simple language. I do not mean to suggest that we talk down to the audience, or assume that they are mathematically ignorant, but that we assume as little as possible, and try to reach the broadest audience.
Each chapter should explain something already known and understood, and why it is significant.
Each chapter should explain a specific, but big, open problem that one hopes might be solved in the next 10-20 years, but probably not before that.
Each chapter should be roughly 15 book pages. If we can have 20 chapters, that will be about 300 pages, which is plenty. That's a guideline; I expect some variance.
Each chapter should include at least 3 easily available references for more information.

I've had many people tell me they think this type of writing is hard -- that is, writing for a general, non-research audience -- and I understand that. This task won't be for everyone. But if, as a field, we can't articulate to a general audience what we do and why it's interesting -- and in particular, if we can't provide some idea of where we hope we'll be in specific subareas a decade or so from now -- well, then I think we're in big trouble in the big landscape of science. I'm sure we can do it. I also think we haven't really tried hard enough to do it in the past, which explains things like our current NSF budget.

If you're wondering what such a chapter might look like, I have my chapter from back in 2006 available. (It's on coding, naturally.) I'll talk more about this chapter, and how much work a chapter is, next post.

Tuesday, April 22, 2008

Failure, and a Second Try

About two years ago, I went through a huge failure on a project. In the back of my mind, I always wanted to use this blog as a jumping off point for a second chance for this project. Now is the time. Perhaps some of you can help me fix this failure.

Back in early 2006, while I was on what's now known as the SIGACT Committee for the Advancement of Theoretical Computer Science, I was thinking about ways to promote theoretical computer science (TCS). One idea that struck me is that most people entering college from high school know nothing about TCS. If they've had exposure to computer science, it has been programming. Compared to the excitement of biology (we're figuring out the building blocks of life!) or physics (we're figuring out the building blocks of the universe!), computer science must seem to the average high school student to be, scientifcally speaking, a wasteland.

Now, we know this isn't true. Those of us in TCS wouldn't be in it unless we thought there were fantastic, mind-bending, potentially world-changing problems in the field. And I think history is on our side; computer science, and in particular TCS, has changed the world, fundamentally, and continues to do so. I don't think, though, we do a good job of promotion, showing the world, and in particular high school students who we might hope to attract to our field, the excitement of TCS. Somehow, our field still attracts talent, but it's an uphill battle. If we could change that, I thought, we could change the perception of our field, attracting top students earlier, and long-term creating a greater appreciation of TCS in computer science and across the sciences.

So in 2006, I dreamed up a project. I wanted a book that could be handed to high-school sophomores, to give them an idea of what TCS is all about. I wanted a book I could give to friends' kids -- heck, to my own kids, when they got to the right age (and wanted to really know what Dad did with his time). I wanted a book that would be interesting, engaging, and cover the wide swathe of work we do.

So, naturally, I knew I couldn't write this myself.

Instead, I wanted a group-book, with experts writing chapters in their areas of expertise. I reached out to about 40 or so people in the TCS community, asking them to write a chapter. I hoped first drafts could be done over summer 2006. And that by the end of the year, we might have a book ready for a publisher. About 1/2 the people agreed, and I sent occasional e-mails reminding and encouraging people.

And nothing happened. I think lots of people started, but never finished, their chapters. I heard a lot about how it was difficult to write chapters for the target audience, high school sophomores. It seemed for everyone that other projects, other papers, other things took priority. Disheartened, and busy myself, I let it go.

I still thought the project was a good idea. I still think so. So now, over the next couple of weeks, I'm going to revive the project here on the blog. I hope to change the nature of the project a bit, so my initial management failure won't necessarily repeat itself. It's something I've wanted to do since I started the blog, but I've kept putting it off.

I think it's time to try again. More to follow.

Monday, April 21, 2008

Books from my Shelf, Part 3

Today, let's look at some more practical books from the shelf.

Network Algorithmics, George Varghese: Even before he wrote this book, George Varghese had essentially written the book on how to get good performance on your network devices (e.g., routers) by smartly using algorithms combined with systems thinking. George is my role model in this area, even moreso now that I've worked with him on a few projects. If you want to see how network people are thinking about and using algorithms for packet classification, prefix-lookups, switching, scheduling, security, and so on, pick up this book.

Mining the Web, Soumen Chakrabarti: Sigh. Has it really been over ten years (no need to get more specific than that!) since Soumen and I were graduate students at Berkeley together? Soumen's one of those people who could have been a great theoretician, but was always lured by that "real world" thing, and instead became an expert on information retrieval and the web, which lets him still do theory now and again but with an eye toward problems of more immediate importance. A great book for everything you might want to know to start building your own search engine. Web crawling, information retrieval, clustering and classification, etc. And lots of algorithms and algorithmic insights.

TCP/IP Illustrated, Volumes 1 + 2, Stevens and Wright: At some point, I was looking at TCP for research purposes, and this was a great book for understanding the fine details. It may take a little looking, but the details are in there. Volume 1 also provides background on a lot of basics that I think are useful for networking research. Finally, I also think it would be useful for theorists to look over this book at some point is to get some insight into real-world systems-thinking.

Sunday, April 20, 2008

DF Splash

I haven't had any official connection to Digital Fountain for some time, but I do still (I think) own an epsilon fraction of stock, so I check in the website every now and again to see what they're up to.

They have a web site for a beta/demo for their new "TV-Quality Streaming Video CDN" product, DF Splash. If any one or ones would care to review it -- in the comments, or as a guest post -- let me know. I'm curious what people think. I'd review it, but I'm swamped (CDI proposal due!), and I'd actually rather hear what other people think, if anyone is willing.

I watch far, far too much TV. I've been pretty unimpressed generally with the TV on the Internet experience -- I haven't gotten into YouTube -- but I have been enjoying www.hulu.com. I'm trying to use it to catch stray episodes I've missed on shows I watch (or watched) regularly -- instead of, say, staying up late hours to have my own mini-marathons of Bewitched.

Wednesday, April 16, 2008

Taxes and Algorithms

I'll continue with my walk through my bookshelf shortly, but as tax day just came and went, a thought or two on taxes, with an algorithmic bent...

Ideally, the computation of taxes is a (relatively simple) algorithm. Ostensibly, there should be a set of numbers one gives as inputs, and then an output-- the amount of tax you owe or are owed -- is produced. This is the basis for tax software programs -- you plug in the numbers, and off you go.

In real life, things are much worse. The required inputs are, at best, vaguely defined -- leading to such fun questions as when I make a charitable deduction by buying a ticket to a charity dinner, how much am I not allowed to deduct because of the value of the dinner -- and subject to interpretation. In computer-speak, garbage in, garbage out. I'm amused by reports like the one where Money magazine sends the same information to 50 tax professionals and ends up with 50 different tax returns (and a large spread in what is owed), or reports that in tests of IRS call-in helplines that the wrong answer is given over a third of the time. And when exceptions or odd cases occur, as they do often with the way the tax code is set up (AMT, anyone?), things rapidly become incomprehensible.

Naturally, I think there are lessons here. I actually find many tax forms are simple to understand, when I understand what the end goal and result is supposed to be. That is, it's much easier to understand an algorithm when one understands the goal of the algorithm, and there's some natural connection on how the steps of the algorithm are getting you to that goal. The problem with many tax forms is that they give the steps of the process without clarifying the goal, so I can't tell if my situation is a standard case or a special case that requires more time and attention. When theorists explain algorithms to outsiders, we ought to keep in mind that they might not understand special cases or detours in the argument that seem natural to us. We should keep the end goal clear throughout our explanation. And I admit I'd be thrilled if the US moved toward a tax system that was algorithmically speaking comprehensible (at least to me).

Another way of viewing this lesson is that there are (or should be) significant differences between algorithms designed for a computer to use, and algorithms designed for a human to use (with or without the help of computers). Humans, for instance, often like to understand the output produced, for their own satisfaction or so they can explain it to others. Computers don't care. In my past foray into human-guided search systems, this was one of the lessons I learned. In theory, we often think of an algorithm as a set of instructions transforming an input to an output. Sticking too closely to that worldview can be quite limiting; algorithms can be objects that humans -- or other agents -- interact with, leading to different design goals than we commonly consider. Even if you could just plug numbers into a computer and have your tax bill pop out, I think many people (myself included) would be unhappy, if we didn't have some understanding of how that final number was derived.

Tuesday, April 15, 2008

Books from my Shelf, Part 2

Continuing a list of some interesting books...

Reversibility and Stochastic Networks, Frank Kelly: This was the book that got me interested in queueing theory style mathematics -- well, really it was a course I took from Frank Kelly, based on the book, and Frank then gave me a copy. The book is out of print, but nicely available online from the link. It's all about Markov chains, queues, queueing networks, and various applications. While I'm sure there are more "modern" treatments, I think the book holds up quite nicely.

Queueing Systems (Volume 1, Theory), Leonard Kleinrock: Another classic with all the basics of queueing theory, including all the fundamental results of all the basic types of queues. (While Volume 2, on Computer Applications, is also interesting, I think Volume 1 is much more useful.)

A First Course in Stochastic Processes, Karlin and Taylor: OK, another classic probability book. (Yes, there's still more of them to come.) Markov chains, renewal theory, Brownian motion, branching processes, etc. (While the Second Course, with more advanced material, is also interesting and useful, I think the first book is much more important.)

Introduction to Probability Theory and Its Applications (vols 1 and 2), Feller. Again with the classics. The classics are apparently classics for a reason, at least in my mind...

Monday, April 14, 2008

Books from my Shelf, Part 1

Commenters have told me they want some book recommendations, so here I go... since there's a number of books, I'll be breaking this post up.

But first, a caveat, I'm not a big collector of academic books. I was spoiled by Berkeley's libraries as a graduate student, and am spoiled by Harvard and MIT's libraries now. But here are some of the books on my shelf that I find useful (besides, of course, the obvious Mitzenmacher and Upfal, which everyone must have on their shelf). Some are useful for theory, some for practice, some for... well, hopefully the descriptions will give some idea. In no particular order, here are the first 5, with more to come...

Handbook of Algorithms and Data Structures, Gonnet and Baeza-Yates: Apparently out of print, but well worth getting used. This is the state of the art in hashing, search algorithms, sorting algorithms, priority queues, text search, etc. as of 1990. And one of the best sources if you need to look up, say, the formulas for the expectation and variance for the number of accesses required under linear probing hashing to insert the (n+1)st element into a table, or pretty much anything else in that vein. Chock-full of formulas and references, this is my go-to book for looking up the basics. Someone should write an updated version of this classic.

Elements of Information Theory, Cover and Thomas: This book is the classic standard introductory text for information theory. And it's just a darn fine book, well written and covering all the basics so you could, for example, skim it over and then go to information theory conferences and act as though you sort of belong.

Modern Coding Theory, Richardson and Urbanke: OK, it's not yet on my shelf, I still have to get a copy. This book is for everything you've wanted to know about low-density parity-check codes by two of the leading experts in the field. I'm not sure I'd recommend it generally to everyone, but I would to everyone interested in coding theory in general (and LDPC's in particular).

Information Theory, Inference, and Learning Algorithms, David MacKay: This book, which I've enjoyed since it came out, now lies somewhat interestingly between the above two information theory books. It's a bit more basic and broad, covering fundamentals of compression, coding, and Bayesian inference, while also covering topics like LDPC codes and neural networks. The book is a bit idiosyncratic, and naturally, since I enjoy David's work, I enjoy the book.

Complex Graphs and Networks, Chung and Lu: I like having this book around so I can pull up Chapter 2 for reference -- all about tail bounds, with an emphasis on variants of martingale bounds (Azuma's inequality) that cover more obscure situations that arise and are hard to find easily elsewhere.

Thursday, April 10, 2008

Things I Can't Talk About

If you're a regular reader, you've probably noticed that I haven't been posting as much as usual lately. Unfortunately, that's because I've been spending too much time on things I can't talk about.

Specifically, for much of the last two weeks, I was at a trial in San Francisco as an expert witness. Luckily, one week of it was during Harvard's spring break, so I didn't miss too much class time. But the case naturally took almost all of my energy and work hours. Not only did I not have much time to blog, but I didn't even have much time to think about things to blog about. Hopefully, that will change, and I'll soon be back to my usual self.

A natural topic to blog about, of course, would be about what it was like being an expert witness. Such a topic fits within the scope of the blog. But the case is just still too close. Instead of talking in generalities, which I think is reasonable and still professional, I'd run the risk of discussing specifics, of either the case, the client, or the attorneys I worked with -- which I definitely think is not, especially in a blog context.

As I pondered this dilemma, it occurred to me that there are plenty of other things still within the realm of professional life that I can't (or, to be clear, I choose not to) talk about. Details of PC meetings is not suitable blog material, as are details of the meetings of our CS faculty search committee -- and pretty much the last few weeks, when I wasn't busy with the case, I was busy with the ICALP PC or the CS search! Nor are discussions of my work on the "Administrative Board", Harvard's rule-enforcing body, for which every case is meant to be entirely confidential. Heck, we aren't even supposed to talk about what NSF panel we serve on when we serve on NSF panels, which I find a bit extreme. (Who couldn't figure that out if they wanted to?)

It is frustrating, since part of the reason I started to blog was because I like to talk about things. But especially because a blog is a public, permanent record, there are interesting, worthwhile, and even entertaining topics... that I can't discuss.

Wednesday, April 09, 2008

The ICALP crossover game

For those who like this sort of trivia -- is there anyone who has had papers in ICALP tracks A,B, and C? (Different years are permissible.) Who has had papers in two of the three tracks?

Track B, being far out of my usual scope, seems hardest to get to me.

Tuesday, April 08, 2008

ICALP A list of papers up

The list of papers for ICALP (Track A) is now up.

This was a hard PC to be on -- lots of papers, and in fact lots of good papers, made it especially hard to draw the somewhat arbitrary dividing line. Congratulations to those who had papers accepted, and for the many that didn't, be aware that the decisions were very, very difficult to make.

Tuesday, April 01, 2008

Jennifer Rexford's take on "Practical Theory"

Jennifer Rexford, who I've mentioned before is a networking person who appreciates theory, gave me a rumor that she'll be giving a talk at STOC on "networking questions relevant to the theoretical CS community." I don't see the program up yet, but I hope it's true. She's a great speaker, with excellent perspective. Kudos to those who thought to invite her!!!

She also pointed me to an editorial she wrote on her ten favorite "practical theory" papers. Naturally, I enjoyed reading her perspective. Particularly since her perspective included my name (flattery, indeed, will you get you everywhere-- or at least on my blog).

I think, however, her editorial offers several lessons. Here's my top two. First, her notion of a theory paper is a little different than ours. Most of the papers are by networking people who think mathematically and theoretically (Varghese, Towsley) or more attuned to what I think of as EE theory. This first lesson is a reaffirmation of my long-standing notion that the FOCS/STOC view of theory is, in many respects, rather limited, especially to our peers outside of theory. I think a second related lesson goes to the importance of theory people getting out there and making their work more accessible to non-theorists. In my case, that has meant writing survey articles, giving talks for general audiences and not just theorists, and going to networking conferences to talk about my work. I'm sure there's plenty of other "mainstream theory" work that Jennifer and others from networking would greatly appreciate -- if only we as a community did a bit better job of letting them know about it.

My Biased Coin