SPAA 2008 list of accepted papers is up. (I was on the PC... standard rules apply, I take credit if you like the program, but pass the blame on to others if you don't.)
Still in the throes of the ICALP PC process...
Sunday, March 30, 2008
Thursday, March 27, 2008
Security, and Class Projects
As I was reading Freedom to Tinker, catching up on the latest reports of flawed voting machines, I remembered my favorite class project of all time.
My first year teaching Algorithms at the End of the Wire, I included a subunit on cryptography/security. (This was before Salil Vadhan arrived, and before Michael Rabin started regularly teaching a crypto course.) One group, for their final class project, decided to explore the potential security flaws in the Crimson Cash system, the local system where students put money on their ID. They got a card reader and figured out how to intercept and spoof messages from the vending machines in the Maxwell-Dworkin lobby. It was a standard man-in-the-middle attack -- you intercept the message from the vending machine so your account doesn't get debited, and tell the vending machine the message went through. For their demo, they showed how they could get a free soda. They got to learn about security by breaking an actual system (which, by the way, in retrospect was something of a bad idea -- next time students try to break system security, I'll make sure they do it in a closed, lab-type setting).
It was great stuff. All CS majors should do some sort of a-little-bit-out-there, hands-on project like that. And then, maybe, we'd have better voting machines.
My first year teaching Algorithms at the End of the Wire, I included a subunit on cryptography/security. (This was before Salil Vadhan arrived, and before Michael Rabin started regularly teaching a crypto course.) One group, for their final class project, decided to explore the potential security flaws in the Crimson Cash system, the local system where students put money on their ID. They got a card reader and figured out how to intercept and spoof messages from the vending machines in the Maxwell-Dworkin lobby. It was a standard man-in-the-middle attack -- you intercept the message from the vending machine so your account doesn't get debited, and tell the vending machine the message went through. For their demo, they showed how they could get a free soda. They got to learn about security by breaking an actual system (which, by the way, in retrospect was something of a bad idea -- next time students try to break system security, I'll make sure they do it in a closed, lab-type setting).
It was great stuff. All CS majors should do some sort of a-little-bit-out-there, hands-on project like that. And then, maybe, we'd have better voting machines.
Labels:
graduate students,
teaching,
undergraduate students
Wednesday, March 26, 2008
Knowing Harry Lewis Can Get You Hired
For that epsilon-fraction of the world that regularly reads my blog but not the complexity blog, here's a great post about the awe-inspiring power of just knowing Professor Harry Lewis. Check out the comments (I think Harry is teasing me again). And for anyone skeptical that Harvard is a high-powered CS institution, just check out Harry's list of his past Teaching Assistants (TFs, in Harvard-speak), and see how many names you recognize...
Friday, March 21, 2008
CS 124 and xkcd
A few of my students in CS 124 (Algorithms and Data Structures) pointed me to this xkcd cartoon -- amused, no doubt, that we had covered the O(2^n n^2) algorithms in class about a week ago. And who says that you don't learn anything important in my class!
David Eppstein was inspired to write a longer blog post on the algorithm, well worth reading (especially if you're in my class!).
David Eppstein was inspired to write a longer blog post on the algorithm, well worth reading (especially if you're in my class!).
Saturday, March 15, 2008
Error-Correction Mechanisms for Publications
I very much like the conference-based publication system of computer science. But an obvious problem with the system -- which mathematicians sometimes throw back in the face of CS theory researchers -- is that this system leads to buggy results getting published and accepted by the community. (In my last post, I talked about the headaches this issue can cause PC members.)
This problem could be ameliorated if as a community we had some standard ways of reporting or dealing with such errors. But I don't think we really do. Occasionally an author will self-report an error or fix something for a journal version, but I imagine errors slip through more often than we'd care to think about. Perhaps it isn't really a problem; for big, important papers, bugs will be found and knowledge of them disseminated. But for smaller papers (which, let's face it, is most of what actually gets written -- even in the major conferences), there doesn't seem to be a process -- in fact, even trying to suggest that there's a bug in someone's work can get your own paper killed.
Yes, I'm unhappy to report, this happened to me. Once, on a paper, a student found a bug in some previous related work, and thought it important to mention in the conference submission to deal with possible questions about how our work related to this other paper. [Since he's job-hunting, I feel I should say this was NOT Adam.] After going back and forth, I agreed that we could in a footnote mention that there appeared to be an error that we were discussing with the author. (The author took a while to admit there was an error, and in fact the student helped suggest a fix.) The PC sent back nasty reviews, with one even suggesting that our action was unprofessional. I, obviously, disagree. This was a submission, ostensibly confidential, not for publication (the PC could ask us to remove the footnote if they objected). We were in contact with the author and trying to clarify and fix the bug we found. How the heck else were we supposed to let the committee know what was going on, if they felt it important? If they felt it wasn't important, it was just a footnote they were welcome to skip.
This attitude, I think, stems from the fact that, on the whole, we're a very pleasant, non-confrontational area of science. Fights in CS theory are rare; most people get along (professionally) quite well. From what I've seen, with rare exception, we're much less confrontational than other sciences. So somehow mentioning out loud that someone might have made a mistake is not considered good form. Again, I may be wrong, but other sciences seem less sanguine.
Of course, the underlying problem in this incident, and others I've seen as a PC member and in other contexts, is that we don't have an error-correction mechanism that allows one to report bugs or suspected bugs in published work. Perhaps we're better off without it -- maybe it would just take up time and provide little value. But I've never heard it even come up as an issue to be thought about and discussed by the community. Perhaps it should.
This problem could be ameliorated if as a community we had some standard ways of reporting or dealing with such errors. But I don't think we really do. Occasionally an author will self-report an error or fix something for a journal version, but I imagine errors slip through more often than we'd care to think about. Perhaps it isn't really a problem; for big, important papers, bugs will be found and knowledge of them disseminated. But for smaller papers (which, let's face it, is most of what actually gets written -- even in the major conferences), there doesn't seem to be a process -- in fact, even trying to suggest that there's a bug in someone's work can get your own paper killed.
Yes, I'm unhappy to report, this happened to me. Once, on a paper, a student found a bug in some previous related work, and thought it important to mention in the conference submission to deal with possible questions about how our work related to this other paper. [Since he's job-hunting, I feel I should say this was NOT Adam.] After going back and forth, I agreed that we could in a footnote mention that there appeared to be an error that we were discussing with the author. (The author took a while to admit there was an error, and in fact the student helped suggest a fix.) The PC sent back nasty reviews, with one even suggesting that our action was unprofessional. I, obviously, disagree. This was a submission, ostensibly confidential, not for publication (the PC could ask us to remove the footnote if they objected). We were in contact with the author and trying to clarify and fix the bug we found. How the heck else were we supposed to let the committee know what was going on, if they felt it important? If they felt it wasn't important, it was just a footnote they were welcome to skip.
This attitude, I think, stems from the fact that, on the whole, we're a very pleasant, non-confrontational area of science. Fights in CS theory are rare; most people get along (professionally) quite well. From what I've seen, with rare exception, we're much less confrontational than other sciences. So somehow mentioning out loud that someone might have made a mistake is not considered good form. Again, I may be wrong, but other sciences seem less sanguine.
Of course, the underlying problem in this incident, and others I've seen as a PC member and in other contexts, is that we don't have an error-correction mechanism that allows one to report bugs or suspected bugs in published work. Perhaps we're better off without it -- maybe it would just take up time and provide little value. But I've never heard it even come up as an issue to be thought about and discussed by the community. Perhaps it should.
Thursday, March 13, 2008
Conferences and Correctness
A post by Mihai Patrascu brought up the difficult issue of how to deal with papers that may or may not be correct when on a program committee. This is, clearly, a difficult question, and can lead to tremendous difficulties, as it is often not a simple task to determine correctness.
One approach is when an issue arises to inform the author(s) and ask them to clear up the issue. I recommend this, but it is not always functional; there may be disagreements between the author and the reviewers, and there is usually a limited amount of time to settle the problem.
Also, this seems to set up an onus on the author that may be unfair: you must convince us (outside what you've already written) that your proofs are correct. Now, that might not sound like an unfair onus, but for a difficult proof it may be very challenging, particularly since initially all that's been asked for (in most conferences) is a 10 page abstract and not the "final" version. Moreover, it's unfair because it's not something you're really asking of all the other authors. Sure, nominally you are, but papers with bugs get through often enough. Sometimes we make mistakes, and arguably there is some unfairness in a policy that insists that suspicious paper X must now be proven until all reviewers are fully satisfied while papers that didn't raise suspicions pass right on through.
As I've mentioned, I think the main job of the PC is to prioritize what papers get in a conference, and a secondary job is to give feedback to the authors. As part of that job, naturally, we want to throw out papers that are wrong and let the authors know about mistakes. But my argument is that conferences (and PCs) are not designed to accurately tell if papers are completely correct. If they were, I wouldn't have 45 papers to consider in a bit over a month, and I wouldn't be given papers on topics like quantum computing to judge. Ensuring correctness is nominally what journals are for, and that's why (in general) journal articles can take more time to review, and why one seeks experts for reviewing.
I'm not sure what a good solution is, and I'm not advocating blindly giving benefit of the doubt to papers that appear wrong. But there should be a high-level understanding that the conference procedure is inherently noisy. Mistaken proofs might be published. Perhaps the problem we should turn our attention to is how, as a community, we handle these errors, including arranging for such errors to be corrected or noted (by the authors or otherwise) in an appropriate fashion.
One approach is when an issue arises to inform the author(s) and ask them to clear up the issue. I recommend this, but it is not always functional; there may be disagreements between the author and the reviewers, and there is usually a limited amount of time to settle the problem.
Also, this seems to set up an onus on the author that may be unfair: you must convince us (outside what you've already written) that your proofs are correct. Now, that might not sound like an unfair onus, but for a difficult proof it may be very challenging, particularly since initially all that's been asked for (in most conferences) is a 10 page abstract and not the "final" version. Moreover, it's unfair because it's not something you're really asking of all the other authors. Sure, nominally you are, but papers with bugs get through often enough. Sometimes we make mistakes, and arguably there is some unfairness in a policy that insists that suspicious paper X must now be proven until all reviewers are fully satisfied while papers that didn't raise suspicions pass right on through.
As I've mentioned, I think the main job of the PC is to prioritize what papers get in a conference, and a secondary job is to give feedback to the authors. As part of that job, naturally, we want to throw out papers that are wrong and let the authors know about mistakes. But my argument is that conferences (and PCs) are not designed to accurately tell if papers are completely correct. If they were, I wouldn't have 45 papers to consider in a bit over a month, and I wouldn't be given papers on topics like quantum computing to judge. Ensuring correctness is nominally what journals are for, and that's why (in general) journal articles can take more time to review, and why one seeks experts for reviewing.
I'm not sure what a good solution is, and I'm not advocating blindly giving benefit of the doubt to papers that appear wrong. But there should be a high-level understanding that the conference procedure is inherently noisy. Mistaken proofs might be published. Perhaps the problem we should turn our attention to is how, as a community, we handle these errors, including arranging for such errors to be corrected or noted (by the authors or otherwise) in an appropriate fashion.
Thursday, March 06, 2008
What is the Purpose of Program Committees?
The subject of Program Committees and conferences seems to be a timely one; besides my recent posts, here an interesting post on the theme by Mihai Patrascu, and some counter-thoughts by Daniel Lemire. Here's some more thoughts.
It's actually important that, as a community, we have a good and relatively consistent story about what conferences are for, for many reasons. Funding of conferences, certainly. So students know the rules of the game coming in. So we all know how publications in conferences should, or should not, affect hiring decisions.
As a practical matter, it is also useful to have a reasonably consistent story for specific conferences about what their goals are so that the Program Committees can perform its function appropriately. A reasonable question is why have PCs at all? Many other fields don't.
When I'm on a PC, I think my primary job is to prioritize the papers to help determine which ones make it in. In a way, it feels somewhat depressing that this is (in my mind) the main job of the PC. I do believe quality control and helping guide the direction of the community are important jobs, and that this is a powerful method for both of these things. But there is, in the end, always a non-trivial bit of arbitrariness, if you have 60 good papers for 40 slots (or if you have 20 good papers for 40 slots), around the boundary. [Joan Feigenbaum has suggested to me that we should be much more explicit about this as a community; otherwise (and this is my thoughts, not Joan's), we start finding the false notion that conferences are to be perfectly fair and essentially correct in their decisions, a standard which is impossible to reach and leads to time-wasting measures like endless PC discussions and, shudder, rebuttals for authors.]
I also think my secondary job is to offer what feedback I can to the authors. But really, there isn't sufficient time for detailed criticism, given the way theory PCs are set up. I once told an AI person I was working with that I was on a PC and had 50 papers to read, and he couldn't believe it. Apparently for AI PCs something like 10-20 papers is the norm, and 20 would be considered high. If we're going to made feedback a higher priority in the role of the PC, we're going to have to increase PC sizes dramatically, and restructure how they work. The way they're set up now, there's hardly time to read all the papers, never mind read them in sufficient detail to offer significant constructive suggestions. (That's what peer-reviewed journals are supposed to be for.)
With this in mind, I'll also throw out two wacky ideas that I'd like to see conferences try.
1) Instead of numerical scores, each PC member just gives a ranking of the papers they've read. Then use some ranking algorithm to give a first cut of where papers fall (instead of numerical averages, like PCs use now). I think this would reduce arbitrariness, since the variance in how people assign numerical scores would disappear, but it would take an experiment to tell.
2) Rather than assign each paper to three people for a detailed review, initially assign each paper to five (or more) people for quick Yes/Maybe/No vote, and chop off the bottom 50% (or whatever the right percentage is). My idea is that statistically speaking a larger number of less accurate votes is as accurate or more than a small number of more accurate votes, accurate enough that we can pre-process the bottom 1/2 or more and then spend more time on the quality papers. The negative of this is that the bottom 1/2 would necessarily get even less feedback than they do now. (I think I heard something like this idea was used in a networking conference; in my limited experience, networking PCs are much more ruthless than theory conferences about quickly finding and putting aside the bottom 1/2 or more of the papers to focus on the good ones.)
It's actually important that, as a community, we have a good and relatively consistent story about what conferences are for, for many reasons. Funding of conferences, certainly. So students know the rules of the game coming in. So we all know how publications in conferences should, or should not, affect hiring decisions.
As a practical matter, it is also useful to have a reasonably consistent story for specific conferences about what their goals are so that the Program Committees can perform its function appropriately. A reasonable question is why have PCs at all? Many other fields don't.
When I'm on a PC, I think my primary job is to prioritize the papers to help determine which ones make it in. In a way, it feels somewhat depressing that this is (in my mind) the main job of the PC. I do believe quality control and helping guide the direction of the community are important jobs, and that this is a powerful method for both of these things. But there is, in the end, always a non-trivial bit of arbitrariness, if you have 60 good papers for 40 slots (or if you have 20 good papers for 40 slots), around the boundary. [Joan Feigenbaum has suggested to me that we should be much more explicit about this as a community; otherwise (and this is my thoughts, not Joan's), we start finding the false notion that conferences are to be perfectly fair and essentially correct in their decisions, a standard which is impossible to reach and leads to time-wasting measures like endless PC discussions and, shudder, rebuttals for authors.]
I also think my secondary job is to offer what feedback I can to the authors. But really, there isn't sufficient time for detailed criticism, given the way theory PCs are set up. I once told an AI person I was working with that I was on a PC and had 50 papers to read, and he couldn't believe it. Apparently for AI PCs something like 10-20 papers is the norm, and 20 would be considered high. If we're going to made feedback a higher priority in the role of the PC, we're going to have to increase PC sizes dramatically, and restructure how they work. The way they're set up now, there's hardly time to read all the papers, never mind read them in sufficient detail to offer significant constructive suggestions. (That's what peer-reviewed journals are supposed to be for.)
With this in mind, I'll also throw out two wacky ideas that I'd like to see conferences try.
1) Instead of numerical scores, each PC member just gives a ranking of the papers they've read. Then use some ranking algorithm to give a first cut of where papers fall (instead of numerical averages, like PCs use now). I think this would reduce arbitrariness, since the variance in how people assign numerical scores would disappear, but it would take an experiment to tell.
2) Rather than assign each paper to three people for a detailed review, initially assign each paper to five (or more) people for quick Yes/Maybe/No vote, and chop off the bottom 50% (or whatever the right percentage is). My idea is that statistically speaking a larger number of less accurate votes is as accurate or more than a small number of more accurate votes, accurate enough that we can pre-process the bottom 1/2 or more and then spend more time on the quality papers. The negative of this is that the bottom 1/2 would necessarily get even less feedback than they do now. (I think I heard something like this idea was used in a networking conference; in my limited experience, networking PCs are much more ruthless than theory conferences about quickly finding and putting aside the bottom 1/2 or more of the papers to focus on the good ones.)
Tuesday, March 04, 2008
Missing Class
[Note: coincidence! I wrote this over the weekend, but have found myself "scooped" by a post on the same theme at the Complexity blog...]
It seems, more and more, I'm expected to miss class.
Obviously, now and again I have to miss class for a conference or workshop to go give a talk. Somewhere I'm sure there are rules and regulations (that generally go ignored) about this sort of thing, but I'm sure my employer and I share the understanding that I'm supposed to go present my research, and it's fine for me to miss a few lectures a semester for that purpose (while trying to get another faculty member or a graduate student to cover, of course).
I'm a little less clear on some of the other options that come my way. NSF panels. NSF workshops. (The FIND grant comes with 3 workshops per year.) DARPA ISAT meetings. PC meetings. Meetings to talk about Visions and Funding and such. Other invited talks. And so on. The underlying problem, of course, is that these things often require travel, which means missing class, which (along with family issues) is one of the reasons I don't understand why people don't think harder about how to avoid travel for these things.
How many classes is it OK to miss in a semester? Offhand, I'm thinking 4 is a goal for a maximum (for my standard Tu-Th teaching), perhaps because it looks like that's how many I'll miss this semester (cancelling 1 class, having 3 lectures covered by graduate students), and that's probably the most I've ever missed in a semester. (There was that one time, I missed the first day of class for the semester, because my wife inconveniently chose that day to birth our second child... but I digress.)
This semester, of course, I got myself in a bind by going to New Zealand, eating up several of my days. But I've definitely noticed more pressure the last few years for events that would cause me to miss class, which, ostensibly, is supposed to be a high priority for my job.
It seems, more and more, I'm expected to miss class.
Obviously, now and again I have to miss class for a conference or workshop to go give a talk. Somewhere I'm sure there are rules and regulations (that generally go ignored) about this sort of thing, but I'm sure my employer and I share the understanding that I'm supposed to go present my research, and it's fine for me to miss a few lectures a semester for that purpose (while trying to get another faculty member or a graduate student to cover, of course).
I'm a little less clear on some of the other options that come my way. NSF panels. NSF workshops. (The FIND grant comes with 3 workshops per year.) DARPA ISAT meetings. PC meetings. Meetings to talk about Visions and Funding and such. Other invited talks. And so on. The underlying problem, of course, is that these things often require travel, which means missing class, which (along with family issues) is one of the reasons I don't understand why people don't think harder about how to avoid travel for these things.
How many classes is it OK to miss in a semester? Offhand, I'm thinking 4 is a goal for a maximum (for my standard Tu-Th teaching), perhaps because it looks like that's how many I'll miss this semester (cancelling 1 class, having 3 lectures covered by graduate students), and that's probably the most I've ever missed in a semester. (There was that one time, I missed the first day of class for the semester, because my wife inconveniently chose that day to birth our second child... but I digress.)
This semester, of course, I got myself in a bind by going to New Zealand, eating up several of my days. But I've definitely noticed more pressure the last few years for events that would cause me to miss class, which, ostensibly, is supposed to be a high priority for my job.
Saturday, March 01, 2008
Another Rant from Conference Reviewing : Algorithms and Evaluation
After my first conference rant on competitive analysis (based on my current stints on the SPAA and ICALP committees), I feel it's only fair to spread the love and rant a bit on my fellow algorithmists.
I simply claim the following : if you are presenting an algorithm with the claim that it might be useful in practice, you should aim to include at least a short experimental section showing that you've implemented the algorithm and that it/how it behaves.
Here's why:
1) If your suggestion is it's potentially going to be useful in practice, and it's your algorithm, it's incumbent on you to provide some evidence for this statement. An implementation is the best evidence. I don't expect pages of simulation results examining corner cases (although, if there's space, that's certainly nice); but a couple of paragraphs explaining that you implemented it, tested on basic data, and the program actually finished goes a long way.
2) It's not that I don't trust your math. It's just that -- well, no, it is just that I don't trust your math. Maybe you've proven that the algorithm is O(n^2). I'd like to know if in practice it seems to be O(n log n) [even if it's O(n^2) in the worst case -- now you've given an average-case or special-case open problem!]. I'd like to know if there's a hidden constant of 10,000 there that makes it really not practical in its current form. I'd like to see that you didn't make an error and that it doesn't look like Theta(n^3) when you run it. [I've got 46 papers to look over in a month. Make my job easier, your paper's more likely to get in.]
3) Maybe, just maybe, someone who might actually want to implement your algorithm will actually read your paper. A non-theorist. Don't you think they want to see that it seems to actually work before implementing it better themselves? Won't some experiments make talking about your work to non-theorists easier? Help make that connection...
I'm not saying every algorithms paper needs an implementation section. In many cases, "algorithmic" results are really "complexity" results -- we're just showing that something can be done, we don't actually expect anyone to do it, and in this case there's no need for a simulation or experimental results. (Of course, in such cases, I expect the authors to limit their claims of interest/utility in practice.) In some cases, space won't permit a reasonable evaluation of the algorithm -- but do plan on it for the journal version. In some cases, the coding and evaluation of the algorithm are so interesting it merits an entirely separate paper!
But I'm amazed at how few algorithms papers provide any actual experimental results (unless they're appearing in a networking/database/other systems conference, where it's more understood that results are also expected). I've actually submitted theory papers to theory conferences with experimental sections and had reviewers urge them to be taken out, which I find mystifying (and I ignore).
And yes, if I'm reviewing your paper, and you don't have any numbers where I think you should, I'm probably mentally (and numerically) docking your paper score....
I simply claim the following : if you are presenting an algorithm with the claim that it might be useful in practice, you should aim to include at least a short experimental section showing that you've implemented the algorithm and that it/how it behaves.
Here's why:
1) If your suggestion is it's potentially going to be useful in practice, and it's your algorithm, it's incumbent on you to provide some evidence for this statement. An implementation is the best evidence. I don't expect pages of simulation results examining corner cases (although, if there's space, that's certainly nice); but a couple of paragraphs explaining that you implemented it, tested on basic data, and the program actually finished goes a long way.
2) It's not that I don't trust your math. It's just that -- well, no, it is just that I don't trust your math. Maybe you've proven that the algorithm is O(n^2). I'd like to know if in practice it seems to be O(n log n) [even if it's O(n^2) in the worst case -- now you've given an average-case or special-case open problem!]. I'd like to know if there's a hidden constant of 10,000 there that makes it really not practical in its current form. I'd like to see that you didn't make an error and that it doesn't look like Theta(n^3) when you run it. [I've got 46 papers to look over in a month. Make my job easier, your paper's more likely to get in.]
3) Maybe, just maybe, someone who might actually want to implement your algorithm will actually read your paper. A non-theorist. Don't you think they want to see that it seems to actually work before implementing it better themselves? Won't some experiments make talking about your work to non-theorists easier? Help make that connection...
I'm not saying every algorithms paper needs an implementation section. In many cases, "algorithmic" results are really "complexity" results -- we're just showing that something can be done, we don't actually expect anyone to do it, and in this case there's no need for a simulation or experimental results. (Of course, in such cases, I expect the authors to limit their claims of interest/utility in practice.) In some cases, space won't permit a reasonable evaluation of the algorithm -- but do plan on it for the journal version. In some cases, the coding and evaluation of the algorithm are so interesting it merits an entirely separate paper!
But I'm amazed at how few algorithms papers provide any actual experimental results (unless they're appearing in a networking/database/other systems conference, where it's more understood that results are also expected). I've actually submitted theory papers to theory conferences with experimental sections and had reviewers urge them to be taken out, which I find mystifying (and I ignore).
And yes, if I'm reviewing your paper, and you don't have any numbers where I think you should, I'm probably mentally (and numerically) docking your paper score....
Labels:
algorithms,
conferences,
papers,
reviews
Subscribe to:
Posts (Atom)