My Biased Coin: papers

Showing posts with label papers. Show all posts

Monday, November 03, 2008

Bugs

In a recent review of a journal paper, I got the following comment:

In explaining why you are presenting simulation results, you say, "First we wish to check our theoretical analysis..." I don't understand this motivation. Your theoretical analysis is substantiated by mathematical proofs. What more evidence do you need of its validity?

Please keep this statement in mind.

I've stated frequently that theorists should actually implement their algorithms. I have some further recent anecdotal evidence to back up this claim.

I have students working on a variety of projects, and recently, I've had a strange convergence: in several of the projects, the students have found what appear to be non-trivial errors in recent theory papers (2 in CS theory, 1 in EE theory). It's not clear that any of the results actually break -- indeed, I don't think any of them do. I don't want to exaggerate. In one of them, a constant factor seems to be messed up -- not disastrous, but it is an important constant factor in context. And perhaps in one of the papers we could chalk it up to really just a sequence of confusing typos rather than an actual error.

Now I'm not naive enough to expect conference papers (or even journal papers) without errors. But the key here is that these errors were either easily found and/or proven to me by the students by having them implement the algorithms described in the paper. Then they sort of jumped out.

Implementing your own algorithm is a good way of checking your work. If you aren't implementing your algorithm, arguably you're skipping a key step in checking your results. Why should a reviewer go to the trouble of checking your work carefully if you're not?

Moreover, imagine not some student but an actual real systems-builder who decides your algorithm is worth implementing for their system. Imagine how they feel when they find things don't quite work as you've stated. That doesn't exactly inspire confidence, and is the sort of thing that discourages someone from using your algorithm. More globally, it gives systems people the feeling that theory (and theorists) aren't important. Why should they be debugging your proofs? You should give them evidence your algorithm can be implemented and works as claimed by actually implementing it if possible or reasonable to do so.

I'm not stating that you personally have to implement your algorithm. Hire a student to do it! It's good experience for them. You can apply for an REU, or other student research funding.

So, to answer the anonymous reviewer, I think there's always plenty of good reason to check our mathematical proofs by experiment and impelementation, and it's suitable to put those results in the journal paper as an aid to the reader (and reviewer!).

Friday, August 29, 2008

A Survey on Hash-Based Packet-Processing Algorithms

Sometime way back, Graham Comorde invited me to a DIMACS workshop on Algorithms for Next Generation Networks, where I talked the usual talk about hash tables, Bloom filters, and things. The organizers later had the wherewithal to undertake putting together a book or collected chapters on the topic, and asked us (us being, of course, me and my student Adam Kirsch) for a chapter related to the talk. Adam was just finishing his thesis, and was willing to take on modifying his thesis to form the basis for a survey chapter. I wrote up a section or two, and for good measure, we got George Varghese, who is the world expert -- and author of the book Network Algorithmics -- to join in as well. (In particular, we really wanted George's much more practical perspective on what works and what doesn't!) The result here is a hopefully pleasant light read on current goings-on in the area of hash-based network algorithms, attempting to blend and show the connections between theory and practice.

There's still time for feedback for the final version; please mail me if you have any constructive suggestions.

Sunday, August 17, 2008

SIGCOMM 2008, Part 3

Here are a few more papers from SIGCOMM which should be of particular interest to a more theoretical audience. (Generally, SIGCOMM papers are interesting -- but again, I'm focusing here on papers that I think might be of special interest to theory people. It strikes me that I should, at some point, similarly summarize papers from a major theory conference -- like STOC 2009 -- that would be of special interest to networking people. Of course, SIGCOMM makes that easier, posting abstracts and all the papers online...)

There's a paper on analyzing BitTorrent in the game-theoretic incentive-style analysis sense. It will require a more careful reading from me, as I'm not a full-fledged game-theory/CS type researcher, but it sure looks interesting on the first perusal. I'm naturally biased to the idea that if all this current effort on game theory that is going on in computer science (and particularly in theory) is to have payoff, real-world protocols must be considered and analyzed. So in that sense, this should be a really interesting paper.

While it doesn't appear particularly theoretical (it looks like what I like to joke is a standard networking paper -- lots of pictures and tables, no equations...) this paper on spamming botnets from Microsoft includes Rina Panigrahy (well known for his work in both theory and practice) as one of the co-authors. (I figure Rina had something to do with where I saw the words "entropy reduction", but that's just a guess...)

Tuesday, July 22, 2008

A Reviewing Story

I'm reminded by some current goings-on about "unusual" reviews, especially one of my worst reviewing experiences ever. I'm sure most everyone has stories of some really painful, inexplicable reviews -- it's like our version of "bad beat" stories in poker -- so here's one of mine.

I had been part of a project that was looking at human-guided search techniques, and specifically we had done a lot of work on 2-D strip-packing, a common problem looked at in the OR and occasionally in the CS literature. Basically, our paper introduced what we would later generalize to Bubblesearch for this problem, and demonstrated how user interaction could lead to even better performance.

We submitted it to a journal-that-will-remain-nameless that claimed it was at the intersection of OR and CS. This seemed a fit to me. This is a standard OR problem; heuristic approaches for it have certainly appeared regularly in other OR journals. We had a very CS-oriented approach, using greedy-based heuristics, and fairly nascent techniques from the interface of AI and user interfaces. We wanted it in front of the OR audience, where human-interaction optimization systems would have been a fairly novel and potentially useful idea.

The reviewers didn't go for it (even after we revised it to answer their complaints). Clearly the human-interaction stuff was a bit beyond what they were able to cope with; if that had really been the main stated objection -- "this is really too AI/UI-ish for us to cope with," then I could have been disappointed by their lack of desire to expand their worldview and moved on. But one reviewer seemed to clearly to think we didn't properly cite and compare results with what I imagine was his own work (which included a paper that was at best tangentially related, and a paper that was apparently under review at another journal and was not publicly available in any format when we wrote ours). Another reviewer simply said that the readers of the journal wouldn't be interested. This is his summary of what we did:

"You look at a simple and natural modification of pretty much the first packing that comes to mind, an idea that could be described over the phone in two minutes, assuming no previous knowledge. Beyond that, you run a bunch of experiments and find out that you get improvements over some metaheuristic." [My note: that "some metaheurisitc" was the one giving the best published results for the problem at the time.]

Yes, that's right, all we did was introduce a painfully simple heuristic -- that hadn't appeared in the literature before, anywhere -- for a well-known, well-studied standard problem, and run experiments showing it beat the best known results on a variety of benchmark instances. I could see why that wouldn't be considered interesting to readers at the OR/CS intersection. Sigh. It's one thing when a reviewer doesn't get your work. It's another when a reviewer gets your work, seems to admit that it's quite good -- at least I view simplicity combined with performance that beats several papers worth of more complicated previous work a plus -- and just says, "But that's not interesting." How do you argue with that?**

I've shied away from the OR community since then. Being fed up at that point, we sent the paper to the Journal of Experimental Algorithmics, where I felt we'd have fewer problems, even if it wasn't exactly the target audience. If you want to read the paper and decide for yourself if the original reviews were warranted, you can find a copy here. (New Heuristic and Interactive Approaches to 2D Rectangular Strip Packing.)

**I admit, I have had to give reviews like that for conference papers -- where the review ends up being, "Sure, I think this is good, you should publish it somewhere, but I'm afraid it's just not in the top x% of the papers I'm seeing and we're space limited for the conference." I hate writing such reviews, but at least I don't make up reasons why the paper isn't good...

Sunday, July 20, 2008

Journal Policies -- IEEE/ACM Transactions on Networking

The Transactions on Networking, or ToN, is changing its policy somewhat on papers. Before, it used to be the page limit was nominally 12 pages, but you could pay $200 per extra page for up to 2 extra pages. Now, the page limit will nominally be 10 pages, and you can pay $220 per extra page up to 4 extra pages.

As I've mentioned before with Transactions on Information Theory (which is, I know, already changing its policy on correspondences), I don't understand page limits for journal articles. An article should take the space it needs for the authors to adequately express their ideas. Admittedly, for ToN, this is less of a problem than for ToIT; 14 pages is generally enough for most any networking paper. I suppose the page limit helps prevent papers with a seemingly endless series of graphs each presenting minimal information. (There are still plenty of networking papers like that, but page limits at least cut down the number of graphs that can appear.) Still, there must be some high-quality papers that authors either send elsewhere or artificially cut down to the page limit.

This change in policy, though, seems just to be a way for them to pull in some extra money. I know IEEE and its societies have had money problems in the past; maybe it's getting worse.

ToN is a high-quality journal, and has been a good outlet for some of my work. I haven't minded paying $400 in the past for the two extra pages when needed. At some point, though, there's a limit. Time for me to look closer at how Open Access journals like Theory of Computing are faring monetarily these days -- is there a networking equivalent yet?

Monday, June 30, 2008

Strange Transactions on Information Theory Policies

The journal where I've published the most now is the IEEE Transactions on Information Theory. (Probably because when I write any journal article in information theory, I send it there, while my other work is more scattered.) So I hope it's clear that, as a journal, I hold it in high regard. That being said, they have some strange policies that I just don't understand. (If you're not familiar with the journal and want to check what I'm saying, their submission guidelines are here.)

First and foremost, they have a distinction between "regular papers" and "correspondence". As they briefly make unclear, "a correspondence item will have one point to make briefly and without adornment, whereas a regular paper will be a more well-rounded treatment of a problem area." In the old days, as far as I can tell, this distinction didn't matter much, and roughly the distinction boiled down to length -- for me, long papers were regular papers, shorter papers were correspondence. Perhaps in the IT community there was more prestige for a regular paper, but I've never been aware of it -- at least in my mind citation count would give a better idea of a paper's importance than its nominal type -- and never cared.

This strangeness didn't become important until recently, when to deal with the apparent problem that the Transactions was becoming too big and thereby using too much paper [yes, I know this shouldn't be a problem, but they still send out lots of paper copies], they limited correspondence items to 5 journal pages.

This, to me, is just strange. First, I dislike the idea of page limits in general for journals. The journal is supposed to be the "permanent record" of a scientific result. If it takes 50 pages to present the result, then that's what it takes. A journal certainly has a right (indeed, perhaps a duty) to have authors properly edit a work so that there's not a gross amount of wasted verbiage, but to set a priori page limits, in my mind, goes against the purported purpose of journal publication.

Second, I don't see how this page limit jibes with this (already strange) regular paper/correspondence distinction. Sometimes, presenting a single point briefly takes more than 5 pages. (Arguably, it often takes this long, if one wants to do a suitable job presenting previous work, or substantial mathematical setup must be done.)

As an example, I submitted a paper to ToIT. I thought it was clearly a correspondence, according to their definition, since it really covered a single thematic issue. Given the policy, I wasted at least a day cutting things from my version to make it a (just over) 5 page submission version. (I had tables and a picture -- it really was a hard exercise to cut that last page.)

The reviews came back, and most of the complaints were questions of the form, "Why don't you talk about Z?", where in fact my discussion of Z was something I took out to get things down to 5 pages. (Duh.) The editor asked the editor-in-chief -- no exceptions to the 5 page rule.

So the editor, smartly, decided that I had a 6-page regular paper. While I appreciated the solution, the whole thing just highlighted the silliness of the policy, and perhaps the strangeness in distinguishing "regular papers" and "correspondence" in the first place.

Another policy I object to is their decision not to publish already-published conference articles unless there is "substantially new" material. (This is not spelled out, but I have been told the rough figure is "30% of the article must be new", whatever that means.) Again, in my mind, this goes against the purported purpose of a journal as the final record of a research result. Conference papers are, in my mind, dramatically more likely to be wrong, and at the very least have generally had little feedback in the reviewing process. A journal is supposed to provide more in terms of reviewing; if a conference paper and journal paper ended up identical, I'd have no problem with it -- and I'd congratulate the author on writing such a solid conference version (assuming I didn't question the reviewing practices of the journal).

I can understand this policy in the context of copyright. An IEEE journal can't (re-)publish something another organization has the copyright to, and an author should understand this. But when the paper was in an IEEE conference -- what's the problem?

These two policies interact in a very strange way in the context of the IEEE International Symposium on Information Theory (ISIT) -- the big IT conference. ISIT papers are -- get this -- 5 pages. One might naturally hope that ISIT papers could be turned right over into Transactions correspondences -- and get the benefit of more complete reviewing and editing for that "final version". But not, it seems, under the current policy that the journal version must have something "new" over the conference version. But I can't add anything new to my 5 page conference version without going over that arbitrary 5 page correspondence limit. Am I the only one that sees something wrong with that picture? (And yes, I have ISIT papers that I won't bother submitting to ToIT, because they're really correspondences that should/would be 6-7 pages once I add that 30% new stuff in.) If they're going to not publish things from conferences, even IEEE conferences, then having a 5 page ISIT limit and a 5 page correspondence limit is, in my mind, just strange.

Monday, June 23, 2008

Yet Another Survey... Deletion Codes

Next week I'll be at SWAT for an invited talk. When I gave them a choice of topics, they went with: A Survey of Results for Deletion Channels and Related Synchronization Channels. I've actually given this talk a few other places, but there's been some new stuff, so it's been updated a bit. (I'll put the latest version of the slides up on my talks page after the talk...)

The invitation gave me an excuse to buckle down and finish something I started over a year ago -- a survey article on results for the capacity of the deletion channel (and related problems) to go with the talk. I must be out of practice writing, because it's taken me most of the month to get this thing done. (Well, OK, there was that whole "baby incident" that took up some time, but still, I've clearly been doing far too much administration this last year, and I need to shake off the rust.) It's probably still a bit rough so anyone interested enough to read it, please feel free to mail corrections/suggestions directly.

Apparently, based on some mild teasing from a colleague and friend, I'm known as someone who writes survey articles. (The teasing was perhaps I'm known only as someone who writes survey articles...) I don't know why I end up writing them -- I can't really say it's a fun exercise to write a survey, though it is often educational -- but somehow I feel it's valuable. In this case, there's definitely the motivation that I've done a lot of work in this area, and I want to both promote the area and my work. The survey has a couple of dozen open problems spelled out right there, so let that be motivation for some of you to read it...

Also, I'm open on suggestions on where to publish it. My last few surveys have nicely been able to go to Internet Mathematics (see here for Bloom filters and here for power laws/lognormal distributions-- although the "uncensored" version of the power law survey from my publications page is more fun) so that they've had a worthwhile permanent home. This survey won't quite fit there, and I think it's good for such things to have a visible outlet (besides, say, arxiv). Ostensibly Transactions on Information Theory will take surveys, although I haven't seen that much in practice; and this survey doesn't really fit in the Foundations and Trends model. (I like the Foundations and Trends model, but it seems to me their surveys are like "mini-textbooks" -- and cost accordingly -- while my surveys are more like "here's some background, big ideas, and pointers to papers" -- and I'd like them to be essentially free.)

Friday, June 13, 2008

An Improved Analysis for "Why Simple Hash Functions Work"

I recently had a paper with Salil Vahdan (in SODA 2008) explaining why simple hash functions "work" for so many stream-related problems where data is hashed to be placed in a hash table or a Bloom filter or a similar structure. Here "work" means "behave essentially the way you would predict if the hash function was perfectly random." The basic reason, we suggested, was that if the data in the stream had enough randomness, it would combine with the randomness from the choice of hash function to produce value that looked independent and uniform.

Our analysis of how much randomness the data needed to have seemed a little loose. Salil and his student, Kai-Min Chung, have recently improved the analysis; here's the arxiv link, the paper will be appearing in RANDOM '08. There's some really nice and somewhat subtle arguments that I imagine will find some uses in other areas. While it "only" improves the constant factor (in terms of number of bits of randomness needed), this is certainly a case where constant factors are important.

The result further reinforces my belief that this is the right high-order explanation for why, whenever anyone actually does an experiment involving hashing, you get the right results no matter what hash function you use.

Friday, May 30, 2008

More Robust Cuckoo Hashing (ESA paper)

I have a paper at ESA, with Adam Kirsch and Udi Wieder, entitled "More Robust Hashing: Cuckoo Hashing with a Stash." (Here's an extended preprint version.) The starting point of the paper is that cuckoo hashing is wonderful, and everybody should know about it and use it. (If you don't know about it or use it, see here or here. But basically, cuckoo hashing is multiple choice hashing with the added bonus of getting to move items around among their multiple choices as needed to balance load on-line.)

So, of course, we want to make cuckoo hashing even better. And one of the problems of cuckoo hashing is that it "fails" with non-trivial probability. For example, in standard 2-choice cuckoo hashing, putting n items into 2(1+eps)n cells with at most 1 item per cell can be done with probability 1 - Theta(1/n). That gives a Theta(1/n) "failure" rate, which doesn't mean anything in terms of "average" behavior, because it can always be handled by re-hashing everything with a new hash function, so that's the end of the story in most theory papers.

In practice, that kind of failure rate is a bit high. Re-hashing an entire hash table could just be too expensive an operation to have occur that frequently for many applications. Can we improve the failure rate somehow?

Let's look at where that Theta(1/n) failure rate comes from. When there are 2 choices per item, the simplest type of failure one might imagine is that 3 items try to use the same two cells -- that is, 3 items have the same 2 choices of location. Indeed, such a problem occurs with probability Theta(1/n) [exercise LTR]. But when such a problem occurs, we can imagine taking one of those 3 items and stashing it somewhere -- in a stash as suggested by the paper title. The stash should be a little space on the side, that we may have to check whenever we look for something. If we implement a stash, how does the size of the stash affect the failure rate?

If failures were essentially independent, so that each item "fails" independently with probability proportional to 1/n^2, then we'd expect f failed items with probability proportional to O(n^{-f}). This turns out to be the right intuition; 0ur analysis shows that a stash sized for s items (for constant s) reduces the failure rate to O(n^{-s-1}). The analysis is a bit trickier, since of course item failures aren't independent, but the result seems natural.

So a small, constant-sized hash -- easily implementable in software or hardware -- reduces the failure probability dramatically, allowing one to avoid rehashing in practice. What I found particularly interesting from the theory+practice perspective is the power in this setting of a constant-sized stash. It's a very natural addition for practitioners -- I don't think it would cause an engineer to blink -- but I think it really changes the potential for cuckoo hashing, making it a structure you could imagine employing on devices used by millions of customers, without having to worry about this otherwise far-too-common failure mode.

We show similar results for multiple cuckoo hashing variations. (The analysis, unfortunately, is different for all of the variations; it would be nice for someone to develop a cleaner, nicer, more unified analysis of all the varieties of cuckoo hashing.) The case of 2 choices is of non-trivial interest, however, since Udi, along with Moni Naor and Gil Segev, have recently shown that 2-choice cuckoo hashing can be used to develop a very efficient history-independent hash table (check out Udi's web page for the paper). The main problem with such a structure is, naturally, the non-trivial probability of rehashing, which can be avoided using a stash.

The stash idea also fits nicely with my previous paper with Adam on the Power of One Move for cuckoo hashing style schemes, although there the stashes (with suggested implementation as Content Addressable Memories in hardware) were larger than constant-sized. I'm beginning to think the idea of stashes (or CAMs) should be a standard section in practically oriented hashing related papers.

Tuesday, April 01, 2008

Jennifer Rexford's take on "Practical Theory"

Jennifer Rexford, who I've mentioned before is a networking person who appreciates theory, gave me a rumor that she'll be giving a talk at STOC on "networking questions relevant to the theoretical CS community." I don't see the program up yet, but I hope it's true. She's a great speaker, with excellent perspective. Kudos to those who thought to invite her!!!

She also pointed me to an editorial she wrote on her ten favorite "practical theory" papers. Naturally, I enjoyed reading her perspective. Particularly since her perspective included my name (flattery, indeed, will you get you everywhere-- or at least on my blog).

I think, however, her editorial offers several lessons. Here's my top two. First, her notion of a theory paper is a little different than ours. Most of the papers are by networking people who think mathematically and theoretically (Varghese, Towsley) or more attuned to what I think of as EE theory. This first lesson is a reaffirmation of my long-standing notion that the FOCS/STOC view of theory is, in many respects, rather limited, especially to our peers outside of theory. I think a second related lesson goes to the importance of theory people getting out there and making their work more accessible to non-theorists. In my case, that has meant writing survey articles, giving talks for general audiences and not just theorists, and going to networking conferences to talk about my work. I'm sure there's plenty of other "mainstream theory" work that Jennifer and others from networking would greatly appreciate -- if only we as a community did a bit better job of letting them know about it.

Thursday, March 13, 2008

Conferences and Correctness

A post by Mihai Patrascu brought up the difficult issue of how to deal with papers that may or may not be correct when on a program committee. This is, clearly, a difficult question, and can lead to tremendous difficulties, as it is often not a simple task to determine correctness.

One approach is when an issue arises to inform the author(s) and ask them to clear up the issue. I recommend this, but it is not always functional; there may be disagreements between the author and the reviewers, and there is usually a limited amount of time to settle the problem.

Also, this seems to set up an onus on the author that may be unfair: you must convince us (outside what you've already written) that your proofs are correct. Now, that might not sound like an unfair onus, but for a difficult proof it may be very challenging, particularly since initially all that's been asked for (in most conferences) is a 10 page abstract and not the "final" version. Moreover, it's unfair because it's not something you're really asking of all the other authors. Sure, nominally you are, but papers with bugs get through often enough. Sometimes we make mistakes, and arguably there is some unfairness in a policy that insists that suspicious paper X must now be proven until all reviewers are fully satisfied while papers that didn't raise suspicions pass right on through.

As I've mentioned, I think the main job of the PC is to prioritize what papers get in a conference, and a secondary job is to give feedback to the authors. As part of that job, naturally, we want to throw out papers that are wrong and let the authors know about mistakes. But my argument is that conferences (and PCs) are not designed to accurately tell if papers are completely correct. If they were, I wouldn't have 45 papers to consider in a bit over a month, and I wouldn't be given papers on topics like quantum computing to judge. Ensuring correctness is nominally what journals are for, and that's why (in general) journal articles can take more time to review, and why one seeks experts for reviewing.

I'm not sure what a good solution is, and I'm not advocating blindly giving benefit of the doubt to papers that appear wrong. But there should be a high-level understanding that the conference procedure is inherently noisy. Mistaken proofs might be published. Perhaps the problem we should turn our attention to is how, as a community, we handle these errors, including arranging for such errors to be corrected or noted (by the authors or otherwise) in an appropriate fashion.

Saturday, March 01, 2008

Another Rant from Conference Reviewing : Algorithms and Evaluation

After my first conference rant on competitive analysis (based on my current stints on the SPAA and ICALP committees), I feel it's only fair to spread the love and rant a bit on my fellow algorithmists.

I simply claim the following : if you are presenting an algorithm with the claim that it might be useful in practice, you should aim to include at least a short experimental section showing that you've implemented the algorithm and that it/how it behaves.

Here's why:

1) If your suggestion is it's potentially going to be useful in practice, and it's your algorithm, it's incumbent on you to provide some evidence for this statement. An implementation is the best evidence. I don't expect pages of simulation results examining corner cases (although, if there's space, that's certainly nice); but a couple of paragraphs explaining that you implemented it, tested on basic data, and the program actually finished goes a long way.
2) It's not that I don't trust your math. It's just that -- well, no, it is just that I don't trust your math. Maybe you've proven that the algorithm is O(n^2). I'd like to know if in practice it seems to be O(n log n) [even if it's O(n^2) in the worst case -- now you've given an average-case or special-case open problem!]. I'd like to know if there's a hidden constant of 10,000 there that makes it really not practical in its current form. I'd like to see that you didn't make an error and that it doesn't look like Theta(n^3) when you run it. [I've got 46 papers to look over in a month. Make my job easier, your paper's more likely to get in.]
3) Maybe, just maybe, someone who might actually want to implement your algorithm will actually read your paper. A non-theorist. Don't you think they want to see that it seems to actually work before implementing it better themselves? Won't some experiments make talking about your work to non-theorists easier? Help make that connection...

I'm not saying every algorithms paper needs an implementation section. In many cases, "algorithmic" results are really "complexity" results -- we're just showing that something can be done, we don't actually expect anyone to do it, and in this case there's no need for a simulation or experimental results. (Of course, in such cases, I expect the authors to limit their claims of interest/utility in practice.) In some cases, space won't permit a reasonable evaluation of the algorithm -- but do plan on it for the journal version. In some cases, the coding and evaluation of the algorithm are so interesting it merits an entirely separate paper!

But I'm amazed at how few algorithms papers provide any actual experimental results (unless they're appearing in a networking/database/other systems conference, where it's more understood that results are also expected). I've actually submitted theory papers to theory conferences with experimental sections and had reviewers urge them to be taken out, which I find mystifying (and I ignore).

And yes, if I'm reviewing your paper, and you don't have any numbers where I think you should, I'm probably mentally (and numerically) docking your paper score....

Tuesday, January 15, 2008

Distributed Beam-forming: New Preprint

Besides NSF proposals, another item that took up time over my winter non-break was finishing a submission to ISIT 2008. To me, this paper was another example of how CS theory people and EE theory people are often tackling the same sort of problems, just in a different language.

The paper is titled Distributed Beamforming with Binary Signaling. Here's the English version of the problem. A bunch of weak transmitters are trying to transmit the same message to a receiver. Before sending the message, they are all out of phase, so their signals potentially cancel each other. They'd like to send so that they are all mostly in phase, so the signals reinforce each other and the message gets through. Initially, nobody knows their phase. The only communication possible is all the transmitters can send a signal, and the receiver can broadcast information back. How quickly can alignment occur?

To better understand the problem, we simplified it as follows. In each round, each transmitter can send a bit, a -1 or +1. Suppose the jth transmitter send the bit b(i,j) in the ith round. The jth transmitter has, for all time, a fixed phase p(j), which is +1 or -1. The receiver's obtained signal in the ith round is |sum_j b(i,j)p(j)|. If there are n transmitters, we'd like to get the obtained signal to beta n for some constant 0 < beta < 1 as quickly as possible; this is what we'll mean by alignment. The system is fully distributed, so transmitters can't communicate directly, and the only feedback they get is 1 bit broadcast every round. In this simple model, we looked at lower bounds and algorithms. The lower and upper bounds are both linear in n, but trying to get those constant factors right is apparently pretty important.

While this is a natural EE theory-type problem, it feels close to very similar problems in communication complexity, at least when simplified in this way. It was also interesting working on a lower bound, where we developed a reduction from a standard rate distortion problem. It seems to me EE people don't generally think in terms of reductions, at least not the way CS people do, although it's a terminology and framework that I think is increasing in use at conferences like ISIT. On the other hand, CS people don't always do reductions that give specific constant factors (in this case, related to entropy). So all in all it was an interesting reduction to develop here.

The "full paper" (ISIT papers are limited to 5 pages) will cover a lot more variations and have more details, though I expect there will still be many open problems. More generally, the theme of unifying the "communication complexity" picture between EE and CS, much as the coding theory picture has been greatly unified of late, seems like a great place to look for research problems.

Wednesday, November 21, 2007

Page Limits on Journal Papers

I know of at least two journals that I publish in that have page limits on the length of papers. I am completely puzzled by this. A journal paper should be as long as it needs to be to derive and explain the results the author intends to convey, since the paper should be the final record on that piece of research. Some papers might need 20 pages; others might be too long even at 10. A hard rule of say 12 pages maximum just doesn't make sense.

I can imagine two reasons for instituting page limits on journal papers: paper costs, and reviewer time. Neither seems particularly compelling to me. Are there other reasons I'm missing? (Notice that both of these are more compelling in the case of conferences, where I would agree page limits make more sense. And even in that setting, papers costs is rapidly disappearing as a reason, and at some point, the question of page limits should probably be reconsidered.)

I do admit that there are many papers where the author should be forced to edit things down further, and most authors have a tendency to write too much rather than too little. But it seems the right answer to this is to have good reviewers and a good editor firmly tell an author that the paper is not yet ready and needs further editing. A page limit doesn't adequately solve this problem and raises new, more unpleasant ones.

Anyone want to defend this practice? Or at least suggest reasons for it I haven't thought of?

Tuesday, October 09, 2007

The simplest insertion/deletion channel

The simplest binary insertion/deletion channel that I can think of is the following: with probability p, each bit independently results in two copies of itself. This is a special case of a subclass of channels that I have dubbed sticky channels, which are like sticky keyboards: each symbol can result in a random number of copies of that symbol.

Sticky channels have the nice property that contiguous blocks of (resp 1s) at the input correspond to contiguous blocks of 0s (resp 1s) at the output. This property makes sticky channels easier than more general insertion/deletion channels.

I've just had a paper on sticky channels accepted to Transactions on Information Theory; here's a link to a preprint. The main result is that for that simplest channel above, I can numerically obtain very tight bounds on the channel capacity. But of course I'd still like to know -- is there a simple formula that gives the capacity as a function of p? And is there are simple and efficient coding scheme that nearly reaches the capacity?

Monday, September 10, 2007

The Hiring Problem and Lake Wobegon Strategies

Continuing the "and now a word from our sponsors" riff, I'll describe a paper (to appear at SODA) that's joint with several people at Yahoo! Research, and started on a visit there.

We're looking at a situation similar to the well-known "secretary problem". For those unaware of the secretary problem, it goes like this: you need to hire a new secretary. There are n applicants, and you will interview them in a random order. The result of each interview is the relative rank of the applicant against those you have already interviewed. ("This applicant was the 3rd best so far.") After each interview, you have to decide whether to hire that applicant before the next interview; once you pass on an applicant, they're gone. Your goal is to maximize the probability that you hire the best of the n applicants. What's your optimal strategy? (I won't give away the answer for those who haven't seen the problem; it's easily found online.)

In our setting, we are considering growth strategies for a small startup company. Each applicant has a quality score, which we take to be uniform on [0,1], that we learn in the course of the interview. Unlike the secretary problem setting, we want more than one employee, although there is not a fixed number of employees we are aiming for. We simply want to grow, balancing the tradeoff between the speed of growth and the quality of new employees.

In the paper, we consider the strategies where we choose to hire an applicant only if their quality score is above the average of the current employees. (We study both the mean average and the median average.) We call these strategies Lake Wobegon strategies, after the fictional town of Lake Wobegon, where

all the women are strong, all the men are good looking, and all the children are above average.

Here, everyone we hire is above average (at least when we hire them). Google claims to use this strategy.

I should hasten to emphasize that (despite the Google claim) we feel that the hiring problem is about hiring employees just as much as the secretary problem is about hiring a secretary: very little. Rather, it's an abstraction, leading to problems about random processes and decision-making in the context of incomplete information. I've given the talk about this work a few times, and some people get hung up on setting it up as an optimization problem and determining the optimal strategy. You could, I suppose, but you'd have to be precise about the tradeoff about growth speed vs. quality, and the abstraction is already removed quite a bit from reality. I personally think it's simply interesting to consider how growth processes like this work under Lake Wobegon strategies; does anyone know of any related examples in nature?

If you're a puzzle person, then you might consider the following: suppose we start with a single person, of quality 1/2, and then interview until we hire n people. Which strategy will lead to higher average quality -- hiring above the median, or hiring above the mean? (Or are they asymptotically the same?)

The hiring problem leads to a great deal of curious mathematical fun : lognormal distributions, the Berry-Esseen inequality, the difference between hiring above the median and hiring above the mean (yes, they're very different!), and potentially lots of open problems and further variations to look at. Hopefully, the problem will become at least a fraction as popular as the secretary problem has.

I'm not sure why people at Yahoo were interested in this. In this case, I think they too got caught up in the mathematics of the problem. I'm glad that in the research labs, people still just follow their curiosity from time to time, even if the immediate application isn't obvious.

Friday, August 31, 2007

New Results in Trace Reconstruction

I've already talked a lot in this blog about deletion channels. Trace reconstruction involves a similar problem. We start with an original binary string X = X1,X2,...,Xn. A trace consists of a string Y1,Y2,...,Ym obtained from the original string by passing it through a deletion channel, where each bit is independently deleted with probability p. The trace reconstruction problem basically asks how many independent traces do you need to see to reconstruct the original string X with high probability. Unlike the coding setting, where X might be chosen from a codebook of our own design, in this setting two natural models to study are when X is uniform over binary strings (so the high probability is over the choice of X and the traces), and in the worst case (where the high probability is just over the traces). Variations of the problem include operations other than deletions (including, say, insertions and errors). As an example application, a set of sensors might be monitoring a sequence of events. Each individual sensor is weak and might miss a given event, in which case the question is how many sensors are needed to reconstruct the event sequence perfectly, with high probability.

Trace reconstruction has some history in the information theory community, and the first CS-style paper I saw on it was by Batu, Kannan, Khanna, and McGregor in SODA 2004. The main result of this paper dealt with random input X and considered p values that were O(1/log n). It seems to me much more natural for p to be constant, and it has remained an open problem to determine an efficient algorithm for constant p.

I mentioned this problem last time I visited Microsoft, and it seemed to resonate with some of the people there. Thomas Holenstein, Rina Panigrahy, Udi Wieder and I have a submission with several results, including an algorithm that for random X and sufficiently small constant probability p requires only a polynomial number of traces and polynomial time (with high probability).

The SODA 2004 uses a majority voting technique -- the bits are determined sequentially, with each string voting on the next bit. A key idea in our new algorithm is a "smart voting" technique. We only let traces vote if there is good reason (based on the already determined bits) to think that the trace has a good prediction for the subsequent bit. That is, only well-informed strings are allowed to vote. Feel free to make your own political analogies. My intuition is that this smart voting technique is a closer analogue to the full belief propagation (or Bayesian analysis) that we want to do than just majority voting. Because of this, I hope this "smart voting" technique is a general approach that will find other applications.

I don't yet have an analysis of a belief-propagation-based algorithm. Also, currently we can't analyze a maximum-likelihood algorithm, that finds the most likely original string X. I also don't know how to implement maximum-likelihood efficiently in this setting. So there's still plenty of open questions in this area.

Wednesday, August 15, 2007

Some Theory-Related SIGCOMM Papers

SIGCOMM (the major networking conference) will be taking place shortly. Although I won't be attending, I thought I'd mention some of the papers that involve theory in a non-trivial way. This list is not meant to be exhaustive, so I apologize for (and welcome comments on) anything I might have missed.

The networking community, I've found, is very amenable to theory. I think there's a real understanding in the community that obtaining systematic improvements in something as wild and large as modern networks requires a solid, fundamental grasp of what's going on. This means good models, and good algorithms and data structures. This doesn't mean that SIGCOMM is full of theory papers; they generally want to see some sort of implementation to see that whatever idea is presented actually works. But many papers have a non-trivial theoretical component, and the SIGCOMM conference is certainly a good place to look for important problems where theoretical insight would be welcome. (The same is true for the larger networking conference, INFOCOM, but that conference is so large it's harder to get a handle on. Also, SIGCOMM aims to be a bit more visionary, or "out there", which I think means there's more opportunity for theory to get involved.)

DTN Routing as a Resource Allocation Problem: DTN's are Disruption Tolerant Networks -- in this paper, focusing on mobile nodes that are only connected intermittently. In one section, the paper sets up an appropriate graph model for the problem, and proves lower bounds for various situations. For example, any algorithm that doesn't know the schedule of node meetings is Omega(n)-competitive compared to an offline adversary when n packets are to be delivered. The paper then moves on to heuristic approaches and an experimental evaluation.

Orbis: Rescaling Degree Correlations to Generate Annotated Internet Topologies: The paper considers the problem of how to generate realistic Internet graphs, where here the relevant graph is a router graph with nodes labeled to give the corresponding AS (Autonomous System) network.

An Axiomatic Basis for Communication: Pretty much like it sounds. Trying to set up a basic logic for network protocols, covering issues like naming, binding, forwarding, etc.

Embracing Wireless Interference: Analog Network Coding: Normally, we think of networking coding as XORing packets, or taking random combinations of packet data over some finite field. In wireless, you may be sending signals, not bits. What do you do then? This paper looks at the problem of effective algorithms for this problem.

Again, these examples aren't exhaustive; many other papers have Lemmas and Theorems in there. In fact, I'd wager that having words like "Lemma" and "Theorem" in the paper is positively correlated with acceptance to SIGCOMM. Maybe someone on the PC will confirm or straighten me out. In any case, I'll be giving all the SIGCOMM papers a closer look....

Tuesday, July 31, 2007

The Reviewers' Pledges

The state of journal reviewing in CS and EE journals is generally pretty dismal. It often takes an unforgivably long time to get reviews back, and often reviews offer less in the way the feedback and constructive criticism than one might hope.

Here I'll focus on aspects related to timing. (I'm unconvinced that reviews that take longer to get back are any better in terms of feedback. Often, just the opposite -- the reviewer realizes the paper has been on their desk too long and just writes a quick paragraph to be done with it.) Let me suggest a set of basic pledges -- rules that everyone should follow. Feel free to suggest your own additions....

When asked to review a paper, I will respond as quickly as possible, even if the answer is no.
I will review at least 1 journal paper for every journal paper I submit. (This does not mean "I will give my graduate students 1 paper to review for every journal paper I submit." Though graduate students should also do reviewing...)
I will skim through a paper I plan to review within one month of receiving it, so that if it is a clear reject for any reason, or there is an important question that needs to be raised, I can promptly write a short note to the editor (and thereby hopefully avoid a more detailed review).
I pledge to review papers within six months of receiving them. I acknowledge that six months is an upper bound, not a target deadline.
I accept that there is reviewing karma, and I should not be surprised if editors pass my papers on to slow reviewers if I am a slow reviewer.