Tuesday, August 19, 2014

Reviewing Scales

I'm just about finished reviewing for CoNEXT (Conference on Emerging Networking Experiments and Technologies), and am starting reviewing for ITCS (Innovations in Theoretical Computer Science).  One notable variation in the process is the choice of the score scale.  For CoNEXT, the program chairs chose a 2-value scale: accept or reject.  For ITCS, the program chair chose a 9-point scale.  Scoring from 1-9 or 1-10 is not uncommon for theory conferences.

I dislike both approaches, but, in the end, believe that it makes minimal difference, so who am I to complain?

The accept-or-reject choice is a bit too stark.  It hides whether you generously thought this paper should possibly get in if there's room, or whether you really are a champion for the paper.  A not-too-unusual situation is a paper gets (at least initially) a majority of accept votes -- but nobody really likes the paper, or has confronted its various flaws. (Or, of course, something similar the other way around, although I believe the first case is more common, as it feels better to accept a close call than to reject one.)  Fortunately, I think the chairs have been doing an excellent job (at least on the papers I reviewed) encouraging discussion on such papers as needed to get us to the right place.  (Apparently, the chairs aren't just looking at the scores, but reading the reviews!)  As long as there's actual discussion, I think the problems of the 2-score solution can be mitigated.

The 9 point scale is a bit too diffuse.  This is pretty clear.  On the description of score semantics we were given, I see:

"1-3 : Strong rejects".

I'm not sure why we need 3 different numbers to represent a strong reject (strong reject, really strong reject, really really strong reject), but there you have it.  The boundaries between "weak reject", "a borderline case" and "weak accept" (scores 4-6) also seem vague, and could easily lead to different people using different interpretations.  Still, we'll see how it goes.  As long as there's good discussion, I think it will all work out here as well.

I prefer the Goldilocks scale of 5 values.  I further think "non-linear" scoring is more informative:  something like top 5%, top 10%, top 25%, top 50%, bottom 50%, but even scores corresponding to strong accept/weak accept/neutral/weak reject/strong reject seem more useful when trying to make decisions.

Finally, as I have to say whenever I'm reviewing, HotCRP is still the best conference management software (at least for me as a reviewer).

Monday, August 18, 2014

Back to Work

Harvard classes start up in a few weeks, and officially, my sabbatical is over.  I'm back in my office, trying to get back into a Harvard routine.

I notice that I've been very light in posting over my sabbatical.  After my term as chair, I was enjoying being in the background, hidden away a bit.  I'm not sure if I'll get back into blogging -- it seems already to be a technology of the past -- but I figure I'll start again and see what happens.

So some short notes.  On my to-do list is to go cover to cover through Ryan O'Donell's new book Analysis of Boolean Functions; Cambridge University Press was nice enough to send me a free copy, which they do with books from time to time.  For those who have been following he's been releasing the book in chapters online, and you already know it's good.  He's made the book is available online also, but it's nice to have a copy for my bookshelf.  It's a beautiful book, both in content and in how it's all put together.  My one thought (so far) as I've started my way through is that it stylistically, to me, reads like a "math book" more than a "CS book", whatever that means.  That's not meant to be a complaint, just an observation.

Boaz Barak accidentally made me laugh on his Updates from ICM 2014 post, well worth reading, when he writes:
"Candes’s talk was an amazing exposition of the power and importance of algorithms. He showed how efficient algorithms can actually make the difference in treating kids with cancer!....

Hearing Candes’s talk I couldn’t help thinking that some of those advances could perhaps have been made sooner if the TCS community had closer ties to the applied math community, and realized the relevance of concepts such as property testing and tool such as the Geomans-Williamson to these kind of questions. Such missed opportunities are unfortunate for our community and (given the applications) also to society at large, which is another reason you should always try to go to talks in other areas."

I think the larger issue the slow but (over long periods) not really subtle shift of the TCS community away from algorithmic work and practical applications.  I'm all for going to talks in other areas, but I think the issue is a larger scale problem.

I'm working on a new class this semester, which if I write I'm sure I'll write more about, but one thing I'd forgotten is how hard and time-consuming it is to construct a lecture.  Maybe some of it is a function of me getting slower, but going through all the possible pieces, picking what you think are the right ones, making sure you've got all the details, and then (for me) writing a "script" of what you plan to go over -- it takes time.  (Good thing I'll likely be on this new class for a few years.)

Plenty more to talk about -- reviewing, some new/old papers, some summer travel.  So we'll see if I can get back into a blogging state of mind.

Friday, June 20, 2014

See You in Prague

For those of you going to SPAA this coming week, I'll see you there.  I'll be giving the last two talks at the conference, to what I expect (based on the timing) will be a nearly empty room.  That just means there will be no pressure.

If you want to hear more about the papers, you can go to Abstract Talk, where I talk about the papers.  Here is the link for the paper Balanced Allocations and Double Hashing, and the link for the paper Parallel Peeling Algorithms.  I haven't done podcasts for Abstract Talk before, so be forgiving if you go to listen.  It seems like a cool idea;  what do people think of it in practice? 

For those who expect to be sightseeing in Prague during the final session (or who just aren't going to SPAA), here's the brief overview. 

For Balanced Allocations and Double Hashing:
In the well-known balanced allocations paradigm, balls are hashed sequentially into bins, where each ball gets d random choices from the hash functions, and is then placed in the least loaded.  With double hashing, we replace the d random choices with d choices of the form a, a+b, a+2b, a+3b,... a+(d-1)b, where a and b are random values (determined by hashing).  That is, we build the d choices from 2 random numbers instead of using d random numbers.  (The numbers are taken mod the size of the hash table, and b should be relatively prime to the hash table size... let's stop worrying about details.)  We find empirically that this makes no difference, in a very strong sense;  the fraction of bins with load j appears the same for every value of j for both systems, so you can't really tell them apart.  We provide a theoretical explanation, based on fluid limit models, for why this happens.

For Parallel Peeling Algorithms:
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core.  We analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty
with high probability. We show that, with high probability, below this threshold, only O(log log n) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, above this threshold, Ω(log n) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty k-core  this asymmetry appears fortunate;  nature is on our side. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.

The Parallel Peeling Algorithms paper was, to my surprise, awarded Best Paper.  Maybe I'll write up more about the surprise at some point, but I'd certainly like to thank the committee for the honor, and pass the credit to where it is due, with my co-authors Jiayang Jiang and Justin Thaler. 

NSF Thanks for the Year

It's that time of year where, as a background process, I have to do my annual reviews for the NSF.  It's generally not a very exciting task, and their online forms remain, I think, unpleasantly designed.  (I think they fixed a problem I had last year, so at least now they point out to you what you haven't filled out more clearly.)

That being said, this year, even more than others, I find myself gratefully filling them out.  The NSF provides the money for my (and my students') research.  And research is fun.  In my few years playing administrator, I was trying to keep up with my research, but inevitably my available time and corresponding output started to decline.  This year, I've been able to "get back into it", and it made me realize how much I enjoy it.  Sure it's often frustrating.  And writing it down (and dealing with conferences, reviews, etc.) can also be frustrating.  But overall the creation process of research, including all the frustrating parts, is the most enjoyable part of the job*, and I'm glad that the NSF is there to support it.  

Thanks NSF.  I'll try to have those reports in well before the nominal deadline....

[* Yes, I like teaching and interacting with students too -- otherwise I'd look for work in one of the many great research labs -- that's the other most enjoyable part of the job.]  

Monday, June 16, 2014

Sad News: Berthold Vöcking

I have just seen the news that Berthold Vöcking passed away.  For those who didn't know him, Berthold was an oustanding researcher in algorithms.  We had several shared interests and were of a similar age;  I always enjoyed his work, and very much enjoyed the few times that I worked with him.  His paper How Asymmetry Helps Load Balancing still amazes me, and is one of the most wonderful papers that I wish I had written.  When I first saw it I simply didn't believe the main result*;  I stopped my other work, wrote up a simulation, and found it matched his analysis.  I then had to spend the next few days understanding it.  Once I understood it, I wondered how he had seen it, and how I had missed it.  It is a truly inspired paper.  

Of course he has other great papers, including the well known Tight Bounds for Worst-Case Equilibria, and maybe the less well known but very interesting (particularly to me) work on Balanced Allocations:  The Heavily Loaded Case.  He had just won a best paper award for ICALP for work on Online Independent Sets.  He was one of the editors of Algorithms Unplugged, a wonderful book project. 

Berthold was regularly on my list of people to invite to workshops, because he always had very interesting work to present and was interesting to talk to.  I can't believe I won't have another chance to talk with him.  His passing is a loss to our community.  

* For those who care, his result was that when hashing using the "power of two choices", suppose that instead of having each new item make two random choices and then placing the item in the least loaded, you split the table into two equal halves (call them left and right), have each item make a random choice from each half, and then place the item in the least loaded, except that you always break ties to the "left half".  There wouldn't seem to be any difference between the two schemes, but there is;  the split-and-break-ties approach works significantly better.    

Saturday, May 31, 2014

Child Geniuses and Other Articles

Stuck on a flight home, I passed time reading a hotel copy of the Wall Street Journal, to find more interesting things than I would have thought.

What first caught my eye is an interesting piece by Jordan Ellenberg, a math professor who also writes a lot about mathematics for more popular consumption, about The Wrong Way to Treat Child Geniuses.  Especially for any readers who did any of those science/math programs as a child (notably John Hopkins/SMPY) -- but not exclusively -- it's definitely worth a read.

Then there were also interesting articles about Ray Kurzweil and an amusing article about entitled Ticket to Dine: the Restaurant Reservation Revoultion.  The last article described a business model that is simultaneously disturbing and why-didn't-I-think-of-it -- using bots to snap up reservations on OpenTable to popular restaurants, and then selling/scalping the reservations.  Evil, or genius?  You decide....

Wednesday, May 28, 2014

ITCS 2015

I was asked to announce the call for ITCS.  Here is a link to the call for papers, with the most important info:
ITCS (previously known as ICS) seeks to promote research that carries a strong conceptual message (e.g., introducing a new concept or model, opening a new line of inquiry within traditional or cross-interdisciplinary areas, or introducing new techniques or new applications of known techniques). ITCS welcomes all submissions, whether aligned with current theory of computation research directions or deviating from them.
Important Dates
Paper Submission Deadline: Friday,
August 8, 2014, 5PM PDT
Apparently in some fit of weakness I agreed to be on the PC.  
I think an ongoing question about ITCS is how well it lives up to its "mission statement".  Part of the question is whether ITCS is necessary -- do FOCS/STOC/SODA not do a sufficiently good job of accepting "conceptual" papers -- and sufficient(ly doing a good job) -- is it really focusing on accepting conceptual papers, or is it just the same-old.  So I guess I'll be able to look under the hood to see how it's going.
My current/upcoming PCs have been SIGCOMM and coNEXT (2014), and ITCS and ICALP part C (2015).  I'm feeling happily diverse in what I get to read.  

Wednesday, May 21, 2014

What's Important in Algoirthms

I saw this interesting article up on 20 questions with Don Knuth, worth reading just for the fun of it.  But the following question on work in algorithm design and analysis particularly interested me, and I nodded especially hard resonating with the statement:

Thus I think the present state of research in algorithm design misunderstands the true nature of efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to be published.
But here's the whole question/answer for context.

15. Robert Tarjan, Princeton: What do you see as the most promising directions for future work in algorithm design and analysis? What interesting and important open problems do you see?
Don Knuth: My current draft about satisfiability already mentions 25 research problems, most of which are not yet well known to the theory community. Hence many of them might well be answered before Volume 4B is ready. Open problems pop up everywhere and often. But your question is, of course, really intended to be much more general.
In general I'm looking for more focus on algorithms that work fast with respect to problems whose size, n, is feasible. Most of today's literature is devoted to algorithms that are asymptotically great, but they are helpful only when n exceeds the size of the universe.
In one sense such literature makes my life easier, because I don't have to discuss those methods in TAOCP. I'm emphatically not against pure research, which significantly sharpens our abilities to deal with practical problems and which is interesting in its own right. So I sometimes play asymptotic games. But I sure wouldn't mind seeing a lot more algorithms that I could also use.
For instance, I've been reading about algorithms that decide whether or not a given graph G belongs to a certain class. Is G, say, chordal? You and others discovered some great algorithms for the chordality and minimum fillin problems, early on, and an enormous number of extremely ingenious procedures have subsequently been developed for characterizing the graphs of other classes. But I've been surprised to discover that very few of these newer algorithms have actually been implemented. They exist only on paper, and often with details only sketched.
Two years ago I needed an algorithm to decide whether G is a so-called comparability graph, and was disappointed by what had been published. I believe that all of the supposedly "most efficient" algorithms for that problem are too complicated to be trustworthy, even if I had a year to implement one of them.
Thus I think the present state of research in algorithm design misunderstands the true nature of efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to be published.
Another issue, when we come down to earth, is the efficiency of algorithms on real computers. As part of the Stanford GraphBase project I implemented four algorithms to compute minimum spanning trees of graphs, one of which was the very pretty method that you developed with Cheriton and Karp. Although I was expecting your method to be the winner, because it examines much of the data only half as often as the others, it actually came out two to three times worse than Kruskal's venerable method. Part of the reason was poor cache interaction, but the main cause was a large constant factor hidden by O notation.

Friday, May 02, 2014

Reviewing Question: What's Important

Nick Feamster on Google+ recently shared (and gave me permission to blog about) the following review comment:
This review comment is super enlightening:

"I think there is a general problem with the overall goal of this study and area of research. This seems to be making data usage a critical thing people should be paying attention to. People have  real issues and I am not at all sure that this one deserves attention. They have real concerns like, are they going to lose their job, where should their children go to college, should they encourage their elderly parents to move into a retirement center."
It's so cringeworthy, it's funny.  You could substitute "data usage" with pretty much any research topic you feel like, and you have a review that's almost certainly accurate and justifies rejection.  Which is what makes it such a wonderful, terrible review.  

Without wanting in any way to excuse this reviewer, I do want to say that this review highlights an ever-increasing problem:  I believe review decisions are more and more becoming dominated by subjective decisions about what topics are "important".  I realize some may say it has ever been thus, and I acknowledge that the importance of the underlying problem has always been a factor in judging a paper.  I think the subjective judgment has become more significant in both in systems and in theory over the years for multiple reasons.  As the field has expanded there's less of an underlying agreement and common understanding of what's important.  Many times the reviewer may not know the area well enough to judge the importance, and there is every-growing potential for area bias:  the problems in my area are (more) important.  Further, there are far too many papers for the available slots so reasons to reject have to be found.  As the above comment suggests, one can always call into question the importance of the problem the paper aims to solve.   

But finally, I think, it fits in with an issue that keeps coming up for me:  reviewers are too arrogant.  If they don't see why the problem is important, then the issue must be with the research or the writing;  it couldn't be with their reading or understanding of the problem space.  Reviewers will have opinions regarding the importance of the works they read -- no getting around that -- and they should where possible give advice to authors on how to best present their results.  But they could often be a bit more judicious in recognizing that they are expressing their opinion and offering advice;  they are not, in the end, the final arbiter of a paper's eventual, actual importance.

I don't see subjective decisions in "importance" going away.  But I think they could be given a bit more care, both in how they are used in the final decision-making, and in how reviewers express their opinions on "importance" to authors.

If you'll excuse me now, I don't have time to blog further, I have to go plan where my children should go to college and where I'll eventually ship off my aging parents.  (Thank goodness I have a tenured position.)           

Thursday, April 03, 2014

Postdocs in Copenhagen

I apologize for the short notice, but my occasional co-author Rasmus Pagh is looking for postdocs for a big data project he recently had funded, with an application deadline on April 14th. 

For more information, you can see this page, which starts with:

The Scalable Similarity Search (SSS) project led by Professor Rasmus Pagh is seeking 3 post-docs with a strong background in algorithms theory, combinatorics, or statistics. The project is funded by the European Research Council (ERC), runs in the years 2014-19, and will include a total of 3 PhD and 3 post-doc positions. The aim of the project is to improve theory and practice of algorithms for high-dimensional similarity search on big data, and to extend similarity search algorithms to work in settings where data is distributed (using a communication complexity perspective) or uncertain (using a statistical perspective). A post-doc position may include a long-term visit to a project partner (at Berkeley, Harvard, MIT, Stanford, or Tsinghua) if all parties find the visit beneficial.

Or you can see this nice video Rasmus recently put together.   

And yes, I'm self-interested in this matter, in that as someone who works with Rasmus, the potential "long-term visit" to Harvard described above would involve me if it worked out.  Also, Copenhagen is a wonderful place.