My Biased Coin: April 2013

Sunday, April 28, 2013

Some Recent Books

If you've been reading the other computer science blogs, you know that there are some new books out.

First is the The Golden Ticket: P, NP, and the Search for the Impossible, by Lance Fortnow, about the P vs. NP problem. I got sent a review copy; I haven't read it yet, but I'm testing it out by having my oldest daughter read it and tell me what she thinks, and I'll read it along with her.

Then there's Quantum Computing since Democritus, which covers complexity theory, quantum computing, cryptography, and a bunch of other stuff.

But there's more! Thomas Cormen, of the famous CLRS Introduction to Algorithms book, has a new book out called Algorithms Unlocked, which seems to be a friendly introduction to the basics of algorithms.

On the popular books front, Eric Schmidt (yes that one) and Jared Cohen have a book out on The New Digital Age: Reshaping the Future of People, Nations and Business.

Also, the book Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers, from some time ago, is out in paperback.

On the textbook side, last year Steven Gortler of Harvard introduced a new textbook for computer graphics that looks really good. And of course, there's always that Mitzenmacher/Upfal Probability and Computing: Randomized Algorithms and Probabilistic Analysis to buy if it's not already on your shelf...

Thursday, April 25, 2013

Thanks for the Memories....

I'm off to a meeting next week, so today was my last day of class for Computer Science 124 (Algorithms and Data Structures) for the semester. For two reasons, today feels important.

First, because I'll be on sabbatical next year, I won't have to teach another lecture for what seems to be something like 18 months. I haven't had that much time off from teaching since I've started. I'm looking forward to it, especially as this year the administrative job has slowed research. I'm hopeful the sabbatical time can be put to use productively on the research side.

Second, more nostalgically, I've taught this class for the last 15 years. To me that feels like a record of some sort. When I come back from sabbatical, the plan is that someone else will be teaching the course, and I'll have to find something new and entertaining (for me, and maybe for the students) to teach. I admit, my preference would be to teach this class pretty much forever. After 15 years, I finally think I'm starting to understand all the material and how it fits together. I'll miss the class. While my head knows that it is probably good for both me and the class to separate, my heart hurts a little, giving it up and moving on.

Given the excitement of once again being done with teaching for the academic year, I'm sure I'll get by. (As I explain to the students, the only ones happier than they are that classes are ending are the faculty...) Things change -- obviously, for me, sometimes very slowly -- and I'm optimistic that whatever I decide to teach next, I can make it a course I'll enjoy teaching for say the next 15 years.

Monday, April 22, 2013

Guest Post by Mark Bun and Justin Thaler

Michael asked me and Mark to write a post on our paper "Dual Lower Bounds for Approximate Degree and Markov Bernstein Inequalities", and we're happy to do so. As the title suggests, our paper focuses on understanding a certain measure of the complexity of a Boolean function, known as approximate degree. Given an n-variate Boolean function f, the approximate degree of f is the least degree of a real polynomial that pointwise approximates f to accuracy 1/3 (the choice of the constant 1/3 is by convention -- the particular constant is inconsequential).

Aside from being a natural complexity measure, approximate degree has a surprisingly wide range of applications throughout theoretical computer science. A very partial list includes:

1) Lower bounds on approximate degree yield lower bounds on quantum query complexity.

2) Lower bounds on approximate degree have recently been used to resolve long-standing open questions in communication complexity (see here for a survey from 2008, and some more recent works here and here).

3) Upper bounds on approximate degree underly the best known agnostic learning algorithms.

Despite the range and importance of these applications, large gaps remain in our understanding of approximate degree. The approximate degree of any *symmetric* Boolean function has been understood since Paturi's 1992 paper, but once we move beyond symmetric functions, few general results are known. One function that has served as a symbol of our lack of understanding is the two-level AND-OR tree, the function computed by a CNF (i.e. an AND of ORs) with all gates having fan-in sqrt(n). Over the last couple of decades, there has been a line of work proving incrementally larger lower bounds on the approximate degree of the AND-OR tree, with the best previous lower bound being Omega(n^{3/8}) due to Sherstov. Our main result in the paper resolves the question completely by establishing an Omega(sqrt(n)) lower bound, matching an O(sqrt(n)) upper bound due to H\oyer, Mosca, and de Wolf. (Sherstov recently independently obtained the same Omega(sqrt(n)) lower bound using related techniques). Our second result is to give a new proof of Paturi's lower bound for symmetric Boolean functions.

Let us say a bit about how our proofs of these two results work. Traditionally, approximate degree lower bounds have been proven by a technique known as symmetrization, which transforms questions about multivariate polynomials into well-understood univariate ones. In Step 1, one supposes there is a n-variate polynomial p that pointwise approximates f to error 1/3, and turns it into a univariate polynomial q in such a way that deg(p) >= deg(q). In Step 2, one shows that q must have high degree, which means p must have high degree as well. Unfortunately, the symmetrization step of moving from p to q often 'throws away' information about p (after all, p is an {n choose d}-dimensional object, and q is only a d-dimensional object), so one sometimes runs into problems in completing Step 2.

Recently, there has been progress in moving beyond 'lossy' lower bound techniques like symmetrization. Our paper uses the following approach, which is completely lossless. You can straightforwardly formulate the approximate degree of a function f as a linear program: the approximate degree of f is at least d if the value of the following LP is larger than 1/3:

min \epsilon

s.t. |f(x) - \sum_{|S| <= d} c_S chi_S(x)| <= epsilon, for all x in {-1, 1}^n

c_S in \mathbb{R}

Here, chi_S(x) is the parity function over variables in the set S. This program is just asking you to construct a degree d polynomial that minimizes the distance to f under the L_{\infty} norm.

If you take the dual of this program, what you'll find is that the dual is asking you to construct a function p that has *zero* correlation with every degree d polynomial, but has large correlation with f. We call such a polynomial a "dual polynomial" for f, as one can think of p as a polynomial with no monomials of degree d or less. A dual polynomial for f is a 'witness' to the fact that f has large approximate degree. By strong LP duality, any approximate degree lower bound entails the existence of a good dual polynomial; it might just take a lot of work to find it! Both of our results -- our Omega(sqrt(n)) lower bound on the approximate degree of AND-OR, and our new proof of Paturi's result -- work by constructing explicit dual polynomials.

In the case of the AND-OR tree, we take a dual polynomial for AND, and a dual polynomial for OR, and we combine them in a certain way to get a dual polynomial for AND-OR. Our proof is a refinement of a technique of Sherstov -- Sherstov had already showed in a very general context the 'right' way to put together dual polynomials for simpler functions F, G to get a dual polynomial p for their 'block-composition' F(G, G, ..., G), but his earlier analysis could not handle the case that F=AND and G=OR. Prior work broke down because it was not known how to argue that p had high correlation with AND(OR, OR ..., OR). We manage to show this by exploiting some special properties of the AND function and the OR function (specifically, the key properties are that AND has low block sensitivity at all inputs except the All-True input, and dual polynomials for OR have 'one-sided error'. It's difficult to give much more detail than that in a blog post, so see the paper for the full discussion.)

We construct our dual polynomial for symmetric functions as follows. \v{S}palek , building on work of Szegedy, had already constructed a dual polynomial for the OR function i.e. a symmetric function with a 'jump' from +1 to -1 at the bottom Hamming layer of the hypercube. We give a relatively simple construction of a dual polynomial for the Majority function i.e. a symmetric function with a 'jump' from +1 to -1 near the middle Hamming layer of the hypercube. We then show how one can interpolate between these two 'extreme' constructions to give a dual polynomial for an arbitrary symmetric function f. Our final dual polynomial closely 'tracks' (in a sense that can be made precise by complementary slackness) an optimal approximating polynomial for symmetric f's, and we are planning on adding a section to the arxiv paper explaining this intuition.

Finally, we use LP duality to reprove some classical Markov-Bernstein type inequalities from approximation theory (not to be confused with Markov's Inequality or Bernstein's Inequality from elementary probability theory). Markov-Bernstein inequalities bound the derivative of a univariate polynomial q on real inputs in the interval [-1, 1] in terms of deg(q) and the maximum value q takes on this interval. Typically, these are the inequalities used to complete Step 2 in the outline of symmetrization arguments given above. We show how to formulate the problem of maximizing the derivative of a bounded polynomial as an LP, and give clean solutions to the dual LP that 'witness' various asymptotic versions of the Markov-Bernstein inequality. The connection between Markov-Bernstein inequalities and LPs has certainly been known before, but to our knowledge no one has proved these inequalities by analytically constructing dual solutions to these LPs, and we think our dual witnesses shed some new light on these well-known results.

Looking ahead, we're very optimistic than more open problems in the analysis of Boolean functions can be cracked using methods based on LP duality. Approximate degree is not the only measure of the complexity of a Boolean function f that can be characterized by a linear program: there's also polynomial threshold function (PTF) degree (i.e. the minimum degree of a polynomial that sign-represents f), PTF weight-degree tradeoffs (i.e. tradeoffs between the degree of a PTF for f vs. the size of its coefficients), and weight-degree tradeoffs for polynomials that approximate f pointwise to accuracy 1/3. All of these complexity measures have important applications in learning theory, communication complexity, and beyond, but remain poorly understood even in the context of simple function classes like AC^0.

This work was not done in a vacuum, and we are extremely grateful to Karthekeyan Chandrasekaran, Troy Lee, Sasha Sherstov, Robert \v{S}palek, Jon Ullman, Andrew Wan, and the anonymous ICALP reviewers for valuable comments and conversations! As well as to Ryan O'Donnell and Li-Yang Tan for suggesting the problem of constructing explicit dual witnesses for the LPs capturing Markov-type inequalities!

Thursday, April 18, 2013

Congratulations to Mark Bun and Justin Thaler

Luca Aceto reported the paper awards for ICALP 2013. But I find it necessary to give a special shout-out to Harvard Ph.D. students Mark Bun and Justin Thaler, who got the Track A Best Paper Award for their work:
Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities.

I've asked Justin and he has agreed that he and Mark will write up a post about their paper that will appear in a few days. But since it was announced I wanted to congratulate them now!

This brings up an interesting question: they won the Best Paper award, but not the Best Student Paper award. Which makes some sense in one way -- if the "Best Paper award" dominates the "Best Student Paper award" (by definition), then there's not need to give the same paper two awards; it has the nominally higher honor. On the other hand, you might say that if they won the Best Paper award, by definition they should also have the Best Student Paper, so they should win both. How do you think awards in that case should be distributed?

Tuesday, April 09, 2013

Guest Post by Mikkel Thorup : Results vs. Techniques

[Mikkel Thorup graciously has offered a guest blog post, discussing what he looks for in conference papers, and possible implications for refereeing.]

Results versus Techniques (Mikkel Thorup)
-------------------------

When I go to STOC/FOCS, I hope to see some great new results/theorems and some great new techniques/proofs. I want both, and in some wonderful cases, I even get both in the same paper. However, many of the greatest results may be in papers with less interesting techniques, or vice versa, and the worst thing is if they then get out-competed by papers achieving semi-strong results using semi-interesting techniques.

I am saying this because many refereeing forms have fields like "Submission's strengths" and "Submission's weakness" suggesting a balanced evaluation with pros and cons. A common case I see is that strong results get criticized for being obtained with too specialized techniques. However, if somebody needs to cite a theorem/result, then, typically, it doesn't matter if the underlying proof/technique is specialized.

I am arguing that if we want an interesting program, then we should not worry about weakness unless it renders a paper unacceptable (e.g., a bug). I am proposing a max-evaluation rather than a sum. What matters is if a paper has an exiting contribution.

I am myself mostly in the problem solving business where we believe that there are important computational problems. Sometimes general techniques work, but other cases require a deep understanding of the problem at hand leading to very problem specific techniques. Sometimes a tiny but subtle change in existing techniques can make a huge difference. Other times, as, e.g., for the 4-color theorem, it seems that what is needed is a huge mess. The point here is that if we want to succeed, then we have to be open to use whatever techniques it takes. Later there may be very interesting technical work simplifying and generalizing the techniques.

I am not saying that techniques are not important. I love papers introducing great new techniques, and I am more than happy to accept them on a technical basis. What I am saying is that if the main contribution of a paper is a result, then the techniques may play a more supporting role whose only obvious merit is that they prove something interesting.

It often turns out that the techniques developed to solve a central problem have other applications, but this may not be easy to guess up front based on a quick review. My experience is that referees are worse than random when it comers to guessing if a techniques will later prove useful in solving other problems.

My suggestion is positive refereeing where the overall score is based on the major contribution of a paper: the thing that would make people come to the talk, and use and cite the paper in the future. If a paper has something interesting to say, then it doesn't matter if other aspects of it are less interesting.

Thursday, April 04, 2013

Upcoming Events

I've been asked to post for some upcoming events.

The Simons Institute will have a multi-day symposium on Visions of the Theory of Computing May 29-31, just before STOC. Looks like a lot of big-name speaker. More info here.

Ely Porat wants a lot of submission for SPIRE's 20th conference; paper dealing is May 2. Call for papers. http://u.cs.biu.ac.il/~porately/spire2013/

Wednesday, April 03, 2013

Harvard E-Mail, Again

Yesterday was the long-awaited faculty meeting after the news that Harvard had examined the e-mail (metadata) of the Resident Deans looking for a leak of what was labelled confidential information.

It was actually quite informative. The most detailed account seems to appear (unsurprisingly) in Harvard Magazine. To me, here are the highlights:

1) Deans Smith and Hammonds gave unambiguous, complete apologies.
2) There was a bit more e-mail searching than originally described; in particular, there were some follow-on searches on the e-mail accounts of a Resident Dean that definitely did not go through the full process (specifically, Dean Smith doesn't appear to have been informed of them).

I think the clear and unambiguous apologies were needed. While I think I have been understanding throughout about why the administration felt the leak issue was important enough to merit e-mail searches, the fact was that Harvard's written policies were not followed. That alone merits the apology. Given that the previous apology offered by the administration was described by many as "half-hearted", it was important to have a full apology given at the meeting. (Let's leave aside that it seemed a bit slow in coming.)

It's a bit more disturbing that the institution doesn't seem to have a complete handle on process for searching electronic communications, but then again, it's a fairly new topic. It's clear that this will be address on an ongoing basis, with an outside attorney investigating the past actions, and a Harvard committee being set up to examine the institutional policies regarding privacy of electronic communications.

Anyhow, seems like progress.

My Biased Coin