Monday, April 22, 2013

Guest Post by Mark Bun and Justin Thaler



Michael asked me and Mark to write a post on our paper "Dual Lower Bounds for Approximate Degree and Markov Bernstein Inequalities", and we're happy to do so. As the title suggests, our paper focuses on understanding a certain measure of the complexity of a Boolean function, known as approximate degree. Given an n-variate Boolean function f, the approximate degree of f is the least degree of a real polynomial that pointwise approximates f to accuracy 1/3 (the choice of the constant 1/3 is by convention -- the particular constant is inconsequential).

Aside from being a natural complexity measure, approximate degree has a surprisingly wide range of applications throughout theoretical computer science. A very partial list includes:

1) Lower bounds on approximate degree yield lower bounds on quantum query complexity.
2) Lower bounds on approximate degree have recently been used to resolve long-standing open questions in communication complexity (see here for a survey from 2008, and some more recent works here and here).
3) Upper bounds on approximate degree underly the best known agnostic learning algorithms.

Despite the range and importance of these applications, large gaps remain in our understanding of approximate degree. The approximate degree of any *symmetric* Boolean function has been understood since Paturi's 1992 paper, but once we move beyond symmetric functions, few general results are known. One function that has served as a symbol of our lack of understanding is the two-level AND-OR tree, the function computed by a CNF (i.e. an AND of ORs) with all gates having fan-in sqrt(n). Over the last couple of decades, there has been a line of work proving incrementally larger lower bounds on the approximate degree of the AND-OR tree, with the best previous lower bound being Omega(n^{3/8}) due to Sherstov. Our main result in the paper resolves the question completely by establishing an Omega(sqrt(n)) lower bound, matching an O(sqrt(n)) upper bound due to H\oyer, Mosca, and de Wolf. (Sherstov recently independently obtained the same Omega(sqrt(n)) lower bound using related techniques). Our second result is to give a new proof of Paturi's lower bound for symmetric Boolean functions.

Let us say a bit about how our proofs of these two results work. Traditionally, approximate degree lower bounds have been proven by a technique known as symmetrization, which transforms questions about multivariate polynomials into well-understood univariate ones. In Step 1, one supposes there is a n-variate polynomial p that pointwise approximates f to error 1/3, and turns it into a univariate polynomial q in such a way that deg(p) >= deg(q). In Step 2, one shows that q must have high degree, which means p must have high degree as well. Unfortunately, the symmetrization step of moving from p to q often 'throws away' information about p (after all, p is an {n choose d}-dimensional object, and q is only a d-dimensional object), so one sometimes runs into problems in completing Step 2.

Recently, there has been progress in moving beyond 'lossy' lower bound techniques like symmetrization. Our paper uses the following approach, which is completely lossless. You can straightforwardly formulate the approximate degree of a function f as a linear program: the approximate degree of f is at least d if the value of the following LP is larger than 1/3:

min \epsilon
s.t. |f(x) - \sum_{|S| <= d} c_S chi_S(x)| <= epsilon, for all x in {-1, 1}^n
c_S in \mathbb{R}

Here, chi_S(x) is the parity function over variables in the set S. This program is just asking you to construct a degree d polynomial that minimizes the distance to f under the L_{\infty} norm.

If you take the dual of this program, what you'll find is that the dual is asking you to construct a function p that has *zero* correlation with every degree d polynomial, but has large correlation with f. We call such a polynomial a "dual polynomial" for f, as one can think of p as a polynomial with no monomials of degree d or less. A dual polynomial for f is a 'witness' to the fact that f has large approximate degree. By strong LP duality, any approximate degree lower bound entails the existence of a good dual polynomial; it might just take a lot of work to find it! Both of our results -- our Omega(sqrt(n)) lower bound on the approximate degree of AND-OR, and our new proof of Paturi's result -- work by constructing explicit dual polynomials.

In the case of the AND-OR tree, we take a dual polynomial for AND, and a dual polynomial for OR, and we combine them in a certain way to get a dual polynomial for AND-OR. Our proof is a refinement of a technique of Sherstov -- Sherstov had already showed in a very general context the 'right' way to put together dual polynomials for simpler functions F, G to get a dual polynomial p for their 'block-composition' F(G, G, ..., G), but his earlier analysis could not handle the case that F=AND and G=OR. Prior work broke down because it was not known how to argue that p had high correlation with AND(OR, OR ..., OR). We manage to show this by exploiting some special properties of the AND function and the OR function (specifically, the key properties are that AND has low block sensitivity at all inputs except the All-True input, and dual polynomials for OR have 'one-sided error'. It's difficult to give much more detail than that in a blog post, so see the paper for the full discussion.)

We construct our dual polynomial for symmetric functions as follows. \v{S}palek , building on work of Szegedy, had already constructed a dual polynomial for the OR function i.e. a symmetric function with a 'jump' from +1 to -1 at the bottom Hamming layer of the hypercube. We give a relatively simple construction of a dual polynomial for the Majority function i.e. a symmetric function with a 'jump' from +1 to -1 near the middle Hamming layer of the hypercube. We then show how one can interpolate between these two 'extreme' constructions to give a dual polynomial for an arbitrary symmetric function f. Our final dual polynomial closely 'tracks' (in a sense that can be made precise by complementary slackness) an optimal approximating polynomial for symmetric f's, and we are planning on adding a section to the arxiv paper explaining this intuition.

Finally, we use LP duality to reprove some classical Markov-Bernstein type inequalities from approximation theory (not to be confused with Markov's Inequality or Bernstein's Inequality from elementary probability theory). Markov-Bernstein inequalities bound the derivative of a univariate polynomial q on real inputs in the interval [-1, 1] in terms of deg(q) and the maximum value q takes on this interval. Typically, these are the inequalities used to complete Step 2 in the outline of symmetrization arguments given above. We show how to formulate the problem of maximizing the derivative of a bounded polynomial as an LP, and give clean solutions to the dual LP that 'witness' various asymptotic versions of the Markov-Bernstein inequality. The connection between Markov-Bernstein inequalities and LPs has certainly been known before, but to our knowledge no one has proved these inequalities by analytically constructing dual solutions to these LPs, and we think our dual witnesses shed some new light on these well-known results.

Looking ahead, we're very optimistic than more open problems in the analysis of Boolean functions can be cracked using methods based on LP duality. Approximate degree is not the only measure of the complexity of a Boolean function f that can be characterized by a linear program: there's also polynomial threshold function (PTF) degree (i.e. the minimum degree of a polynomial that sign-represents f), PTF weight-degree tradeoffs (i.e. tradeoffs between the degree of a PTF for f vs. the size of its coefficients), and weight-degree tradeoffs for polynomials that approximate f pointwise to accuracy 1/3. All of these complexity measures have important applications in learning theory, communication complexity, and beyond, but remain poorly understood even in the context of simple function classes like AC^0.

This work was not done in a vacuum, and we are extremely grateful to Karthekeyan Chandrasekaran, Troy Lee, Sasha Sherstov, Robert \v{S}palek, Jon Ullman, Andrew Wan, and the anonymous ICALP reviewers for valuable comments and conversations! As well as to Ryan O'Donnell and Li-Yang Tan for suggesting the problem of constructing explicit dual witnesses for the LPs capturing Markov-type inequalities!

No comments: