Thursday, November 29, 2012

STOC Goings-On

Joan has permitted/encouraged me to post on the STOC PC process this year.  So far, all seems to be running smoothly.

Despite pre-conference suggestions by some naysayers that the formatting rules would be a problem, it doesn't seem to have been, unsurprisingly confirming that STOC submitters can in fact (with high probability) follow basic directions.

The hardest part, I think, has been the assignment process.  This always seems to be the hardest part on the PC end.  Having a large committee both helps and hurts -- more flexibility, but more individuals to deal with.  In particular, some people didn't seem to enter their paper preferences in a timely fashion, causing issues downstream.   Joan has been handling assignment issues as they arise, and I don't expect any significant problems in the end.

As usual, I'm thrilled to be using HotCRP over the alternatives.  I'm by no means an expert in HotCRP usage - I don't go around throwing down searches using Boolean expressions, or that check what the Overall Merit score fields are.  Still, nice to know that they're there.  I'm just thrilled that I can easily get the key overall view on my assigned papers, at a glance seeing how many reviews are in and the scores for everything.  Also, being able to tag papers in arbitrary ways just seems like key functionality.

First-round reviews are trickling in.  To all of you on the PC, thank you - and please get those reviews in!  It's actually helpful to us if you submit reviews as you do them as opposed to waiting to put them all in the system in bulk at the deadline, so please post them as you can.

Sunday, November 25, 2012

Assessing Computer Scientists

My colleague and frequent co-author John Byers knows that I can spend up to several hours a day actively worrying about my ranking as a computer scientist.  While the automated daily Google Scholar updates (any new citations today?) are helpful, it's not always clear how I should interpret them. So John was happy to direct me to a new paper, Assessing Computer Scientists Using Citation Data, to help me and other ranking-obsessed individuals find their place in the world.

The methodology itself is actually quite interesting.  The first question is what version of the h-index do you want to use?  In particular, what external information do you use to judge which version of the h-index appears most accurate?  The method used here is to assume that department rankings accurately represent the quality of the faculty within the departments, and use a regression between the reputation of departments and the mean citation score of the faculty to help determine which versions of the h-index appear most accurate for assessment purposes.  There are several other factors accounted for in the methodology, such as a prediction model for the probability a computer scientist works at a department depending on their quality "mismatch", and how to take into account the thing like the field and years-since-PhD of individual researchers.  (Theory papers as a group obtain much fewer citations on average;  security and cryptography papers, much more.)  The latter allows one to come up with variations of the h-index score that are field- and age-adjusted to individual researchers.  That is, the paper provides a systematic approach that attempts to correct for some of the known weaknesses of h-index scores.  This sort of analysis is common in econometric papers, and is the same general type of analysis we did in our Groupon paper a while back.

I'm well aware that many people object to this type of ranking of individuals, some based on arguments of principle (this isn't how scientific research should be judged) and some based on technical arguments (these approaches are fundamentally flawed).  This work doesn't really try to address the first type of argument, but arguably it goes a fair way toward addressing various technical concerns by showing a suitable application of econometric techniques.  How well does it do?  You'll have to look at the tables in the paper and decide for yourself.    

I generally find these types of measurements useful, with the understanding that they're imperfect.  (When asked to write a promotion letter for someone, for instance, I do examine citation counts.)  To the extent that they become "more perfect", the implication seems to me that they will become the standard first-order approximation of quality.  I don't see how such an outcome could reasonably be avoided.  One argument is that it shouldn't, because it's a useful guide to performance;  if certain people don't perform well in relative terms under that metric, but should be thought of as exceptions to the first-order rule, then exceptional arguments (in letters and such) can be made when needed.  But then not everyone can or will be an exception.

Tuesday, November 20, 2012

Posts from FOCS (part 5)

[Again, edited by Justin Thaler, who organized all this.  The following was written by Michael Forbes of MIT.]

Title: Higher Cell Probe Lower Bounds for Evaluating Polynomials
Author: Kasper Green Larsen

Larsen gave a very understandable talk about data structure lower bounds, where he managed to survey the two relevant prior techniques and introduce his new technique. The lower bounds he discusses are in Yao's cell-probe model. On a basic level, this model is to data structures as communication complexity is to algorithmic complexity. That is, the cell-probe model (and communication complexity) are really only concerned with the ability of a data structure (or algorithm) to move information around. These models ignore computational factors. This makes their results (as lower bounds) hold generally (even when we do care about computation), and also allows us to actually derive results (since getting unconditional lower bounds on computation is hard, but lower bounds for information are possible).

Specifically, the cell-probe model thinks of a data structure as a random-access array of cells, each with a word of some fixed size. Each word can store some sum, or some sort of pointer to another cell. To allow pointers to be meaningful, we allow the word size to be at least log(n), and typically O(log(n)) is the word size considered, as this models real-life usage of numbers. A data structure then has two parts: (1) a preprocessing step for storing information into the cells, and (2) a method for probing these cells to support queries on the original information. In this model, we will only charge for (possibly adaptive) probes to cells. Once we have a cell's information, we can perform arbitrary computation on it to determine the next probe.

The cost of any conventional data structure (as designed, say, in the word RAM model) can only decrease in the cell-probe model, as we consider computation as free. Thus, any lower bounds in the cell-probe model will necessarily apply to any reasonable model for data structures one could consider.

The first technique (by Miltersen-Nisan-Safra-Wigderson) used to prove cell-probe lower bounds was based in communication complexity. As mentioned above, the data structure can be seen as divided: the cells storing the information, and the probes used to answer queries on this information. Given this division, it seems natural to divide these two parts between two players in a communication game, albeit one that is fairly asymmetric in the inputs the players have. That is, Alice has the query that we are trying to answer, and Bob has original information. They are tasked with outputting the answer to Alice's query on Bob's data. Notice that there are two trivial protocols: Alice could send her query to Bob, or Bob could send all of the original information to Alice. These are asymmetric, as typically Alice's query will have a very small description, while Bob has many bits of information.

A data structure can then be converted into a communication protocol as follows: Bob will alone construct the cells of the data structure, and Alice will send indices of the cells she wishes to probe. Bob will return the values of those cells, and Alice will generate the next probe, and so on, until the query can be answered. With this transformation, we can now apply the techniques of communication complexity to get the needed lower bounds.

While the above idea is a good one, it is not good enough for the regime that Larsen is considering. This regime is that of data structures that only have a poly(n) number of possible queries. A good example of this is where the original information is a set of n points in the two-dimensional grid [n]x[n], where each point has a weight at most n. The data structure seeks to return the sum of the weights of the points contained in a given geometric rectangle SxT, where S,T are intervals in [n]. There are n^4 possible rectangles and thus n^4 many queries. To compare, encoding the points takes O(n log(n)) bits. Clearly, in this regime a data structure using n^4 cells can answer queries in O(1) probes. The more interesting question is the time/space trade-off: how many probes are needed when we allow the data structure to only use as many bits as needed to describe the original information?

It turns out that in this regime, the above lower bound technique cannot give lower bounds better than a constant number of probes. This is provable, in the sense that we cannot prove good communication lower bounds because there are good upper bounds: the above trivial protocols are too efficient in this setting.

A second technique (by Patrascu-Thorup), more geared to this regime, uses again the above communication framework, but now gives Alice a harder problem. Alice is given d queries to answer on the single instance Bob holds. The reason this becomes interesting is because given a data structure for this problem, we can now develop relatively better communication protocols: Alice can run all d queries in parallel. Doing this naively will not gain much, but Patrascu-Thorup observed that in one round of Alice sending the indices of the cells she wishes to probe, the order of these cells is irrelevant, as Bob will simply return their contents anyways. Thus, Alice can then send fewer bits by encoding her messages appropriately. This approach turns out to yield lower bounds of the form lg(n)/lg(lg(n)), and cannot do better for the same reason as with the first technique: the trivial protocols are too efficient.

Finally, the technique Larsen introduces diverges from the above themes. Panigrahy-Talwar-Wieder introduces a cell-sampling technique for proving lower bounds, and Larsen takes this further. So far, he can only apply the idea to the polynomial evaluation problem. In this problem, we are given a degree n polynomial over a finite field of size n^2. We will support queries to the evaluation of this polynomial to any of the n^2 points of the field. As before, we can do this in O(1) probes if we have n^2 space, and rather we wish to study the question in the O(n * polylog(n)) space regime. Larsen shows a lower bound of log(n) probes in this regime, which notably is better than any of the previous techniques could hope to prove.

To prove this lower bound, consider any small probe data structure. If we subsample the cells of the data structure (delete any given cell independently and at random) then because any query can be answered by few probes, there will be many queries that can still be answered from the sub-sampled cells. As degree n polynomials can be recovered from any n+1 evaluations, if the polynomial evaluation data structure can still answer n+1 queries with the sub-sampled cells then this means that the sub-sampled cells contain at least as much information as the space of degree n polynomials. However, if the number of probes is small enough, then we can subsample so few cells that this will reach a contradiction, as the number of sub-sampled cells will be too small to recover an arbitrary degree n polynomial. Setting the parameters correctly then yields the lower bound.

Monday, November 19, 2012

Computer Science Rhodes Scholar

A question we tend to get at Harvard is why students should come here to study computer science.  There are lots of reasons, of course, but one thing we like to emphasize it that it's possible to study computer science here but be a more "well-rounded" person than perhaps at some other places.

So I'm happy to provide the example of Aidan C. de B. Daly, a computer science concentrator who just won a Rhodes fellowship.  Congrats (to him and all the winners -- see their bios here.)

Wednesday, November 14, 2012

Harvard Financials

Harry Lewis has had several interesting posts this month over at his blog.  In particular, he discusses Harvard Magazine's recent analysis of Harvard's Annual Financial Report, which offers what I find to be a somewhat bleak picture of the last decade, followed by maybe some optimism for the current campaign.  Recommended reading for those who care about that sort of thing.  Overall, it paints a familiar picture of too many people being far too optimistic about economic conditions up through 2008, allowing and encouraging a grab by the central administration (in the form of a "strategic infrastructure fund") for big-ticket projects that have mostly fallen by the wayside.  Other failures include huge increases in debt and the still silly-seeming interest-rate swaps.  The end result has been a slow multi-year recovery, which (from my standpoint) seems competently managed (under the new administration), but unsurprisingly difficult -- we've had to give up a lot of opportunities to dig ourselves out of this hole, and outside economic conditions don't make it any easier.

Harvard continues to innovate, grow, and expand, even in this weakened state.  I hope that these new financial constraints will mean future plans will be more carefully and reasonably thought out than in the heyday of the early 2000's, even when improved financials offer us a little more breathing room.

Tuesday, November 13, 2012

Posts from FOCS (Part 4)

[Editor: In Part 4 of what now looks to be a 5- or 6-part series of reports from FOCS, second-year graduate student Ludwig Schmidt of MIT contributes a summary of the worksop on Randomized Numerical Linear Algebra from Saturday October 20.]

The workshop "Randomized Numerical Linear Algebra (RandNLA): Theory and Practice" was organized by Haim Avron, Christos Boutsidis and Petros Drineas. The website of the workshop contains the slides, abstracts and links to the speakers' websites: . Michael Mahoney has written a survey of the field: .

After a short introduction by Petros Drineas, Michael Mahoney gave a one-hour tutorial talk. The goal of RandNLA is to design better algorithms for numerical linear algebra by using randomization. Potential benefits include a better runtime (both in theory and in practice) and simpler algorithms that are easier to analyze and more amenable to parallel architectures. The tutorial talk focused on two core problems in numerical linear algebra that highlight the main ideas: low-rank approximation of matrices and least squares approximation for overdetermined systems.

Many algorithms in RandNLA follow a similar pattern: first, parts of the input matrix (rows, columns or entries) are sampled (or otherwise "sketched") in order to construct a smaller problem. This smaller problem is then solved with a traditional algorithm. If the sketch behaves similarly to the original matrix, the solution of the subproblem is close to the solution of the original problem. Hence an important issue is finding a good sampling or sketching scheme. Currently RandNLA algorithms use one of two approaches in order to achieve relative-error approximations: they either construct suitable importance sampling probabilities or use random projections followed by uniform sampling.

A key concept in the analysis of RandNLA algorithms are the statistical leverage scores. For an n x d matrix A (n >= d), the leverage scores a defined as follows: Let U be an n x d orthogonal basis for the column space of A and let U_i be the i-th row of U. Then the statistical leverage scores are || U_i ||^2  ( ||  || denotes the l_2 norm). The concept of statistical leverage scores comes from regression analysis where they are used to detect outliers.

For least-squares approximation, traditional algorithms (Cholesky, QR or SV decomposition) take O(nd^2) time. A basic randomized algorithm first samples O(d log (d) / eps^2) rows of the input matrix A and the input vector b, using the leverage scores of A as importance sampling probabilities. It then solves the smaller least-squares problem to get an approximate solution x'.  The length of the resulting residual vector is a (1 + eps) approximation of the optimal residual || A x_opt - b ||. There are also bounds on the difference between x' and x_opt.

While it takes O(nd^2) time to calculate the leverage scores exactly, an approximation of the leverage scores can be computed faster. One alternative approach uses a fast JL-transform on A, after which the leverage scores are approximately uniform and hence importance sampling on the transformed matrix corresponds to sampling rows of the transformed matrix uniformly. The resulting algorithms run in roughly O(n d log(d / eps) + d^3 log^2 n / eps) time.

Similar ideas are also used for low-rank approximation. In addition to theoretical results, some RandNLA algorithms have already been implemented and evaluated in different scenarios with an emphasis on massive data sets (e.g human genomics and astronomy). There has also been work on implementing RandNLA algorithms in distributed environments.

After the tutorial, several speakers gave 30 minute talks on various topics related to RandNLA. Nick Harvey gave a survey on generalization of Chernoff bounds to matrices: given a set of random, square, symmetric matrices, we want to show that their sum is close to the expected value of the sum. Here, closeness is expressed by bounds on the smallest and largest eigenvalues. Nick introduced Rudelson's Sampling Lemma, the Ahlswede-Winter inequality and Tropp's user-friendly tail bounds, which work in increasingly general settings.

Next, Nikhil Srivastava talked about graph sparsification with a focus on spectral sparsification. A graph H on the same vertices as graph G is a spectral sparsifier for G if the Laplacian of H gives a good approximation of the Laplacian quadratic form of G. Spectral sparsifiers can be constructed with a sampling scheme that uses effective resistances. For a Laplacian L = B^T W B (B is the incidence matrix and W contains the edge weights), these effective resistances are the leverage scores of W^1/2 B. In the case of effective resistances, the sampling probabilities can be computed quickly using Laplacian solvers.

The following two talks discussed practical implementations of RandNLA algorithms. Haim Avron described joint work with Petar Maymounkov and Sivan Toledo. When NLA algorithms are used as subroutines of larger applications, deterministic error bounds and a poly-log dependence on the accuracy are desirable. This can be achieved by using the randomization ideas described above to construct a preconditioner for an iterative algorithm (e.g. CG, LSQR, etc.). The right sampling strategy then leads to a well-behaved matrix and the problem can be solved with a small number of iterations. Empirical results show that a corresponding implementation can be faster than LAPACK (the state of the art) on dense matrices.

Ilse Ipsen spoke about aspects of the practical implementation outlined above (joint work with Thomas Wentworth). Given a matrix Q with orthonormal columns, we want to sample rows from Q so that the resulting matrix has a good condition number. They compare different row sampling schemes and give bounds for the condition number of the resulting matrix based on the coherence of Q. They also conducted numerical experiments and give tighter bounds on the condition number based on the leverage scores of Q.

Yiannis Koutis made another connection to Laplacian linear systems and presented joint work with Gary Miller, Richard Peng and Alex Levin. The theoretically best solver relies on spectral sparsifiers as preconditioners for solving Laplacian / SDD (symmetric diagonally dominant) systems. In contrast to the full sparsifiers described by Nikhil, the algorithm relies on incremental sparsifiers, i.e. a sequence of preconditioners, which are constructed with low-stretch spanning trees. While this algorithm has not been implemented yet, similar ideas have been used in CMG, which is a solver combining the multigrid method with cut-based graph sparsification.

Anastasios Zouzias talked about solving Laplacian systems in the Gossip model (joint work with Nikolaos Freris). The Gossip model is a distributed model of computation where each node can only communicate with its direct neighbors. Moreover, the input is split so that each node has only a single coordinate of the input vector. At the end of the computation, each node has a single output coordinate. Their solution is based on the (randomized) Kaczmarz method, an iterative algorithm for solving linear systems that uses a single row of the Laplacian matrix in each iteration.

Next, Christos Boutsidis spoke about column-based matrix reconstruction (joint work with Petros Drineas and Malik Magdon-Ismail). The overall problem is to approximate a given matrix A with another matrix X that has rank at most k. Christos focused on the problem where we want to approximate A with either r columns of A or r linear combinations of columns of A. One motivation for this restriction is that approximations with actual data columns can be easier to interpret. Since good theoretical results for the r = k case are already known, the talk focused on the r > k case. The authors give deterministic algorithms for approximation in both the Frobenius and the spectral norm and match the lower bound in the spectral norm case.

David Woodruff returned to least-squares regression and discussed his recent results on algorithms running in input sparsity time (joint work with Ken Clarkson). While previous work relied on the fast JL transform for subspace embeddings, their algorithm is based on embeddings that make use of the subspace structure. In particular, it is sufficient to preserve the coordinates with large leverage scores. The subspace embedding is the same as the CountSketch matrix in data streaming. The resulting algorithm runs in time roughly O(nnz(A) + d^3 / eps^2), where nnz(A) is the number of nonzero elements in the matrix A (using iterative methods, this can be improved to a logarithmic dependence on 1 / eps). Similar techniques also work for low-rank approximation.

In the final talk of the workshop, Ben Recht gave a survey of matrix completion. In the matrix completion problem, we are given a low-rank matrix M with missing entries and have to fill the missing data. This problem is also known as the Netflix problem. Since minimizing the rank is NP-hard, we instead use the convex relaxation of the rank function, which is the nuclear norm (the sum of the singular values). This convex relaxation approach is similar to l_1-minimization in compressive sensing. For an m x n matrix A (m <= n) with rank r and coherence bounded by mu, the nuclear norm heuristic recovers A with high probability if we have at least C mu^2 n r log^6 n entries (C is a constant). The dependence on the coherence is necessary because each entry has to provide a similar amount of information about A.

Sunday, November 11, 2012

More from Simons Institute

Alistair Sinclair asked me to post the following.

For #2; note that those with existing postdoc positions are eligible (the Simons people hope they'll be able to arrange a one-semester leave); positions are valid up to 6 years from PhD, so junior faculty are also OK.

The newly created Simons Institute for the Theory of Computing at UC Berkeley has recently posted the following announcements:

1. Details of programs for academic year 2013-14:

2. Research Fellow positions (application deadline Jan 15, 2013): [Opportunities for outstanding junior scientists to spend one or two semesters at the Institute in academic year 2013-14, in connection with one of its programs.]

3. Call for program proposals (deadline Dec 15, 2012): [An invitation for groups of researchers to submit proposals for programs in the broad area of theory of computing and its relationship to other fields - typically one semester in length - to run at the Institute during Fall 2014, Spring 2015 or Fall 2015.] Please follow the links above for details.

Friday, November 09, 2012

Posts from FOCS (Part 3)

Continuing posts from FOCS, with several guest-writers and guest-editor Justin Thaler.

[Editor: This is Part 3 in our series of posts covering FOCS 2012. Part 4 will be coming in the next couple of days, and there might or might not be a Part 5. We start with Harvard post-doc Karthekeyan (Karthik) Chandrasekaran , who contributes a summary of Nisheeth Vishnoi's talk on A Permanent Approach to the Traveling Salesman Problem from Session 2A on Sunday, October 21.]

Title: A Permanent Approach to the Traveling Salesman Problem.
Author: Nisheeth Vishnoi

Nisheeth gives a randomized algorithm to find a TSP tour of length at most n(1+O(1/sqrt(logk)) for simple k-regular graphs on n vertices. Both the algorithm as well as the analysis are simple and easy to state. The algorithm: pick a cycle cover, contract all cycles, find a minimum spanning tree in the resulting graph and output the tour consisting of the cycles and two copies of the edges in the tree. The cycles contribute to n edges and what we lose over n is bounded by the number of cycles. So, the goal is to pick a good cycle cover, one in which the number of cycles is small.

To pick a cycle cover, map the given graph to a bipartite graph by making two copies u_L and u_R for each vertex u, and adding edges (u_L, v_R) and (v_L, u_R) for each edge (u,v). Then find a perfect matching in the bipartite graph. Such a perfect matching can be mapped to a cycle cover in the original graph (such cycle covers could contain vertex-disjoint edges). Nisheeth shows that in the case of k-regular graphs, picking a uniformly random perfect matching in the bipartite graph leads to a good cycle cover with good probability. This is because:

- The number of perfect matchings in k-regular bipartite graphs (approx. equal to the number of cycle covers in k-regular graphs) is
the permanent of the adjacency matrix which is known to be large (per(A)=Omega(k^n) due to Egorychev, Falikman)


- The number of bad cycle covers (with large number of small cycles) in k-regular graphs is much smaller (by a counting argument).
Setting the parameters appropriately, this leads to at most n/sqrt(k) cycles with good probability.

To obtain a poly-time randomized algorithm, obtain a near-uniform perfect matching in the bipartite graph using Jerrum-Sinclair-Vigoda -- this increases the probability of picking a bad cycle cover only by 1/poly(n).

[Editor: Fourth-year grad student Michael Forbes of MIT contributes a summary of Mark Zhandry's talk on constructing quantum random functions from  Session 13A on October 23. Michael does a great job making this summary accessible even to non-quantum folks.]

Title: How to Construct Quantum Random Functions
Author: Mark Zhandry

Zhandry considers cryptography in the post-quantum world, where we want to have good cryptographic primitives despite the potential for quantum adversaries. There are two sides to this question.  The first part is determining which concrete problems seem hard for quantum computers to solve, and using them to construct concrete instantiations of cryptographic primitives.  As an example, it is well known that some concrete cryptographic schemes based on the hardness of factoring are broken because of Shor's algorithm, which can factor integers efficiently on a quantum computer. As such, post-quantum research in concrete cryptography has focused on primitives based on latticed-based problems.

Another side to this question, and this is what Zhandry considers, is identifying the relative structure of cryptographic primitives, such as determining which primitives can be constructed given other primitives.  As an example, it is classically known that one-way functions imply the existence of pseudo-random generators (PRG), which in turn imply the existence of pseudo-random functions (PRF). Zhandy answers the question (amongst others): does the implication "PRG=>PRF" still hold, assuming quantum adversaries?

We should note that there are two variants of the above question.  One variant allows the adversary to use a quantum computer, but only to interact with the cryptographic primitive in a classical way.  The more interesting variant that Zhandry considers is where the interaction can be quantum, and because of the nature of the quantum model,any query by the adversary can "involve" all of the PRF.  In the query model, quantum algorithms can be [provably] exponentially better than classical algorithms, so the adversary Zhandry considers is indeed powerful.

The "PRG=>PRF" result is commonly known as the GGM construction, based on the authors Goldreich-Goldwasser-Micali. They show how to take a PRG, stretching n bits to 2n bits, and from that produce a (seeded) family of functions from n bits to n bits, such that this family of functions is indistinguishable from random by a classical adversary.  The construction of this family is based on a large tree iterating the PRG on itself, where we split the PRG into the left n-bits and right n-bits, and plug one of these n-bits back into the PRG.  The input to the function family dictates the left/right pattern in which we apply the PRG. The security of this construction follows the standard hybrid argument approach.

The main difficulty in applying the above construction in the face of quantum adversaries, is that the hybrid argument in the classical case exploits that classical algorithms can only "look at" as many parts of the PRF as the runtime of the algorithm.  However, as mentioned above, quantum computers can "look at" the entire PRF even in a single query.  Thus, if one wanted to perform a hybrid argument, one might need exponentially many hybrids, which would result (by the averaging used in the hybrid argument) in an exponential (and thus unacceptable) loss in the parameters.

To get around this, Zhandry replaces the distribution of interest with another distribution (quantumly indistinguishable to small-query algorithms) such that this new distribution is created by intrinsically few parameters, so that any hybrid argument on this new distribution only needs to be over few hybrids, allowing the rest of the GGM argument to go through.  Specifically, consider a distribution D, and any r. If we choose a function g via D^[r] (that is, r random samples from D, indexed 1 through r) and f from [r]^X (that is, a random map from the space X to the numbers 1 through r), then g composed with f is a function giving each element in X a sample from D, while only involving r true samples from D.  When r goes to infinity, this distribution will converge to each element in X getting a uniformly random sample from D.  Further, Zhandry shows that for a fixed r, a quantum algorithm with q queries cannot distinguish this process from truly random samples with better than poly(q)/r statistical distance.  Thus, as the above random process only depends on r truly random samples, the hybrid argument will only involve r terms, and, once the choice of r is made, this allows the GGM analysis to be completed in this quantum setting.

Wednesday, November 07, 2012

NYCE (New York Computer Science and Economics Day) Announcement

I was asked to post the following:

We are pleased to announce that NYCE 2012, the fifth annual New York
Computer Science and Economics Day, will be held in New York City on
Monday, December 3rd.

** General Information
The goal of NYCE is to bring together researchers from the larger New
York metropolitan area interested in computer science, economics,
marketing, and business. This year, our organizing theme is ``The
Economics of Big Data, Information, and Privacy.''

NYCE features both invited talks and shorter contributed talks and
posters (see below for details). This year, our invited speakers are
Alessandro Acquisti (CMU), Moshe Babaioff (Microsoft Research), Tim
Roughgarden (Stanford), Michael Schwarz (Google), and Catherine Tucker

NYCE 2012 will be held at the Simons Foundation on December 3rd, from
9:00 a.m. to 5:00 p.m.

For more information, please go to our website:

** Contributed Talks and Posters
Students are invited to present their work in the form of posters or
short (10 minute) talks on any subject of interest to the NYCE
community; these include (but are not limited to) theoretical,
modeling, algorithmic, and empirical work on advertising and marketing
based on search, user-generated content, social networks, or other
means of monetizing the Internet. If you are interested in presenting
your work at NYCE, simply submit an abstract by Thursday, November 8th
(3 days after the STOC deadline).

** Registration
Advance registration (free, thanks to our generous sponsors) is now
open. You can register at until November
20th. After that date, on-site registration is available for a fee of

-- Dirk Bergemann, Sham Kakade, Nitish Korula, and Aaron Roth.

Saturday, November 03, 2012

Posts from FOCS (Part 2)

Continuing posts from FOCS, with several guest-writers and guest-editor Justin Thaler.

[Editor: First-year grad student Mark Bun of Harvard contributes a summary of two talks on submodular maximization from Session 13A on Tuesday, Oct. 23]

We had not just one, but two talks on new tight algorithms for submodular maximization. A submodular function is a set function whose value offers diminishing returns to its input. That is, adding some element to a set causes a smaller increase in the value of the function than does adding that element to a subset. Many combinatorial optimization problems can be expressed as the minimization or maximization of a submodular function, including min- and max-cut, coverage problems, and welfare maximization in algorithmic game theory. Unsurprisingly, submodular maximization tends to be NP-hard for most natural choices of constraints, so we look for approximation algorithms. Both results regard the input submodular function as a value oracle, which evaluates the function on a set in one time step.

Paper Title: A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization

Roy Schwartz spoke about work with Niv Buchbinder, Moran Feldman and Seffi Naor on a randomized linear-time 1/2-approximation for unconstrained submodular maximization. This matches a lower bound due to Feige, Mirrokni and Vondrak, who showed that achieving any approximation factor larger than 1/2 requires exponentially many queries to the value oracle. That paper also gave a 2/5-approximation based on local search, which was later improved to 0.41 by Gharan and Vondrak by simulated annealing, and then to 0.42 by Feldman, Naor and Schwartz by taking advantage of structural similarity. Given this line of work, the result here is remarkable for being both tight and incredibly simple. It is essentially a greedy algorithm that exploits both randomness and the symmetry of the optimization problem.

Paper Title: A Tight Combinatorial Algorithm for Submodular Maximization Subject to a Matroid Constraint.

Justin Ward then presented a new tight algorithm for monotone submodular maximization subject to a matroid constraint, in joint work with Yuval Filmus. Calinescu et al. gave a tight approximation where they relaxed the problem to continuous space, performed gradient ascent, and then rounded. This algorithm achieves the same approximation, but is purely combinatorial. It works by applying the standard greedy algorithm to an auxiliary potential function that has lower curvature than the input function. One of the neat parts of this talk was a discussion of how the authors came up with this auxiliary function. It was quite the tale of experimenting with matroids of low-rank, solving an LP, and guessing a necessary recurrence.
[Editor: Fourth-year grad student Michael Forbes contributes a summary of Ketan D. Mulmuley's talk on Geometric Complexity Theory from Session 12B on Tuesday, October 23.]

Paper Title: Geometric Complexity Theory V: Equivalence between blackbox derandomization of polynomial identity testing and derandomization of Noether's normalization lemma

Ketan starts with the following problem. Consider the space of t-tuples of n by n complex matrices. We impose an equivalence relation on this space via simultaneous conjugation by a fixed matrix. For example, when t=2, we consider (A,B) and (PAP^{-1},PBP^{-1}) equivalent, for any invertible matrix P. The question is to classify the equivalence classes under this action, by identifying a suitably nice normal form. In this context, "nice" means that t-tuples in normal form are the solutions-to/parameterized-by some polynomial equations.

For t=1, this problem is solved by the Jordan normal form. For t>1, this problem is considered "wild". That is, this problem is considered quite difficult in representation theory, and any other problem that embeds this classification problem is called "wild". This problem is similar to hard problems in complexity theory, as it can be shown that the t=2 case embeds the t>2 case, so there is some sort of "completeness" result going on.

It is classically known that Noether's normalization lemma is at the heart of the above classification problem. This lemma states that there exist some set S of generators, such that a certain ring is integral over the ring generated by S. Further, a random choice of S will work. In his paper, Ketan argues that explicit derandomization of the polynomial identity testing (PIT) problem will imply explicit sets S of generators as needed for Noether's lemma, which would imply explicit classification of the "wild" problems mentioned above. That is, the (black-box) PIT problem is to construct a small set H of evaluation points, such that for any small algebraic circuit C computing a non-zero polynomial, C will evaluate to non-zero on some point in H.

The PIT problem has seen a lot of interesting work in the past decade, much of it focused on constructing such sets H (called hitting sets) for restricted classes of circuits, such as depth-3 circuits with bounded fan-in. Ketan suggests that because the general PIT question embeds these wild problems from representation theory, there is a tension between the consensus in the two fields: representation theory sees these classification problems as notoriously difficult, while complexity theory might suggest that these problems (via their connection to PIT) are possible to solve (but perhaps still difficult).
[Editor: Third-year grad student Thomas Steinke of Harvard contributes a summary of two talks on pseudorandomness from Session 2B on Sunday, October 21.]

FOCS 2012 had very many interesting talks. I'll discuss two papers on
unconditional pseudorandom generators, namely Pseudorandomness from Shrinkage by Russell Impagliazzo, Raghu Meka, and David Zuckerman (speaker) and Better Pseudorandom Generators from Milder Pseudorandom Restrictions by Parikshit Gopalan, Raghu Meka (speaker), Omer Reingold, Luca Trevisan, and Salil Vadhan.

Both of these papers show a connection between pseudorandom generators and (pseudo)random restrictions.

Random restrictions are often used to prove lower bounds for circuits, branching programs, or other models of computation. Typically one shows that a circuit `simplifies' or `shrinks' when a random subset of its inputs are set to random values. For example, Hastad's switching lemma is used to show that an AC0 circuit reduces in depth each time a random restriction is applied. However, the parity function does not simplify when restricted. Thus proving that parity is not in AC0.

A pseudorandom generator for a function family immediately gives a lower bound for such functions. Meka et al. give a partial converse to this by converting a lower bound analysis using random restrictions into a pseudorandom generator. They show that, if a pseudorandom restriction sufficiently shrinks circuits in a given family, then a pseudorandom generator can be constructed that essentially matches the lower bound given by the analysis. Thus this paper adds to the well-known connection between hardness and randomness. Their work yields pseudorandom generators with polynomial stretch for several models, including near-quadratic stretch for branching programs that can read their input in an arbitrary fixed order, while the previous best result gives linear stretch.

Gopalan et al. also analyse pseudorandom restrictions, but with very different techniques and results. They analyse mild restrictions (i.e. only a constant fraction of input bits are restricted) that do not yield the strong shrinkage required by Meka et al.. They are able to show that these restrictions, when `averaged out,' leave a function that is fooled by a small bias distribution. Recursively applying this analysis yields a pseudorandom generator with near-optimal almost-logarithmic seed length for combinatorial rectangles and read-once CNFs, as well as a hitting set generator for width-3 branching programs. Previous work was only able to achieve these results with large error; when the error of the pseudorandom generator is required to be polynomially small, their results are a polynomial improvement in terms of seed length.
[Editor: Fourth-year grad student Justin Thaler of Harvard contributes a summary of two unrelated talks.]

Paper Title: The Cutting Plane Method is Polynomial for Perfect Matchings.

Harvard's own Karthekeyan Chandrasekaran talked about joint work with Laszlo A. Vegh and Santosh S. Vempala on cutting plane algorithms for matching problems. The cutting plane method is a popular algorithm for solving integer programs (IPs), used in commercial solvers. It works by starting with an LP relaxation of the given IP to obtain basic optimal solution x_0, and then iteratively adding constraints that are valid for integer solutions but violated by the basic optimum. It continues until the basic optimum is integral. The goal of this paper is to take a step toward explaining the practical efficiency of cutting plane methods, by giving an efficient cutting-plane algorithm for min-cost perfect matching (MWPM) --MWPM is known to be in P, but it was open (apparently for 30 years) whether there was a polynomial-time cutting-plane algorithm for this problem.

A brief summary of how they achieve this is as follows. They start with a natural, well-known LP relaxation of the MWPM problem, called the bipartite relaxation. This relaxation has the nice property that all basic optima x are half-integral, and the support of x is a disjoint union of edges and odd cycles. This makes it easy to find cuts (the cuts correspond to what are called blossom inequalities, see the paper for details). A major challenge, though, is that naively adding cuts will not preserve the half-integrality of intermediate LPs, so at each iteration they throw away some of the old cuts that were added earlier in the execution. They need to take considerable care in choosing which cuts to keep in order to guarantee half-integrality of intermediate LPs (and to ensure that their algorithm makes progress at a sufficiently high rate).

Paper Title: Lower bounds on information complexity via zero-communication protocols and applications.

I unfortunately didn't catch the name of the presenter, but this work was by Iordanis Kerenidis, Sophie Laplante, Virginie Lerays, Jeremie Roland and David Xiao. As the title states, this paper was about information complexity. The idea underlying this concept is as follows. In the standard two-party communication model, Alice and Bob have inputs x and y respectively, and they communicate with each other in order to compute a function f(x, y), with the goal of minimizing the total number of bits transmitted; this is the communication cost CC. Even if Alice and Bob exchange a lot of bits, we can still ask how much information about Alice's input was revealed to Bob, and vice versa; this is the information cost IC. In principle, this could be much smaller than the total communication, as a message can sometimes reveal less information than its length.

As far as I can tell, people started studying IC precisely because it provided a way to reason about and lower bound CC (it is easy to see that the amount of information revealed about the inputs is never larger than the total communication), but IC is a natural complexity measure in its own right, and determining the precise relationship between IC and CC (i.e. whether it is possible to ``compress communication down to the information complexity'') is still a major open question. This is the context underlying the paper, and finally here is a summary of the results.

The authors given a new method, called the relaxed partition bound (RPB), for lower bounding the information complexity of any function. The proof that the RPB indeed lower bounds IC goes through a new connection between "zero-communication protocols" and information complexity-- see the paper for details. Remarkably, the RPB method is stronger than almost all known methods for lower bounding the *communication* complexity f (the only exception is the (unrelaxed) partition bounded itself). Essentially this means that any existing CC lower bound proved via a technique other than the partition bound is actually something stronger: it is an IC lower bound. They are therefore able to resolve the information complexity of a number of important problems, resolving three open questions from a STOC 2012 paper by Mark Braverman.

Thursday, November 01, 2012

Posts from FOCS (Part 1)

Justin Thaler went to FOCS, and managed to arrange for various people to write up their thoughts on various parts of the conference.  So Justin is the famed Editor below.  We haven't seen much posted about FOCS, so hopefully these summaries will be of interest.  I expect we'll have a few posts in this series.

[Editor: Third-year grad student Matt Weinberg of MIT contributes a summary of many of the EconCS talks from FOCS. Matt starts with the Workshop on Bayesian Mechanism Design on Saturday, October 20, followed by some talks from the main program.]

Missing: Jason Hartline, Introduction to Bayesian Mechanism Design, and Tim Roughgarden, Simple/Prior Independent mechanisms part A.

1) Tim Roughgarden, Simple/Prior Independent mechanisms part B: For some settings, such as selling a single item, the revenue-optimal auction is well understood but difficult to implement in practice. On the other hand, the second price auction (item goes to the highest bidder, she pays the second highest price), is often used in practice but has absolutely no revenue guarantee. A famous theorem in Economics of Bulow and Klemperer helps explain this: While the second price auction itself has no revenue guarantee, the revenue of the second price auction after recruiting one extra bidder is at least as good as the optimal auction (without recruitment). In other words, a seller's efforts are better spent advertising than analyzing the market (in these scenarios).

In addition to presenting this theorem, Tim discussed several extensions of this theorem. In a paper with Jason Hartline, they showed that in more complex settings (called "single-dimensional"), the spirit of Bulow-Klemperer holds: simple auctions yield approximately optimal revenue. In a paper with Peerapong Dhangwatnotai and Qiqi Yan, they show further that one can obtain approximately optimal revenue via a simple auction with very limited information about the market. With Inbal Talgram-Cohen and Qiqi Yan, they showed that such results are also possible in certain multi-dimensional settings, such as matching markets.

2) Costis Daskalakis, Revenue Optimization in Multi-Item Auctions: As mentioned in Tim's talk, revenue-optimization in single-item auctions is very well understood. Not only does this mean that revenue can be optimized exactly with enough work, but it enables the study of simple, approximately optimal auctions such as those presented by Tim. Unfortunately, as soon as a second item enters the picture, all bets are off and all of a sudden we know absolutely nothing. Not only do we not know what the optimal auction looks like, but we don't have any grasp on a good benchmark for designing approximations in general settings (in certain restricted settings, Shuchi Chawla, Jason Hartline, David Malec and Balasubramanian Sivan were able to approximate a meaningful benchmark via a clever multi- to single-dimensional reduction). In contrast, welfare maximization is extremely well understood, and has been for 40 years, 10 years before we knew how to maximize revenue with even a single item.

Costis presented work with Yang Cai and Matt Weinberg showing that in very general multi-dimensional settings, revenue maximization can be computationally efficiently reduced to welfare maximization. Specifically, the mechanism design problem of maximizing revenue is reduced to the algorithm design problem of maximizing welfare (without worrying about incentives).

3) Nicole Immorlica, Black Box Reductions from Mechanism to Algorithm Design: One could argue that the overarching goal in mechanism design is to reduce problems with incentives (such as maximizing an objective subject to truthfulness constraints) to problems without (such as designing an algorithm to optimize some objective on all inputs). Several famous economic results, such as the second-price auction, VCG (to maximize welfare), and Myerson's (to maximize revenue selling a single item) follow this paradigm. In fact, so does the main result presented in Costis' previous talk. An obvious important question is simply "when can such reductions exist?" and more specifically, "when are they compatible with approximation algorithms?"

Nicole presented work by Jason Hartline and Brendan Lucier, which shows that in all Bayesian, single-dimensional settings an alpha-approximation algorithm for welfare implies a Bayesian truthful alpha-approximation mechanism for welfare. Follow-up work done independently by Jason Hartline, Robert Kleinberg, and Azarakhsh Malekian and Xiaohui Bei and Zhiyi Huang showed that the same result holds in multi-dimensional settings as well. In a more recent paper, Nicole with Shuchi Chawla and Brendan Lucier provided some negative results, essentially showing that these techniques will not apply far beyond these settings. They show first that it is not possible to replace Bayesian truthfulness with deterministic truthfulness, even in single-dimensional settings. They also show that it is not possible to replace welfare with a non-linear objective (such as makespan).

4) Shuchi Chawla, Non-Linear Objectives in Mechanism Design: Non-linear objectives, such as minimizing makespan, have proven much more difficult to handle in mechanism design than linear ones, such as welfare. Specifically, consider the problem where there are n jobs that need to be run on one of m machines. Each machine i reports a processing time t(i,j), and the goal is to allocate the jobs to machines in a way that minimizes the makespan (maximum processing time among the machines). In the single-dimensional version of this problem, Aaron Archer and Eva Tardos showed how to obtain a 3-approximation, and Peerapong Dhangwatnotai, Shahar Dobzinski, Shaddin Dughmi, and Tim Roughgarden later gave a PTAS. For the general problem, Noam Nisan and Amir Ronen showed that VCG only obtains an m-approximation and Itai Ashlagi, Shahar Dobzinski, and Ron Lavi later proved that no natural, truthful mechanism can do better (for a very reasonable definition of natural mechanisms).

Shuchi presented joint work with Jason Hartline, David Malec, and Balusabramanian Sivan that beats this hardness result using Bayesian truthfulness, obtaining a constant factor approximation whenever n = O(m). In some sense, this is the "hard" case of the problem. In light of the hardness result from Nicole's previous talk, the mechanism cannot simply be a black-box reduction to algorithm design, but instead uses novel ideas.

5) Eva Tardos, Price of Anarchy of Practical Auctions: The price of anarchy studies the effect of selfish behavior in games. In order to minimize congestion in a network, a central designer might try to enforce that certain users use certain paths. As this is quite a daunting task, one might instead hope that simply letting users choose their own paths will result in good (but perhaps not optimal) congestion. In a 2001 paper, Tim Roughgarden and Eva Tardos showed exactly this: that as long as users choose paths in a Nash Equilibrium, the congestion is only a constant-factor worse than optimal. Since then, several other games were shown to have a good price of anarchy, including several auction scenarios. 

Until recently, all of these results were proved using different techniques, and every new game required a new key insight to prove a PoA bound. In addition, once interest grew to other solution concepts beyond Nash Equilibrium, each setting had to be revisited and reproved. In 2009, Tim Roughgarden introduced a "smoothness" technique which essentially reduced proving PoA bounds for several solution concepts to proving that a game (or auction) was "smooth." Eva presented recent work with Vasilis Syrgkanis and Renato Paes Leme where they extend the smoothness framework to accommodate auctions and combinations of auctions. They show first that several practical auctions, including first price auctions and second price auctions (under some mild assumptions) are "smooth." Second, they show that any sequential or parallel combination of smooth auctions is itself smooth (also under some natural assumptions), and therefore has good price of anarchy in several solution concepts. This nicely models lots of practical settings (such as participating in multiple auctions over ebay) and shows that simplicity doesn't cost us much.


[Editor: Matt also contributes a summary of the AGT Session (3A) from Sunday October 21st].

Session 3A:

1) Yang Cai, Optimal Multi-Dimensional Mechanism Design: Reducing Revenue to Welfare Maximization (joint work with Costis Daskalakis and Matt Weinberg): Designing a revenue-maximizing auction for multiple items is one of the central open problems in mathematical economics (see summary of Costis' workshop talk for more background on the problem). An ideal solution, especially for computer scientists, would be one that reduces the problem of designing the auction (a problem that must respect bidder's incentives) to the problem of designing an algorithm (a problem with no incentive constraints). This paper does exactly that in very general settings.

The main idea is that the revenue-maximizing auction can be described as a solution to an LP with polynomially many variables but exponentially many constraints. Remarkably, a separation oracle for the feasible region can be constructed using only black-box access to an algorithm that maximizes welfare (without concern for incentives). See Tim Roughgarden's summary of this talk by Costis here:

2) Zhiyi Huang, The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal (joint work with Sampath Kannan): Traditional mechanism design usually focuses on maximizing (or nearly maximizing) an objective subject to truthfulness constraints. Traditional differential privacy focuses on maximizing an objective subject to privacy constraints. This paper explores the possibility of obtaining all three goals simultaneously. In this paper, the authors show that the Exponential mechanism, a well-studied mechanism from differential privacy, is actually truthful as well. In other words, truthfulness is compatible with differential privacy,at least for maximizing welfare.

The main idea is that the Exponential mechanism is a maximal-in-distributional-range mechanism, offset by the Shannon entropy. MIDR mechanisms are known to be truthful. They also show that in several settings, the mechanism can be implemented efficiently.

3) Laszlo Vegh, Concave Generalized Flows with Applications to Market Equilibria: Concave generalized flow networks build upon standard flow networks. In a traditional flow network, when x flow leaves node v on a path to w, x flow arrives at w. In a Concave generalized flow network, every edge has a concave, monotone gain function, so that when x flow leaves node v on a path to w, g_e(x) flow arrives at w instead. The goal is still to maximize the flow arriving at the sink. This problem (and the special case where every gain function is linear) has been extensively studied, and can be solved via convex programming. However, until this result, no combinatorial algorithm was known (that applied in all settings).

The paper provides a combinatorial algorithm with quite good running time (not much worse than just solving a regular network flow), and also provides new applications to finding market equilibria. Specifically, the paper shows that equilibrium for Linear Fisher markets (and also other market equilibrium solutions) can be found using concave generalized flows.