Friday, September 06, 2013

Guest Post by Justin Thaler: A "Mini-Survey" of Practical Verification Algorithms

Justin Thaler, after presenting his most recent work on verifiable computation at Crypto, decided to write up a mini-survey on recent work in the area, and he offered to make it a blog post.  It's a somewhat longer-than-average blog post,  but well worthwhile.  So without further ado, here's Justin.


In the last few years there has been an enormous amount of work devoted to the development of "practical" protocols for verifiable computation, with multiple approaches being pursued across several research groups. It has recently become clear to me that the sheer volume of work and the sometimes subtle relationships among the various approaches has made the area difficult to follow. The goal of this post is to provide a reasonably concise description of the various approaches and the pros and cons of each, with the hope that this will serve as a handy reference for those interested in the area, at least until a more thorough and formal survey can take its place.


Some Quick Background:

The field of verifiable computation seeks to develop protocols that allow a computationally weak verifier (such as a cloud computing user) to send its data to a powerful but untrusted prover (such as a cloud computing service). The verifier can then ask the prover to perform some computation on the data, and the protocol should enable the prover to return the output of the computation along with a guarantee that the output is correct. Concretely, the ultimate goal is to start with a program written in a high-level language like C and automatically ensure that the prover faithfully executed the program.

The only background required to understand this post is having encountered interactive proof systems and argument systems . To review, in an interactive proof, the prover tells the verifier the output of a computation, and then they have a conversation during which the prover tries to convince the verifier that the output is correct. Any interactive proof must satisfy two properties: the first, called completeness, says that an honest prover will convince the verifier to accept, and the second, called soundness, says that a dishonest prover will be caught with high probability even if the dishonest prover is computationally unbounded. An argument system is an interactive proof system that is sound only against polynomially time provers.


Sources of Overhead:

I will focus on projects that are reasonably general-purpose, require only one prover, and have attempted to refine theory toward implementation. In the general case, all approaches satisfying these criteria work by first turning the high-level computer program into an arithmetic circuit or a set of "constraints", and then using complexity-theoretic or cryptographic machinery (or both) to check that the prover correctly evaluated the circuit or satisfied the set of constraints. I will refer to the "high-level program ==>; circuit/constraints" transformation in any system as the 'front-end', and the actual protocol used to check that the prover correctly evaluated the circuit as the 'back-end'.

Thus, there are two primary sources of overhead in existing systems: the overhead in the front-end (as some computer programs that run in T machine steps may only be computed by circuits with far more than T gates), and the overhead in the back-end (i.e., the extra work required for the prover to evaluate the circuit with a guarantee of correctness, relative to evaluating the circuit with no guarantee). The existence of two sources of overhead complicates the comparison among many of the papers that I will discuss, as work on front-ends and back-ends has been interleaved, some back-ends work with more general circuits than other back-ends (and hence can interface with more efficient front-ends), and there has been cross-pollination among systems. I will attempt to clearly delineate these issues below by focusing on contributions to the design of back-ends and front-ends separately, beginning with the former.


Summary of Back-Ends:

Existing back-ends fall into three broad categories: interactive proofs, argument systems with pre-processing, and argument systems without preprocessing. I will describe each of the three approaches below, starting with the most efficient but least general approach and moving toward more general but less efficient approaches.

1) Interactive Proofs: In 2008, Goldwasser, Kalai, and Rothblum (GKR) gave a powerful interactive proof protocol for circuit evaluation. The costs to the verifier in this protocol grow linearly with the circuit depth, so the protocol only achieves an efficient verifier if the circuit has small depth (equivalently, if the function computed by the circuit has an efficient parallel algorithm). Fortunately, many of the computations actually being outsourced do exhibit large amounts of parallelism. In 2012, Cormode, Mitzenmacher, and I implemented the GKR protocol plus refinements -- most notably, we brought the runtime of the prover down from Omega(S^3) in a naive implementation to O(S log S), where S is the number of gates in the circuit. Our implementation demonstrated that the concrete costs to the verifier are very low, but there were two downsides to the implementation: First, despite our refinements, the prover runtime remained a bottleneck (the prover took roughly three orders of magnitude more time than that required to evaluate the circuit with no guarantee of correctness). Second, unless the circuit satisfied a certain 'regularity' condition on its wiring pattern, the verifier required an expensive pre-processing phase to "extract" information about the wiring of the circuit.

Very recently, I addressed these bottlenecks by showing the following. First, for 'regular' circuits, the runtime of the prover can be reduced from O(S log S) to O(S). Concretely, for these circuits, the prover now runs about 10 times slower than what is required to evaluate the circuit gate-by-gate with no guarantee of correctness. Informally, a circuit has a 'regular' wiring pattern if there is a way to label each gate g with a binary string such that the labels of the in-neighbors of g are very simple functions of the label of g (this condition is satisfied by natural circuits computing naive matrix multiplication, pattern matching, FFTs, and several other problems).

Second, for any data parallel computation (i.e., any setting where an arbitrary sub-computation C is applied independently to many pieces of data, before possibly aggregating the results), the cost of the pre-processing stage for the verifier and the overhead for the prover can be made to depend only on the size of the sub-computation C, and not at all on the number of pieces of data to which C is applied. Essentially, this holds because the data-parallel computation is 'regular' at a higher level of abstraction than in the first result -- while the sub-computation C may have very irregular wiring, the various invocations of C do not interact at all.

Third, I gave a simple protocol for matrix multiplication that avoids the circuit representation entirely, allowing the prover to compute the correct answer using an arbitrary algorithm, and then spend O(n^2) extra work proving the answer is correct.

[Aside on Matrix Multiplication: It is worth comparing this new matrix multiplication protocol to classical approaches like Freivalds' algorithm. The latter requires no interaction and *no* extra work for the prover. The advantage of the interactive approach is most evident for computations that invoke matrix multiplication as a means to an end rather than as an end in itself. For example, the best-known algorithms for computing the diameter of a graph work by repeatedly squaring the adjacency matrix. Running Freivalds' algorithm to verify these algorithms would require the prover to send the intermediate matrices to the verifier, which would be terabytes of data even for graphs on 1 million nodes. The interactive approach allows the prover to send only the right answer (which is just a single number), and then spend a few KBs of extra communication and a low-order amount of extra work proving the answer is correct.]

2) Argument Systems with Pre-Processing: In 2007, Ishai, Kushilevitz, and Ostrovsky gave an argument that required an expensive pre-processing phase for the verifier, but avoided the use of short PCPs (short PCPs are complicated complexity-theoretic protocols upon which most argument systems were traditionally based -- see the section on Argument Systems without Pre-processing below). Systems with pre-processing are primarily of interest in data-parallel settings, because the verifier can only save work when the set-up costs are amortized over many instances of the same computation (i.e., when the same computation is applied independently to many different inputs).

In 2012, Setty, McPherson, Blumberg, and Walfish began refining IKO's protocol. They called their first system Pepper, and subsequent refinements were described in a follow-up system Ginger. Further follow-up work incorporated an ingenious encoding of computations due to Genarro, Gentry, Parno, and Raykova (GGPR), and resulted in a system called Zaatar that (mostly) supercedes Pepper and Ginger.

Parno, Gentry, Howell, and Raykova introduced another system called Pinocchio that is also based on the theoretical innovations of GGPR. Also, very recent work by Ben-Sasson, Chiesa, Genkin, Tromer, and Virza (BCGTV) implements a back-end that is closely related to Pinocchio.

The high-level comparison between the Zaatar back-end and the Pinocchio/BCGTV back-end is the following. Zaatar is somewhat more efficient because it uses less cryptographic machinery (roughly, the verifier in Zaatar poses queries in the clear, while Pinocchio/BCGTV require the prover to compute over encrypted queries.) However, Pinocchio/BCGTV provide functionality that Zaatar does not. Specifically, the former supports public verifiability and zero-knowledge. Additionally, it consists of only two messages, one from the verifier to the prover and one from the prover to the verifier. Finally, it allows set-up costs to be amortized over an indefinite number of instances of the same computation -- in contrast, the pre-processing costs of Zaatar can be amortized over many instances of a computation only if all instances are verified at the same time. It is worth mentioning that the Pinocchio/BCGTV back-ends are based on non-standard cryptographic assumptions, while Zaatar is based on standard ones.

3) Argument Systems without Pre-Processing: Ben-Sasson, Chiesa, Genkin, and Tromer (BCGT) have been pursuing argument systems that avoid a pre-processing phase for the verifier. These argument systems are based on short PCPs, and while existing work on this topic is still theoretical in nature (focusing on reducing the concrete costs of existing PCPs), implementation efforts are reportedly underway.


Comparison of Back-Ends:

Comparison of Argument Systems without Pre-Processing to other approaches: The advantage of the short-PCPs approach relative to interactive proofs and argument systems with pre-processing is that the former *never* requires a pre-processing phase for the verifier. The disadvantage is that the overheads to the prover are likely to be substantially higher than they are in argument systems with pre-processing. For example, the short PCPs described by BCGT require sequential repetition of the protocol many times in order to drive down soundness error. Like Pinocchio and Zaatar, they also require performing FFT operations on objects as large as a computation transcript, but unlike Pinocchio and Zaatar, these short PCPs also require working over fields of small characteristic (and hence can only be combined with front-ends that generate circuits over these fields), require the use of non-trivial finite field algorithms, and require circuits that satisfy certain algebraic notions of regularity. Precisely how large these overheads are in practice remains to be seen.

Comparison of Interactive Proofs to Argument Systems with Pre-Processing: The advantages of interactive proofs are three-fold. First, they are secure even against computationally unbounded provers, while argument systems are only secure against polynomial-time provers. Second, interactive proofs can avoid pre-processing entirely for structured computation (e.g. 'regular' circuits) and minimize pre-processing for data-parallel computation, while argument systems with pre-processing (discussed below) inherently require an expensive pre-processing stage for the verifier. Third, interactive proofs achieve unmatched prover efficiency when they are applicable, and in fact can avoid the circuit representation entirely for fundamental primitives like matrix multiplication. Concretely, Zaatar's prover is roughly 3 orders of magnitude slower than evaluating the circuit with no correctness guarantee (see e.g. these slides by Mike Walfish), which is roughly commensurate with the prover overhead in my original GKR implementation with Cormode and Mitzenmacher, but is a couple orders of magnitude slower than what we can now achieve with interactive proofs for "regular" circuits (see above).

The disadvantages of interactive proofs are the following. First, they are only applicable to small-depth circuits (i.e., parallelizable computation). Second, they do not support "non-deterministic" circuits, which can be extremely useful for turning high-level programs into small circuits as described in the section on front-ends below (there is evidence that lack of support for non-determinism in inherent in the use of interactive proofs). This means that certain kinds of computations such as those involving many random accesses to memory or sorting/comparison operations are problematic for interactive proofs. Third, interactive proofs cannot support properties like zero-knowledge and public verifiability that are achieved by Pinocchio/BCGTV. Fourth, interactive proofs require logarithmically many rounds of interaction between prover and verifier, while argument systems typically require just one or two rounds.

(I maintain, however, that the round complexity of interactive proofs isn't a big deal. First, interaction can be generically removed via the Fiat-Shamir heuristic, which is secure in the random oracle model and may be acceptable in practice. Second, any system allowing the verifier to request a specific computation in the first place is likely to involve a remote procedure call and hence incur round-trip delays anyway. Third, browsers already send a separate request for every image and script on a web page, and typically a browser cannot substantially parallelize these requests.)


Summary of and Comparison of Front-Ends:

Ginger, Zaatar, and Pinocchio all contain compilers that turn high-level programs into (non-deterministic) circuits (Pinocchio was the first to handle a subset of C, while the others initially handled a different high-level language but have since moved to C). Vu, Setty, Blumberg, and Walfish (VSBW) built a front-end for an implementation of the GKR protocol (this front-end must generate small-depth circuits that are either deterministic or contain a small number of non-deterministic inputs, since the non-deterministic inputs must be sent explicitly to the verifier. It should also be clarified that their compiler does not produce regular circuits, and hence the GKR protocol requires a pre-processing phase when applied to these circuits, and amortizes these costs over many instances of the computation exactly as in Zaatar/Ginger/Pepper). Given a high-level program, the VSBW front-end also automatically determines whether Zaatar or GKR would be most efficient to verify the computation and runs the better of the two protocols, and their system as a whole is called Allspice.

However, none of the front-ends above efficiently supports programs that utilize random access to memory. Two recent front-ends change this. One is the front-end developed by BCGTV mentioned above, which implemented earlier theoretical work by the first four authors. Given a high-level program, their approach to circuit generation roughly works as follows. They first compile the C program into actual machine code for a simple Random Access Machine (RAM). They then generate a circuit which takes an entire transcript (sorted by time) of the RAM computation as a non-deterministic input, and checks that the transcript is valid. This requires checking the transcript for both *time consistency* (i.e., that the claimed state of the machine at time i correctly follows from the machine's claimed state at time i-1) and *memory consistency* (i.e., that every time a value is read from memory location x, the value that is returned is equal to the last value written to that location).

Their circuit checks time-consistency by representing the transition function of the RAM as a small circuit (they have managed to represent the entire transition function as a circuit with ~1000 gates, at least for short RAM computations of < 1 million steps. While this is still a lot of gates, it is a substantial engineering achievement). It then applies this circuit to each entry i of the transcript and checks that the output is equal to entry i+1 of the transcript. Their circuit checks memory consistency by using routing techniques to re-sort the transcript based on memory location (with ties broken by time), at which point it is straightforward to check that every memory read from a given location returns the last value written to that location.

The second front-end that supports random access to memory is called Pantry, developed by Braun, Feldman, Ren, Setty, Blumberg, and Walfish. They augment Zaatar and Pinocchio's front-end to support random access to memory by using Merkle-hashing techniques -- this offers an alternative approach to the routing-based technique for ensuring memory consistency pursued by BCGTV. (Interestingly, the Merkle-hashing approach was explicitly considered in BCGT's theoretical work, but has not yet been incorporated into the BCGTV system).

Merkle trees allow one to outsource memory maintenance by building a binary tree whose leaves correspond to memory locations. Each leaf stores the value contained in the corresponding memory location, and each internal node stores an evaluation of a collision-resistant hash function applied to its children. As long as the verifier knows the value of the root, every time a memory location (leaf) is accessed, the prover can 'prove' that the returned value is correct by revealing all values along the leaf-to-root 'witness path'. One can show that the only way the prover can lie is by finding a collision in the hash function.

Pantry identifies a collision-resistant hash function whose evaluation function can be (relatively) efficiently represented as a circuit or set of constraints. This allows evaluation of the hash function (and hence checking of 'witness paths', and hence support for random access to memory) to be folded into Zaatar's and Pinocchio's verification machinery.

Pantry's use of collision-resistant hash functions is also useful for some applications (such as MapReduce) that do not necessarily require random access to memory -- for these applications, Pantry avoids the use of a full Merkle-hash tree, reducing costs relative to what is required to support general RAMs.

Comparison of Pantry and BCGTV:

Generality: BCGTV can handle arbitrary C programs. Pantry comes close to achieving this but still cannot handle data-dependent loops. BCGTV also has the advantage that it can be modified to generate circuits that satisfy the algebraic regularity conditions required by back-ends based on short PCPs (though implementations of these back-ends remain in progress). One important aspect of Pantry is that it enables the outsourcing of computations that are run over "remote inputs", i.e., inputs that the verifier never sees in full, but is merely provided a digest (typically a hash, or a Merkle-hash) of the input. This is important in several applications, such as verifying MapReduce computations. In principle, this feature could be incorporated into the BCGTV front-end as well, but doing so efficiently would likely require redoing many of the optimizations in Pantry.

Efficiency: BCGTV's primary overhead stems from the fact the ~1000-gate transition function sub-circuit must be repeated for *every* step of the RAM. Thus, their circuits are at least 1000 times larger than the runtime of the RAM. Pantry on the other hand requires expensive evaluations of a collision-resistant hash function (when supporting general random access to memory, logarithmically many evaluations of the hash function are required per memory access). However, hash function evaluations in Pantry are only required when memory is accessed, not for every step of the RAM. Thus, which of the two approaches (BCGTV vs. Pantry) generates smaller circuits may depend on how memory-intensive the computation is.



Ideally, this post makes clear that each of the approaches to verifiable computation being pursued thus far achieves a different tradeoff between efficiency, expressiveness, and support for features such as public verifiability and zero knowledge properties. This diversity can only be a good thing as users will be able to choose the approach that best suits their needs.

In more detail, despite substantial progress, the fully general-purpose front-end still generates circuits that are at least three orders of magnitude larger than the runtime of original RAM computation. Moreover, existing back-ends based on argument systems with pre-processing impose an additional three orders of magnitude overhead for the prover. Arguments based on short PCPs will avoid pre-processing for the verifier, but impose additional overheads on the prover. Interactive proofs can avoid these large overheads for sufficiently structured computation, but require a front-end that generates small-depth deterministic circuits, and does not provide cryptographic properties like public verifiability and support for zero-knowledge achieved by some of the argument systems.

Two attractive directions for future work present themselves. One is to develop protocols and build systems that can verifiably execute general computations, but that automatically leverage structure within computations for efficiency gains. Alternatively, it may be better to develop a restricted programming framework analogous to MapReduce that still allows for the expression of a powerful class of computations and automatically "extracts" the structure necessary to verify the computation efficiently. If this approach is pursued, it will be critical to determine the right balance between the level of generality to support and the amount of structure to force upon computations for efficiency gains.

Acknowledgements: I am indebted to Mike Walfish and Andrew J. Blumberg for extensive conversations over several months that substantially influenced my understanding of the area and this post in particular, and to Mike Walfish and Michael Mitzenmacher for valuable feedback on an early draft of this post. Any errors are entirely my own.

Further resources: Bryan Parno hosted a session on verifiable computation at the Microsoft Research Faculty Summit, where he, Michael Mitzenmacher, and Mike Walfish spoke about several of the approaches discussed in this post. Video of the session is available, and Mike's slides contain a nice comparison of the various approaches.


Ilya Mironov said...

Thank you for putting this post together - it's an excellent summary that I'll share with others.

Personally, I'm more excited about the zero-knowledge and SNARK potential of this line of work. There's always an alternative to farming out a computation - i.e., doing it locally. For outsourcing to make economic sense, numbers have to work out. On the other hand, there are sometimes no good alternatives to NIZKs or SNARKs. Existence of practical NIZKs constructions opens a whole new range of possibilities for designers of cryptographic protocols.

Justin Thaler said...

Thanks Ilya, that's an excellent point.