In this post, we'll look at the natural problem of how to communicate an estimate of a real value in [0,1], using just 1 bit. The post is based on this paper (by Ran Ben-Basat of UCL and Shay Vargaftik of VMware Research and myself -- they helped also with the post) that appeared in ICALP last year.

This question is motivated by various aggregation problems; multiple sending devices may measure a value, and wish to send the value to an aggregator who will compute something from the received values, such as the average. In our problem, the senders have a real value x in [0,1] to send, but are constrained to send just a single bit. Variations of this problem have come up in recent work on distributed/federated learning, where clients compute a gradient vector and send it to a centralized parameter server to update a learning model; we may want to compress the vector to a small number of bits, or even 1 bit, per coordinate. (We'll have more to say on the federated learning problem in a future post.) Of course, it's also just an interesting randomized algorithm problem that seems interesting in its own right.

A natural way to look at the problem is as a variation on rounding. Given a value x in [0,1], one natural approach if limited to one bit is to deterministically round it to X. But what should the receiver do when they receive the (rounded) bit value X? It depends on what one's optimization goal is, but to minimize the maximum possible error, the receiver should have their estimate x' take on the value 3/4 when X is 1, 1/4 otherwise. Note though that deterministic rounding this way is biased -- the expectation E[x'] does not equal x. Randomized rounding, where the sender sends 1 with probability x and 0 otherwise, and the receiver uses the received bit X as the estimate x', has the property that E[x'] = x. Unbiased estimators are arguably more natural for many estimation problems. Here the measure of performance would be the maximum variance for the estimate over all inputs x, so for randomized rounding the cost is 1/4 (when x = 1/2).

Can one do better than these schemes? It turns out that you can, if you have available shared randomness. An approach that has been known in the engineering world (where it has been used in signal processing) is subtractive dithering:

We assume that the sender and receiver have access to shared randomness ℎ∼𝑈[−1/2,1/2]. Given a value x, the sender sends 1 if x+h≥1/2, 0 otherwise. The receiver estimates x' = X - h. We leave as an exercise that this is unbiased, which can be shown by deriving the stronger fact that x' is distributed as 𝑈[𝑥−1/2,𝑥+1/2] , and thus Var[𝑥']=1/12.

Subtractive dithering ignores that generating a shared real number may be more costly or problematic than generating a finite number of shared bits. So one of the results of our paper is developing a "finite shared random bits" unbiased estimator, that corresponds to randomized rounding with no shared bits and converges to subtractive dithering as the number of shared random bits goes to infinity. (The approach does allow for generating a private random real value.)

Also in our paper, we study biased schemes, aiming to minimize the worst-case expected mean-squared error (where the expectation is over randomness used in the algorithm). For example, it's very odd in the setting of subtractive dithering that one can obtain estimates smaller than 0 or greater than 1, when the input is restricted to [0,1], but that's a price we pay for having an unbiased estimator. For a biased estimator, you might naturally truncate the result from subtractive dithering; by truncating to [z,1-z] for an appropriate z > 0, you can indeed slightly improve over the worst-case mean-squared error of 1/16 for deterministic rounding.

We then studied various algorithmic improvements to obtain better biased schemes. We were able to progress by looking at limited shared randomness, namely finding the best algorithm with s shared bits. For example, consider the case of just 1 shared random bit h in {0,1}. The receiver receives 1 bit X from the sender, and thus can have four possible estimates x' depending on X and h. If the 4 possible estimate values are v0, v1, v2, v3 (all between 0 and 1), then it is possible to show the largest possible expected squared error occurs at one of the five inputs 0, 1, (v0+v1)/2, (v1+v2)/2, (v2+v3)/2. We can then write a quadratic program to find the values that minimizes the worst-case expected squared error. The end result is the following rounding algorithm: given 1 shared random bit h in {0,1} and the value x, let X = 1 if x ≥ 0.4 + 0.2h, and 0 otherwise; then let the estimate x' = 0.1 + 0.2h + 0.6X. This has a worst-case expected mean-squared error of 1/20, beating deterministic rounding and truncated subtractive dithering. Using some additional arguments we can handle more shared random bits; at 8 bits we improve the worst-case expected squared error to about 0.04599, which is quite close to our lower bound of about 0.0459, and this is better than anything we could come up with analytically. The optimal solution is still not known (an open question for future work!).

Overall there are many variants of the rounding problem, and few tight bounds currently. So even for simple-seeming problems like rounding, there are still interesting things to do.