My Biased Coin: 2021

Monday, November 08, 2021

Postdoc call for FODSI

As a member of FODSI (Foundations of Data Science Institute -- an NSF funded institute with the aim of advancing theoretical foundations for data science), I'm self-interestedly posting the call for postdocs for this year. Two of the areas are a) Sketching, Sampling, and Sublinear-Time Algorithms and b) Machine Learning for Algorithms (which includes what I call "Algorithms with Predictions.") I'd be happy to see postdoc applications in those areas from people who want to spend some time at Harvard, for example.... but of course there are lots of other exciting things going on with FODSI too and you should take a look.

The call is at https://academicjobsonline.org/ajo/jobs/20132

Call text below:

The Foundations of Data Science Institute (FODSI), funded by the National Science Foundation TRIPODS program, is announcing a competitive postdoctoral fellowship. FODSI is a collaboration between UC Berkeley and MIT, partnering with Boston University, Northeastern University, Harvard University, Howard University and Bryn Mawr College. It provides a structured environment for exploring interdisciplinary research in foundations of Data Science spanning Mathematics, Statistics, Theoretical Computer Science and other fields.

We are looking for multiple postdoctoral team members who will collaborate with FODSI researchers at one or more of the participating institutions. These positions emphasize strong mentorship, flexibility, and breadth of collaboration opportunities with other team members -- senior and junior faculty, postdocs, and graduate students at various nodes around the country. Furthermore, postdoctoral fellows will be able to participate in workshops and other activities organized by FODSI.

The fellowship is a one-year full-time appointment, with the possibility of renewal for a second year (based upon mutual agreement) either at the same or at another FODSI institution. The start date is flexible, although most appointments are expected to start in summer 2022. Candidates are encouraged to apply to work with more than one faculty mentor at one or more participating institutions (in-person mentoring is preferred, but remote options will be also considered). The applicants should have an excellent theoretical background and a doctorate in a related field, including Mathematics, Statistics, Computer Science, Electrical Engineering or Economics. We particularly encourage applications from women and minority candidates.

The review process will start on November 15, 2021 and will continue until positions are filled.

Thursday, November 04, 2021

HotNets Presentation : Zero-CPU Collection with Direct Telemetry Access

HotNets has asked that we let people know that the 2021 presentations are available here. I'm using that an excuse to highlight our paper on Zero-CPU Collection with Direct Telemetry Access (arxiv version here), but really I want to highlight the talk by graduate student Jonatan Langlet (Queen Mary University of London) who, as is the nature of graduate students, did all of the real work, and who really did a great job on the talk (direct link). If you guessed from my involvement this involves hashing in some way, your maximum likelihood estimate turns out to be correct.

I think our work fits the HotNets call, which asks for new approaches and preliminary work. Specifically, the call for the HotNets workshop says this:

We invite researchers and practitioners to submit short position papers. We encourage papers that identify fundamental open questions, advocate a new approach, offer a constructive critique of the state of networking research, re-frame or debunk existing work, report unexpected early results from a deployment, report on promising but unproven ideas, or propose new evaluation methods. Novel ideas need not be supported by full evaluations; well-reasoned arguments or preliminary evaluations can support the possibility of the paper’s claims.
We seek early-stage work, where the authors can benefit from community feedback. An ideal submission has the potential to open a line of inquiry for the community that results in multiple conference papers in related venues (SIGCOMM, NSDI, CoNEXT, SOSP, OSDI, MobiCom, MobiSys, etc.), rather than a single follow-on conference paper. The program committee will explicitly favor early work and papers likely to stimulate reflection and discussion over “conference papers in miniature”.

There are similar other "Hot" workshops in other areas, and it was about 14 years ago that I asked whether CS theory should have a HotTheory workshop. There's been a proliferation of new conferences and workshops in theory since then, but none of them really seem to have this flavor. So maybe it's worth asking again whether a HotTheory workshop would make sense? Or do existing theory events meet the theory community needs?

Monday, October 25, 2021

How to send a real number using a single bit (and some shared randomness)

In this post, we'll look at the natural problem of how to communicate an estimate of a real value in [0,1], using just 1 bit. The post is based on this paper (by Ran Ben-Basat of UCL and Shay Vargaftik of VMware Research and myself -- they helped also with the post) that appeared in ICALP last year.

This question is motivated by various aggregation problems; multiple sending devices may measure a value, and wish to send the value to an aggregator who will compute something from the received values, such as the average. In our problem, the senders have a real value x in [0,1] to send, but are constrained to send just a single bit. Variations of this problem have come up in recent work on distributed/federated learning, where clients compute a gradient vector and send it to a centralized parameter server to update a learning model; we may want to compress the vector to a small number of bits, or even 1 bit, per coordinate. (We'll have more to say on the federated learning problem in a future post.) Of course, it's also just an interesting randomized algorithm problem that seems interesting in its own right.

A natural way to look at the problem is as a variation on rounding. Given a value x in [0,1], one natural approach if limited to one bit is to deterministically round it to X. But what should the receiver do when they receive the (rounded) bit value X? It depends on what one's optimization goal is, but to minimize the maximum possible error, the receiver should have their estimate x' take on the value 3/4 when X is 1, 1/4 otherwise. Note though that deterministic rounding this way is biased -- the expectation E[x'] does not equal x. Randomized rounding, where the sender sends 1 with probability x and 0 otherwise, and the receiver uses the received bit X as the estimate x', has the property that E[x'] = x. Unbiased estimators are arguably more natural for many estimation problems. Here the measure of performance would be the maximum variance for the estimate over all inputs x, so for randomized rounding the cost is 1/4 (when x = 1/2).

Can one do better than these schemes? It turns out that you can, if you have available shared randomness. An approach that has been known in the engineering world (where it has been used in signal processing) is subtractive dithering:

We assume that the sender and receiver have access to shared randomness ℎ∼𝑈[−1/2,1/2]. Given a value x, the sender sends 1 if x+h≥1/2, 0 otherwise. The receiver estimates x' = X - h. We leave as an exercise that this is unbiased, which can be shown by deriving the stronger fact that x' is distributed as 𝑈[𝑥−1/2,𝑥+1/2] , and thus Var[𝑥']=1/12.

Subtractive dithering ignores that generating a shared real number may be more costly or problematic than generating a finite number of shared bits. So one of the results of our paper is developing a "finite shared random bits" unbiased estimator, that corresponds to randomized rounding with no shared bits and converges to subtractive dithering as the number of shared random bits goes to infinity. (The approach does allow for generating a private random real value.)

Also in our paper, we study biased schemes, aiming to minimize the worst-case expected mean-squared error (where the expectation is over randomness used in the algorithm). For example, it's very odd in the setting of subtractive dithering that one can obtain estimates smaller than 0 or greater than 1, when the input is restricted to [0,1], but that's a price we pay for having an unbiased estimator. For a biased estimator, you might naturally truncate the result from subtractive dithering; by truncating to [z,1-z] for an appropriate z > 0, you can indeed slightly improve over the worst-case mean-squared error of 1/16 for deterministic rounding.

We then studied various algorithmic improvements to obtain better biased schemes. We were able to progress by looking at limited shared randomness, namely finding the best algorithm with s shared bits. For example, consider the case of just 1 shared random bit h in {0,1}. The receiver receives 1 bit X from the sender, and thus can have four possible estimates x' depending on X and h. If the 4 possible estimate values are v0, v1, v2, v3 (all between 0 and 1), then it is possible to show the largest possible expected squared error occurs at one of the five inputs 0, 1, (v0+v1)/2, (v1+v2)/2, (v2+v3)/2. We can then write a quadratic program to find the values that minimizes the worst-case expected squared error. The end result is the following rounding algorithm: given 1 shared random bit h in {0,1} and the value x, let X = 1 if x ≥ 0.4 + 0.2h, and 0 otherwise; then let the estimate x' = 0.1 + 0.2h + 0.6X. This has a worst-case expected mean-squared error of 1/20, beating deterministic rounding and truncated subtractive dithering. Using some additional arguments we can handle more shared random bits; at 8 bits we improve the worst-case expected squared error to about 0.04599, which is quite close to our lower bound of about 0.0459, and this is better than anything we could come up with analytically. The optimal solution is still not known (an open question for future work!).

Overall there are many variants of the rounding problem, and few tight bounds currently. So even for simple-seeming problems like rounding, there are still interesting things to do.

Wednesday, October 20, 2021

Networking+Theory Postdoc at Harvard

The last couple of years one aspect of research I've greatly enjoyed is getting back into networking, which is really due to my excellent (and patient) collaborators Ran Ben Basat (now at UCL, was a postdoc at Harvard) and Minlan Yu (at Harvard). Minlan and I are working to establish a larger-scale Networking+Theory (hopefully broadening to an even larger Systems+Theory) group at Harvard, working on algorithmic problems in the context of real (or at least very real-ish) systems. We have funding, and are looking for a postdoc, the basic description is below. Ideally we're looking for people comfortable with the theory side and the systems side. The website link for applying is https://academicpositions.harvard.edu/postings/10748 . We have preliminary website for the group at https://projects.iq.harvard.edu/theosys (it's just a start, Minlan and I are both on sabbatical, but you can see some of our publications). We look forward to finding another member of the team!

Networking + Theory Postdoctoral Position

The John A. Paulson School of Engineering and Applied Sciences at Harvard University (SEAS) seeks applicants for a postdoctoral position in networking and theory. The postdoc is intended for one year but there will be funding to potentially extend it to a second year. The postdoc will receive a generous salary as well as an allocation for research and travel expenses.

We are looking for junior scientists who are especially interested in working at the intersection of networking and algorithmic theory, in areas such as programmable network architectures, data center network management, cloud computing, and algorithms for the Internet. Example topics of interest include but are not limited to the design and analysis of sketches and filters for use in real systems, network security, network compression methods, and optimizing network performance for machine learning applications. The ideal candidate will be interested in both building real systems and either developing algorithms and data structures or using existing, underutilized results from the theoretical literature in system design.

The postdoc is intended to work with closely with Minlan Yu and Michael Mitzenmacher, and others involved in the group focused on Systems + Theory work that they are developing, as well as possibly other Harvard faculty.

Candidates should have backgrounds in networking and/or theoretical computer science. Candidates should demonstrate experience in working at the intersection of these areas, or otherwise demonstrate how they will be able to contribute at the intersection. The candidate will be expected to publish scholarly papers, attend internal, domestic, and international conferences and meetings, and take on a mentorship role for undergraduate and graduate students.

Harvard SEAS is dedicated to building a diverse community that is welcoming for everyone, regardless of disability, gender identity and expression, physical appearance, race, religion, or sexual orientation. We strongly encourage applications from members of underrepresented groups.

Saturday, September 04, 2021

Harvard Shopping Period, Here We Go Again

I was looking at today's Harvard Crimson, and noted that Harvard's shopping period looks ready to be vanished again. Shopping period is that wonderful Harvard tradition where students don't preregister for classes, but instead they choose classes after the first week, after having a chance to go to a lecture or two and see how they like it. I encourage students -- and faculty -- to push back against efforts that restrict student flexibility in choosing their classes. While administrators hate it, I still think it's better for students to avoid strong forms of preregistration.

In fact, I realized I've been fighting this fight for quite a while -- here's something I wrote about when the administration under Larry Summers tried to get rid of it, from this blog back in 2008, and here's a Crimson article from 2002, where I spoke out against the plan for moving to preregistration that the blog post refers to.

More recently, in 2018-2019, I found myself on the "Course Registration Committee", a committee that initially seemed similarly designed to find a way to move Harvard from shopping period to preregistration. But a few of us on the committee made convincing arguments that shopping period had many benefits, and the disappearance of shopping period seemed at least somewhat further delayed, while a better solution was found.

Then the pandemic. Shopping period seemed impossible in the chaos, and things were more ad hoc (although there was, last year, a push for "class previews" of some sort before the beginning of classes). This seems to give an opening to remove shopping period again.

I'm not saying there aren't ways to change the system to make it better. I'm not blind to the issues of shopping period -- not having a determined class size before classes begin is problematic for some classes. I believe the committee people who have continued to look at the issue are trying to make things better going forward. But the push always seems to be to make a system which is easier for administrators, and somehow correspondingly that is worse for the students. Students should have the flexibility to see classes and teachers before committing, which to me means either a shopping period, or a structure that allows them to easily change classes for the first couple of weeks with negligible overhead. I suppose one could design an add/drop system with the flexibility I'd have in mind, but it never seems to work that way in practice -- students end up needing signatures and approvals of various flavors, because (I think) it's in the best interest of administrators to make it hard for students to change classes once classes begin. (Having 60 people sign up for a class but then having 30 people leave in the first week is possibly worse than the shopping period approach of not having sign-ups before the class at all, but it's a lot easier to disincentivize students from switching out (or in) with a preregistration system, so that problem disappears, at the cost of student flexibility.)

As an example of a "non-shopping" issue I've seen this semester, first-year students at Harvard first semester are "limited" to four classes. (There may be a way to get an exception to this, but I understand it would be rare exception.) So this semester, with no shopping period, first years have to choose their 4 classes -- without seeing them -- and then manage a confusing add/drop process if they don't like them. The older students generally know how to play the game -- they sign up for 5 or even 6 classes (if they can) and drop the ones they don't like, because dropping is generally easier than adding. But first years don't have that flexibility because the 4-course rule is enforced at signup. (I advise some first year students, and this problem came up.)

I'm sure many non-Harvard people reading this think the shopping period system sounds crazy, and maybe a few think Harvard students are bizarrely spoiled. Perhaps they're right. But I found as a student it was wonderful, and shaped my future in powerful ways by encouraging me to explore. You walk into a class you thought you should take, and find the professor puts you to sleep; or you get dragged to a class by a friend, and find an inspiring subject you had no idea you would like. I believe my college experience would have been lessened significantly without shopping period.

As a faculty member, shopping period is clearly a pain, but I think a manageable one. Back in 2002-2003, the faculty pushed back against preregistration (see this old Crimson article), but it seems opinions have shifted over time; many faculty seemed to have moved to thinking it's not worth it, which is probably in their own self-interest. Having been on both sides, I'm still strongly in favor of shopping period. I suppose if I ever get into administration I may see things differently.

I hope there are (younger) defenders out there, in the students and faculty, to push to make sure any changes still favor student choice over administrative convenience, and lead to the best outcomes for students.

Monday, August 09, 2021

Queues with Small Advice

I have had papers rejected, with comments of the form that the results seem too easy, and are at the level of a homework assignment. Generally, I think these reviewers miss the point. The fact that the results seem easy may be because the point isn't the derivation but the conception and framing of the problem. I actually think that generally it's an interesting subclass of good papers that can be and are turned into homework assignments.

A new-ish paper of mine, Queues with Small Advice, was recently accepted to the very new SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), which took place July 19-21. This conference focuses on algorithms with a close tie to applications. Some people unfamiliar with theory conferences might think that algorithms work would naturally be tied to applications, but I've generally found that algorithmic work tied to applications is more negatively reviewed in theory conferences. Indeed, that type of work is much more likely to receive comments of the form that the results seem too easy, and are at the level of a homework assignment. So perhaps this new conference will fill an important role and hole in the current slate of theory conferences.

In any case, I actually do think this paper is in some ways easy (in that the analysis is readily approachable with standard tools), and parts of it would, I believe, make a great homework assignment. The goal was to show the potential power of using even very simple advice, such as from machine-learning algorithms, in queueing systems. This seems to me to be a very understudied topic, and fits into the recently growing theme of Algorithms with Predictions. (The paper was rejected previously from a conference, where the most negative review said "Very accessible and well written paper, which certainly provides motivation to consider problems of this type." but also said "The mathematical analysis in this paper is fairly standard, and in that sense not novel... the paper is interesting, but not advancing sufficiently the state of the art.")

The paper focuses on the case of 1 bit of advice -- essentially, is the job "short" or "long". I think this type is advice is a good approach to look at for queueing -- it corresponds naturally to putting a job at the front of the queue, or the back. And it may be easier for machine-learning algorithms to generate accurately. Simple is often good in practice.

Rather than describe the paper further, I'll go ahead and turn it directly into a collection of homework problems. Feel free to use them or variations you come up with; hopefully, the students won't find the paper for answers. I personally would be thrilled if one outcome of this paper was that prediction-based problems of this form made their way into problem sets. (Although, serious question: who still teaches queueing theory any more?)

Section 1: One-Bit Advice (Single Queue)

a) Consider the standard M/M/1 queue, with Poisson arrivals at rate λ, and exponentially distributed service times of mean 1; the expected time a job spends in the queue in equilibrium is 1/(1-λ). Now suppose each job comes with one bit advice; if the job has service time greater than T, the bit is 1, and if it is smaller than T, the bit is 0. A "big" job goes to the end of the queue, a "small" job goes to the front. (Assume the queue is non-preemptive.) Find the expected time for a job in this queue in equilibrium, as a function of T and λ.

b) What is the optimal value for T (as a function of λ)?

c) Repeat parts a and b, but this time with a preemptive queue. Does preemption help or hurt performance?

Harder variation: The above questions, but with an M/G/1 queue (that is, for a general, given service distribution); derive a formula for the expected time in the system, where the formula may involve terms based on the service distribution.

Easier variation: Write a simulation, experimentally determine the best threshold, and the improvements from one bit of advice. Different service time distributions can be tried.

Section 2: One-Bit Advice with Predictions (Single Queue)

Where would possibly get a bit of advice in real life? Perhaps from a machine learning predictor. But in that case, the advice might turn out to be wrong. What if our bit of advice is just right most of the time?

a) Consider the (non-preemptive) M/M/1 queue variation from Section 1 part a above, but now the advice is correct with some probability p. Find the expected time for a job in this queue in equilibrium, as a function of p, T, and λ.

b) Repeat part a with a preemptive queue.

Harder variations: The above questions, but have the probability the advice is correct depend on the size of the job. A particularly fun example is when the "predicted service time" for a job with true time x is exponentially distributed with mean x, and the prediction bit is 1 if the predicted time is larger than T, and 0 otherwise. Also, one can again consider general service times.

Easier variation: Again, write a simulation and derive experimental results/insights.

Section 3: One-Bit Advice with Prediction (Power of 2 Choices) [harder, grad student level; needs to know fluid limit models; I'd stick with sections 1 and 2!]

a) Derive fluid limit equations for a collection of N queues, where there are two types of jobs: "large" jobs arrive as a Poisson stream of rate λ₁N and have exponentially distributed service times with mean μ₁ and "small" jobs arrive as a Poisson stream of rate λ₂N and have exponentially distributed service times of mean μ₂. Each job comes with a bit of advice determining whether it is large or small, but large jobs are mislabelled with probability p₁ and small jobs are mislabelled with probability p₂. An incoming job selects a queue using "the power of two choices" -- it is up to you to describe how a job determines what is the better of the two choices (there are multiple possibilities) and how jobs are processed within a queue (non-preemptive is suggested).

[Hint: the queue state can be represented by the number of jobs that are labelled short that are waiting, the number of jobs that are labelled long that are waiting, and the type of the job currently being serviced.]

b) Compare fluid limit results to simulations for 1000 queues to see if your equations seem accurate.

Tuesday, June 08, 2021

Machine Learning for Algorithms Workshop (July 13-14)

We're having an online workshop on "Machine Learning for Algorithms" on July 13-14, with a great group of speakers. Announcement below, link at https://fodsi.us/ml4a.html, free registration (but please register in advance)!

In recent years there has been increasing interest in using machine learning to improve the performance of classical algorithms in computer science, by fine-tuning their behavior to adapt to the properties of the input distribution. This "data-driven" or "learning-based" approach to algorithm design has the potential to significantly improve the efficiency of some of the most widely used algorithms. For example, they have been used to design better data structures, online algorithms, streaming and sketching algorithms, market mechanisms and algorithms for combinatorial optimization, similarity search and inverse problems. This virtual workshop will feature talks from experts at the forefront of this exciting area.

The workshop is organized by Foundations of Data Science Institute (FODSI), a project supported by the NSF TRIPODS program (see fodsi.us). To attend, please register at

https://fodsi.us/ml4a.html

My Biased Coin