The Erdős Discrepancy Problem

by Derrick Stolee

Computational combinatorics made a big splash last spring when Konev and Lisista announced their SAT-solver-based proof of the Erdős Discrepancy Problem for constant C=2. Recently, Le Bras, Gomes, and Selman announced another SAT-based solution for C=3, but when restricting the sequences to have a special property.

I was working on a draft blog post to talk about these results, but then Terry Tao went and solved the full conjecture. I tweeted about it and got many responses, my favorite being simply the following image:

kel

To help those who have difficulty with Tao’s paper (and I am not qualified to read any part of his solution), I made a short video describing the problem, or read below.

The Erdős Discrepancy Problem

Discrepancy is a measure of inbalance. In a 2-colored hypergraph H, consider an edge and count the number of vertices in that edge of each color. The discrepancy of the edge is the absolute difference of those counts. The discrepancy of the 2-coloring is the maximum discrepancy over all edges. Finally, the discrepancy of the hypergraph is the minimum discrepancy of a 2-coloring.

If instead of colors (such as “red” and “blue”) we colored the vertices with +1 and -1, then the discrepancy of an edge is the absolute value of the sum on the vertices. This setting is used more frequently, especially in this problem.

Consider an infinite sequence x_1,x_2,x_3,\dots where each x_i receives a value in \{+1, -1\} (we will call such sequences \pm1-sequences). How “unbalanced” can this sequence be? To measure this, we must consider which sets form the edges of our hypergraph.

If we consider our edges to be intervals [a,b], then the alternating sequence +1, -1, +1, -1, \dots has discrepancy 1, a very uninteresting value.

If we consider our edge to be all subsets of the natural numbers, then we can simply select elements labeled +1 to trivially find an arbitrarily high discrepancy. In fact, even if we restrict our edges to somewhat “complicated” sets such as arithmetic progressions, by van der Waerden’s theorem we will have arbitrarily long arithmetic progressions with the same label, and therefore arbitrarily high discrepancy.

Erdős loved numbers, and especially constructions within the natural numbers. Arithmetic progressions were nice, but not interesting in this situation. Instead, he decided to look at what I will call “rooted” arithmetic progressions. Given integers m and d, the rooted arithmetic progression A_{m,d} is the progression d, 2d, 3d, \dots, md. These rooted arithmetic progressions are the sets we will consider to measure the discrepancy of an arbitrary \pm 1-sequence.

Erdős Discrepancy Conjecture. Let x_1,x_2,x_3,\dots be an arbitrary \pm 1-sequence. For every constant C, there exist integers m,d such that \left|\sum_{i=1}^m x_{i\cdot d}\right| > C.

This conjecture is trivial for C = 0. For C = 1, the case is more difficult, but Mathias proves it quickly. (While I was writing this post I thought this case was near-trivial, but I made a very silly mistake! This problem can be strange.)

The conjecture suddenly becomes very difficult when considering C = 2. In order for you to quickly understand the difficulty of this problem, I will use the non-proof method of “appeal to authority” (similar to “proof by intimidation”): This case was specifically the target of PolyMath 5 and they were not able to prove the conjecture (but did extend a lower bound in the finite case).

What is this finite case? Specifically, we can reduce the conjecture to finite lists of values in \{+1,-1\} and ask for the discrepancy of the finite list. However, for short lists we will have finite discrepancy, and in fact can have rather small discrepancy.

Finite Discrepancy Conjecture. For every nonnegative integer C there exists a minimum integer \tau(C) such that for all n \geq \tau(C), every \pm1-list of length n has discrepancy strictly larger than C. Call \tau(C) the Tao number (see what I did there?). This follows in the tradition of Ramsey numbers, van der Waerden numbers, and what some have called “Szemerédi Numbers” (let \mathsf{sz}(n;k) be the size of the largest k-AP-free subset of \{1,\dots,n\}).

Observe that determining \tau(C) suffices to demonstrate that there exists a \pm1-list of length \tau(C)-1 of discrepancy at most C, and that every \pm1-list of length \tau(C) has discrepancy at least C. By a compactness argument, these conjectures are equivalent. The number \tau(C) is defined this way (using the minimum value where no counterexample exists, versus the maximum value where a counterexample exists) to mimic the values of Ramsey numbers and van der Waerden numbers.

Here is a list of known values and bounds on \tau(C) (if such an integer exists).

C 1 2 3
\tau(C) 11 1,161 \geq 127,646

Note that the value \tau(2) = 1,161 is the main theorem of Konev and Lisista (and is the first proof that \tau(2) is finite) and the bound \tau(3) \geq 127,646 is a major part of the paper by Le Bras, Gomes, and Selman. The reason they found such a specific number, but only a bound, is that they computed a different number that comes from a restricted family of \pm1-lists.

A \pm 1-list x_1,x_2,x_3,\dots is multiplicative if x_{i\cdot j} = x_i \cdot x_j for all i,j \geq 1. This is a very restrictive family. This restriction is not trivial, nor is it out of the blue. This is one of the restrictions considered by the PolyMath group (including a host of other restrictions). This leads to a special case of the conjecture.

Multiplicative Discrepancy Conjecture. For every nonnegative integer C, there exists a minimum integer N_C' such that for all n \geq \tau'(C), every multiplicative \pm1-list of length n has discrepancy strictly larger than C.

While verifying this conjecture for C \leq 3, the following values were determined.

C 1 2 3
\tau(C) 12 1,161 \geq 127,646
\tau'(C) 10 247 127,646

Aside: While I have not looked too far, what other restrictions could these methods find finite bounds on the corresponding “\tau^*(C)” value?

Tao’s Results

As I am very unfamiliar with Tao’s methods, I can only speculate right now on what numerical values his proof implies. From first look, it seems that his methods give an upper bound of \tau(C) \leq 2^{O(C^2)}. This is remarkably close to the obvious lower bound of \tau(C) \geq 2^{O(\sqrt{C})}, given by taking a random list and using Chernoff bounds to estimate the probability of high discrepancy.

See Also

\pm 1 Sequence Wikipedia Entry

Gowers’ blog post on the recent results.

Lipton and Regan discuss the results.

The Extremal Example for Discrepancy 2

I leave you with the example of a \pm 1-sequence of length 1160 with discrepancy 2, as displayed at the end of the paper:

- + + - + - - + + - + + - + - - + - - + + - + - - + - - +
+ - + - - + + - + + - + - + + - - + + - + - - - + - + + -
+ - - + - - + + + + - - + - - + + - + - - + + - + + - - -
- + + - + + - + - + + - - + + - + - + - - - + + - + - - +
+ - + + - + - - + + - + - - + - - - + - + + - + - - + + -
+ + - + - - + - - + + - + + - + - - + + - + - - + + + - +
- + - - - - + + + - + - - + - - + + + - - - + + - + + - +
- - + - - + + + - - + - + - + - - + - + + + - + + - + - -
+ - - + + - + - - + + - + + - + - - + - - + + - - + + + -
- - + + + - + - - - + + - + - - + + - - + - + - - + - + +
+ - + - - + + - + + - + - - + + - + - - + - - + + - + - -
+ + - - + - + + - + - + - - + - + - + + - + - - + + - + -
- + - - + + - + - + - + + - + - + - + + - - - + - + - - +
+ + + - - + - - - + + - + - + + - + - - + + - + - - + - -
+ + - + - - + + + + - - + - - - + - + + + + - - + - - + +
- + + - + - - + + - + - - + - - + + - + - - + + - + + - +
- - + + - + - - + - - + + - + + - + - - - - + + + - + - -
+ + - - + + + - - - + - + + - + - - + - + + - - - + - + +
- + + - + - - + - - + + - - + + + + - + - - + - - + - - +
+ + + - - + - - + + + - - - + + - + + - + - - + + - - + -
+ - - + - - + + - + + - + - - + - - + - + + + - + + - + -
- + - - + + - - + - + + - + + - + - - + - - + - - + + - +
+ - + - - + + + - - - + + - + - - + + - + + - - - + + + -
- - + + - + + - - - - + + + - - + - + + - + - - + - - + +
- + - - + + - + + - + - + + - - + + - - + + - - - - + + +
- + + - - + + - - - - + + - + + + - - + + - - - + + + - -
- - + - + - + + - + + - + + - + - + - - - - + + + - - + +
- + - - + + - + + - + - - + - - + - - + + - + - - + + - +
+ - + - - + + - - + - + - - + - + - + - + + + + - - - + -
+ - + + - - + - - + - + - + - + + - + - + + + - - + - + -
- + - - + - + + + - - + - + + + - - - + + - + - - + - - +
+ - + + - - + + - - - + + - + - + + - - + + - + - - - + -
+ + - + - - + - + + - - + + - + - - + + - + - - + - + + +
- + - - + + - - + - + - + + + - - + - + - - + + - + + - +
- - + - - + - + + - - - + - + + - + - + + - - + + - + - -
+ + + - + - - - - + + - - + - + + - + - + + - - + + - + -
- + + - + - + + - - + + - + - - - + - + + - + - - + + + -
- - - + - + - + + - - + + - + - - + + - + + - + + - + - -
+ - - + - - + + + + - - - + + - - - + - + - + + - + - + +
+ - - + - + + - - + - + - - + - + - + + - - - + + + - + +