Bourgain's Embedding Theorem

Speaker: Siddharth Mulherkar
Date of Talk: April 22, 2026

1. Setting and Statement

Suppose we have a metric space \((X, d)\) of \(n\) points (think something like a graph with the graph distance), and we wish to find an embedding \(X\hookrightarrow \mathcal{H}\). Ideally this is \(\mathbb{R}^d\) with the standard Euclidean structure. This is in general impossible (there must be counterexamples), but we’ll allow for some error.

Definition 1.

\((X, d)\) embeds into \(\mathcal{H}\) with distortion \(\alpha \) if there is a function \(f : X\to \mathcal{H}\) such that

\[\frac{1}{\alpha } d(x, y) \leq \left\lVert f(x) - f(y) \right\rVert \leq d(x, y)\]

for every \(x, y\in X\).

In particular, \(f\) needs to be \(1\)-Lipschitz.

Theorem 2. (Bourgain, 1986)

For any metric space \((X, d)\), there exists some \(f, \mathcal{H}\) so that \(X\) embeds into \(\mathcal{H}\) with distortion \(O(\log n)\).

2. Bourgain’s Insight on Scale

One can recognise that the embedding realising Bourgain’s theorem must be a Lipschitz function for which the Lipschitz bound is “close to sharp”. In abstract metric spaces, the “canonical” Lipschitz function is simply the distance function, say to a point or to a subset, i.e. \(\operatorname{dist}(\cdot, S) : X\to \mathbb{R}\) for \(S\subseteq X\). But there are so many ways in which the Lipschitz bound fails, and one must take other considerations.

One may recognise that this Lipschitz bound fails depending on the relationship between S and the two points, and thus one may cook up the following idea: define \(f: X\to \mathbb{R} ^{2^n}\) via \(x\mapsto 2 ^{-\frac{n}{2}} \left( \operatorname{dist}(x, A) \right) _{A\subseteq X}\). This says to sample all the different subsets of \(X\) randomly and aggregate that data. This is of course very Lipschitz, but unfortunately has distortion \(\sim n^2\) when \((X, d)\) is the circle graph of \(n\) points. The problem is that for diametrically opposed \(x\) and \(y\), \(\operatorname{dist}(x, A) \approx \operatorname{dist}(y, A)\) with high probability!

The problem is that large subsets of \(X\) fail to distinguish between many distant points whereas small subsets of \(X\) fail to distinguish between nearby points! One therefore needs to find a sampling regime that gives weight to all the different scales in the metric space.

The embedding is given by specifying a probability distribution \(\mu \) on \(\mathcal{P}(X)\) as follows:

Pick \(j\in \left\lbrace 1,\ldots,\log n \right\rbrace\) uniformly at random.
Define \(A\subseteq X\) by including each \(v\in X\) in \(A\) independently with probability \(2 ^{-j}\).
Set \(f : X\to \mathbb{R} ^{2^n}\) by taking \[f(x) = \left( \sqrt{ \mu (A) } \operatorname{dist}(x, A) \right) _{A\subseteq X}.\]

3. Sharpness?

There are two very natural questions here.

Question 3.

Is \(O \left( \log n \right)\) sharp?

The answer is, surprisingly, yes, and so-called “expander graphs” saturate the inequality.

Question 4.

Is the dimension of the codomain \(\mathbb{R} ^{2^n}\) sharp?

This is far from sharp! You can make the dimension \(O \left( \log^2n \right)\) by the Johnson-Lindenstrauss theorem applied twice! Project onto a random subspace and you’ll be happy.

There are other natural open questions related to improvements of Bourgain’s theorem when one has additional information on the geometry of the metric space, e.g. when \((X, d)\) is a planar graph.

There are also questions about making this deterministic and about how close metric spaces are to Banach spaces. For the latter, see the Ribe program and Assaf Naor’s 2018 ICM talk.

Hunter Liu's Website