Duality and \(c\)-cyclic Monotonicity
Speaker: Robert HaleDate of Talk: April 13, 2026
1. Kantorovich Duality
This is a pretty famous statement in optimal transport. It states
\[\inf \left\lbrace \int _{X\times Y} c(x, y) d \gamma : \gamma \in \Gamma (\mu , \nu ) \right\rbrace = \sup \left\lbrace \int_X \phi (x) d \mu + \int _Y \psi (y) d \nu : (\phi , \psi ) \in \Phi _c \right\rbrace.\]Here, \(\Gamma (\mu , \nu )\) is the set of transport plans between two probability measures \(\mu \) and \(\nu \) on Polish spaces \(X\) and \(Y\), respectively. \(\Phi _c\) is the set of pairs in \(L^1(\mu ) \times L^1(\nu )\) such that \(\phi (x) + \psi (y) \leq c(x, y)\) for \(\mu \)-a.e. \(x\) and \(\nu \)-a.e. \(y\).
There is some intuition to go with this: one can imagine having a “hub” or “middle man” where all the supplies distributed according to \(\mu \) are first gathered in the hub, with the cost of gathering \(x\) equal to \(\phi (x)\), before being shipped off with distribution \(\nu \), with the shipping cost at \(y\) equal to \(\psi (y)\). \(\phi (x) + \psi (y) \leq c(x, y)\) just says this needs to be consistent with the reported cost of transporting from \(x\) to \(y\). Thus Kantorovich duality says that the most cost-effective way to transport \(\mu \) to \(\nu \) in a potentially decentralised fashion is exactly the most expensive way to collect and redistribute goods in a centralised manner.
There are several proofs of this but the one that’s most common is based on e.g. Sion’s minimax theorem. Omitted since I’ve seen it before.
2. \(c\)-Transforms
Definition 1.
If \(u : X\to \mathbb{R} \cup \left\lbrace \infty \right\rbrace\), its \(c\)-transform \(u^c: Y\to \mathbb{R}\cup \left\lbrace -\infty \right\rbrace\) is defined by
\[u^c(y) = \inf _{x\in X} \left( c(x, y) - u(x) \right).\]Note this is a generalisation of the Legendre transform, dispelling a lot of the mysticism around this definition and the properties it entails.
Proposition 2.
If \(y\in Y\) such that \(y\in \partial^c \phi(x)\), then \(\phi (x) + \phi ^c(y) =c(x, y)\).
The proof is just chasing definitions, more or less. There may be some finiteness problems that one has to be careful with.
3. \(c\)-cyclic Monotonicity is Sufficient for Optimality
A week ago, \(c\)-cyclic monotonicity was proven to be a necessary condition on the support of an optimal map. In fact, one can prove the converse is true in the following way:
Theorem 3.
If \(\gamma \in \Gamma (\mu , \nu )\), \(\operatorname{supp}(\gamma )\) is \(c\)-cyclically monotone, and \(\int c(x, y)\ d \gamma < \infty\), then in fact \(\gamma \) is optimal.
Proof: There is a \(c\)-concave function \(\phi : X\to \mathbb{R}\cup \left\lbrace \infty \right\rbrace\) such that \(y\in \partial^c \phi (x)\) \(\gamma \)-a.e. Thus \(\phi (x) + \phi ^c(y) = c(x, y)\) \(\gamma \)-a.e. Rewriting the cost and applying Kantorovich duality immediately proves the claim. \(\square\)
So slick!!