VARIANTS OF THE OPTIMAL TRANSPORT PROBLEM AND THEIR DUALITY
                                     By
                      Chamila Malagoda Gamage
                           A DISSERTATION
                               Submitted to
                       Michigan State University
                in partial fulfillment of the requirements
                             for the degree of
                  Mathematics—Doctor of Philosophy
                                    2023


                                        ABSTRACT
The classical Optimal Transport (OT) problem studies how to transport one distribution to
another in the most efficient way. In the past few decades it has emerged as a very powerful
tool in various fields, such as optimization theory, probability theory, partial differential
equations, machine learning and data analysis. In this thesis, we will discuss some existing
variants of the classical optimal transport problem, such as the capacity constrained OT
problem, multi-marginal OT problem, entropy-regularized OT problem and barycenters,
and we will introduce a couple of new variants by combining the existing versions. We will
also discuss their duality results and some characterizations.


To my loving daughter Mishini Olivia...
                 iii


                              ACKNOWLEDGEMENTS
The journey of my doctoral studies has been an arduous yet immensely rewarding experience.
I have faced numerous challenges, encountered countless obstacles, and navigated uncharted
territories. Today, I stand at the culmination of my efforts, filled with a deep sense of
satisfaction and accomplishment. None of this would have been possible without the amazing
people I have had around me.
    First and foremost, I am extremely grateful to my advisor, Dr. Jun Kitagawa, for his
unwavering support, guidance, and encouragement throughout this challenging process. His
knowledge, insights, and belief in my abilities have been invaluable, and I feel incredibly
fortunate to have had the opportunity to learn from him. I would also like to express the
deepest of gratitude towards the members of my dissertation committee: Dr. Baisheng Yan,
Dr. Zhengfang Zhou, and Dr. Russell Schwab, for their guidance through different stages
of my Ph.D. career. I also take this opportunity to thank the National Science Foundation
(NSF) as I was partially supported through my advisor’s NSF grants DMS-1700094 and
DMS-2000128 many times in the form of Research Assistantships, which were very helpful
for me.
    I am deeply indebted to Dr. Dejan Slepčev, at Carnegie Mellon University, whose guid-
ance, expertise, and invaluable insights have shaped my research and made this thesis pos-
sible. Your time, patience, and fruitful discussions have not only expanded my knowledge
but also helped me think deeper and approach problems from new perspectives. Your men-
torship has been invaluable, and I am truly grateful for the profound impact you have had
on my academic growth.
    Furthermore, I would like to acknowledge the contributions of the professors who have
taught and mentored me throughout my academic journey. Special thanks to Dr. Archil
Gulisashvili and Dr. Sergiu Aizicovici, at Ohio University: Your dedication to teaching and
commitment to excellence have provided me with a solid foundation in Analysis and PDEs.
I am grateful for the knowledge and skills you imparted, which have been crucial to my
                                             iv


research.
    I would also like to extend a special thanks to my teaching mentors, Andrew Krause,
and Tsveta Sendova. Your guidance and support in developing my teaching skills have been
instrumental in my growth as an educator. The knowledge and experiences you shared have
not only enriched my teaching abilities but have also enhanced my research journey.
    To my dear friends, thank you for your unwavering support during the ups and downs of
this Ph.D. journey. While I cannot mention everyone individually, I would like to express my
special thanks to Dimitris Vardakis, Estefania Garcia, Mihalis Paparizos, Ana-Maria Raicu,
Arman Travakoli, Joe Melby, Craig Gross, Keshav Sutrave, Rachel Domagalski, Franciska
Domokos, Kai Huang, Rui Wang, Danika Van Niel, Samara Chamoun, Reshma Menon,
Hitesh Gakhar, Leonardo Abbrescia and Kasun Fernando. Your friendship and support
have been a constant source of motivation and encouragement. Your willingness to listen,
provide feedback, and engage in intellectual discussions has enriched my research and per-
sonal growth. I am grateful for the countless hours spent together, both academically and
socially, and for the memories we have created.
    I could not have undertaken this journey without the immense support of my family, who
have been my rock throughout this journey. First of all, I would like to thank my loving
husband, Lubashan Pathirana, for his continuous support throughout this journey. His love,
encouragement, and understanding have been my anchor during the most difficult moments.
He has believed in me even when I doubted myself, and his constant presence has given me
the strength to persevere. I would like to say a special thank you to my daughter, Mishini
Olivia. She came into my life during this doctoral journey, and her presence has given me
a reason to try harder, to push through the most difficult moments, and to keep going even
when things seemed overwhelming. I would also like to thank my parents, my parents-in-
law, and my sister, for their love, encouragement, and constant belief in my abilities. Your
support has been the cornerstone of my achievements, and I am forever grateful for your
sacrifices and understanding.
                                              v


    Finally, I would like to express my sincere appreciation to all the individuals who have
contributed to this thesis, directly or indirectly. Your support, advice, and encouragement
have played a significant role in shaping my research and shaping me as a researcher. To
all those who have believed in me and supported me throughout this endeavor, I offer my
deepest gratitude. Your contributions have been invaluable, and I am humbled by your
presence in my life.
                                               vi


                             TABLE OF CONTENTS
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1
CHAPTER 1       PRELIMINARIES          . . . . . . . . . . . . . . . . . . . . . . . . . .  3
CHAPTER  2      THE CLASSICAL OPTIMAL TRANSPORT PROBLEM                          . . . . .  7
     2.1 The Primal Problem . . . . . . . . . . . . . . . . . . . . . . . .      . . . . .  7
     2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     2.3 Properties of the optimizers . . . . . . . . . . . . . . . . . . . .    . . . . . 11
CHAPTER  3      CAPACITY CONSTRAINED OT PROBLEM                      . . . . . . . . . . . 14
     3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . 14
     3.2 The Primal Problem . . . . . . . . . . . . . . . . . .      . . . . . . . . . . . 15
     3.3 Duality - Version I . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . 17
     3.4 Duality - Version II . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . 18
     3.5 A further characterization on the optimizers . . . . .      . . . . . . . . . . . 19
CHAPTER 4       MULTI-MARGINAL OT PROBLEM . . . . . . . . . . . . . . . .                  23
     4.1 The Multi-Marginal OT (MMOT) Problem . . . . . . . . . . . . . . . . .            23
     4.2 Capacity Constrained Multi-Marginal OT Problem . . . . . . . . . . . .            26
CHAPTER 5       BARYCENTERS . . . . . . . . . . . . . . . . . . . . . . . . . . .          50
     5.1 Barycenters in the Wasserstein Space . . . . . . . . . . . . . . . . . . . .      50
     5.2 Capacity Constrained Barycenter Problem . . . . . . . . . . . . . . . . .         54
CHAPTER 6       ENTROPY REGULARIZATION . . . . . . . . . . . . . . . . . .                 66
     6.1 Entropy Regularized Optimal Transport (EROT) Problem . . . . . . . .              66
     6.2 Entropy Regularized Barycenter Problem . . . . . . . . . . . . . . . . . .        71
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     86
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
                                            vii


                                     INTRODUCTION
The origin of the Optimal Transport (OT) Problem goes back to the year of 1781, where the
French Mathematician Gaspard Monge introduced a problem in his classic paper Memoire
sur la theorie des deblais et des remblais [41], which is about finding the most efficient way
of moving dirt from one place to another which was inspired due to military and economic
purposes. However, this problem remained unsolved for almost two centuries, until the
Russian Mathematician and economist Leonid Kantorovich’s involvement in this problem
([30]) who made some progress with the invention of Linear Programming.
    To get an insight of this problem, we will consider an example in the discrete setting.
Suppose there is a large number of iron mines and the iron has to be transported to the
refining factories. The problem is to find where each unit of iron should be transported so
that the total transportation cost is minimized. Such an assignment from an initial position
to its final position, is known as an “optimal transport plan”.
    Over the past few decades, the theory of OT has gained a lot of attention and it has been
applied in various fields such as optimization theory, probability theory partial differential
equations, machine learning, etc. In 1987, in [10] Yann Brenier showed that under certain
conditions, there exists a unique transport plan that minimizes the cost associated to the
Euclidian distance squared. In 1995, Wilfrid Gangbo and Robert McCann generalized this
result for cost functions which are strictly convex or concave ([22, 23]). In [6], Benamou
and Brenier presents a dynamical formulation of the OT problems which connects the OT
theory to many other fields such as fluid mechanics ([11]), image processing ([42]), data
analysis ([36]), etc. The field of Computational OT is another rapidly growing area as it
serves as a powerful tool to compare probability distributions. Object recognition ([25]),
label classification ([55]), and generative modelling ([50]) are few among many sub-fields in
machine learning that widely apply OT tools.
    Viewing the OT problem as a linear programming problem enables us to construct dual-
ity theory for the OT problem. It plays a significant role in understanding and solving the
                                               1


OT problem. Furthermore, it helps to characterize optimal solutions which is often chal-
lenging to do without duality theory. For instance, in computation OT theory, the duality
theory enables to use efficient computational algorithms, such as Sinkhorn algorithm ([45])
to approximate the solutions to the OT problem.
    In this thesis, we will discuss a few variants of the classical OT problem, namely, the ca-
pacity constrained OT problem, multi-marginal OT problem, entropy-regularized OT prob-
lem and barycenters, and their duality theory. We will also discuss a couple of new variants
by combining already existing versions, such as the capacity constrained multi-marginal OT
(CCMMOT) problem and capacity constrained barycenters. By combining the two notions
of the capacity constrained OT problem and the barycenter problem, we will introduce ca-
pacity constrained barycenters in Wasserstein space. Under certain assumptions, we will
prove that the problem attains a minimizer and present some duality results. The notion of
the CCMMOT problem already exists in the literature ([18]); however, a dual formulation for
this problem does not exist. We will present a dual formulation for this problem and prove
the strong duality result and the existence of dual maximizers. The entropy-regularized ver-
sion of Wasserstein barycenters and their dual formulation also exist in the literature ([38]).
The authors have proven that the strong duality holds and the existence of the primal prob-
lem via duality result. In this thesis, we will provide a direct proof for the existence of a
minimizer for the primal problem and the existence of dual maximizers.
                                                2


                                          CHAPTER 1
                                      PRELIMINARIES
Some standard symbols, definitions, and theorems used in this thesis are given below.
Notation
    Let X be a Polish space (see Definition 1.0.1).
    • B(X): The sigma algebra of Borel sets of X.
    • P(X): The space of Borel probability measures on X.
    • P2 (X): The space of Borel probability measures with finite second moment.
    • M(X): The space of finite Borel measures.
    • M+ (X): The space of positive, finite Borel measures.
    • C(X) : Continuous functions on X.
    • Cb (X): Bounded, continuous functions on X.
    • L1 (Rd ): Functions integrable w.r.t. Lebesgue measure on Rd .
    • L1 (X, dµ): Functions integrable w.r.t. measure µ on X.
    • L0 (X, dµ): Measurable functions on the space (X, µ).
    • [f ]+ : Positive part of the function f (see Definition 1.0.16).
    • [f ]− : Negative part of the function f (see Definition 1.0.17).
Definitions
Definition 1.0.1. (Polish Spaces) A Polish space is a separable completely metrizable topo-
logical space.
                                                 3


Definition 1.0.2. (Push forward of a measure) Given two Polish spaces X, Y , a Borel map
T : X 7→ Y , and a probability measure µ ∈ P(X), we define the push forward of µ through
T , denoted by T# µ ∈ P(Y ), as
                         T# µ(E) = µ(T −1 (E)),            ∀E ⊂ Y, Borel.
Definition 1.0.3. (Convex set) A subset C of a vector space V is convex if (1−λ)x+λy ∈ C
whenever x, y ∈ C, and 0 ≤ λ ≤ 1.
Definition 1.0.4. (Weak Convergence) A sequence {µn }n∈N ⊆ P(X) converges weakly to
µ ∈ P(X), if for all f ∈ Cb (X),
                                          Z              Z
                                    lim      f dµn =         f dµ.
                                   n→∞     X               X
Definition 1.0.5. (Tightness) A set A ⊆ P(X) is tight, if ∀ε > 0, there exists a compact
set Kε ⊂ X such that
                                 µ(X \ Kε ) < ε,           ∀µ ∈ A.
Definition 1.0.6. (Lower semi-continuity) Let (X, d) be a metric space. A function f :
X 7→ R ∪ {+∞} is lower semi-continuous, if for every sequence xn such that xn → x, we
have
                                      f (x) ≤ lim inf f (xn ).
                                                 n→∞
Definition 1.0.7. (Support of a measure) Let X be a separable metric space. We define
the support of a measure γ, denoted by spt(γ), as the smallest closed set on which γ is
concentrated.
                                                       \
                             spt(γ) :=                                E.
                                         {E:E is closed and γ(X\E)=0}
Definition 1.0.8. (Finite pth moment) A measure µ ∈ P(Rd ) has finite pth moment, if
                                     Z
                                          |x|p dµ(x) < +∞.
                                       Rd
                                                   4


Definition 1.0.9. (Vanishing measures on small sets) A probability measure µ ∈ P(Rd )
is said to vanish on small sets if and only if µ(E) = 0, ∀E ⊂ B(Rd ) having Hausdorff
dimension d − 1 or less.
Definition 1.0.10. (ω-continuity) A function f : X 7→ R is said to be ω-continuous, if there
exists a function ω : [0, ∞] 7→ [0, ∞] such that limt→0 ω(t) = ω(0) = 0 and
                           |f (x) − f (y)| ≤ ω(|x − y|),        ∀x, y ∈ X.
Definition 1.0.11. (K-Convexity along curves) Given a metric space (X, d), a functional
ϕ : X 7→ (−∞, ∞] is called K-convex on a curve γ : t ∈ [0, 1] 7→ γt ∈ X, for some K ∈ R, if
                                                  1
             ϕ(γt ) ≤ (1 − t)ϕ(γ0 ) + tϕ(γ1 ) − Kt(1 − t)d2 (γ0 , γ1 ),          ∀t ∈ [0, 1].
                                                  2
Definition 1.0.12. (Proper Convex function) A convex function f : X 7→ [−∞, ∞] is called
proper, if f (x) < ∞ for at least one x ∈ X and f (x) > −∞ for all x ∈ X.
Definition 1.0.13. (Infimal Convolution) Given two proper convex functions f, g on Rd , we
define their infimal convolution, denoted by f □g, as
                        (f □g)(x) = inf {f (x − y) + g(y)},            ∀x ∈ Rd .
                                        y
Definition 1.0.14. (L-Lipschitzness) Given two metric spaces (X, dX ) and (Y, dY ), a func-
tion f : X 7→ Y is said to be L-Lipschitz, if there is a real constant L ≥ 0 such that, for all
x1 , x2 ∈ X,
                                 dY (f (x1 ), f (x2 )) ≤ LdX (x1 , x2 ).
Definition 1.0.15. (Legendre-Fenchel Transform) Let E be a normed vector space, and φ
a convex function on E with values in R ∪ {∞}. Then the Legendre-Fenchel transform of φ
is the function φ∗ , defined on the dual space E ∗ by the formula
                                   φ∗ (z ∗ ) = sup{z ∗ · z − φ(z)}.
                                                z∈E
                                                     5


Definition 1.0.16. (Positive part) Given a function f : R 7→ R, we define its positive part,
denoted by [f ]+ , as
                                     [f (x)]+ = max{f (x), 0}.
Definition 1.0.17. (Negative part) Given a function f : R 7→ R, we define its negative part,
denoted by [f ]− , as
                                   [f (x)]− = − min{f (x), 0}.
Theorems
Theorem 1.0.18. (Prokhorov) Let (X, d) be a Polish space. Then a family A ⊂ P(X) is
relatively compact w.r.t. the weak topology if and only if it is tight.
Theorem 1.0.19. (Fatou’s Lemma) Let fn : X 7→ [0, ∞] be measurable, for each n ∈ N.
Then
                             Z                           Z
                                 lim inf fn dµ ≤ lim inf     fn dµ.
                               X   n→∞             n→∞     X
Theorem 1.0.20. (Monotone Convergence theorem) Let {fn } be a sequence of measurable
functions on X such that
  (i) 0 ≤ fk (x) ≤ fk+1 (x) ≤ ∞, for all k ∈ N and all x ∈ X,
 (ii) limn→∞ fn (x) = f (x), for all x ∈ X.
Then,
                                          Z          Z
                                     lim     fn dµ =    f dµ.
                                    n→∞    X          X
                                                 6


                                          CHAPTER 2
              THE CLASSICAL OPTIMAL TRANSPORT PROBLEM
2.1    The Primal Problem
    Let X and Y be two Polish Spaces, µ ∈ P(X) and ν ∈ P(Y ) be two Borel probability
measures, and c : X × Y 7→ R ∪ {∞} be a Borel measurable cost function. The Monge
Problem is the following:
Problem 2.1.1. Find a Borel map T : X 7→ Y , that minimizes the cost
                                               Z
                                 M (S) :=          c(x, S(x)) dµ(x)                          (2.1.1)
                                                X
among all Borel maps S : X 7→ Y such that S# µ = ν.
    Such maps are called transport maps from µ to ν. Maps that minimize the cost M (S)
are called optimal transport maps.
    The push forward condition S# µ = ν can be characterized by
      Z                Z                       Z
         f (y)dν(y) =     f (y) dS# µ(y) =         f ◦ S(x) dµ(x),     ∀f ∈ L1 (Y, dν).      (2.1.2)
       Y                Y                        X
    There are few major drawbacks in the Monge formulation. For example:
    • The constraint set could be empty.
      Eg: For µ = δ0 and ν = 21 δ1 + 12 δ−1 , the condition (2.1.2) cannot hold for any S : X 7→ Y
      that is µ-a.e. single-valued.
    • The cost M (S) could be non-linear in S (depending on c), hence could be difficult to
      solve.
    • The constraint S# µ = ν may not be closed under weak convergence in general.
      Eg: ([2], Chapter 1) Let µ = L|[0,1] and ν = 12 δ1 + 12 δ−1 . Consider the sequence of
      functions given by Sn (x) := S(nx) where S : R 7→ R is a 1-periodic function defined
                                                    7


      by
                                             
                                                      on [0, 1/2)
                                             
                                             1
                                             
                                    S(x) =                                             (2.1.3)
                                              −1 on [1/2, 1)
                                             
                                             
      Then, (Sn )# µ = ν, ∀n ∈ N, but (Sn ) weakly converges to S = 0 in Lp , ∀1 ≤ p < ∞, so
      that S# µ = δ0 ̸= ν.
    Due to these issues, we consider a relaxation of the Monge Problem, which is known as
the Kantorovich Problem.
Definition 2.1.2. For given two probability measures µ ∈ P(X) and ν ∈ P(Y ), we define
the set of all transport plans from µ to ν by
          Π(µ, ν) := {γ ∈ P(X × Y ) : Projx (x, y)# γ = µ, Projy (x, y)# γ = ν}.       (2.1.4)
The conditions on γ above, are known as the marginal conditions and they can also be defined
as
      γ(A × Y ) = µ(A),      ∀A ∈ B(X),       and      γ(X × B) = ν(B),      ∀B ∈ B(Y ).
    Then, the Kantorovich Problem is defined as below:
Problem 2.1.3. Find a γ0 ∈ Π(µ, ν) that minimizes the cost
                                         Z
                                  K(γ) =       c(x, y) dγ(x, y)                        (2.1.5)
                                          X×Y
among all transport plans γ ∈ Π(µ, ν).
    Transport plans that minimize the cost K(γ) are called optimal transport plans.
    When compared to the Monge formulation, there are many advantages in the Kantorovich
formulation, such as:
    • The set Π(µ, ν) is always non-empty as it contains µ ⊗ ν.
    • The cost K(γ) is linear in γ (regardless of c), hence much easier to solve.
                                               8


    • The set Π(µ, ν) is a convex set.
      We can easily see that for any γ1 , γ2 ∈ Π(µ, ν) and 0 ≤ λ ≤ 1, λγ1 +(1−λ)γ2 ∈ Π(µ, ν).
    • If T# µ = ν, then γ := (Id ×T )# µ ∈ Π(µ, ν), hence the set of transport plans contains
      all transport maps.
Now, we will discuss the existence of a minimizer for the Kantorovich problem.
Theorem 2.1.4. ([51], Theorem 1.7) Let X, Y be two Polish spaces and µ ∈ P(X) and
ν ∈ P(Y ). If c : X × Y 7→ [0, ∞] is lower semi-continuous, then the Kantorovich problem
(2.1.3) has a minimizer.
    The proof is based on the tightness of the set Π(µ, ν) and the Prokhorov theorem.
    From here onwards, we will call the Kantorovich problem, the classical Optimal Transport
(OT) problem and we will denote it by
                                              (Z                     )
                            OTc :=     inf           c(x, y) dγ(x, y) .                  (2.1.6)
                                     γ∈Π(µ,ν)   X×Y
2.2   Duality
    Let X, Y be two compact Polish spaces, µ ∈ P(X) and ν ∈ P(Y ) and c : X × Y 7→ [0, ∞)
be continuous. We will define the dual formulation of the OT problem as the following
maximization problem:
                                        (Z                  Z              )
                      OTˆ ∗ :=   sup          ϕ(x) dµ(x) +      ψ(y) dν(y)               (2.2.1)
                          c
                               (ϕ,ψ)∈Φc     X                 Y
where
                  Φc := {(ϕ, ψ) ∈ Cb (X) × Cb (Y ) : ϕ(x) + ψ(y) ≤ c(x, y)}.             (2.2.2)
    Due to the lack of compactness of the above class of admissible functions, we will consider
an alternative dual formulation.
                                                 9


Definition 2.2.1. Given a function f : X 7→ R ∪ {±∞}, we define its c-transform, f c :
Y 7→ R ∪ {±∞} by
                                   f c (y) := inf {c(x, y) − f (x)}.                           (2.2.3)
                                              x∈X
                                                                                           ∗
Similarly, given a function g : Y 7→ R ∪ {±∞}, we define its c∗ -transform, g c : X 7→
R ∪ {±∞} by
                                      ∗
                                   g c (x) := inf {c(x, y) − g(y)}.                            (2.2.4)
                                               y∈Y
Definition 2.2.2. A function f : X 7→ R ∪ {±∞} is called c-concave, if there exists a
                                                      ∗
function g : Y 7→ R ∪ {±∞} such that f = g c . A function g : Y 7→ R ∪ {±∞} is called
c∗ -concave, if there exists a function f : X 7→ R ∪ {±∞} such that g = f c .
     We will denote the set of c-concave functions on X by c-conc(X) and the set of c∗ -concave
functions on Y by c∗ -conc(Y ).
     Observe that, given an admissible pair (ϕ, ψ) in OT    ˆ ∗ , if we replace (ϕ, ψ) by (ϕ, ϕc ) and
                     ∗
then again by (ϕcc , ϕc ), the value will be increased while satisfying the constraints ([51],
Definition 1.10). Hence, we consider the following dual formulation.
                                          (Z                   Z                )
                     OT∗c :=     sup            ϕ(x) dµ(x) +        ϕc (y) dν(y) .             (2.2.5)
                              ϕ∈c-conc(X)     X                  Y
Functions that maximize OT∗c are called Kantorovich potentials.
     Now, we will present the existence of dual maximizers and strong duality results.
Theorem 2.2.3. ([51], Proposition 1.11) Let X, Y be compact subsets of Rd and c be a con-
tinuous function. Then OT∗c has a solution (ϕ, ψ) such that ϕ ∈ c-conc(X), ψ ∈ c∗ -conc(Y )
and ψ = ϕc .
     In the proof, one starts with a maximizing sequence (ϕn , ψn ) and take the c-transforms
so that it improves the dual formulation. This transformation will make (ϕn , ψn ) equi-
continuous and equi-bounded, so that one can apply the Arzelà-Ascoli theorem to get the
existence result.
                                                   10


Theorem 2.2.4. ([51], Theorem 1.39) Let X, Y be two Polish spaces and suppose that
c : X × Y 7→ R is uniformly continuous and bounded. Then,
                                                   OTc = OT∗c .
     The proof uses the concept of c-cyclical monotonicity, which we will discuss next.
2.3    Properties of the optimizers
Definition 2.3.1. Given a function c : X × Y 7→ R ∪ {∞}, we say a set Γ ⊂ X × Y is
c-cyclically monotone, if for any positive integer p, any permutation σ of {1, . . . , p}, and any
finite family of points (x1 , y1 ), . . . , (xp , yp ) ∈ Γ, we have
                                         p                   p
                                      X                    X
                                             c(xi , yi ) ≤      c(xi , yσ(i) ).
                                       i=1                 i=1
Theorem 2.3.2. ([51], Theorem 1.38) Let γ be an optimal transport plan for OTc and let
c be a continuous function. Then, spt(γ) is a c-cyclically monotone set.
     Now, we will present a more general theorem.
Theorem 2.3.3. ([53], Theorem 5.9) Let X, Y be two Polish spaces and µ ∈ P(X) and
ν ∈ P(Y ). Suppose that c : X × Y 7→ R ∪ {∞} is a lower semi-continuous function, such that
there exist some real-valued, upper semi-continuous functions u ∈ L1 (X, dµ), v ∈ L1 (Y, dν)
satisfying
                          c(x, y) ≥ u(x) + v(y),                 ∀(x, y) ∈ X × Y.
Then,
    1. Duality holds:
                                                    (Z                         Z            )
                   OTc =             sup                   ϕ(x) dµ(x) +           ψ(y) dν(y)
                          (ϕ,ψ)∈Cb (X)×Cb (Y )           X                      Y
                                   ϕ+ψ≤c
                                                          (Z                       Z            )
                        =                sup                     ϕ(x) dµ(x) +         ψ(y) dν(y)
                          (ϕ,ψ)∈L1 (X,dµ)×L1 (Y,dν)            X                    Y
                                        ϕ+ψ≤c
                                           (Z                       Z                 )
                        =      sup                 ϕ(x) dµ(x) +          ϕc (y) dν(y)
                          ϕ∈L1 (X,dµ)           X                      Y
                                                          11


                                        (Z                 Z            )
                                              c∗
                         =    sup           ψ (x) dµ(x) +     ψ(y) dν(y) .
                            ψ∈L1 (Y,dν)   X                 Y
      Note that, in the above suprema, we may take ϕ ∈ c-conc(X) and ψ ∈ c∗ -conc(Y ).
   2. Suppose c is real-valued and the cost OTc is finite. Then there is a measurable c-
      cyclically monotone set Γ ⊂ X × Y , such that for any γ ∈ Π(µ, ν), the following
      statements are equivalent.
        a) γ is optimal;
        b) γ is c-cyclically monotone;
        c) There exists a c-concave function ϕ : X 7→ R ∪ {−∞} such that ϕ(x) + ϕc (y) =
            c(x, y), γ-a.e.;
        d) There exist functions ϕ : X 7→ R ∪ {∞} and ψ : Y 7→ R ∪ {∞}, such that
            ϕ(x) + ϕc (y) ≤ c(x, y), ∀(x, y), with equality γ-a.e.;
        e) γ is concentrated on Γ.
    Now, we will present a uniqueness result for the optimal transport plans. This is known
as the Brenier-McCann’s Theorem.
Theorem 2.3.4. ([54], Theorem 2.12) Let c(x, y) = |x − y|2 and µ ∈ P2 (X), ν ∈ P2 (Y ).
Suppose that µ vanishes on small sets. Then,
   1. There exists a unique optimal transport plan, given by
                                             γ = (Id ×∇u)# µ,
      where ∇u is uniquely determined µ-a.e. such that u is convex and ∇u# µ = ν.
      Furthermore,
                                           spt(ν) = ∇u(spt(µ)).
   2. ∇u is the unique solution to the Monge problem given by (2.1.1).
                                                  12


    3. If ν also vanishes on smalls sets, then ∇u∗ is a ν-a.e. unique solution to the Monge
       problem from ν to µ, such that u∗ is convex and
                                  ∇u(∇u∗ (y)) = y      ν-a.e. y     and
                                  ∇u∗ (∇u(x)) = x       µ-a.e. x.
     Finally, we will briefly discuss the notion of the Wasserstein distance.
Definition 2.3.5. Let (X, d) be a Polish metric space and µ, ν ∈ P(X). For a given p ∈
[1, ∞), we define the Wasserstein distance of order p between µ and ν by
                                               Z                    1/p
                                                            p
                        Wp (µ, ν) :=     inf         d(x, y) dγ(x, y)     .
                                       γ∈Π(µ,ν)  X×X
Proposition 2.3.6. ([3], Chapter 7.1) Wp defines a distance on Pp (X).
Proposition 2.3.7. ([53], Corollary 6.9) Wp is lower semi-continuous w.r.t. weak conver-
gence of measures.
                                                  13


                                            CHAPTER 3
                    CAPACITY CONSTRAINED OT PROBLEM
3.1    Introduction
    In the capacity constrained OT problem, we impose capacity constraints which limit the
amount transported between any given source and corresponding target. As an example in
the discrete case, we can consider a large number of coal mines from which coal has to be
transported to refining factories and the problem is to find where each unit of coal should
go with a minimum transportation cost. In the unconstrained OT problem, we assume that
any amount of coal can be transported, whereas in the capacity constrained case, there is
a limit to the amount of coal that can be transported from one mine to the corresponding
factory.
    Formally, given two probability measures µ, ν ∈ P(Rd ), that represent the distributions
in source and target, respectively, and a finite Borel measure γ̃ on Rd × Rd , that represents
the capacity constraint for the transport plans, we minimize the cost:
                                           (Z                        )
                                 inf                c(x, y) dγ(x, y)                      (3.1.1)
                               γ∈Πγ̃ (µ,ν)   Rd ×Rd
Here, the set Πγ̃ (µ, ν) represents the set of transport plans from µ to ν bounded by γ̃.
    In [46], Rachev and Rüschendorf introduced this problem on compact spaces where they
study bounded below, Borel measurable and lower semicontinuous cost functions and ob-
tained a dual formulation of the minimization problem. Recently, in a series of papers by
Korman, McCann and Seis, [33, 35, 34], the authors have considered this problem for con-
tinuous, bounded cost functions on Rd × Rd and for finite, bounded capacity constraints.
There, they have obtained the equivalence between the primal problem and a dual problem
with the existence of minimizers of the capacity constrained OT problem and existence of
dual maximizers. However, unlike in the classical case, any further information about dual
maximizers such as regularity or inheriting properties from the cost function is still unknown.
    In this chapter, we will present the existing results regarding this Capacity Constrained
                                                  14


OT problem and provide some characterization of the optimizers of the primal and dual
problems.
3.2   The Primal Problem
    Similar to the work in [33], we will use functions to represent mass densities.
    Let f and g be two non-negative, compactly supported density functions in L1 (Rd ) with
equal total masses, i.e. Rd f (x) dx = Rd g(y) dy, representing the source and the target
                            R                R
densities. Let c be a Borel measurable function on Rd ×Rd representing the cost function. Let
h̃ ∈ L∞ (Rd × Rd ) be a compactly supported function representing the capacity constraint.
Denote by Πh̃ (f, g), the set of all joint densities h ∈ L1 (Rd × Rd ) with marginals f and g,
and bounded by h̃, i.e.
                       Z                          Z
               f (x) =      h(x, y) dy, g(y) =         h(x, y) dx,       and 0 ≤ h ≤ h̃.
                         Rd                         Rd
We define the two-marginal Capacity Constrained Optimal Transport (CCOT) problem be-
tween f and g under the capacity h̃ as the following minimization problem:
                                       OTCC :=       inf      Ic (h)                     (3.2.1)
                                                  h∈Πh̃ (f,g)
where
                                         Z
                               Ic (h) :=          c(x, y)h(x, y) dxdy.
                                           Rd ×Rd
    Unlike in the unconstrained problem, it is not always guaranteed that the set Πh̃ (f, g)
is non-empty. The necessary and sufficient conditions for Πh̃ (f, g) to be non-empty are as
follows:
Proposition 3.2.1. ([46], Corollary 4.6.15) The set Πh̃ (f, g) ̸= ∅ if and only if
                       Z               Z                      Z
                          f (x) dx +      g(y) dy ≤ 1 +              h̃(x, y) dxdy,
                        A               B                      A×B
for any Borel measurable sets A, B ⊂ Rd .
    The following theorem states that OTCC has a minimizer.
                                                   15


Theorem 3.2.2. ([33], Theorem 3.1) Let c be a bounded, continuous function on Rd × Rd
and 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈ L1 (Rd ) compactly
supported functions such that the set Πh̃ (f, g) is non-empty. Then, OTCC has a minimizer
in Πh̃ (f, g).
    In order to get uniqueness of the minimizers, we require the cost c to satisfy three con-
ditions.
(C1) c(x, y) is bounded,
(C2) there is a Lebesgue negligible closed set Z ⊂ Rd ×Rd such that c(x, y) ∈ C 2 (Rd ×Rd \Z)
       and,
(C3) c(x, y) is non-degenerate: i.e. det∇2xy c(x, y) ̸= 0 for all (x, y) ∈ Rd × Rd \ Z.
Theorem 3.2.3. ([33], Theorem 8.1) Suppose that the cost c(x, y) satisfies the conditions
(C1), (C2), and (C3). Let 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈
L1 (Rd ) compactly supported functions such that the set Πh̃ (f, g) is non-empty. Then, OTCC
has a unique minimizer.
    Now, we will give a characterization of the minimizers of OTCC .
Definition 3.2.4. Let h̃ ∈ L∞ (Rd × Rd ). A density h ∈ Πh̃ (f, g) is called geometrically
extreme, if h(x, y) = 1W (x, y)h̃(x, y) for almost all (x, y) ∈ Rd × Rd , for some Lebesgue
measurable set W ⊂ Rd × Rd .
Theorem 3.2.5. ([33], Theorem 7.2) Suppose that the cost c(x, y) satisfies the conditions
(C1), (C2), and (C3). Let 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈
L1 (Rd ) compactly supported functions such that the set Πh̃ (f, g) is non-empty. If h ∈ Πh̃ (f, g)
minimizes OTCC , then h is geometrically extreme.
                                               16


3.3     Duality - Version I
     For given f, g ∈ L1 (Rd ) and 0 ≤ h̃ ∈ L∞ (Rd ×Rd ), we consider the following maximization
problem:
                                    OT∗CC :=          sup        J(u, v, w)                        (3.3.1)
                                                 (u,v,w)∈Lipc,h̃
where
                     Z                     Z                       Z
      J(u, v, w) :=      u(x)f (x) dx +         v(y)g(y) dy +                w(x, y)h̃(x, y) dxdy, (3.3.2)
                      Rd                     Rd                      Rd ×Rd
and
               (
    Lipc,h̃ :=   (u, v, w) : u ∈ L1 (Rd , f dx), v ∈ L1 (Rd , gdy), w ∈ L1 (Rd × Rd , h̃dxdy),
                                                                                )                  (3.3.3)
                  u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 .
     Now, we will present the strong duality result for the CCOT problem.
Theorem 3.3.1. ([35], Theorem 1) Let f, g ∈ L1 (Rd ) be two probability densities such that
Πh̃ (f, g) is non-empty and 0 ≤ h̃ ∈ L∞ (Rd ×Rd ) be compactly supported. Let c ∈ L1 (Rd ×Rd ).
Then,
                                             OTCC = OT∗CC .
Remark 3.3.2. In [35], the authors use an infinite dimensional linear programming duality
with a quadratic penalization to get this result.
     In [34], the same authors prove the existence of dual maximizers for OT∗CC .
     Let X and Y be two compact subsets of Rd with unit volumes, f and g be probability
densities on X and Y , respectively, and 0 ≤ h̃ ∈ L∞ (X × Y ).
     Instead of the dual functional (3.3.2), we consider the functional
                               Z             Z              Z
                   ′
                  J (u, v) :=     uf dx +         vg dy −            [−c + u + v]+ h̃ dxdy.        (3.3.4)
                                X              Y               X×Y
and define
                                     ′
                                OT∗CC :=               sup             J ′ (u, v).                 (3.3.5)
                                            u∈L1 (X,f dx),v∈L1 (Y,gdy)
                                                       17


                              ′
     Note that OT∗CC = OT∗CC (see Appendix 1).
     The existence result of dual maximizers is given below:
Theorem 3.3.3. ([34], Theorem 4.2) Let f, g and h̃ be continuous and strictly positive on
their compact supports X, Y , and X × Y , respectively. Let c ∈ L1 (X × Y ). Fix an η > 1
and assume that Πh̃/η (f, g) is non-empty. Then, there exist functions (u, v) ∈ L1 (X, f dx) ×
L1 (Y, gdy), such that
                                                 ′
                                          OT∗CC = J ′ (u, v).
     The authors also provide a characterization of the optimizers of the primal and the dual
problems as follows:
Corollary 3.3.4. ([34], Corollary 1.1) Under the assumptions of Theorem 3.3.3, any h ∈
Πh̃ (f, g) is optimal if and only if there exist functions (u, v) ∈ L1 (X, f dx) × L1 (Y, gdy), such
that
                                            
                                                ≥ 0 where h = 0,
                                            
                                            
                                            
                                            
                                            
                                            
                                            
                               c−u−v            = 0 where 0 < h < h̃,                        (3.3.6)
                                            
                                            
                                            
                                            
                                             ≤ 0 where h = h̃.
                                            
                                            
3.4     Duality - Version II
     In [46], the authors consider the CCOT problem in a different setting.
     Let X, Y be two compact subsets of Rd and let c : X ×Y 7→ R∪{∞} be Borel measurable
and bounded below. Let µ ∈ P(X), ν ∈ P(Y ) and γ̃ be a finite Borel measure on X × Y .
     Then, the CCOT is defined as the following minimization problem:
                                                  (Z                    )
                           OTCC :=      inf             c(x, y) dγ(x, y) ,                   (3.4.1)
                                      γ∈Πγ̃ (µ,ν)   X×Y
where
      Πγ̃ (µ, ν) := {γ ∈ Π(µ, ν) : γ(A × B) ≤ γ̃(A × B), ∀(A, B) ∈ B(X) × B(Y )}.            (3.4.2)
                                                   18


    Its dual formulation is given by the following maximization problem:
                      (Z                    Z                 Z                     )
       OT∗CC := sup        u(x) dµ(x) +         v(y) dν(y) +       w(x, y) dγ̃(x, y) , (3.4.3)
                  B      X                    Y                X×Y
where the supremum is taken over the set B of real-valued functions u, v, w satisfying u ∈
Cb (X), v ∈ Cb (Y ), w ∈ Cb (X ×Y ) and w ≤ 0 with u(x)+v(y)+w(x, y) ≤ c(x, y) everywhere.
    Now, we will present the duality theorem for this version.
Theorem 3.4.1. ([46], Theorem 4.6.14) Let c : X × Y 7→ R ∪ {∞} be Borel measurable and
bounded below. Then, the following statements are equivalent:
  (a) c is lower semi-continuous on X × Y .
  (b) The duality holds for all µ ∈ P(X), ν ∈ P(Y ) and γ̃ ∈ M+ (X × Y ).
                                         i.e.      OTCC = OT∗CC .
    The proof is based on the abstract duality theorem (see [46], Theorem 4.6.1).
3.5    A further characterization on the optimizers
    Even though the idea of the CCOT problem is quite as natural as the classical OT
problem, only a little is known about the optimizers when compared to other variants of
OT problem. In this section, we will present some characterization on the optimizers of the
primal and the dual problems for CCOT problem.
    Let X, Y be compact subsets of Rd and c be a non-negative, bounded, continuous function
on X × Y . Let µ ∈ P(X), ν ∈ P(Y ) be probability measures which are absolutely continuous
w.r.t. Lebesgue measure with densities f ∈ L1 (X) and g ∈ L1 (Y ) and γ̃ be a compactly
supported finite measure on X × Y that is absolutely continuous w.r.t. Lebesgue measure
with a bounded density h̃ ∈ L∞ (X × Y ).
    We will redefine the primal and the dual problems as follows:
                                               (Z                      )
                           Icap :=   inf               c(x, y) dγ(x, y) .              (3.5.1)
                                   γ∈Πγ̃ (µ,ν)    X×Y
                                                   19


                                   I∗cap :=       sup        J(u, v, w),                       (3.5.2)
                                             (u,v,w)∈Lipc,γ̃
where
                           Z                     Z                    Z
            J(u, v, w) =       u(x) dµ(x) +           v(y) dν(y) +          w(x, y) dγ̃(x, y)
                             X                     Y                    X×Y
and
                  (
       Lipc,γ̃ :=   (u, v, w) : u ∈ L1 (X, dµ), v ∈ L1 (Y, dν), w ∈ L1 (X × Y, dγ̃),
                                                                                )              (3.5.3)
                     u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 .
   Let γ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a maximizer for I∗cap .
Let E ⊆ spt(γ̃) be the compact support of γ.
   Then, by Theorem 3.2.5, we have that
                                            
                                             γ̃ on E ⊂ X × Y,
                                            
                                            
                                    γ=                                                         (3.5.4)
                                             0 elsewhere.
                                            
                                            
   By duality (Theorem 3.3.1), we have that
            Z               Z             Z            Z
                   c dγ =       u dµ +        v dν +           w dγ̃
              X×Y            X             Y             X×Y
                            Z             Z            Z                    Z
                         =      u dµ +        v dν +               w dγ̃ +              w dγ̃.
                             X             Y             (X×Y )∩E            (X×Y )∩E c
By (3.5.4) and w ≤ 0 on (X × Y ) ∩ E c , we have that
                         Z                Z             Z            Z
                                c dγ ≤        u dµ +         v dν +         w dγ.              (3.5.5)
                           X×Y              X             Y           X×Y
On the other hand, since (u, v, w) ∈ Lipc,γ̃ , we have that
                      u(x) + v(y) + w(x, y) ≤ c(x, y),            ∀(x, y) ∈ X × Y.
By integrating both sides with respect to γ, we get
                         Z             Z             Z                Z
                             u dµ +        v dν +            w dγ ≤         c dγ.              (3.5.6)
                           X            Y             X×Y              X×Y
                                                     20


By combining (3.5.5) and (3.5.6), we get
                                Z                Z            Z             Z
                                        c dγ =      u dµ +         v dν +           w dγ.            (3.5.7)
                                  X×Y             X             Y             X×Y
Thus, by using the marginal conditions, we can write
                                           Z
                                                 (c − u − v − w) dγ = 0.                             (3.5.8)
                                            X×Y
Since we have the inequality c − u − v − w ≥ 0, we can conclude that
                                 c(x, y) − u(x) − v(y) − w(x, y) = 0               γ-a.e.            (3.5.9)
   Now, let n ∈ N, σ be any permutation of {1, . . . , n} and pick an arbitrary collection of
points (x1 , y1 ), . . . , (xn , yn ) ∈ spt(γ). Then,
                       X  n                X n                X n
                             c(xi , yi ) −      w(xi , yi ) =      u(xi ) + v(yi )
                        i=1                i=1                 i=1
                                                              X n
                                                            =      u(xi ) + v(yσ(i) )
                                                               i=1
                                                              X n
                                                            ≤      c(xi , yσ(i) ) − w(xi , yσ(i) ).
                                                               i=1
This shows that spt(γ) is (c − w)-cyclical monotone (see Definition 2.3.1).
   From here onwards, we will assume that the capacity γ̃ = κµ ⊗ ν for some κ ≥ 1.
   Let γ ∗ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a maximizer for I∗cap .
Thus, by (3.5.7), we have
                              Z                  Z            Z             Z
                                            ∗
                                       c dγ =       u dµ +         v dν +           w dγ ∗ .        (3.5.10)
                                 X×Y              X             Y             X×Y
   Again, consider the inequality
                                          u(x) + v(y) + w(x, y) ≤ c(x, y).
   By integrating both sides with respect to an arbitrary γ ∈ Π(µ, ν), we get
                                Z             Z             Z                Z
                                     u dµ +      v dν +            w dγ ≤           c dγ.
                                  X            Y             X×Y               X×Y
                                                            21


i.e.
                                Z          Z             Z
                                   u dµ +       v dν ≤          (c − w) dγ.
                                 X           Y             X×Y
Now, taking the infimum over γ ∈ Π(µ, ν) from both sides, we get
                            Z          Z                      Z
                               u dµ +     v dν ≤ inf                 (c − w) dγ.
                             X          Y            γ∈Π(µ,ν)  X×Y
Observe that, since
                                          Πγ̃ (µ, ν) ⊆ Π(µ, ν),
we have
                      Z                               Z
               inf          (c − w) dγ ≤      inf           (c − w) dγ
             γ∈Π(µ,ν)  X×Y                γ∈Πγ̃ (µ,ν)  X×Y
                                          Z
                                       ≤          (c − w) dγ ∗                           (3.5.11)
                                          ZX×Y          Z
                                       =       u dµ +       v dν,       (by (3.5.10)).
                                           X              Y
     Recall that the strong duality result holds for the classical OT problem with Borel mea-
surable and µ ⊗ ν-a.e. finite costs (see [5], Theorem 2). Since c − w ≥ 0 is Borel measurable
and µ ⊗ ν-a.e. finite, we have that
                            Z                                        Z         Z
                      inf         (c − w) dγ =           sup            u dµ +     v dν. (3.5.12)
                   γ∈Π(µ,ν)   X×Y                 u∈L1 (X),v∈L1 (Y )  X          Y
                                                        u+v≤c
     Thus, by (3.5.11) and (3.5.12), we can conclude that (u, v) maximizes the dual of the
classical OT problem with cost (c − w).
     Finally, we will list the above characterization in the following proposition.
Proposition 3.5.1. Let γ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a
maximizer for I∗cap . Then,
    1. c(x, y) − u(x) − v(y) − w(x, y) = 0            γ-a.e..
    2. spt(γ) is (c − w)-cyclical monotone.
    3. If γ̃ = κµ ⊗ ν for some κ ≥ 1, then (u, v) maximizes the dual of the classical OT
       problem with cost (c − w).
                                                     22


                                             CHAPTER 4
                              MULTI-MARGINAL OT PROBLEM
4.1   The Multi-Marginal OT (MMOT) Problem
4.1.1    Introduction
    As opposed to the classical OTP, which is a two-marginal problem, multi-marginal OTP
(also known as the multi-dimensional Monge-Kantorovich Problem) studies transporting
mass from a single source to multiple targets. More generally, given a p-tuple of probability
measures (ν1 , . . . , νp ) in Rd , the problem consists of finding an optimal way of successively
rearranging ν1 onto νi against a certain cost on Rpd where ν1 represents the mass distribution
at the source and (νj )j̸=1 represent the mass distributions at the targets.
    MMOT is a versatile framework that can be applied in various fields, including image
processing, computer vision, economics, machine learning, natural language processing, and
medical imaging, among others. For example, MMOT is used in economics to model and
solve problems involving the distribution of resources, such as the allocation of goods among
multiple buyers and sellers [13]. Also, MMOT is used in machine learning tasks such as
clustering, where it is used to group similar data points together based on their feature
similarities.
    This was first discussed by Gangbo and Święch in [24], for a specific cost function. More
characterization of the optimal solutions and some applications have been discussed in [43,
44] and a few variants such as multi-marginal partial OTP [31] have been introduced later
on.
4.1.2    The Primal Problem (MMOT Problem)
    Let p be a positive integer. For a given p-tuple of probability measures (ν1 , . . . , νp ) in
P(Rd ), and a lower semi-continuous cost function c(x1 , . . . , xp ) : Rpd 7→ R, we consider the
minimization problem:
                                                           Z
               OTMM (ν1 , . . . , νp ) :=     inf               c(x1 , . . . , xp )dγ(x1 , . . . , xp ), (4.1.1)
                                          γ∈Π(ν1 ,...,νp )  Rpd
                                                        23


where
             Π(ν1 , . . . , νp ) := {γ ∈ P(Rpd ) : Projxi (x1 , . . . , xp )# γ = νi , ∀1 ≤ i ≤ p}. (4.1.2)
When p = 2, (4.1.1) becomes the classical OT problem.
Remark 4.1.1. In [24], the authors consider the cost
                                                                    p
                                                                  X
                                           c(x1 , . . . , xp ) =       |xj − xk |2                  (4.1.3)
                                                                  j̸=k
and the minimization problem
                                                        p Z
                                                      X          |Tj (x) − Tk (x)|2
                                 OT′MM (T ) := inf                                  dν1 (x).        (4.1.4)
                                                 K
                                                      j̸=k    Rd          2
where K is the set of all p-tuples of maps T = (T1 , . . . , Tp ) such that Ti : Rd 7→ Rd (i =
1, . . . , p) are Borel measurable and satisfy νi = Ti# ν1 .
Theorem 4.1.2. Given νi ∈ P(Rd ) for each i = {1, . . . , p}, and a lower semi-continuous
cost c : Rpd 7→ R, OTMM has a solution.
     The proof of the existence of a minimizer for (4.1.1) is quite standard. Similar to the
proof of existence of solutions for the classical OT Problem, since the set Π(ν1 , . . . , νp ) is non-
empty, convex and compact with respect to the weak topology and the functional γ 7→ c dγ
                                                                                                     R
is linear with respect to γ, we can guarantee the existence of a minimizer for (4.1.1).
Remark 4.1.3. In [24], the authors have proven that if the measures ν1 , . . . , νp are vanishing
on (d − 1)-rectifiable sets and have finite second moments, then the MMOT Problem with
cost pj̸=k |xj − xk |2 has a unique solution (See [24], Corollary 2.2).
        P
4.1.3        Duality
     Similar to the dual formulation of the two-marginal OT problem, we define the dual
problem of the MMOT Problem as follows:
                                                               24


     Given probability measures (ν1 , . . . , νp ) in P(Rd ), we consider the maximizing problem
                                                             ( p Z                         )
                                                                X
                            OT∗MM (u1 , . . . , up ) := sup              ui (xi ) dνi (xi ) .     (4.1.5)
                                                          A     i=1  Rd
Here, A is the set of all p-tuples of functions (u1 , . . . , up ) such that ui ∈ L1 (Rd , dνi ) and
upper semi-continuous, and pi=1 ui (xi ) ≤ c(x1 , . . . , xp ), ∀(x1 , . . . , xp ) ∈ Rpd .
                                      P
     The duality results between (4.1.1) and (4.1.5) have been discussed in the literature. How-
ever, in [24], the authors provide duality results with extensive characterization. Therefore,
we will present the duality results given in [24] for the cost given by (4.1.3).
Theorem 4.1.4. ([24], Theorem 2.1) Assume that ν1 , . . . , νp are non-negative Borel proba-
bility measures vanishing on (d − 1)-rectifiable sets and having finite second moments. Set
Xi := spt(νi ) for i = 1, . . . , p. Then:
   (i) OT∗MM admits a maximizer u = (u1 , . . . , up ) ∈ A.
  (ii) There is a minimizer S = (S1 , . . . , Sp ) for (4.1.4) satisfying S1 (x) = x (x ∈ Rd ).
       The Si are one-to-one νi -a.e., are uniquely determined, and have the form Si (x) =
       ∇fi∗ (∇f1 (x)) (x ∈ Rd ), where
                                                              |x|2
                                                    fi (x) =       + ϕi (x),
                                                               2
       the ϕi are convex functions, and fi∗ ∈ C 1 (Rd ) where fi∗ denotes the Legendre transform
       of fi (see [47], Section 26).
 (iii) Duality holds: the optimal values in (4.1.4) and (4.1.5) coincide.
 (iv) If ū = (ū1 , . . . , ūp ) ∈ A is another maximizer for (4.1.5), we can modify the ūi ’s on sets
       of zero νi measure to obtain a maximizer, still denoted ū, such that ūi is differentiable
       νi -a.e. Furthermore,
                                                     ∇ui = ∇ūi     νi -a.e.
                                                            25


4.2     Capacity Constrained Multi-Marginal OT Problem
     The notion of the capacity constrained multi-marginal OT Problem already has been
introduced in [18]. The Capacity Constrained Multi-Marginal Optimal Transport (CCM-
MOT) problem introduces capacities that limit the amount transported between the source
and the targets.
4.2.1       The Primal Problem
     Suppose we are given a positive integer p, and p probability measures ν1 , . . . , νp ∈ P(Rd ).
Let γ̃ be a compactly supported finite measure on Rpd such that γ̃ ≪ ν1 ⊗ . . . ⊗ νp .
     Define
Πγ̃ (ν1 , . . . , νp ) := {γ ∈ Π(ν1 , . . . , νp ) : γ(A1 × . . . × Ap ) ≤ γ̃(A1 × . . . × Ap ), ∀Ai ∈ B(Rd ), ∀i}.
                                                                                                            (4.2.1)
We assume that the set Πγ̃ (ν1 , . . . , νp ) is non-empty.
     Then, the CCMMOT problem is to minimize the cost:
                                                             Z
                           OTCCMM :=            inf               c(x1 , . . . , xp ) dγ(x1 , . . . , xp ). (4.2.2)
                                         γ∈Πγ̃ (ν1 ,...,νp )  Rpd
     In [18], it has been shown that the CCMMOT problem has a solution and some char-
acterization of the optimal solution is given. We will present some of the results from [18]
below.
Theorem 4.2.1. ([18], Theorem 3.1) If c ∈ L∞ (Rpd ), then OTCCMM has a solution.
     Next, we will provide a characterization of the optimal solutions of (4.2.2).
Definition 4.2.2. A measure γ ∈ Πγ̃ (ν1 , . . . , νp ) is called an extreme point of the convex set
Πγ̃ (ν1 , . . . , νp ) if γ is not the midpoint of a non-trivial line segment in Πγ̃ (ν1 , . . . , νp ).
Theorem 4.2.3. ([18], Theorem 4.1) Suppose ν1 , . . . , νp are non-atomic Borel probability
measures on Rd . Then, γ ∈ Πγ̃ (ν1 , . . . , νp ) is an extreme point of the set Πγ̃ (ν1 , . . . , νp ) if
and only if γ = 1W γ̃ for a γ-measurable set W ⊂ Rpd .
                                                               26


    To get the uniqueness of optimal solutions, we make the following two assumptions on
the cost function.
   1. Assume that the function c ∈ L∞ (Rpd ) is such that the mixed partial derivative of
      order p
                                                                     ∂ pc
                                              ∂i1 ,...,ip c =                   i
                                                               ∂xi11 , . . . , xpp
                                                             i
      exists for each set of variables (xi11 , . . . , xpp ), where 1 ≤ ik ≤ d for any k ≤ p.
   2. Assume that for some fixed number N ∈ N ∪ {∞} there exist at most countably many
      disjoint open sets {Gk }N      k=1 , Gk ⊆ R , such that the following conditions are satisfied:
                                                      d
     (C1) each set Gk has positive Lebesgue measure;
     (C2) the union of all sets in {Gk }N        k=1 has full Lebesgue measure;
                                                                                   k
     (C3) for every k ≤ N there exist a set of variables (xk11 , . . . , xpp ) such that the functions
            ∂k1 ,...,kp c is either strictly positive or strictly negative on Gk .
Theorem 4.2.4. ([18], Theorem 6.1) Suppose we are given a cost function c on Rd satisfying
the conditions (C1), (C2), and (C3). Then, any γ ∈ P(Rpd ) that is an optimal plan is an
extreme point of the set Πγ̃ (ν1 , . . . , νp ).
Corollary 4.2.5. ([18], Corollary 6.1.1) If c on Rd satisfies the conditions (C1), (C2), and
(C3), then the optimal plan is unique.
4.2.2    Duality
    To the best of our knowledge, a dual formulation for the CCMMOT problem does not exist
in the literature. Therefore, in this section we present a dual formulation for the CCMMOT
problem and prove the strong duality result and the existence of dual maximizers.
    We will generalize the techniques used in [34, 35] for the two-marginal capacity con-
strained OT problem to get our results.
                                                           27


    Let νi be a probability measure in P(Rd ) which is absolutely continuous w.r.t. Lebesgue
measure with density fi ∈ L1 (Rd ) for each 1 ≤ i ≤ p and γ̃ be a compactly supported finite
measure that is absolutely continuous w.r.t. Lebesgue measure with a bounded density h̃.
We denote by Πh̃ (f1 , . . . , fp ) the set of all non-negative measurable joint densities on Rpd that
are bounded by h̃. i.e. for each 1 ≤ i ≤ p, fi (xi ) = h(x1 , . . . , xp ) dx1 . . . dxi−1 dxi+1 . . . dxp ,
                                                                            R
and 0 ≤ h ≤ h̃. As the dual formulation of the CCMMOT problem, we consider the following
maximization problem:
                                     OT∗CCMM :=                sup          J(u1 , . . . , up , w)                            (4.2.3)
                                                      (u1 ,...,up ,w)∈Liph̃
                                                                          c
where
                                  p Z                                   Z
                                 X
   J(u1 , . . . , up , w) :=               ui (xi )fi (xi ) dxi +            w(x1 , . . . , xp )h̃(x1 , . . . , xp ) dx1 . . . dxp
                                 i=1   Rd                                Rpd
                                                                                                                              (4.2.4)
and
                 (
    Liph̃c :=      (u1 , . . . , up , w) : ui ∈ L1 (Rd ), w ∈ L1 (Rpd ),
                        p
                                                                                                                     )        (4.2.5)
                      X
                           ui (xi ) + w(x1 , . . . , xp ) ≤ c(x1 , . . . , xp ), and w(x1 , . . . , xp ) ≤ 0 .
                       i=1
First, we will prove that the strong duality holds for the CCMMOT problem.
Theorem 4.2.6. Let νi be a probability measure in P(Rd ) which is absolutely continuous
w.r.t. Lebesgue measure with density fi ∈ L1 (Rd ) for each 1 ≤ i ≤ p, γ̃ be a compactly
supported finite measure that is absolutely continuous w.r.t. Lebesgue measure on Rpd with
a bounded density h̃, c ∈ L1 (Rpd ) and assume Πh̃ (f1 , . . . , fp ) ̸= ∅. Then,
                                                   OTCCMM = OT∗CCMM .
    Here, we will redefine OTCCMM as
                                                                              Z
                              OTCCMM =              inf            Ic (h) :=       ch dx1 . . . dxp .                         (4.2.6)
                                               h∈Πh̃ (f1 ,...,fp )             Rpd
                                                                    28


Proof. The proof of Theorem 4.2.6 is a consequence of the two propositions that we will be
proving below.
Proposition 4.2.7. Under the hypotheses of Theorem 4.2.6, we have
                                                  OTCCMM ≥ OT∗CCMM                                               (4.2.7)
Proposition 4.2.8. Under the hypotheses of Theorem 4.2.6, there exists a sequence
{(uε1 , . . . , uεp , wε )}ε↓0 in Liph̃c such that
                                             Ic (h0 ) = lim J(uε1 , . . . , uεp , wε ),                          (4.2.8)
                                                         ε↓0
where h0 is a minimizer of (4.2.6) of the form h0 = 1W h̃, for a Lebesgue measurable set
W ⊂ Rpd .
    Proof of Proposition 4.2.7: Let h ∈ Πh̃ (f1 , . . . , fp ) with Ic (h) finite and let
(u1 , . . . , up , w) ∈ Liph̃c . Then we have
                       Z
         Ic (h) =           ch dx1 . . . dxp
                        Rpd
                        p Z                   Z                             Z             p
                       X                                                                X
                   =            ui fi dxi +         wh̃ dx1 . . . dxp +            (c −     ui − w)h dx1 . . . dxp
                       i=1   Rd                 Rpd                           Rpd       i=1
                            Z
                          +        w(h − h̃) dx1 . . . dxp
                              Rpd
                   ≥ J(u1 , . . . , up , w).
Note that, in the second line, we use the marginal conditions on h and in the last line, we
used the properties of the set Liph̃c along with the fact that h ≤ h̃. By taking the infimum
over h ∈ Πh̃ (f1 , . . . , fp ) and supremum over (u1 , . . . , up , w) ∈ Liph̃c from both sides, we get
the inequality (4.2.7).
    Proof of Proposition 4.2.8: Similar to [35], we introduce a relaxed version of the MMOT
problem with capacity constraints. Let ε > 0 be a small number. Define
                                            Z                                 p
                                                                        1 X
                                Icε (h)  :=      ch dx1 . . . dxp +               || ⟨h⟩xi − fi ||22             (4.2.9)
                                             Rpd                       2ε    i=1
                                                              29


where ⟨g⟩xi denotes the ith marginal of g, for a given function g = g(x1 , . . . , xp ) defined on
Rpd , i.e. for each 1 ≤ i ≤ p, ⟨g⟩xi = g(x1 , . . . , xp ) dx1 . . . dxi−1 dxi+1 . . . dxp .
                                                   R
    For (u1 , . . . , up , w) ∈ Liph̃c such that each ui ∈ L2 (Rd ), we define
                                       p Z                        Z                                     p
                                     X                                                             εX
          ε
        J (u1 , . . . , up , w) :=              ui fi dxi +             wh̃ dx1 . . . dxp −               ||ui ||22 . (4.2.10)
                                     i=1     Rd                    Rpd                             2 i=1
Note that, if ui ∈     / L2 (Rd ) for some i ∈ {1, . . . , p}, we can extend J ε to a functional on the
entire set Liph̃c by setting J ε (u1 , . . . , up , w) := −∞.
    First, we will prove that the statement of Proposition 4.2.7 holds for the relaxed version.
Lemma 4.2.9. Let ε > 0 be given. Under the hypotheses of Theorem 4.2.6, we have
                                  inf Icε (h) ≥               sup          J ε (u1 , . . . , up , w).                 (4.2.11)
                                0≤h≤h̃               (u1 ,...,up ,w)∈Liph̃
                                                                         c
Proof. Let 0 ≤ h ≤ h̃ and (u1 , . . . , up , w) ∈ Liph̃c be such that Icε (h) and J ε (u1 , . . . , up , w) are
both finite. Then, observe that,
                         Z                                     p
                                                       1 X
            Icε (h)  =         ch dx1 . . . dxp +                  || ⟨h⟩xi − fi ||22
                           Rpd                         2ε i=1
                           p  Z                    Z                                       p
                         X                                                           εX
                     =             ui fi dxi +            wh̃ dx1 . . . dxp −                 ||ui ||22
                         i=1 R
                                 d                   Rpd                             2   i=1
                               Z              p                                      Z
                                            X
                            +        (c −        ui − w)h dx1 . . . dxp +                    w(h − h̃) dx1 . . . dxp
                                Rpd          i=1                                       Rpd
                                     p
                                1 X
                            +            || ⟨h⟩xi − fi + εui ||22
                               2ε i=1
                     ≥ J ε (u1 , . . . , up , w).
The last line holds true by the definition of the set Liph̃c , the fact that 0 ≤ h ≤ h̃ and the
non-negativity of the L2 norms.
    Finally, by taking the infimum over h ∈ Πh̃ (f1 , . . . , fp ) and supremum over (u1 , . . . , up , w)
∈ Liph̃c from both sides, we get the inequality (4.2.11).
    Next, we will prove that the relaxed problem (4.2.9) has a minimizer.
                                                                  30


Lemma 4.2.10. Let ε > 0 be given. Under the hypotheses of Theorem 4.2.6, there exists
a minimizer hε of Icε , and hε can be chosen of the form hε = 1Wε h̃ for some Lebesgue
measurable set Wε ⊆ Rpd . Furthermore, if ĥε is another minimizer of Icε , then for each
                              D E
1 ≤ i ≤ p, ⟨hε ⟩xi = ĥε .
                                     xi
Proof. Let h0 be a minimizer of (4.2.6). Since it is admissible for Icε , and h0 has marginals
f1 , . . . , fp , we have that Icε (h0 ) = Ic (h0 ). Thus, we have that, −||h̃||∞ ||c||L1 (W̃ ) ≤ inf h Icε (h) ≤
Icε (h0 ) = Ic (h0 ) < ∞, where W           f is the support of h̃. So, we pick a minimizing sequence
{hn }n∈N of Icε . Since we have 0 ≤ hn ≤ h̃, for each n, we can find a subsequence (without
relabeling) {hn }n∈N that converges weak-* in (L1 (Rpd ))∗ = L∞ (Rpd ) (see [48], Chapter 19)
to some L∞ function hε that satisfies 0 ≤ hε ≤ h̃.
      Also note that for each 1 ≤ i ≤ p,
                                             Z
                            ⟨hn ⟩xi − fi =              (hn − h0 ) dx1 . . . dxi−1 dxi+1 . . . dxp .
                                               R(p−1)d
Hence,
           Z                                Z       Z                                                     !2
                  | ⟨hn ⟩xi − fi |2 dxi ≤                      |hn − h0 | dx1 . . . dxi−1 dxi+1 . . . dxp    dxi .
             Rd                              Rd        R(p−1)d
      Since hn , h0 ≤ h̃ ∈ L∞ (Rpd ) with compact support, the sequence {⟨hn ⟩xi − fi }n∈N is
bounded in L2 . Hence, we can find a further subsequence (without relabeling) {hn }n∈N such
that the sequences {⟨hn ⟩xi − fi }n∈N weakly converge to ⟨hε ⟩xi − fi in L2 for each i.
                                                                                             
      Now, fix an i ∈ {1, . . . , p} and let ξ = ξ(xi ) be an arbitrary smooth and compactly
supported test function. Then,
                     Z                                                    Z
                                                                                                 
                         [ ⟨hn ⟩xi − fi − ⟨hε ⟩xi − fi ]ξ dxi =                 ⟨hn ⟩xi − ⟨hε ⟩xi ξ dxi
                      Rd                                                     d
                                                                          ZR
                                                                       =       (hn − hε )ξ dx1 . . . dxp .
                                                                           Rpd
      Since hn converges to hε weak-*, we get that ⟨hn ⟩xi − fi converges to ⟨hε ⟩xi − fi .
      Now, by the lower semi-continuity of the L2 norm with respect to weak L2 convergence,
for each i, we have that
                                 || ⟨hε ⟩xi − fi ||L2 (Rd ) ≤ lim inf || ⟨hn ⟩xi − fi ||L2 (Rd ) .
                                                                n→∞
                                                                31


Also, since c ∈ L1 (Rpd ), and hε , hn are supported in W                        f , by the weak-* convergence on
L∞ (W  f ), we get
                                   Z                                  Z
                                          chε dx1 . . . dxp = lim           chn dx1 . . . dxp .
                                    Rpd                        n→∞     Rpd
Thus, we have that
                     inf Icε (h) = lim inf Icε (hn )
                      h              n→∞
                                                                                   p
                                              (Z                                                           )
                                                                              1 X
                                 = lim inf            chn dx1 . . . dxp +             || ⟨hn ⟩xi − fi ||22
                                     n→∞         Rpd                         2ε   i=1
                                              Z                                 p
                                                                          1 X
                                 ≥ lim inf         chn dx1 . . . dxp +             lim inf || ⟨hn ⟩xi − fi ||22
                                     n→∞       Rpd                        2ε   i=1
                                                                                     n→∞
                                   Z                                 p
                                                                 1 X
                                 ≥         chε dx1 . . . dxp +           || ⟨hε ⟩xi − fi ||22
                                     R pd                       2ε  i=1
                                 = Icε (hε ).
This concludes that hε is a minimizer for Icε . Since the relaxed problem is strictly convex
(see Appendix 2), it is clear that hε has unique marginals. Also, since hε is a minimizer for Ic
in the class Πh̃ (⟨hε ⟩x1 , . . . , ⟨hε ⟩xp ), we can choose hε such that hε = 1Wε h̃ for some Lebesgue
measurable set Wε ⊂ W             f (see Theorem 4.2.3 and Theorem 4.2.4).
     For the rest of the proof, we define the following functions.
                                                    1
                                          uεi := − (⟨hε ⟩xi − fi )          ∀1 ≤ i ≤ p,                         (4.2.12)
                                                    ε
and
                                                                      p
                                                             (                   )
                                                                     X
                                                wε := min c −             uεi , 0 .                             (4.2.13)
                                                                     i=1
     Note that, by the definition of wε , we get that wε ≤ 0 and                                    uεi + wε ≤ c. Thus,
                                                                                             Pp
                                                                                                i=1
(uε1 , . . . , uεp , wε ) ∈ Liph̃c defined in (4.2.5). Also note that for each i, uεi are determined inde-
pendently of the choice of hε in Πh̃ (f1 , . . . , fp ). We will claim that (uε1 , . . . , uεp , wε ) maximizes
J ε in Liph̃c .
                                                               32


Lemma 4.2.11. Let hε = 1Wε h̃ be a minimizer for Icε . Then
                                                    
                                                     ≤ 0 a.e. in Wε ,
                                             p
                                                    
                                                    
                                           X
                                      c−        uiε
                                                                                                              (4.2.14)
                                                     ≥ 0 a.e. in W
                                                    
                                            i=1                          f \ Wε .
Proof. Let ξ ≥ 0 be an arbitrary smooth test function. Define, for any σ ∈ R
                                               hσε := hε + σξ(h̃ − hε ).
Since hε = 1Wε h̃, we have                       
                                                            a.e. in Wε ,
                                                 
                                                 hε
                                                 
                                          hσε =                                                               (4.2.15)
                                                 σξ h̃ a.e. in W
                                                 
                                                                     f \ Wε .
   Observe that, h0ε = hε and for 0 ≤ σ ≤ ||ξ||−1                ∞ , 0 ≤ hε ≤ h̃. Since hε minimizes Ic , we
                                                                                σ                                   ε
have Icε (hε ) = Icε (h0ε ) ≤ Icε (hσε ).
   Now observe that,
                 Z                                    p
                                                 1 X
    Icε (hσε ) =      chσε    dx1 . . . dxp +           || ⟨hσε ⟩xi − fi ||22
                  Rpd                           2ε i=1
                 Z                                                       p
                                                                    1 X D                             E
               =      c(hε + σξ(h̃ − hε )) dx1 . . . dxp +                    || (hε + σξ(h̃ − hε ))        − fi ||22
                  Rpd                                               2ε i=1
                                                                                                        x i
                 Z                              Z                                           p
                                                                                      1 X
               =      chε dx1 . . . dxp +             cσξ(h̃ − hε ) dx1 . . . dxp +           || ⟨hε ⟩xi − fi ||22
                  Rpd                            Rpd                                  2ε i=1
                              p  Z                      E 2
                       1 X                 D
                   +                    σξ (h̃ − hε )           dxi
                      2ε i=1 Rd                          xi
                            p Z       D                E 
                      1X                                                          
                   +                    σξ (h̃ − hε )          · ⟨hε ⟩xi − fi dxi .
                      ε i=1 Rd                           xi
   Then, the right derivative of Icε at σ = 0 is given by
                         Icε (σ) − Icε (0)
   ∂+ Icε (0) = lim+
                 σ→0
                            (Z σ                            Z
                         1
               = lim+                 chε dx1 . . . dxp +          cσξ(h̃ − hε ) dx1 . . . dxp
                 σ→0 σ           Rpd                          Rpd
                               p                               p Z                     E 2
                         1 X                        2    1 X                  D
                    +             || ⟨hε ⟩xi − fi ||2 +                σξ (h̃ − hε )           dxi
                       2ε i=1                            2ε i=1 Rd                       xi
                                                            33


                          p Z         D                 E 
                    1X                                                                 
                 +                      σξ (h̃ − hε )             · ⟨hε ⟩xi − fi dxi
                    ε i=1 Rd                               xi
                                                             p
                    Z                                                                    )
                                                        1 X
                 −          chε dx1 . . . dxp −                  || ⟨hε ⟩xi − fi ||22
                      Rpd                              2ε i=1
                                                                               p Z
                     (Z
                                                                        σ X               D          E 2
             = lim+               cξ(h̃ − hε ) dx1 . . . dxp +                           ξ (h̃ − hε )        dxi
               σ→0          Rpd                                         2ε i=1 Rd                      xi
                          p Z         D                                                   )
                    1X                                 E                            
                 +                      ξ (h̃ − hε )            · ⟨hε ⟩xi − fi dxi
                    ε i=1 Rd                             xi
               Z                                                p Z         D              E 
                                                          1X                                                       
             =      cξ(h̃ − hε ) dx1 . . . dxp +                              ξ (h̃ − hε )        · ⟨hε ⟩xi − fi dxi
                Rpd                                       ε i=1 Rd                            xi
               Z                                                p Z
                                                          1X                               
             =      cξ(h̃ − hε ) dx1 . . . dxp +                              ⟨hε ⟩xi − fi ξ(h̃ − hε ) dx1 . . . dxp
                Rpd                                       ε i=1 Rpd
                                 p
               Z                         !
                               X
             =         c−            uεi ξ(h̃ − hε ) dx1 . . . dxp .
                Rpd            i=1
    Since h0ε = hε minimizes Icε , we have
                                                              p
                                              Z                        !
                                                            X
                    0 ≤ ∂+ Icε (0) =                   c−         uεi ξ(h̃ − hε ) dx1 . . . dxp .
                                                 Rpd        i=1
    As this holds for any arbitrary test function ξ ≥ 0, we have that (c −
                                                                                                   Pp
                                                                                                      i=1 uεi ) (h̃−hε ) ≥ 0
a.e..
    Since h̃ − hε ≥ 0 a.e. and h̃ − hε > 0 a.e in W                       f \ Wε , we get the second part of the
inequality in (4.2.14).
    With a similar argument for hσε := hε − σξhε , we can prove the first inequality in (4.2.14).
    Next, we will prove a duality result for the relaxed version.
Lemma 4.2.12. Let hε = 1Wε h̃ be a minimizer for Icε and uεi , wε be defined by (4.2.12) and
(4.2.13). Then
                                            Icε (hε ) = J ε (uε1 , . . . , uεp , wε ).                               (4.2.16)
    In particular, (uε1 , . . . , uεp , wε ) is a maximizer for J ε (u1 , . . . , up , w) in Liph̃c .
                                                             34


Proof. Observe that,
                                  p Z                         Z                                         p
                                X
                                                                          ε                        εX ε 2
J ε
    (uε1 , . . . , uεp , wε ) =             uεi fi  dxi +              w h̃ dx1 . . . dxp −                ||u ||
                                        Rd                       Rpd                               2 i=1 i 2
                                Zi=1                                 Z                                 Z
                              =        chε dx1 . . . dxp −                  chε dx1 . . . dxp +              wε hε dx1 . . . dxp
                                  Rpd                                  Rpd                               Rpd
                                                                                 p                                p Z
                                       Z                                      XZ                                X
                                    −            ε
                                              w hε dx1 . . . dxp +                        uεi  ⟨hε ⟩xi dxi −              uεi ⟨hε ⟩xi dxi
                                         Rpd                                   i=1    Rd                        i=1    Rd
                                         p                                                                   p
                                       XZ                             Z
                                                                                                          εX ε 2
                                    +              uεi fi dxi +               wε h̃ dx1 . . . dxp −             ||u ||
                                       i=1     Rd                        Rpd                              2 i=1 i 2
                                                                                      p
                                Z                                    Z                                 !
                                                                                     X
                                                                                                     ε
                              =        chε dx1 . . . dxp −                     c−          uεi  −w        hε dx1 . . . dxp
                                  Rpd                                  Rpd           i=1
                                       Z                                                 p
                                                                                       XZ
                                                 ε
                                    +         w (h̃ − hε ) dx1 . . . dxp +                         uεi (fi − ⟨hε ⟩xi ) dxi
                                         Rpd                                            i=1     Rd
                                            p
                                       εX ε 2
                                    −           ||u ||
                                       2 i=1 i 2
                                Z                                           p
                                                                      1 X
                              =        chε dx1 . . . dxp +                      || ⟨hε ⟩xi − fi ||22
                                  R pd                               2ε    i=1
                                       Z
                                    +         wε (h̃ − hε ) dx1 . . . dxp
                                         R pd
                                                         p
                                       Z                                    !
                                                        X
                                    −            c−            uεi − wε hε dx1 . . . dxp
                                         Rpd            i=1
                                              Z
                              = Icε (hε ) +           wε (h̃ − hε ) dx1 . . . dxp
                                                Rpd
                                                         p
                                       Z                                    !
                                                        X
                                    −            c−            uεi − wε hε dx1 . . . dxp .
                                         Rpd            i=1
                                                                                                                                 (4.2.17)
     From (4.2.14) and (4.2.13), we observe that, on Wε , hε = h̃, hence wε (h̃ − hε ) = 0 and
since c − pi=1 uεi ≤ 0, wε = c − pi=1 uεi , we obtain c − pi=1 uεi − wε = 0. On the other hand,
               P                                 P                                     P
on W f \ Wε , hε = 0, hence (c − Pp uεi − wε )hε = 0, and since c − Pp uεi ≥ 0, wε = 0, we
                                                     i=1                                                      i=1
obtain wε (h̃ − hε ) = 0.
     Thus, (4.2.17) becomes
                                                   J ε (uε1 , . . . , uεp , wε ) ≥ Icε (hε ).
                                                                         35


     By (4.2.11) and the fact that (uε1 , . . . , uεp , wε ) ∈ Liph̃c , we can conclude that (uε1 , . . . , uεp , wε )
is a maximizer for J ε (uε1 , . . . , uεp , wε ) in Liph̃c .
     Now, we will prove that the minimizer of the relaxed problem (4.2.9) approximates the
original problem (4.2.6).
Lemma 4.2.13. Let {hε }ε↓0 be a sequence of minimizers for Icε . Then, it is precompact in
the L∞ -weak-* topology and every limit point h0 is a minimizer of Ic . Furthermore,
                                        lim Ic (hε ) = Ic (h0 ),                                          (4.2.18)
                                         ε↓0
                                      lim ε||uεi ||22 = 0,           ∀1 ≤ i ≤ p.                          (4.2.19)
                                       ε↓0
Proof. Since we have 0 ≤ hε ≤ h̃, there exists a subsequence {hεn }n∈N that converges weakly-
* in L∞ (W f ) to some function 0 ≤ h̄ ≤ h̃.
     Let h̄0 be a minimizer of Ic . Note that, h̄0 is admissible for the relaxed problem and since
hεn is a minimizer for the relaxed problem, we have
                                          Icεn (hεn ) ≤ Icεn (h̄0 ) = Ic (h̄0 ).                          (4.2.20)
By the weak-* convergence of {hεn } in L∞ (W              f ), we have that
                            Z                                    Z
                                   ch̄ dx1 . . . dxp = lim             chεn dx1 . . . dxp .
                              Rpd                          εn ↓0   Rpd
i.e.
                                                Ic (h̄) = lim Ic (hεn ).                                  (4.2.21)
                                                           εn ↓0
Since Ic (hεn ) ≤ Icεn (hεn ), we have
                                      lim inf Ic (hεn ) ≤ lim inf Icεn (hεn ).                            (4.2.22)
                                         εn ↓0                   εn ↓0
By taking the lim inf in (4.2.20), we have
                                             lim inf Icεn (hεn ) ≤ Ic (h̄0 ).                             (4.2.23)
                                               εn ↓0
                                                            36


Thus, by combining (4.2.21), (4.2.22), and (4.2.23)
                                                      Ic (h̄) ≤ Ic (h̄0 ).                                              (4.2.24)
Now, we will claim that h̄ has the marginals f1 , . . . , fp .
     From (4.2.20), we have
              Z                                        p                                 Z
                                               1 X                                2
                     chεn  dx1 . . . dxp +                 || ⟨hεn ⟩xi − fi ||2 ≤               ch̄0 dx1 . . . dxp .
                Rpd                           2εn i=1                                      Rpd
So,
                          p                                    Z
                        X
                              || ⟨hεn ⟩xi −   fi ||22 ≤ 2εn          c(h̄0 − hεn ) dx1 . . . dxp .
                         i=1                                     Rpd
Note that, since h̄0 , hεn ∈ L∞ (Rpd ) with 0 ≤ hεn ≤ h̃, ∀εn , and c ∈ L1 (Rpd ), we have
      c(h̄0 − hεn ) dx1 . . . dxp ≤ 2||h̃||L∞ (Rpd ) ||c||L1 (Rpd ) .
R
  Rpd
     Hence, supn Rpd c(h̄0 − hεn ) dx1 . . . dxp < ∞.
                     R
     Therefore, as εn ↓ 0, for each i,
                                                ⟨hεn ⟩xi → fi         in L2 .                                           (4.2.25)
     Now, fix an i ∈ {1, . . . , p} and let ξ = ξ(xi ) be an arbitrary smooth and compactly
supported test function. Then,
      Z                              Z                                       Z                            
                                                                
             h̄  xi
                    − fi ξ dxi =              ⟨hεn ⟩xi − fi ξ dxi +                       h̄ xi
                                                                                                 − ⟨hεn ⟩xi ξ dxi
        Rd                               Rd                                       Rd
                                       Z                                       Z
                                                                
                                   =          ⟨hεn ⟩xi − fi ξ dxi +                    (h̄ − hεn )ξ dx1 . . . dxp
                                           d                                      Rpd
                                       ZR                                          Z
                                                                  
                                   ≤         | ⟨hεn ⟩xi − fi ξ| dxi +                    (h̄ − hεn )ξ dx1 . . . dxp
                                         Rd                                          Rpd
                                                                                          Z
                                   ≤ || ⟨hεn ⟩xi − fi ||L2 (Rd ) ||ξ||L2 (Rd ) +                (h̄ − hεn )ξ dx1 . . . dxp .
                                                                                            Rpd
In the last line, the first integral on the right converges to zero by (4.2.25) and the second
integral converges to zero by the weak-* convergence of hεn . Since ξ is arbitrary, we get that
for each i,     h̄  xi
                       = fi . Thus, we have h̄ ∈ Πh̃ (f1 , . . . , fp ). Hence, by (4.2.24), we have that
                                      Ic (h̄) = Ic (h̄0 ) =          min            Ic (h).                             (4.2.26)
                                                                h∈Πh̃ (f1 ,...,fp )
                                                               37


    Thus, we have
      min            Ic (h) = Ic (h̄) = lim inf Ic (hεn ) ≤ lim inf Icεn (hεn ) ≤ Ic (h̄0 ) =                        min             Ic (h).
 h∈Πh̃ (f1 ,...,fp )                           εn ↓0                      εn ↓0                                  h∈Πh̃ (f1 ,...,fp )
So, we can find a subsequence {hεn }n∈N (not relabeled), such that
                                        min           Ic (h) = lim Ic (hεn ) = lim Icεn (hεn ).                                      (4.2.27)
                                  h∈Πh̃ (f1 ,...,fp )              εn ↓0                  εn ↓0
    Therefore,
                                                                                       p
                                  (Z                                                                            )
                                                                              1 X
                 Ic (h̄) = lim               chεn dx1 . . . dxp +                        || ⟨hεn ⟩xi − fi ||22
                            εn ↓0       Rpd                                 2ε   n i=1
                                                                                                p
                                  (Z                                   )               (                                  )
                                                                                           1 X
                          = lim              chεn dx1 . . . dxp + lim                               || ⟨hεn ⟩xi − fi ||22
                            εn ↓0       Rpd                                     ε n ↓0   2ε n
                                               (           p
                                                                                         ) i=1
                                                     1 X
                          = Ic (h̄) + lim                       || ⟨hεn ⟩xi − fi ||22 .
                                        εn ↓0     2εn i=1
    This gives us that
                                                             p
                                                  (                                       )
                                                       1 X
                                           lim                   || ⟨hεn ⟩xi − fi ||22        = 0.
                                          εn ↓0       εn i=1
Using the definition of uεi n in (4.2.12), we get
                                                               p
                                                             X
                                                       lim          εn ||uεi n ||22 = 0.
                                                       εn ↓0
                                                              i=1
Hence, for each 1 ≤ i ≤ p,
                                                          lim εn ||uεi n ||22 = 0.
                                                          εn ↓0
This completes the proof of Lemma 4.2.13.
    Finally, rewriting (4.2.16) using J(uε1n , . . . , uεpn , wεn ) and Ic (hεn ), we get that
                                                                                          p
                                                                                         X
                                   J(uε1n , . . . , uεpn , wεn )    = Ic (hεn ) +             εn ||uεi n ||22 .
                                                                                         i=1
Letting εn ↓ 0, we get
                                  lim J(uε1n , . . . , uεpn , wεn ) = lim Ic (hεn ) = Ic (h0 ).
                                  εn ↓0                                       εn ↓0
    This completes the proof of Proposition 4.2.8, hence the proof of Theorem 4.2.6.
                                                                      38


Remark 4.2.14. Note that the assumption of h̃ being compactly supported, automatically
gives us that the densities fi ’s are compactly supported.
    Next, we will prove the existence of dual maximizers for the CCMMOT problem.
    Recall that the CCMMOT problem is given by
                                                            Z
                         OTCCMM :=            inf                 c(x1 , . . . , xp ) dγ(x1 , . . . , xp )
                                        γ∈Πγ̃ (ν1 ,...,νp )  Rpd
    and the dual problem is given by
                                OT∗CCMM :=                  sup         J(u1 , . . . , up , w),
                                                  (u1 ,...,up ,w)∈Liph̃
                                                                     c
where
                               p Z                                 Z
                             X
   J(u1 , . . . , up , w) :=          ui (xi )fi (xi ) dxi +              w(x1 , . . . , xp )h̃(x1 , . . . , xp ) dx1 . . . dxp .
                             i=1   Rd                               Rpd
    Let h̃ ∈ L∞ (X1 × . . . × Xp ), and the probability densities fi ∈ L1 (Xi ), for each 1 ≤ i ≤ p
are compactly supported. We will assume that the sets Xi = spt(fi ) have unit Lebesgue
measure.
    In order to get the existence result, we will consider the following dual formulation:
                                      ′
                                  OT∗CCMM :=                 sup          J ′ (u1 , . . . , up ),                         (4.2.28)
                                                    ui ∈L1 (Xi ),∀1≤i≤p
where
                               p Z                                                  p
                                                        Z              "                    #
                             X                                                    X
       ′
     J (u1 , . . . , up ) :=          ui fi dxi −                         −c +           ui     h̃ dx1 . . . dxp .        (4.2.29)
                              i=1  Xi                     X1 ×...×Xp              i=1         +
Note that, (see Appendix 1)
                                                                        ′
                                              OT∗CCMM = OT∗CCMM .
Now, we will show that (4.2.28) has a solution.
Theorem 4.2.15. Let fi and h̃ be continuous and strictly positive on their compact supports
Xi ⊆ Rd and X1 × . . . × Xp respectively. Let c ∈ L1 (X1 × . . . × Xp ). Fix an η > 1 and
                                                               39


assume that Πh̃/η (f1 , . . . , fp ) is non-empty. Then, there exist functions (u1 , . . . , up ) where
ui ∈ L1 (Xi ) ∀i, such that
                                                         ′
                                                  OT∗CCMM = J ′ (u1 , . . . , up ).
    As in [34], our main goal will be to get a coercivity estimate of maximizing sequences
which will guarantee L1 -boundedness.
    For each 1 ≤ i ≤ p, let
                                                              Z
                                                 ui fi :=          ui (xi )fi (xi ) dxi .
                                                                Rd
    Note that for a p-tuple of constants (k1 , . . . , kp ) with                              ki = 0, we have
                                                                                     Pp
                                                                                          i=1
                                      J ′ (u1 , . . . , up ) = J ′ (u1 + k1 , . . . , up + kp ).
    Hence, we can find a p-tuple of constants (k1 , . . . , kp ) with                                 ki = 0, so that
                                                                                              Pp
                                                                                                 i=1
                                             (ui + ki )fi = (uj + kj )fj , ∀i ̸= j.
    First, we will find a bound on the means ui fi .
Lemma 4.2.16. Fix ui ∈ L1 (Xi ), c ∈ L1 (X1 × . . . × Xp ) and a probability density h ∈
L∞ (X1 × . . . × Xp ) with marginals (f1 , . . . , fp ). Suppose there is some η > 1 such that h ≤ h̃η .
Then,
                                           p
                                         X                  ||ηhc||L1 (X1 ×...×Xp ) − J ′ (u1 , . . . , up )
               ′
             J (u1 , . . . , up ) ≤            ui fi ≤                                                       .             (4.2.30)
                                          i=1
                                                                                η−1
Proof. From definition (4.2.29), we have
                                       p Z                                                   p
                                                                   Z             "                  #
                                     X                                                      X
            ′
          J (u1 , . . . , up ) =                  ui fi dxi −                      −c +         ui      h̃ dx1 . . . dxp .
                                      i=1     Xi                    X1 ×...×Xp              i=1      +
By the non-positivity of the second integral, we get one direction of the inequality:
                                                                              p
                                                                            X
                                                      ′
                                                   J (u1 , . . . , up ) ≤        ui fi .                                   (4.2.31)
                                                                            i=1
    On the other hand, since 0 ≤ h ≤                          and the fact that (−c +                                          ui ]+ ,
                                                           h̃
                                                                                               Pp                      Pp
                                                           η                                       i=1  ui ) ≤ [−c +       i=1
we have
                                              p                                        p
                                                              Z            "                  #
                                            X                                        X
                J ′ (u1 , . . . , up ) =          ui fi −                    −c +          ui     h̃ dx1 . . . dxp
                                            i=1                X1 ×...×Xp            i=1       +
                                                                    40


                                          p                                          p
                                                         Z                                  !
                                        X                                          X
                                     ≤       ui fi − η                    −c +           ui    h dx1 . . . dxp
                                         i=1                X1 ×...×Xp              i=1
                                          p              Z                                         p
                                        X                                                         X
                                     =       ui fi + η                 ch dx1 . . . dxp − η           ui fi
                                         i=1                X1 ×...×Xp                            i=1
                                                      p
                                                    X
                                     ≤ −(η − 1)            ui fi + ||ηch||L1 (X1 ×...×Xp ) .
                                                     i=1
Thus, we have
                                  p
                                 X            ||ηch||L1 (X1 ×...×Xp ) − J ′ (u1 , . . . , up )
                                     ui fi ≤                                                    .                     (4.2.32)
                                 i=1
                                                                    η−1
     By combining (4.2.31) and (4.2.32), we get the inequality (4.2.30) we desired.
Remark 4.2.17. Note that, if we assume that (ui + ki )fi = (uj + kj )fj , ∀i ̸= j for some
constants (k1 , . . . , kp ), the bounds in (4.2.30) imply that ui fi is also bounded ,∀1 ≤ i ≤ p.
     Next, we will obtain a bound on the oscillation of ui fi around its mean for each i.
Lemma 4.2.18. Let ui , fi ∈ L1 (Xi ) for each 1 ≤ i ≤ p, c ∈ L1 (X1 × . . . × Xp ) and
h̃ ∈ L∞ (X1 × . . . × Xp ). Suppose that there is some 0 < ε ≤ 1 such that εf1 . . . fp ≤ h̃ for
all (x1 , . . . , xp ) ∈ X1 × . . . × Xp . Then, for each 1 ≤ i ≤ p we have
                                                                                                         p
  ε                                                                                                    X
    ||ui fi − ui fi ||L1 (X1 ×...×Xp ) ≤ −J ′ (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) +      |ui fi |. (4.2.33)
  6                                                                                                    i=1
Proof. Fix an i ∈ {1, . . . , p}. Define the oscillation around the mean as
                                                     Z
                                             σi :=          |ui fi − ui fi | dxi .
                                                       Xi
Let                                         (                                               )
                                                                                        σi
                                    Xi± =     xi : ±(ui (xi )fi (xi ) − ui fi ) >
                                                                                         3
and
                                                        m±  i = |Xi |
                                                                     ±
Also, let
                                                 Z
                                       A±
                                        i   =±         (ui (xi )fi (xi ) − ui fi ) dxi .
                                                   Xi±
                                                               41


Note that by definition, A±               i ≥ 0.
    Now, observe that
   Z                                                                Z
                           (ui fi − ui (xi )fi (xi )) dxi =              (ui fi − ui (xi )fi (xi )) dxi
                      σi
    {|ui fi −ui fi |≤  3
                         }                                            Xi
                                                                          Z
                                                                       −                            (ui fi − ui (xi )fi (xi )) dxi
                                                                                               σi
                                                                            {ui fi −ui fi <−   3
                                                                                                  }
                                                                          Z
                                                                       −                         (ui fi − ui (xi )fi (xi )) dxi
                                                                                            σi
                                                                            {ui fi −ui fi >  3
                                                                                               }
                                                                                                Z
                                                                = ui fi |Xi | − ui fi −               (ui fi − ui (xi )fi (xi )) dxi
                                                                                                  Xi−
                                                                          Z
                                                                       −         (ui fi − ui (xi )fi (xi )) dxi
                                                                            Xi+
                                                                    Z
                                                                =         (ui (xi )fi (xi ) − ui fi ) dxi
                                                                      Xi+
                                                                          Z
                                                                       +         (ui (xi )fi (xi ) − ui fi ) dxi
                                                                            Xi−
                                                                                −
                                                                = A+  i − Ai .
                                                                                                                               (4.2.34)
In the second line, first term, we used the fact that the sets Xi have unit Lebesgue measure.
    Then, by the definition of σi ,
                    Z
          σi =            |ui fi − ui fi | dxi
                      Xi
                    Z                                                  Z
              =                              |ui fi − ui fi | dxi +                             |ui fi − ui fi | dxi
                                        σi                                                 σi
                      {|ui fi −ui fi |≤ 3
                                           }                             {ui fi −ui fi <−  3
                                                                                              }
                           Z
                       +                           |ui fi − ui fi | dxi
                                              σi
                              {ui fi −ui fi > 3
                                                 }
                                                               Z                                    Z
                    σi n                             σi o
              ≤             |ui fi − ui fi | ≤              +        |ui fi − ui fi | dxi +               |ui fi − ui fi | dxi
                     3                                3         Xi−                                   Xi+
                    σi                         −                 −
                          1 − m+                    + A+
                                                  
              ≤                      i  −   m  i          i + Ai .
                     3
Thus,
                                                              2 m+            m−
                                                                                   
                                                                                                  2
                                         A+ i  +   A−i   ≥      + i + i                 σi ≥ σi .                              (4.2.35)
                                                              3       3         3                 3
                                                                     42


   On the other hand, by (4.2.34) we have
                                           Z
                            +       −
                         Ai − Ai =                                  (ui fi − ui (xi )fi (xi )) dxi
                                                               σi
                                             {|ui fi −ui fi |≤  3
                                                                  }
                                              σi n σi                                    σi o
                                        ≥−             −          ≤ ui fi − ui fi ≤
                                               3             3                            3                        (4.2.36)
                                              σi                         −
                                                     1 − m+
                                                                           
                                        =−                     i − mi
                                               3
                                              σi
                                        ≥− .
                                               3
Thus by (4.2.35) and (4.2.36), we get
                                                                    σi
                                                        A+  i ≥        .                                           (4.2.37)
                                                                     6
   Therefore, first we will find a bound on A+                  i , so that we get a bound on σi .
   Now, observe that
       Z
  +
 Ai =       (ui fi − ui fi ) dxi
        Xi+
         Z                     Z
    =−         ui fi dxi +            ui fi dxi
           Xi+                  Xi+
                          Z
                   +
    = −(ui fi )mi +                                                  ui f1 . . . fp dx1 . . . dxp
                            X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp
                                                                                  p
                          Z                                                              !
                                                                                 X
    =  −(ui fi )m+ i  +                                                −c +           uk f1 . . . fp dx1 . . . dxp
                            X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp                      k=1
            Z                                                                           X            Z
         +                                           cf1 . . . fp dx1 . . . dxp −              uj fj      fi dxi
              X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp                                            j̸=i         Xi+          (4.2.38)
                                    p                                          p                  p
         Z             "                   #
       1                          X                                      1X                   1X
    ≤                    −c +           uk h̃ dx1 . . . dxp −                      uk fk +           uk fk
       ε   X1 ×...×Xp             k=1
                                                                          ε  k=1
                                                                                              ε k=1
                                             +
            Z                                                     X           Z
         +                 |c|f1 . . . fp dx1 . . . dxp −              uj fj         fi dxi − (ui fi )m+   i
              X1 ×...×Xp                                          j̸=i           Xi+
                                                                                          
         1                                                                      1
    ≤ − J ′ (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) +                    +
                                                                                  − mi ui fi
         ε                                                                      ε
                     Z              !
               1                        X
         +        −         fi dxi           uj fj .
               ε      Xi+               j̸=i
Note that in the penultimate line, we used the fact that εf1 . . . fp ≤ h̃.
                                                               43


     So, from (4.2.38), we have that,
                               ′
                 εA+                                                                              +
                                                                                                    
                     i ≤ −J (u1 , . . . , up ) + ε||cf1 . . . fp ||L1 (X1 ×...×Xp ) + 1 − εmi ui fi
                                           Z             !
                                                            X
                             + 1−ε                fi dxi          uj fj .
                                             Xi+            j̸=i
Since each fi is a probability density and m+                    i , ε ≤ 1, we have 0 ≤ 1 − εmi ≤ 1 and
                                                                                                        +
               R
0 ≤ 1 − ε X + fi dxi ≤ 1.
                 i
     So, we get that
                                                                                       p
                                                                                     X
                εA+         ′
                                                                                                             (4.2.39)
                                                                                               
                   i  ≤ −J (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) +        uk fk +
                                                                                      k=1
Since we have        A+i ≥  σi
                            6
                                 by (4.2.37), we get
                                                                                            p
                      ε           ′
                                                                                          X
                        σi ≤ −J (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) +       uk fk .
                      6                                                                   k=1
This is the desired inequality (4.2.33).
     Now, we will obtain L1 -bounds on the ui ’s.
Proposition 4.2.19. Let c ∈ L1 (X1 × . . . × Xp ) and h̃ ∈ L∞ (X1 × . . . × Xp ). Let h be a prob-
ability density with marginals f1 , . . . , fp such that h ≤                h̃
                                                                            η
                                                                                for some η > 1. Suppose that there
is some ε > 0 such that εf1 (x1 ) . . . fp (xp ) ≤ h̃(x1 , . . . , xp ) and ε ≤ min{f1 (x1 ), . . . , fp (xp )}
for almost all (x1 , . . . , xp ) ∈ X1 × . . . × Xp and suppose that there exist some functions
ui ∈ L1 (Xi ), i ∈ {1, . . . , p} such that ui fi = uj fj , ∀i ̸= j. Then OT∗CCMM −1 ≤ J ′ (u1 , . . . , up )
implies that ||ui ||L1 (Xi ) is bounded for each 1 ≤ i ≤ p.
Remark 4.2.20. Note that the above bound depends only on OT∗CCMM , ε, η, ||c||L1 and
||h̃||L∞ .
Proof. Fix an i ∈ {1, . . . , p}. Observe that
                     1                                1                       1                  1
||ui ||L1 (Xi ) ≤ || ||L∞ (Xi ) ||ui fi ||L1 (Xi ) ≤ ||ui fi ||L1 (Xi ) ≤ ||ui fi ||L1 (Xi ) + ||ui fi − ui fi ||L1 (Xi )
                     fi                               ε                       ε                  ε
     Using Lemma 4.2.16, Lemma 4.2.18 and the fact that OT∗CCMM −1 ≤ J ′ (u1 , . . . , up ), we
get a bound for ||ui ||L1 (Xi ) for each i.
                                                             44


     Now, we will discuss the proof of the existence of maximizers.
     By Proposition 4.2.19, we can pick a maximizing sequence (u1 , . . . , up ) with uniform
L1 -bounds for OT∗CCMM . However, since the L1 -unit ball is not compact, an L1 -bound is
not sufficient to get the convergence of a subsequence. Therefore, in order to get better
compactness properties, we extend the definition of J ′ (u1 , . . . , up ) from L1 space to the
space of signed measures with finite total variation.
     Let M (X) denote the space of signed Radon measures on X. Let C ∈ M (X1 × . . . × Xp )
be such that dC(x1 , . . . , xp ) = c(x1 , . . . , xp ) dx1 . . . dxp and Ui ∈ M (Xi ) for each 1 ≤ i ≤ p.
     Define
                              p Z                                                  p
                                                 Z                   "                                                 #
                             X                                                  X
  J˜′ (U1 , . . . , Up ) :=           fi dUi −                  h̃ d −C +               H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd     .
                             i=1  Xi              X1 ×...×Xp                     i=1                                     +
                                                                                                                   (4.2.40)
Here, dHid (x) = dxi denotes Lebesgue measure on Rd .
     First, we will prove that the functional J˜′ is upper semi-continuous with respect to weak-*
convergence in M (X1 ) × . . . × M (Xp ).
Lemma 4.2.21. Let c ∈ L1 (X1 × . . . × Xp ) be such that dC = c dH pd . Let f1 , . . . , fp
and h̃ be continuous, non-negative functions on the compact sets X1 , . . . , Xp and X1 × . . . ×
Xp respectively. Then, the functional J˜′ is upper semi-continuous with respect to weak-*
convergence in M (X1 ) × . . . × M (Xp ).
Proof. Let (U1n , . . . , Upn )∞  n∈N be a bounded sequence in M (X1 ) × . . . × M (Xp ) such that
(U1n , . . . , Upn ) converges to (U1 , . . . , Up ) when tested against functions in C(X1 )×. . .×C(Xp ).
We need to show that
                                    lim sup J˜′ (U1n , . . . , Upn ) ≤ J˜′ (U1 , . . . , Up ).
                                      n→∞
Observe that,
                     p Z                                                p
                (                        Z                  "                                                   # )
                    X                                                 X
   lim sup                   fi dUin −               h̃ d −C +              H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd
      n→∞                 Xi              X1 ×...×Xp
                    i=1                                                i=1                                       +
                                                               45


                 p Z                                                               p
                                                   Z                   "                                              #
               X                                                                X
  ≤ lim sup               fi dUin + lim sup −                    h̃ d −C +           H1d  ⊗ ... ⊗   Uin ⊗ ... ⊗   Hpd
       n→∞            Xi                n→∞          X1 ×...×Xp
               i=1                                                               i=1                                    +
                   p Z                                                         p
                                                Z                   "                                               #
                X                                                             X
   = lim sup               fi dUin − lim inf                  h̃ d −C +             H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd     .
        n→∞             Xi               n→∞      X1 ×...×Xp
                 i=1                                                          i=1                                     +
      We already have that, for each 1 ≤ i ≤ p,
                                           Z                       Z
                                                fi dUi = lim             fi dUin .                              (4.2.41)
                                             Xi              n→∞     Xi
      So, it remains to show that
                                  p
    Z                  "                                                 #
                                 X
                  h̃ d −C +           H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd
      X1 ×...×Xp                 i=1                                       +
                                                                     p
                                     Z                   "                                              #
                                                                   X
                         ≤ lim inf                 h̃ d −C +            H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd       . (4.2.42)
                              n→∞      X1 ×...×Xp                  i=1                                     +
Fix an i ∈ {1, . . . , p}. Let
                                                         p
                                                                                               !
                                                       X
                            dµni := h̃ d −C +               H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd
                                                        i=1
and
                                                         p
                                                                                             !
                                                      X
                             dµi := h̃ d −C +               H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd .
                                                       i=1
      Then, we have that dµni converges weakly-* to dµi . i.e. given a function ϕ ∈ C(X1 ×
. . . × Xp ), we have
  Z                            Z
               ϕ   dµni  =−                 ch̃ϕ dx1 . . . dxp
    X1 ×...×Xp                  X1 ×...×Xp
                                  p Z         Z                                                                 !
                                 X
                               +                                             h̃ϕdx1 . . . dxi−1 dxi+1 . . . dxp     dUin
                                 i=1    Xi      X1 ×...×Xi−1 ×Xi+1 ×...×Xp
                                   Z
                          n→∞
                         −−−→ −                  ch̃ϕ dx1 . . . dxp
                                     X1 ×...×Xp
                                  p Z         Z                                                                 !
                                 X
                               +                                             h̃ϕdx1 . . . dxi−1 dxi+1 . . . dxp     dUi
                                 i=1    Xi      X1 ×...×Xi−1 ×Xi+1 ×...×Xp
                                                             46


                        Z
                      =              ϕ dµi .                                                  (4.2.43)
                          X1 ×...×Xp
     Now, we will decompose the measures µni and µi into their positive and negative parts as
µni = µni + − µni − and µi = µi+ − µi+ . Since µni converges to µi weak-*, by plugging ϕ ≡ 1 in
(4.2.43), we get
   µni + (X1 × . . . × Xp ) − µni − (X1 × . . . × Xp ) = µni (X1 × . . . × Xp )
                 n→∞
                −−−→ µi (X1 × . . . × Xp ) = µi+ (X1 × . . . × Xp ) − µi− (X1 × . . . × Xp ). (4.2.44)
Let X̃ = X1 × . . . × Xp .
     So, we have
                           µi+ (X̃) − µi+ (X̃) ≤ lim inf {µni + (X̃) − µni − (X̃)}.           (4.2.45)
                                                     n→∞
     On the other hand, since the total variation norm ||µi ||TV given by |µi | = µi+ + µi− is
lower semi-continuous w.r.t. weak-* convergence, we have
                           µi+ (X̃) + µi− (X̃) ≤ lim inf ||µni ||TV
                                                     n→∞
                                                                                              (4.2.46)
                                                 = lim inf {µni + (X̃) + µni − (X̃)}.
                                                     n→∞
i.e.
                           µi+ (X̃) + µi− (X̃) ≤ lim inf {µni + (X̃) + µni − (X̃)}.           (4.2.47)
                                                     n→∞
Now, by combining (4.2.45) and (4.2.47), we get
                                        µi+ (X̃) ≤ lim inf µni + (X̃).
                                                      n→∞
Note that, since we only have the positive part of the measures in (4.2.42), it is enough to
get lower semi-continuity of µni + (X̃) only.
     This completes the proof of Lemma 4.2.21.
     Finally, we will present the main result for existence of dual maximizers.
Theorem 4.2.22. Let fi and h̃ be continuous and strictly positive on their compact supports
Xi ⊆ Rd and X1 × . . . × Xp respectively. Let c ∈ L1 (X1 × . . . × Xp ). Fix an η > 1 and
                                                       47


assume that Πh̃/η (f1 , . . . , fp ) is non-empty. Then, there exist functions (u1 , . . . , up ) where
ui ∈ L1 (Xi ) ∀i, such that
                                                  ′
                                             OT∗CCMM = J ′ (u1 , . . . , up ).
Proof. Let (un1 , . . . , unp )n∈N be a maximizing sequence for OT∗CCMM ,
      i.e. for each i ∈ {1, . . . , p}, uni ∈ L1 (Xi ) and
                                               ′
                                        OT∗CCMM = lim J ′ (un1 , . . . , unp ).
                                                        n→∞
Fix ε ≤ minx1 ,...,xp ∈X̃ {f1 (x1 ), . . . , fp (xp )}. Since h̃ is bounded away from zero, we may pick ε
sufficiently small so that we have εf1 (x1 ) . . . fp (xp ) ≤ h̃ for all (x1 , . . . , xp ) ∈ X1 × . . . × Xp .
We can find a p-tuple of constants (k1n , . . . , kpn ) with pi=1 kin = 0 such that adding kin to
                                                                             P
each uni ensures that uni fi = unj fj for each i ̸= j.
      Now, by Proposition 4.2.19, for each i, we get a bound for ||uni ||L1 (Xi ) independent from
n. For each i, let Uin ∈ M(Xi ) be such that dUin (xi ) = duni (xi ) dxi . Then, we have that
||Uin ||TV = ||uni ||L1 (Xi ) is bounded. Thus, by Alaoglu’s theorem, we can get a subsequence
(without relabelling) (U1n , . . . , Upn ) that converges weakly-* to some (U1 , . . . , Up ) ∈ M (X1 ) ×
. . . × M (Xp ).
                                                                            n→∞
      So, we have that J˜′ (U1n , . . . , Upn ) = J ′ (un1 , . . . , unp ) −−−→ OT∗CCMM . By Lemma 4.2.21, we
have that
                                            OT∗CCMM ≤ J˜′ (U1 , . . . , Up ).
Next, we will prove that OT∗CCMM ≥ J˜′ (U1 , . . . , Up ).
      Let C = cH1d ⊗ . . . ⊗ Hpd . By Lebesgue’s decomposition theorem, for each i, the measures
Ui can be written as Ui = Uiac +Uis , where Uiac ≪ H d and Uis ⊥ H d . On the other hand, by the
Hahn-Jordan decomposition theorem, we can write Ui = Ui+ − Ui− . Since the Hahn-Jordan
decomposition and the Lebesgue decomposition commute, we have
                                                Ui+ = [Uiac ]+ + [Uis ]+ ,
and
                                                Ui− = [Uiac ]− + [Uis ]− .
                                                            48


Now, observe that
                 p
  "                                                       #
              X
    −C +             H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd
               i=1                                          +
                             p                                                     p
                 "                                                      #      "                                      #
                           X                                                     X
             = −C +               H1d ⊗ . . . ⊗ Uiac ⊗ . . . ⊗ Hpd           +        H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd     .
                            i=1                                            +      i=1                                   +
   Thus,
                             p Z                                                    p
                                                      Z                 "                                             #
                           X                                                       X
  J˜′ (U1 , . . . , Up ) =              fi dUi −                   h̃ d −C +           H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd
                            i=1     Xi                 X1 ×...×Xp                  i=1                                  +
                             p                           p
                           XZ                          XZ
                         =              fi dUiac +                fi dUis
                            i=1     Xi                  i=1    Xi
                                                                     p
                                  Z                     "                                                #
                                                                   X
                               −                  h̃ d −C +            H1d   ⊗ ... ⊗   Uiac ⊗ ... ⊗  Hpd
                                    X1 ×...×Xp                     i=1                                     +
                                                            p
                                  Z                     "                                       #
                                                          X
                               −                  h̃ d          H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd
                                    X1 ×...×Xp             i=1                                    +
                                                           p Z
                                                         X
                         = J˜′ (U1ac , . . . , Upac ) +            fi dUis
                                                                Xi
                                  Z                     "i=1p                                   #
                                                          X
                               −                  h̃ d          H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd
                                    X1 ×...×Xp             i=1                                    +
                                                       p Z
                                                      X
                         ≤ J ′ (u1 , . . . , up ) +             fi dUis
                                                      i=1    Xi
                                                            p
                                  Z                     "                                       #
                                                          X
                               −                  hd            H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd
                                    X1 ×...×Xp             i=1
                         = J ′ (u1 , . . . , up )
                                  ′
                         ≤ OT∗CCMM .
Here, for each i, ui represents the Radon-Nikodym derivative of Uiac such that ui dxi = dUiac
and h ∈ Πh̃/η (f1 , . . . , fp ).
   So, we have proven that
                                ′                                                              ′
                          OT∗CCMM ≤ J˜′ (U1 , . . . , Up ) ≤ J ′ (u1 , . . . , up ) ≤ OT∗CCMM .
Thus, (u1 , . . . , up ) is the desired maximizer.
                                                                  49


                                         CHAPTER 5
                                      BARYCENTERS
5.1    Barycenters in the Wasserstein Space
5.1.1    Introduction
    The problem of finding a barycenter in the Wasserstine space is a nonlinear interpolation
between several probability measures. As an example in the discrete case, consider several
coal mines, and the coal extracted has to be sent to a factory that is located centrally. The
problem is to find the location to construct such a factory so that total transportation cost
is minimized.
    This notion of barycenters in Wasserstein space was first introduced by Agueh and Carlier
in [1] and they have provided existence, uniqueness, characterizations of the minimizer and
regularity of the barycenter, and relate it to the multi-marginal OT problem considered
by Gangbo and Święch in [24]. This problem has a wide range of applications including
economics [13] and data science [4, 8].
    To elaborate a few applications of Wasserstein Barycenters; in image processing, Wasser-
stein barycenters have been used to generate “averaged” images from a set of input images.
This technique is particularly useful for denoising and image reconstruction. “Image morph-
ing” or “image interpolation” is such an application ([52, 56]). In Machine Learning, it can
be used to generate representative data points from a set of input data, which can then be
used to train machine learning models. This can be particularly useful in cases where the
input data is noisy or incomplete.
    Overall, Wasserstein barycenters are a powerful mathematical tool that have found many
applications in different fields. Their properties and computational efficiency make them a
popular choice for solving optimization problems involving probability measures.
    To provide some background on the Wasserstein Barycenters, we will present some ex-
isting results in [1] in the next few sections.
                                                50


5.1.2    The Primal Problem
    Recall that we define the squared 2-Wasserstein distance between two probability mea-
sures µ, ν ∈ P2 (Rd ) by
                                                      Z
                                                                1
                             W22 (µ, ν) :=     inf                |x − y|2 dγ(x, y).
                                            γ∈Π(µ,ν)    Rd ×Rd  2
    Let p ≥ 2 be an integer. Given a p-tuple of probability measures (ν1 , . . . νp ) with each
νi ∈ P2 (Rd ) and a p-tuple of positive real numbers (λ1 , . . . λp ) with pi=1 λi = 1, we define
                                                                                     P
the following minimization problem:
                                                            p
                                                        (                     )
                                                          X
                                  OTBC =        inf            λi W22 (νi , ν) .              (5.1.1)
                                             ν∈P2 (Rd )
                                                           i=1
    A solution of (5.1.1) is called the barycenter of the probabilities νi with weights λi .
Remark 5.1.1. For p = 2 with λ1 = λ2 = 21 , this problem means finding the midpoint be-
tween the two measures ν1 and ν2 and such an interpolation is already known as the McCann’s
interpolation [40].
Theorem 5.1.2. ([1], Proposition 2.3) Given an integer p ≥ 2, a p-tuple of probability
measures (ν1 , . . . νp ) with each νi ∈ P2 (Rd ) and a p-tuple of positive real numbers (λ1 , . . . λp )
with pi=1 λi = 1, the Barycenter Problem given by (5.1.1) has a solution.
     P
    The proof is given in [1] and it uses the direct method in Calculus of Variations.
    To study the uniqueness and other properties of the barycenters, we require a dual
formulation.
5.1.3    Duality
    Define the space of continuous functions with at most quadratic growth,
                                                  (                                       )
                                                                         f
                 Y := (1 + | . |2 )Cb (Rd ) = f ∈ C(Rd ) :                      is bounded
                                                                      1 + | .|2
that is equipped with the norm
                                                              |f (x)|
                                          ||f ||Y := sup               .                      (5.1.2)
                                                       x∈Rd  1 + |x|2
                                                        51


    We define the dual of (5.1.1) as
                                                  p Z
                                                              (                     )
                                                 X              λi
            OT∗BC    :=           sup                    inf              2
                                                                   |x − y| − fi (y) dνi (x).  (5.1.3)
                        fi ∈Cb (Y ),
                                     P p
                                       i=1 fi =0 i=1 Rd y∈R
                                                            d    2
    In [1], the authors have proven that the strong duality holds for the barycenter problem
and dual maximizers for (5.1.3) exist.
Theorem 5.1.3. ([1], Proposition 2.2 & Proposition 2.3)
                                                    OTBC = OT∗BC ,
and OT∗BC given by (5.1.3) has a solution.
    With the duality results, we can characterize the barycenters in several ways. Given
below are few results from [1].
Proposition 5.1.4. ([1], Proposition 3.5) Assume that there is an index i ∈ {1, . . . , p} such
that νi vanishes on small sets. Then OTBC admits a unique solution ν which is given by
ν = ∇ϕi# νi , where given a solution (f1 , . . . , fp ) of OT∗BC , ϕi is the convex potential defined
by                                                            (                     )
                                               λi               λi
                             λi ϕi (x) := |x|2 − inf               |x − y|2 − fi (y) .        (5.1.4)
                                               2        y∈Rd    2
Proposition 5.1.5. ([1], Proposition 3.8) Assume that νi vanishes on small sets for every
i ∈ {1, . . . , p}, and let ν ∈ P2 (Rd ). Then the following conditions are equivalent:
   1. ν solves OTBC .
   2. ν = ∇ϕi# νi for every i, where ϕi is defined by (5.1.4).
   3. There exist convex potentials ψi such that ∇ψi is the Brenier’s map transporting νi to
      ν, and a constant C such that
                                p
                             X                         |y|2
                                    λi ψi∗ (y)  ≤C+         , ∀y ∈ Rd , with equality ν-a.e.  (5.1.5)
                             i=1
                                                         2
                                                           52


Remark 5.1.6. If ν and the potentials ψi satisfy the third statement of Proposition 5.1.5,
then the support of ν is included in the contact set where the convex function ϕ := pi=1 λi ψi∗
                                                                                                      P
                                                        |.|2
agrees with its quadratic majorant C +                   2
                                                             . Note that, at such a point, we have
                                             p
                                           X
                                                 λi ∂ψi∗ (x) ⊂ ∂ϕ(x) ⊂ {x},
                                            i=1
so that each potential ψi∗ is differentiable at x. The potentials ψi∗ are therefore differentiable
on the support of ν and satisfy the relation
                                                       p
                                                     X
                                                            λi ∇ψi∗ = id.                               (5.1.6)
                                                     i=1
Remark 5.1.7. If (5.1.6) holds everywhere for the Brenier’s maps ∇ψi transporting νi to
ν, then ν is optimal for OTBC .
    In [1], the authors also have shown that OTBC is equivalent to a linear programming
problem of multi-marginal optimal transport type similar to the problem solved by Gangbo
and Święch in [24]. Now we will list two main Theorems which describe this relation.
Theorem 5.1.8. ([1], Theorem 4.1) Assume that νi vanishes on small sets for i = 1, . . . , p.
Then the multi-marginal problem given by
                 (Z                                                                                )
                              X                       
          sup                           λi λj xi · xj dγ(x1 , . . . , xp ), γ ∈ Π(ν1 , . . . , νp ) .   (5.1.7)
                      (Rd )p    1≤i≤j≤p
admits a unique solution γ ∈ Π(ν1 , . . . , νp ). Moreover, γ is of the form γ = (T11 , . . . , Tp1 )# ν1
with Ti1 = ∇u∗i ◦ ∇u1 for i = 1, . . . , p where ui are the strictly convex potentials defined by
                                                     λi 2 gi (x)
                                        ui (x) :=        |x| +         , ∀x ∈ Rd ,                      (5.1.8)
                                                     2             λi
and (g1 , . . . , gp ) are the convex potentials that solve the dual of (5.1.7) given by
                       ( p Z                    p
                                                                                                    )
                           X                  X                   X
                   inf               gi dνi ,      gi (xi ) ≥           λi λj xi · xj , ∀x ∈ (Rd )p .   (5.1.9)
                            i=1   Rd           i=1              1≤i≤j≤p
                                                               53


Proposition 5.1.9. ([1], Proposition 4.2) Assume that νi vanishes on small set for i =
1, . . . , p. Then the solution of OTBC given by (5.1.1) is given by ν = T# γ, where γ is the
solution of (5.1.7) and T is defined by
                                          p
                                       X
                             T (x) :=        λi xi ,      ∀x := (x1 , . . . , xp ) ∈ (Rd )p .         (5.1.10)
                                        i=1
Remark 5.1.10. Note that the Euclidean barycenter T (x) defined in (5.1.10) is characterized
by the property
                                   p
                                                                    (                  )
                                 X
                                      λi |xi − T (x)|2 = inf           λi |xi − y|2 .                 (5.1.11)
                                                              y∈Rd
                                  i=1
5.2       Capacity Constrained Barycenter Problem
     In this section, we introduce the notion of capacity constrained barycenters in Wasserstein
space which is a generalization of the barycenter problem (5.1.1). As the name suggests, the
capacity constrained barycenter problem introduces capacities to each of the two marginal
problems associated. Under certain assumptions on the capacities, we have proven that the
problem attains a minimizer and obtained duality results.
5.2.1        The Primal Problem
     Given an integer p ≥ 2, a p-tuple of probability measures (ν1 ,...,νp ) each in P2 (Rd ) and
a p-tuple of positive real numbers (λ1 ,...,λp ) with                      i=1 λi = 1, we define the following
                                                                       Pp
minimizing problem:
                                                                     p
                                                       (                                 )
                                                                   X
                               OTCCBC := inf′             J(ν) :=        λi Wf22 (νi , ν)               (5.2.1)
                                               ν∈P
                                                                    i=1
where
                        P′ = P2 (Rd ) ∩ ν ′ : Πγ˜i (νi , ν ′ ) ̸= ∅, ∀i ∈ {1, 2, . . . p} ,             (5.2.2)
                                            
and                                                       (Z                                  )
                                                                     1
                         f 2 (νi , ν) =
                         W 2                  inf                      |x − y|2 dγi (x, y) .            (5.2.3)
                                         γi ∈Πγ˜i (νi ,ν)    Rd ×Rd  2
     Here, {γ̃i }pi=1 ⊆ M+ (Rd × Rd ) is the set of capacities of the two marginal problems and
Πγ̃i (νi , ν) is the set of transport plans from νi to ν bounded by γ̃i (see Definition 3.4.2).
                                                            54


    We call the problem (5.2.1), the Capacity Constrained Barycenter (CCBC) problem
between the measures ν1 , . . . , νp .
    We will consider this minimization problem under two assumptions.
    • Assumption 1: We assume that {γ̃i }pi=1 are compactly supported finite measures that
       are absolutely continuous w.r.t. Lebesgue measure with bounded densities for all i ∈
       {1, 2, . . . p}.
    • Assumption 2: We assume that the set P′ is non-empty.
    Recall that the non-empty condition of the set Πγ˜i (νi , ν) is given by the following theorem.
Theorem 5.2.1. ([46], Corollary 4.6.15) Let X, Y be compact sets and for given Borel
probability measures µ ∈ P(X), ν ∈ P(Y ) and a finite Borel measure γ̃ on X × Y , we have
that Πγ̃ (µ, ν) ̸= ∅ if and only if
                     µ(A) + ν(B) ≤ γ̃(A × B) + 1,                ∀A ∈ B(X) and ∀B ∈ B(Y ).
Remark 5.2.2. Note that the assumption 2 is not a very restrictive assumption. For in-
stance, we could pick the capacities γ̃i such that γ̃i = νi ⊗ ξ for some probability measure
ξ ∈ P2 (Rd ) so that the set P′ becomes non-empty.
Lemma 5.2.3. W        f22 is weakly lower semi-continuous.
Proof. Let {µn }n∈N and {ν̃n }n∈N be two sequences of probability measures in P′ such that
µn ⇀ µ∗ and ν̃n ⇀ ν ∗ .
Since {µn } and {ν̃n } are tight              Π(µn , ν̃n ) is also tight (see [53], Lemma 4.4).
                                        S
Now, let γn∗ ∈ Π(µn , ν̃n ) be optimal such that γn∗ ≤ γ̃. Since {γn∗ } is also tight, there exists
a γ ∗ ∈ P(Rd × Rd ) such that γn∗ ⇀ γ ∗ and γ ∗ ∈ Π(µ∗ , ν ∗ ). We also get that γ ∗ ≤ γ̃ (see the
proof of (5.2.8) in Theorem 5.2.4 below).
    Now,
                                              (Z                               )
           f 2 (µ∗ , ν ∗ ) =                               1
           W   2                inf                          |x − y|2 dγ(x, y)
                             γ∈Πγ̃ (µ∗ ,ν ∗ )    Rd ×Rd    2
                                                              55


                            Z
                                    1
                       ≤              |x − y|2 dγ ∗ (x, y)
                             Rd ×Rd 2
                                   Z
                                              1
                       ≤ lim inf                 |x − y|2 dγn∗ (x, y)         (see [53], Lemma 4.3)
                              n      Rd ×Rd 2
                                                     (Z                                 )
                                                               1
                       = lim inf         inf                      |x − y|2 dγn (x, y)
                              n    γn ∈Πγ̃ (µn ,νn )     d
                                                       R ×R  d 2
                       = lim inf W f22 (µn , ν̃n ).
                              n
This proves that W  f22 is weakly lower semi-continuous.
    Now, we will show that the CCBC problem has a solution.
Theorem 5.2.4. Under the assumptions 1 and 2, the CCBC problem given by (5.2.1) has
a solution.
Proof. Let {ν̃n }n∈N ⊆ P′ be a minimizing sequence of OTCCBC .
i.e. limn→∞ J(ν̃n ) = inf ν J(ν). Then, we can find an M > 0 such that J(ν̃n ) ≤ M for all n.
Thus, for each 1 ≤ i ≤ p and for each n ∈ N, λi W           f2 2 (νi , ν̃n ) ≤ M .
    Using the duality and the assumption that νi ’s have finite second moments, we can show
that (see Appendix 3)
                               Z
                           sup    |x|2 dν̃n ≤ C,           for some constant C.
                            n
Hence, {ν̃n } is tight (see Appendix 4).
    Then, by Prokhorov’s theorem, there exists a subsquence {ν̃n } (not relabeled), that
converges weakly to some ν ∗ ∈ P(Y ).
    Since, |x|2 dν ∗ ≤ lim inf n |x|2 dν̃n ≤ C (see [53], Lemma 4.3), we have that ν ∗ ∈
           R                          R
P2 (Y ).
    Now, we will prove that ∀1 ≤ i ≤ p, there exists a γi ∈ Πγ˜i (ν̃n , ν ∗ ), so that ν ∗ ∈ P′ .
    Fix an i ∈ {1, . . . , p} and n ∈ N. By assumption 2, there exists some γi,n ∈ Π(νi , ν̃n )
such that γi,n ≤ γ̃i . Since {γi,n } is tight, there exists a subsequence {γi,n } (not relabeled),
that weakly converges to some γi∗ ∈ P(X × Y ).
                                                       56


    Let E ⊂ Rd be an open set and {ϕl }∞         l=1 ⊂ Cb (R ) be a sequence of functions such that
                                                               d
0 ≤ ϕl ≤ 1E and ϕl ↗ 1E pointwise.
    Then,
              Z              Z                                    Z
                   ϕl dν̃n =      ϕl d(Projy (x, y)# γi,n ) =         ϕl ◦ Projy (x, y) dγi,n .
                Rd             Rd                                  Rd
    Letting n → ∞, we get
                               Z                 Z
                                           ∗
                                     ϕl dν =          ϕl ◦ Projy (x, y) dγi∗ .                  (5.2.4)
                                 Rd               Rd
    In (5.2.4), on the left hand side, we used the fact that ν̃n ⇀ ν ∗ and on the right hand
side, we used the fact that γi,n ⇀ γi∗ . Now, letting l → ∞ in 5.2.4, by monotone convergence
theorem, we get
                              Z                  Z
                                   1E dν = ∗
                                                      1E ◦ Projy (x, y) dγi∗ .
                                Rd                Rd
                                i.e.      ν ∗ (E) = Projy (x, y)# γi∗ (E).                      (5.2.5)
Now, let B ⊂ Rd be any Borel set. Since Borel measures are outer regular, we have that
                                     ν ∗ (B) = Projy (x, y)# γi∗ (B).                           (5.2.6)
Similarly, we can prove that, for any Borel set A ∈ Rd ,
                                      νi (A) = Projx (x, y)# γi∗ (A).                           (5.2.7)
    By (5.2.6) and (5.2.7), we can conclude that γi∗ ∈ Π(νi , ν ∗ ).
    Now, let E, F ⊂ Rd be two open sets and {ϕl }∞          l=1 ⊂ Cb (R × R ) be a sequence of
                                                                            d  d
functions such that 0 ≤ ϕl ≤ 1E×F and ϕl ↗ 1E×F pointwise.
    Since γi,n ≤ γ̃i , we have
                                   Z                      Z
                                             ϕl dγi,n ≤            ϕl dγ̃i .
                                     Rd ×Rd                Rd ×Rd
Now, letting n → ∞, gives us
                                    Z                    Z
                                              ϕl dγi∗ ≤           ϕl dγ̃i .
                                      Rd ×Rd               Rd ×Rd
                                                      57


Letting l → ∞, the monotone convergence theorem gives us
                                    Z                             Z
                                               1E×F dγi ≤   ∗
                                                                           1E×F dγ̃i ,
                                      Rd ×Rd                        Rd ×Rd
                                       i.e.        γi∗ (E × F ) ≤ γ̃i (E × F ).
Hence, for given two Borel sets A, B ⊂ Rd , we have that
                                             γi∗ (A × B) ≤ γ̃i (A × B).                         (5.2.8)
    Thus, γi∗ ∈ Πγ˜i (νi , ν ∗ ), i.e. ν ∗ ∈ P′ .
    Now, we will claim that ν ∗ is the desired minimizer.
    Observe that,
                                     p
                                   X
                     J(ν ∗ ) =              f 2 (νi , ν ∗ )
                                         λi W  2
                                   i=1
                                     p
                                   X
                                ≤        λi lim inf W   f22 (νi , ν̃n )    (By Lemma 5.2.3)
                                                 n
                                    i=1
                                               p
                                             X
                                ≤ lim inf          λi Wf22 (νi , ν̃n )
                                       n
                                             i=1
                                = lim inf J(ν̃n )
                                       n
                                = OTCCBC .
This concludes that ν ∗ is a minimizer for CCBC Problem.
5.2.2    Duality
    For the dual formulation, we will be adopting the duality results for two-marginal capacity
constrained OT problem by Rachev and Rüschendorf in [46].
    Let K × K be a compact subset of Rd × Rd containing supports of the capacities γ̃i for
all i. For an integer p ≥ 2 consider the dual problem:
                               ( p Z
                                   X               nλ                                  o
                                                        i
           OT∗CCBC = sup                      inf         |x − z|2 − fi (z) − wi (x, z) dνi (x)
                           A       i=1 K
                                             z∈K       2
                                         p Z
                                                                            )                   (5.2.9)
                                       X
                                   +                  wi (x, y) dγ̃i (x, y) .
                                       i=1     K×K
                                                              58


    Here, the supremum is taken over the set A of functions fi and wi satisfying fi ∈
                                                                  p
                                                                X
Cb (K), wi ∈ Cb (K × K) , wi ≤ 0, ∀i ∈ {1, 2, . . . p} and           fi = 0.
                                                                 i=1
    In the Theorem below, we will show that the strong duality result holds for the CCBC
Problem. We will be using some similar arguments as in [1], Proposition 2.2.
Theorem 5.2.5. Under the assumptions 1 and 2, the strong duality holds,
                                    i.e.       OTCCBC = OT∗CCBC .
Proof. Let ν ∈ P′ ∩ P(K), γi ∈ Π(νi , ν) such that γi ≤ γ̃i and (f1 , . . . , fp , w1 , . . . , wp ) such
that pi=1 fi = 0 and wi ≤ 0.
      P
    Then, for each (x, y) ∈ K × K, we have
                   nλ                                 o λ
                      i                                     i
              inf       |x − z| − fi (z) − wi (x, z) ≤ |x − y|2 − fi (y) − wi (x, y).
                               2
             z∈K     2                                     2
    Hence,
                   nλ                                 o                          λi
                      i
              inf       |x − z| − fi (z) − wi (x, z) + fi (y) + wi (x, y) ≤ |x − y|2 .
                               2
             z∈K     2                                                           2
Now, by integrating w.r.t. γi over K × K, we get
  Z        nλ                                  o          Z                    Z
              i         2
       inf      |x − z| − fi (z) − wi (x, z) dνi (x) +        fi (y) dν(y) +         wi (x, y) dγi (x, y)
    K z∈K   2                                              K                     K×K
                                                                  Z
                                                                         λi
                                                              ≤             |x − y|2 dγi (x, y). (5.2.10)
                                                                    K×K  2
Since γi ≤ γ̃i and wi ≤ 0, we have
                          Z                             Z
                               wi (x, y) dγ̃i (x, y) ≤        wi (x, y) dγi (x, y).              (5.2.11)
                           K×K                           K×K
By combining (5.2.10) and (5.2.11), we get
  Z        nλ                                  o          Z                    Z
              i         2
       inf      |x − z| − fi (z) − wi (x, z) dνi (x) +        fi (y) dν(y) +         wi (x, y) dγ̃i (x, y)
    K z∈K   2                                              K                     K×K
                                                                  Z
                                                                         λi
                                                              ≤             |x − y|2 dγi (x, y). (5.2.12)
                                                                    K×K  2
                                                     59


Now, we take the summation over i to get
    p Z          nλ                                      o              p Z
  X                   i
                                                                       X
                                 2
            inf         |x − z| − fi (z) − wi (x, z) dνi (x) +                   wi (x, y) dγ̃i (x, y)
   i=1   K z∈K       2                                                 i=1   K×K
                                                                      p Z
                                                                    X          λi
                                                                 ≤                |x − y|2 dγi (x, y). (5.2.13)
                                                                     i=1   K×K  2
Note that, in (5.2.13), we used the fact that
                                                           Pp
                                                              i=1 fi = 0.
    Now, by taking the infimum over γi ∈ Πγ˜i (νi , ν), infimum over ν ∈ P′ and supremum
over (fi , wi ) ∈ A from both sides, we get
                                               OT∗CCBC ≤ OTCCBC .                                          (5.2.14)
Next, we will prove that
                                               OT∗CCBC ≥ OTCCBC .
    For i ∈ {1, . . . , p}, define Hi : Cb (K) × Cb (K × K) 7→ R ∪ {∞} by
                       Z             nλ                                   o
                                         i         2
                      −                   |x − z| − fi (z) − wi (x, z) dνi (x)
                      
                                inf
                      
                      
                      
                            K z∈K      2
    Hi (fi , wi ) =                R
                               − K×K wi (x, y) dγ̃i (x, y)                              if wi ≤ 0 .        (5.2.15)
                      
                      
                      
                      
                                                                                        otherwise
                      
                      ∞
                      
Now, we take the Legendre transform of Hi (fi , wi ) in both variables to get
                                              (Z              Z                            )
             Hi∗ (ν, ξ) =           sup              fi dν +          wi dξ − Hi (fi , wi ) .              (5.2.16)
                             fi ∈Cb (K),wi ≤0    K             K×K
Here, ν ∈ M(K) and ξ ∈ M(K × K).
    Letting ξ = 0 in (5.2.16), we get
                                   (Z                                                           Z
                                              nλ                                  o
                                                 i
  Hi∗ (ν, 0) =          sup               inf               2
                                                   |x − z| − fi (z) − wi (x, z) dνi (x) +           fi (y) dν(y)
                 fi ∈Cb (K),wi ≤0      K z∈K    2                                                 K
                                                                         Z                           )
                                                                       +        wi (x, y) dγ̃i (x, y) . (5.2.17)
                                                                           K×K
First, we will show that if ν ∈ M(K) \ P(K), then Hi∗ (ν, 0) = ∞.
                                                           60


   Case 1: Suppose ∃A ∈ B(K) such that ν(A) < 0.
   Then, there exists a function f ∈ Cb (K) such that f ≤ 0 and               f dν > 0. Let t ≥ 0 be
                                                                           R
                                                                            K
arbitrary and choose fi = tf and wi = 0. Then,
                                  Z        nλ                        o   Z
                                               i
                    Hi∗ (ν, 0) ≥      inf                2
                                                 |x − y| − tf (y) dνi +      tf dν
                                   K y∈K      2                            K
                                   Z
                               ≥t      f dν,      ∀t ≥ 0.
                                    K
Hence, Hi∗ (ν, 0) = ∞.
   Case 2: Suppose ν(K) < 1.
   Let t > 0 be arbitrary and choose fi = −t and wi = 0. Then,
                                     Z         nλ                  o   Z
                                                   i
                       Hi∗ (ν, 0) ≥       inf                2
                                                     |x − y| + t dνi −     t dν
                                       K y∈K      2                      K
                                       Z              Z       
                                  ≥t          dνi −        dν , ∀t ≥ 0
                                           K            K
                                  = (1 − ν(K))t,         ∀t ≥ 0.
Hence, Hi∗ (ν, 0) = ∞.
   Case 3: Suppose ν(K) > 1.
   Let t > 0 be arbitrary and choose fi = t and wi = 0. Then,
                                     Z         nλ                  o   Z
                                                   i
                       Hi∗ (ν, 0) ≥       inf                2
                                                     |x − y| − t dνi +     t dν
                                       K y∈K      2                      K
                                        Z               Z        
                                  ≥t −            dνi +        dν , ∀t ≥ 0
                                              K            K
                                  = (−1 + ν(K))t,          ∀t ≥ 0.
Hence, Hi∗ (ν, 0) = ∞.
   Therefore, whenever ν ∈ M(K) \ P(K), we have Hi∗ (ν, 0) = ∞.
   Now, let ν ∈ P(K). By the duality of the two-marginal CCOT Problem (Theorem 3.4.1),
we get
                                       Hi∗ (ν, 0) = λi W f 2 (νi , ν).
                                                            2                                (5.2.18)
                                                      61


Hence, we have
                                              p                               p
                                            X                               X
                        OTCCBC = inf                 f 2 (νi , ν) = inf
                                                  λi W  2                         Hi∗ (ν, 0).          (5.2.19)
                                         ν                               ν
                                             i=1                            i=1
Now, consider the Legendre transform of                                       restricted to Cb (K) viewed as a
                                                       Pp          ∗
                                                           i=1 Hi (·, 0),
subspace of Cb (K)∗∗ .
                        p
                                       !∗                 (Z                    p
                                                                                              )
                      X                                                      X
                           Hi∗ (·, 0)      (f ) = sup              f dν −         Hi∗ (ν, 0)   .
                                                      ν         K
                       i=1                                                    i=1
where f ∈ Cb (K) and ν ∈ M(K).
    Letting f = 0, we get
                              p
                                             !∗                 (         p
                                                                                        )
                             X                                         X
                                  Hi∗ (·, 0)     (0) = sup          −       Hi∗ (ν, 0) .
                                                             ν
                             i=1                                        i=1
Hence,                                          !∗
                                  p                                      p
                                                                   (                   )
                                 X                                     X
                           −          Hi∗ (·, 0)     (0) = inf              Hi∗ (ν, 0)    .            (5.2.20)
                                                                ν
                                 i=1                                   i=1
By combining (5.2.19) and (5.2.20), we get
                                                           p
                                                                           !∗
                                                        X
                                   OTCCBC = −                  Hi∗ (·, 0)      (0).                    (5.2.21)
                                                         i=1
Now, we define the infimal convolution of Hi′ s as
                                        ( p                        p
                                                                                            )
                                          X                      X
                       H(f ) = inf              Hi (fi , wi ) :        fi = f, wi ≤ 0 .                (5.2.22)
                                           i=1                    i=1
Then,
                                            p                      p
                                        (                                                  )
                                          X                      X
                       H(0) = inf               Hi (fi , wi ) :       fi = 0, wi ≤ 0 .
                                           i=1                    i=1
So,
                                     p                     p
                            (                                                       )
                                    X                   X
           −H(0) = sup          −       Hi (fi , wi ) :        fi = 0, wi ≤ 0 .
                                    i=1                  i=1
                                        p Z
                                    (
                                       X              nλ                                        o
                                                           i
                   = P sup                       inf         |x − z|2 − fi (z) − wi (x, z) dνi (x)
                         p
                         i=1 fi =0,    i=1    K z∈K       2
                          wi ≤0
                               p Z
                                                                    )
                             X
                          +                wi (x, y) dγ̃i (x, y) .
                             i=1     K×K
                                                         62


Thus, we have
                                              −H(0) = OT∗CCBC .                                                  (5.2.23)
   Also note that,
                            (Z                       )
             H ∗ (ν) = sup          f dν − H(f )
                         f       K
                                                       p                      p
                            (Z                    (                                                  ))
                                                     X                      X
                     = sup          f dν − inf              Hi (fi , wi ) :       fi = f, wi ≤ 0
                         f       K                    i=1                   i=1
                                                            p                      p
                            (Z                    (                                                      ))
                                                          X                     X
                     = sup          f dν + sup         −        Hi (fi , wi ) :       fi = f, wi ≤ 0
                         f       K                         i=1                   i=1
                                 p Z                                          p
                            (                                        (                         ))
                                X                                           X
                     = sup               fi dν +         sup             −        Hi (fi , wi )
                         f      i=1   K            fi ∈Cb (K),wi ≤0          i=1
                                             p Z                 p
                                        (                                           )
                                           X                    X
                     =       sup                   fi dν −           Hi (fi , wi )
                       fi ∈Cb (K),wi ≤0    i=1   K              i=1
                         p
                                             (Z                               )
                       X
                     =            sup              fi dν − Hi (fi , wi )
                        i=1 fi ∈Cb (K),wi ≤0     K
                         p
                       X
                     =      Hi∗ (ν, 0).
                        i=1
Hence, we have
                                                            p
                                                         X
                                               ∗
                                             H (ν) =           Hi∗ (ν, 0).                                       (5.2.24)
                                                          i=1
Now, by combining (5.2.21) and (5.2.24), we get
                                              OTCCBC = −H ∗∗ (0).                                                (5.2.25)
Since we have (5.2.23), it only remains to show that
                                                 H ∗∗ (0) = H(0).                                                (5.2.26)
Since H is convex, it is sufficient (see [20]) to show that H is continuous at 0 for the norm
topology given by (5.1.2). To prove that, we rewrite Hi defined in (5.2.15) as
                     Z                                                                 Z
                              n                         λi           2
                                                                       o
    Hi (fi , wi ) =     sup fi (z) + wi (x, z) − |x − z| dνi (x) −                              wi (x, y) dγ̃i (x, y).
                      K z∈K                              2                               K×K
                                                          63


Plugging in z = 0, we get
                          Z                                 Z                     Z
                                                       λi
 Hi (fi , wi ) ≥ fi (0) +     wi (x, 0) dνi (x) −                   2
                                                                 |x| dνi (x) −            wi (x, y) dγ̃i (x, y). (5.2.27)
                           K                           2 K                          K×K
   Since ∀1 ≤ i ≤ p, wi ∈ Cb (K × K), there exist some negative real numbers mi such that
mi ≤ wi (x, 0) ≤ 0, for all x ∈ K.
   Thus, (5.2.27) becomes
                                                                          Z
                                                                      λi
                              Hi (fi , wi ) ≥ fi (0) + mi −                  |x|2 dνi (x).                       (5.2.28)
                                                                      2    K
Note that, in (5.2.28), the last integral is non-negative since wi ≤ 0.
   Now, taking the summation over i, we get
                       p                    p                  p                p Z
                     X                    X                  X             λi X
                          Hi (fi , wi ) ≥       fi (0) +           mi −                |x|2 dνi (x).
                      i=1                  i=1                i=1
                                                                            2 i=1 K
Now, for all f ∈ Cb (K) such that                    fi = f , we have
                                            Pp
                                               i=1
                                                     p                   p Z
                                                  X               λi X
                            H(f ) ≥ f (0) +             mi −                    |x|2 dνi (x).                    (5.2.29)
                                                   i=1
                                                                   2   i=1 K
By the finite second moment condition of each of the νi ’s, we get that,
                                                    H(f ) > −∞.                                                  (5.2.30)
Now, let f be such that ||f ||Y ≤        p
                                         4
                                            min{λ1 , . . . , λp }. Choosing fi =           f
                                                                                           p
                                                                                             and wi = 0 in H(fi , wi ),
we have
                                       p              
                                    X           f
                          H(f ) ≤         Hi        ,0
                                     i=1
                                                 p
                                       p Z           n f (y) λ                    o
                                    X                                 i
                                  ≤           sup                − |x − y|2 dνi
                                     i=1 K
                                              y∈K         p          2
                                       p Z           nλ
                                    X                     i             2    λi          2
                                                                                           o
                                  ≤           sup           (1 + |y| ) − |x − y| dνi
                                     i=1 K
                                              y∈K       4                     2
                                       p Z                        
                                    X            λi λi 2
                                  =                   + |x|              dνi
                                     i=1   K      4         2
                                            p        Z
                                     1 X λi
                                  = +                     |x|2 dνi .
                                     4 i=1 2 K
                                                             64


Again, by the finite second moment condition of each of the νi ’s, we get that,
                                          H(f ) < ∞.                                  (5.2.31)
   Thus, by (5.2.30) and (5.2.31), we have shown that the convex function H never takes the
value −∞ and is bounded from above in a neighborhood of 0. Thus, by a standard convex
analysis result (see [20], Proposition 2.5), H is continuous at 0. Hence, H ∗∗ (0) = H(0).
   Thus, by combining (5.2.23) and (5.2.25), we get
                                     OTCCBC = OT∗CCBC .                               (5.2.32)
                                               65


                                         CHAPTER 6
                            ENTROPY REGULARIZATION
6.1    Entropy Regularized Optimal Transport (EROT) Problem
6.1.1    Introduction
      Optimal transport offers an elegant solution to many theoretical and practical prob-
lems at the interface between probability, partial differential equations, and optimization.
This success however comes at a high computational price, and the computation becomes
prohibitive whenever the dimension exceeds a few hundred, since it requires the solution of
a linear program over distributions on a product space. Entropic regularization provides us
with an approximation of optimal transport, with lower computational complexity and easy
implementation. It operates by adding an entropic regularization penalty to the original
problem making it a strictly convex problem, hence guaranteeing a unique minimizer.
    In this section, we explore the EROT problem and its duality.
    The idea of smoothing the classical OTP with an entropic regularization term was first
introduced by Cuturi in [16] and it has been shown that the resulting optimum can be
computed through Sinkhorn’s matrix scaling algorithm at a speed that is several orders of
magnitude faster than that of transport solvers. Duality results and a characterization of
the dual maximizers have been presented by Marino and Gerolin in [39, 17]. Some of the
main results, definitions and remarks mentioned in this section are taken from [39, 45] .
6.1.2    The Primal Problem
    Let (X, dX ) and (Y, dY ) be Polish Spaces, c : X × Y → R be a cost function, µ ∈ P(X)
and ν ∈ P(Y ) be probability measures and ε > 0. We consider the following minimization
problem:
                                      Z
                     OTε =     inf         c(x, y)dγ(x, y) + ε KL(γ|µ ⊗ ν).          (6.1.1)
                             γ∈Π(µ,ν)  X×Y
                                                66


The function KL(γ|ξ) in (6.1.1) is the Kullback-Leibler divergence between the probability
measures γ and ξ in P(X × Y ) which is defined as
                                     Z
                                              γ log(γ)d(ξ)        if γ ≪ ξ
                                     
                                     
                                     
                         KL(γ|ξ) =       X×Y                                .
                                                                  otherwise
                                     
                                     +∞
                                     
Here, γ denotes the Radon-Nikodym derivative of γ with respect to the measure ξ.
The problem (6.1.1) can also be represented as
                                    OTε =      inf   ε KL(γ|K)                          (6.1.2)
                                            γ∈Π(µ,ν)
where K represents the Gibbs Kernel for the cost c, given by:
                                                          c(x,y)
                                K = k(x, y)µ ⊗ ν = e−       ε    µ ⊗ ν.                 (6.1.3)
The existence of a minimizer in (6.1.1) and its characterizations have been discussed by
several authors under different settings ([9], [15], [29] [49]).
    Under the definition (6.1.3) of K ([37], Proposition 2.3), a necessary and sufficient con-
dition for a minimizer, γε , of (6.1.1) to be unique is given by:
                                                
                                                       R
                                                fε (x) gε (y)k(x, y)dν(y) = 1
                                                
                                                          Y
       γε = fε (x)gε (y)K, where fε , gε solve                                  .       (6.1.4)
                                                       R
                                                gε (y) fε (y)k(x, y)dµ(x) = 1
                                                
                                                         X
    Here, the functions fε (x) and gε (y) are known as the Entropic potentials and they are
unique up to the trivial transformation f 7→ f /λ, g 7→ λg for some λ > 0. The system solved
by the Entropic potentials is called the Schrödinger system. The following theorem states
that if we assume that µ and ν are positive everywhere and their entropy w.r.t. K is finite,
then the minimizer of (6.1.1) takes a special form.
Theorem 6.1.1. ([9], Corollary 3,9) Let (X, dX ) and (Y, dY ) be two Polish spaces, and
c : X × Y → [0, ∞) be a bounded cost function. Suppose that µ ∈ P(X), ν ∈ P(Y ) are
                                                 67


two probability measures that are absolutely continuous w.r.t. Lebesgue measure, such that
                                                                                   c(x,y)
µ(x), ν(y) > 0, ∀x ∈ X, y ∈ Y , and KL(µ ⊗ ν|K) < +∞ for K = e−                       ε   µ ⊗ ν. Then, OTε
has a unique minimizer of the form
                                   γε (x, y) = fε (x)gε (y)K(x, y).
6.1.3     Duality
    We define a dual functional for (6.2.1) as follows:
                       Z                  Z                    Z
                                                                         u(x)+v(y)−c(x,y)
           Dε (u, v) =     u(x)dµ(x) +       v(y)dν(y) − ε            e         ε         d(µ ⊗ ν).
                        X                  Y                     X×Y
Then, the dual formulation of (6.2.1) is the following maximization problem:
                                  OT∗ε =         sup          Dε (u, v).                             (6.1.5)
                                           u∈Cb (X),v∈Cb (Y )
    Now, we will present the duality result for the EROT problem. This has been discussed
in several articles, such as [14, 21, 28, 27, 39].
Theorem 6.1.2. ([39], Proposition 2.11) Let (X, dX ), (Y, dY ) be two Polish metric space,
c : X × Y 7→ R be a bounded function and let ε > 0. Suppose that µ ∈ P(X) and ν ∈ P(Y )
are two probability measures that are absolutely continuous w.r.t. Lebesgue measure. Then,
                                           OTε = OT∗ε +ε.
    In [39], the authors also provide a direct proof of existence of maximizers in (6.1.5)
following the direct method of Calculus of Variations. The main idea in the proof is to define
a generalized version of c-transform (see Definition (2.2.1)), called the (c, ε)-transform.
6.1.3.1    Entropy-Transform and its properties
    In this section, we will consider functions c : X × Y → R which are bounded.
    First, we will define the space Lexp
                                       ε , in which the functions admissible for the dual problem
will lie.
                                                   68


Definition 6.1.3. Given an ε > 0, a Polish space (X, dX ), and a probability measure µ ∈
P(X), we define the set Lexp ε (X, dµ) by
                   n
  Lexp
    ε  (X, dµ)  :=  u : X → [−∞, ∞) : u is a measurable function in (X, dµ)
                                                                                     Z               o
                                                                             and 0 <    eu/ε dµ < ∞ .
                                                                                      X
    For u ∈ Lexp
               ε (X, dµ) we also define σu := ε log              eu/ε dµ .
                                                            R              
                                                              X
    Observe that u ∈ Lexpε (X, dµ) may take the value −∞ on a set of positive measure, but
not µ-a.e., since we have a condition X eu/ε dµ > 0.
                                           R
Definition 6.1.4. Given an ε > 0, two Polish spaces (X, dX ), (Y, dY ), two probability mea-
sures µ ∈ P(X), ν ∈ P(Y ), and a bounded function c : X × Y 7→ R, we define the (c, ε)-
                     ε (X, dµ) → L (Y, dν) by
transform F(c,ε) : Lexp                 0
                                                   Z                          
                                                             u(x)−c(x,y)
                          F (c,ε)
                                  (u)(y) := −ε log         e      ε       dµ(x) .                 (6.1.6)
                                                        X
                                                  ∗ ,ε)
Similarly, we define the (c∗ , ε)-transform F(c             ε (Y, dν) → L (X, dµ) by
                                                                               0
                                                        : Lexp
                                                     Z                        
                                                              v(y)−c(x,y)
                           (c∗ ,ε)
                         F         (v)(x) := −ε log        e       ε      dν(y) .                 (6.1.7)
                                                         Y
For simplicity, we will use the notation u(c,ε) = F(c,ε) (u).
Remark 6.1.5. Note that the definition of (c, ε)-transform is consistent with the definition
of c-transform in the sense that as ε → 0, u(c,ε) → uc ([26], Lemma 4.2).
    Now we will list some results about the (c, ε)-transform mentioned in [39] without proof.
Lemma 6.1.6. ( [39], Lemma 2.3) Let (X, dX ), (Y, dY ) be Polish spaces, u ∈ Lexp             ε (X, dµ),
      ε (Y, dν) and ε > 0. Then,
v ∈ Lexp
  (i) u(c,ε) ∈ L∞ (Y, dν) and v (c,ε) ∈ L∞ (X, dµ), satisfying the inequality
                      R u(x)                                            R u(x) 
       −∥c∥∞ − ε log X e ε dµ ≤ u(c,ε) (y) ≤ ∥c∥∞ − ε log X e ε dµ .
                                                   69


  (ii) u(c,ε) ∈ Lexp
                  ε (Y, dν) and v
                                   (c,ε)
                                         ∈ Lexp
                                             ε (X, dµ).
       Furthermore, |σu(c,ε) + σu | ≤ ∥c∥∞ .
     Given below are some more properties of the (c, ε)-transform.
Proposition 6.1.7. ([39], Prop 2.4) Let (X, dX ) and (Y, dY ) be Polish metric spaces, c :
X × Y → [0, ∞] be a bounded function, µ ∈ P(X), ν ∈ P(Y ) be probability measures,
       ε (X, dµ) and ε > 0. Then,
u ∈ Lexp
   (i) if c is L-Lipschitz, then u(c,ε) is L-Lipschitz;
  (ii) if c is ω-continuous, then u(c,ε) is ω-continuous;
 (iii) if |c| ≤ M , then |u(c,ε) + σu | ≤ M ;
 (iv) if |c| ≤ M , then F(c,ε) : L∞ (X, dµ) → Lp (Y, dν) is a 1-Lipschitz compact operator.
  (v) if c is K-concave with respect to y, then u(c,ε) is K-concave.
6.1.3.2      Dual Problem
     Recall that the dual functional of the EROT problem is given by
                     Z                  Z                 Z
                                                                  u(x)+v(y)−c(x,y)
        Dε (u, v) =     u(x)dµ(x) +         v(y)dν(y) − ε       e        ε         d(µ ⊗ ν).      (6.1.8)
                      X                   Y                 X×Y
     Now, we will state some results used to get the existence of dual maximizers.
Lemma 6.1.8. ([39], Lemma 2.6) Consider the functional Dε : Lexp                             exp
                                                                            ε (X, dµ) × Lε (Y, dν) →
R defined by (6.1.8). Then,
                           Dε (u, u(c,ε) ) ≥ Dε (u, v)     ∀v ∈ Lexp
                                                                   ε (Y, dν),                     (6.1.9)
                         Dε (u, u(c,ε) ) = Dε (u, v) if and only if v = u(c,ε) .                 (6.1.10)
In particular, u(c,ε) ∈ argmax{Dε (u, v) : v ∈ Lexp    ε (Y, dν)}.
                                                    70


Lemma 6.1.9. ([39], Lemma 2.7) Given u ∈ Lexp      ε (X, dµ) and v ∈ Lε (Y, dν), there exist
                                                                         exp
       ε (X, dµ) and v ∈ Lε (Y, dν) such that
u∗ ∈ Lexp                ∗        exp
    • Dε (u, v) ≤ Dε (u∗ , v ∗ ),
    • ∥v ∗ ∥∞ ≤ 3∥c∥∞ /2,
    • ∥u∗ ∥∞ ≤ 3∥c∥∞ /2.
Moreover we can choose a ∈ R such that u∗ = (v + a)(c,ε) and v ∗ = (u∗ )(c,ε) .
    The theorem given below states that the dual functional (6.1.8) attains a maximizer.
Theorem 6.1.10. ([39], Theorem 2.8) Let (X, dX ), (Y, dY ) be Polish spaces, c : X × Y → R
be a bounded function, µ ∈ P(X), ν ∈ P(Y ) be probability measures and ε > 0. Then, the
dual problem given by
                       sup {Dε (u, v) : u ∈ Lexp              exp
                                             ε (X, dµ), v ∈ Lε (Y, dν)}               (6.1.11)
attains the supremum for a unique couple (u0 , v0 ) (up to the trivial tranformation (u, v) 7→
(u + a, v − a)). In particular we have
                             u0 ∈ L∞ (X, dµ)  and    v0 ∈ L∞ (Y, dν);
moreover, we can choose the maximizers such that ∥u0 ∥∞ , ∥v0 ∥∞ ≤ 23 ∥c∥∞ .
    The proof mainly uses Lemma 6.1.8 and Lemma 6.1.9 along with Banach-Alaoglu theo-
rem.
6.2   Entropy Regularized Barycenter Problem
6.2.1    Introduction
    Similar to finding optimizers of the classical OT problem, finding the Wasserstein
barycenter is also not an easy task. By introducing the entropy regularization to the classical
Wasserstein Barycenter problem, it becomes more tractable and flexible.
                                               71


    Two different notions of regularization exist in the literature. In [7, 12], the authors
introduce a penalization term which is a function of the barycenter. On the other hand, in
[38], the authors add a penalty term which is a function of the transport plans between the
barycenter and the target measures. The main difference between these two approaches is the
reference point that is used to regularize the resulting measure. The choice of regularization
approach depends on the specific application and the desired properties of the resulting
measure. However, the similarities or the differences of these approaches have not been
discussed widely in the literature. In this section, as in [38] we will be considering the latter
approach.
    In [38], the authors introduce the regularized Wasserstein barycenter problem and its
dual formulation. They prove that the strong duality holds and the existence of the primal
problem via duality result. However, they do not discuss the existence of the maximizers of
the dual functional.
    In this section, we will provide a direct proof for the existence of a minimizer for the
primal problem and the existence of dual maximizers.
6.2.2     The Primal Problem
    Throughout this section, we will be working with compact subsets of Rd and symmetric,
bounded cost functions. Recall that the entropy regularized OT problem is defined as
                                              Z
                   OTε (µ, ν) = inf                  c(x, y)dγ(x, y) + ε KL(γ|µ ⊗ ν).                      (6.2.1)
                                     γ∈Π(µ,ν)   X×Y
By the duality result, we have
                                    Z                  Z                    Z
                                                                                      u(x)+v(y)−c(x,y)
  OTε (µ, ν) =         sup             u(x)dµ(x) +        v(y)dν(y) − ε             e        ε         d(µ ⊗ ν).
                 u∈Cb (X),v∈Cb (Y )  X                  Y                    X×Y
Also recall that given an integer p ≥ 2, X1 , . . . , Xp , Y compact subsets of Rd , a p-tuple of
probability measures (ν1 ,...,νp ) each in P(Xi ), a p-tuple of positive real numbers (λ1 ,...,λp )
        p
      X
with       λi =1, and cost functions ci : Xi × Y 7→ R, we define the Wasserstein Barycenter
       i=1
problem as
                                                                p
                                                               X
                             OTBC (ν1 , . . . , νp ) = inf         λi OTci (νi , ν)                        (6.2.2)
                                                       ν∈P(Y )
                                                               i=1
                                                       72


where OTci (νi , ν) is the optimal transport cost between νi and ν with cost ci given in (2.1.6).
    In [1], Proposition 2.2, gives us the following duality result for the case ci (x, y) = 21 |x−y|2 .
                                                         p Z            n                           o
                                                        X
     OTBC (ν1 , . . . , νp ) =          sup                      inf     λ i c i (x i , y) − f i (y)  dνi (xi ).    (6.2.3)
                                              p                      d
                                                             Xi y∈R
                                            P
                               fi ∈Cb (Y ),   i=1 fi =0 i=1
Now we will introduce the Entropy Regularized Barycenter (ERBC) problem.
    Given an integer p ≥ 2, X1 , . . . , Xp , Y compact subsets of Rd , a p-tuple of probability
                                                                                                                  p
                                                                                                                 X
measures (ν1 ,...,νp ) each in P(Xi ), a p-tuple of positive real numbers (λ1 ,...,λp ) with                         λi =1,
                                                                                                                 i=1
and ε > 0, we define the ERBC problem as
                                                                       p
                                                                    X
                               OTεBC (ν1 , . . . , νp )   = inf           λi OTε (νi , ν).                          (6.2.4)
                                                            ν∈P(Y )
                                                                    i=1
Now we will prove that the minimization problem (6.2.4) has a minimizer. For simplicity we
will assume that ci (x, y) = 21 |x − y|2 for each 1 ≤ i ≤ p in this proof, but we can prove that
this result holds for more general costs. Also note that in [38], the authors have provided a
proof for the existence of a minimizer via duality and here we provide a direct proof without
assuming the duality result.
Lemma 6.2.1. OTε is weakly lower semi-continuous.
Proof. Let {µn }n∈N and {νn }n∈N be two sequences such that µn ⇀ µ∗ and νn ⇀ ν ∗ . We can
pick subsequences (not relabeled) {µn }n∈N and {νn }n∈N such that
                                   lim OTε (µn , νn ) = lim inf OTε (µn , νn ).
                                  n→∞                          n→∞
Since {µn }n∈N and {νn }n∈N are tight                     Π(µn , νn ) is also tight (see [53], Lemma 4.4). Now
                                                  S
                                                      n∈N
let {γn }n∈N be a sequence with marginals µn and νn which is "close" to the optimal value of
OTε (µn , νn ),
    i.e. given δ > 0,
                        Z
                            |x − y|2 dγn + ε KL(γn |µn ⊗ νn ) ≤ OTε (µn , νn ) + δ.
Since {γn }n∈N is also tight, we can pick a subsequence (not relabeled) {γn }n∈N such that
γn ⇀ γ ∗ and γ ∗ ∈ Π(µ∗ , ν ∗ ).
                                                            73


    Now, observe that,
                                   Z
                        ∗    ∗
                  OTε (µ , ν ) ≤       |x − y|2 dγ ∗ + ε KL(γ ∗ |µ∗ ⊗ ν ∗ )
                                             Z
                                 ≤ lim inf |x − y|2 dγn + ε lim inf KL(γn |µn ⊗ νn )
                                       n                                 n
                                             (Z                                            )
                                 ≤ lim inf           |x − y|2 dγn + ε KL(γn |µn ⊗ νn )
                                       n
                                 ≤ lim inf OTε (µn , νn ) + δ
                                       n
                                 = lim OTε (µn , νn ) + δ.
                                     n
In the second inequality, we get the first term due to weak lower semi-continuity of the total
cost (see [53], Lemma 4.3), and the second term due to the lower semi-continuity of the
relative entropy, which is a well-known result (see [19], Lemma 1.4.3).
    Finally, letting δ → 0, we get the lower semi-continuity result.
Theorem 6.2.2. Let X1 , . . . , Xp , Y be compact subsets of Rd . Given an integer p > 0, a
p-tuple of probability measures (ν1 ,...,νp ) each in P2 (Xi ), a p-tuple of positive real numbers
                    p
                   X
(λ1 ,...,λp ) with     λi =1, and ε > 0, there exists a minimizer for the ERBC problem given
                   i=1
by (6.2.4).
Proof. Let
                p                       p                  Z
               X                      X                            1
     I(ν) :=       λi OTε (νi , ν) =           inf      λi           |x − y|2 dγi (x, y) + ε KL(γi |νi ⊗ ν).
               i=1                    i=1
                                           γi ∈Π(νi ,ν)     Xi ×Y  2
Also, we will denote the squared 2 - Wasserstein distance by
                                                          Z
                                                                 1
                           W22 (νi , ν)  :=      inf               |x − y|2 dγi (x, y).
                                             γi ∈Π(νi ,ν)  Xi ×Y 2
Then, (6.2.4) becomes
                                     OTεBC (ν1 , . . . , νp ) = inf I(ν).
                                                                 ν∈P(Y )
    Now, let {ν n }n∈N be a minimizing sequence of OTεBC , i.e. limn→∞ I(νn ) = inf ν I(ν).
Then, we can find an M > 0 such that I(νn ) ≤ M for all n. Since ε KL(γi |νi ⊗ ν) ≥ 0, we
                                                          74


have for each n,
                                    p
                                  X
                                        λi W22 (νi , νn ) ≤ I(νn ) ≤ M.
                                  i=1
Thus, for each 1 ≤ i ≤ p and for each n ∈ N, λi W22 (νi , νn ) ≤ M .
Using the duality of the Kantorovich problem and the assumption that νi ’s have finite second
moments, we can show that (see Appendix 3)
                              Z
                          sup    |x|2 dνn ≤ C,            for some constant C.
                            n
Hence, {νn } is tight (see Appendix 4).
    Then, by Prokhorov’s theorem, there exists a subsquence {νn } (not relabeled), that
converges weakly to some ν ∗ ∈ P(Y ). Since, |x|2 dν ∗ ≤ lim inf n |x|2 dνn ≤ C, we
                                                            R                   R
have that ν ∗ ∈ P2 (Y ). Here, the first inequality is again due to the weak convergence of
measures for lower semi-continuous bounded below costs ([19], Theorem A.3.12). Note that
   |x|2 dν = W22 (ν, δ0 ) for any ν ∈ P2 (Y ).
R
 X
    Now, we will prove that ν ∗ is the desired minimizer.
    Observe that,
                      inf I(ν) = lim inf I(νn )
                        ν              n
                                              p
                                            X
                                = lim inf        λi OTε (νi , νn )
                                       n
                                             i=1
                                     p
                                   X
                                ≥        lim inf λi OTε (νi , νn )
                                            n
                                   i=1
                                     p
                                   X
                                ≥        λi OTε (νi , ν ∗ )     (By Lemma 6.2.1)
                                   i=1
                                = I(ν ∗ ).
This proves that (6.2.4) has a minimizer.
Remark 6.2.3. Note that we assumed that X1 , . . . , Xp , Y are compact only to be consistent
with the assumptions on the ERBC Problem in the original paper [38]. However, we do
                                                      75


not require the compactness of the spaces in the proof, hence the Theorem 6.2.2 holds for
X1 = . . . = Xp = Y = Rd .
6.2.3       Duality
      We define the dual functional of the ERBC problem as
                                             p     Z                 Z                                                      
                                            X                                   (ϕi (xi )+ψi (y)−ci (xi ,y))
JεBC (ϕ1 , . . . , ϕp , ψ1 , . . . , ψp ) =     λi         ϕi dνi − ε         e               ε
                                                                                                                    ∗
                                                                                                             dνi ⊗ ν (xi , y) .
                                            i=1        Xi               Xi ×Y
                                                                                                                        (6.2.5)
Since we already know the existence of a barycenter of the minimization problem (6.2.4),
say ν ∗ , we will use ν ∗ for the following duality results.
      Now we will present the duality result for the ERBC problem discussed in [38].
Theorem 6.2.4. ([38], Theorem 3.1) The dual formulation of (6.2.4) is
                                     p     Z                         Z                                                      
                                  X                                             (ϕi (xi )+ψi (y)−ci (xi ,y))
                                                                                                                    ∗
              sup                       λi      ϕi (xi )dνi (xi ) − ε         e               ε              dνi ⊗ ν (xi , y) .
 {(ϕi ∈Cb (Xi ),ψi ∈Cb (Y ))}pi=1 i=1        Xi                        Xi ×Y
        Pp
           i=1 λi ψi =0
                                                                                                                        (6.2.6)
Moreover, strong duality holds in the sense that the infimum of (6.2.4) equals the supremum
of (6.2.6), and a solution to (6.2.4) exists. If {(ϕi , ψi )}pi=1 solves (6.2.6), then each (ϕi , ψi )
is a solution to the dual formulation (6.1.5) of OTε (νi , ν ∗ ).
      The proof relies on the convex duality theory of locally convex topological spaces.
      Now, we will prove that dual maximizers for (6.2.6) exist.
Theorem 6.2.5. Given an integer p > 2, let X1 , . . . , Xp , Y be compact subsets of Rd , ci :
Xi × Y 7→ R+ be symmetric, bounded cost functions such that for each 1 ≤ i ≤ p, ci
is Li -Lipschitz, (ν1 ,...,νp ) be a p-tuple of probability measures each in P(Xi ), (λ1 ,...,λp ) be
                                                           p
                                                         X
a p-tuple of positive real numbers with                        λi =1, and ε > 0. Then, there exist functions
                                                          i=1
{(ϕi ∈ L1 (Xi , νi ), ψi ∈ L1 (Y, ν ∗ ))}pi=1 that solve (6.2.6).
                                                               76


Proof. We will redefine the dual formulation as
                                                                                  p
                                                                                 X
                                  J∗BC    :=                sup                       λi D(ϕi , ψi ),                           (6.2.7)
                                              {(ϕi ∈Cb (Xi ),ψi ∈Cb (Y ))}pi=1 i=1
                                                      Pp
                                                         i=1 λi ψi =0
where
                                         Z                 Z                    Z
                                                                                           (ϕi +ψi −ci )
                                                                      ∗
                      D(ϕi , ψi ) :=          ϕi dνi +         ψi dν − ε                 e      ε        dνi ⊗ ν ∗ .
                                           Xi                Y                    Xi ×Y
    Let (ϕ1,n , . . . , ϕp,n , ψ1,n , . . . , ψp,n )n∈N be a maximizing sequence. i.e. we have that for each
                                                                                                                 p
                                                                                                              X
i = 1, . . . , p, ϕi,n ∈ Cb (Xi ), ψi,n ∈ Cb (Y ) with i=1 λi ψi,n = 0 and lim
                                                                    Pp
                                                                                                                    λi D(ϕi,n , ψi,n ) =
                                                                                                      n→∞
                                                                                                               i=1
J∗BC .
    For each 1 ≤ i ≤ p define ϕei,n = ψi,n             ci ,ε
                                                             . By Lemma 6.1.8, we have that
                                               D(ϕi,n , ψi,n ) ≤ D(ϕei,n , ψi,n ).                                              (6.2.8)
                                                                                                  Pp−1
                                                                                                          λi ki,n
Now for each 1 ≤ i ≤ p − 1, define ki,n = −σψi,n and kp,n = −                                        i=1
                                                                                                         λp
                                                                                                                   (the definition of
the softmax operator σψi,n              is given in definition 6.1.3). Note that, we have pi=1 λi ki,n = 0.
                                                                                                                    P
    By Proposition 6.1.7 part (iii), we have that |ψi,n                     ci ,ε
                                                                                  + σψi,n | ≤ ||ci ||∞ . A simple calculation
gives us that |ψi,n    ci ,ε
                             − ki,n | ≤ ||ci ||∞ for each 1 ≤ i ≤ p − 1.
    Now, for each 1 ≤ i ≤ p, we define
                                     ϕ∗i,n = ϕei,n − ki,n        and ψi,n    ∗
                                                                                  = ψi,n + ki,n .
Then for each 1 ≤ i ≤ p − 1, we have that ||ϕ∗i,n ||∞ ≤ ||ci ||∞ . i.e.
                                                      sup ||ϕ∗i,n ||∞ < ∞.                                                      (6.2.9)
                                                        n
    Observe that,
                                            p                  p
                                           X                  X
                                                    ∗
                                               λi ψi,n   =         λi (ψi,n + ki,n ) = 0.
                                           i=1                i=1
    Since for each 1 ≤ i ≤ p, ci is Li -Lipschitz, by Proposition 6.1.7 part (i), for each 1 ≤ i ≤ p
and n ∈ N, ψi,n   ci ,ε
                        is Li -Lipschitz. Hence for each 1 ≤ i ≤ p − 1, ϕ∗i,n = ψi,n                         ci ,ε
                                                                                                                   − ki,n is Lipschitz
continuous with the same Lipschitz constant maxi Li .
    Now, we will prove that for each 1 ≤ i ≤ p, supn ||ψi,n                         ∗
                                                                                       ||L1 (Y,dν ∗ ) < ∞.
                                                                   77


    Let 1 ≤ i ≤ p − 1 be arbitrary. Then,
                            Z ψ∗                 Z ψ +k
                                    i,n                  i,n    i,n
                                           ∗
                                 e ε dν =             e ε dν ∗
                              Y
                                                 ZY ψi,n −σψ
                                                                  i,n
                                              =       e       ε       dν ∗
                                                   Y                Z ψ
                                                                              i,n
                                                                                      ∗
                                                 Z ψi,n −ε log e ε dν
                                                                      Y
                                              =       e                 ε               dν ∗
                                                   Y
                                                                     Z                  −1
                                                                               ψi,n
                                                                                        ∗
                                                 Z      ψi,n
                                                                 log        e   ε   dν
                                              =       e  ε     e          Y                    dν ∗
                                                   Y
                                                 Z            Z                     −1
                                                        ψi,n             ψi,n
                                              =       e  ε             e   ε   dν  ∗
                                                                                          dν ∗
                                                   Y                Y
                                              = 1.
Since for a given 1 ≤ q < ∞, there is a constant c such that ctq ≤ et , for each t > 0, we have
that for each 1 ≤ q < ∞,
                                               ∗   q                           ∗
                                             ψi,n
                                      Z                              Z
                                                           ∗       1          ψi,n
                                                        dν ≤                e   ε  dν ∗ .
                                        Y      ε     +             c     Y
    Thus,
                                    ∗ q ∗ εq
                              Z
                                     ψi,n + dν ≤                for some constant c.
                                Y                        c
So, supn || ψi,n              ∗ < ∞ for each 1 ≤ q < ∞ and for each 1 ≤ i ≤ p − 1.
            ∗ 
                   || q
                 + L (Y,dν )
In particular,
                                                                                                                   (6.2.10)
                                   ∗ 
                         sup || ψi,n         || 1
                                          + L (Y,dν )
                                                        ∗ < ∞,                 ∀1 ≤ i ≤ p − 1.
                           n
    Now, since (ϕ∗i,n , ψi,n
                         ∗
                             )1≤i≤p,n∈N is a maximizing sequence, there is some constant c1 such
that
             p
            X
     −c1 ≤      λi D(ϕ∗i,n , ψi,n
                                ∗
                                   )
            i=1
             p                             p
                                                              ϕ∗       ∗
                   Z                              Z
            X                            X                      i,n +ψi,n −ci
         =      λi      ϕ∗i,n dνi  −ε         λi            e         ε         dνi ⊗ ν ∗
                    Xi                              Xi ×Y
            i=1                          i=1
                                                                                                                   (6.2.11)
            p−1                                                     p−1
                                                                                          ϕ∗     ∗
                   Z                      Z                                   Z
            X                                                      X                       i,n +ψi,n −ci
         =      λi      ϕ∗i,n dνi + λp           ϕ∗p,n dνp − ε            λi            e       ε        dνi ⊗ ν ∗
            i=1     Xi                       Xp                     i=1          Xi ×Y
                                  ϕ∗     ∗
                     Z
                                   p,n +ψp,n −cp
              − ελp             e       ε        dνp ⊗ ν ∗ .
                        Xp ×Y
                                                            78


   Recall that, for each 1 ≤ i ≤ p, ϕ∗i,n = ψi,n               ci ,ε
                                                                      − ki,n and ψi,n       ∗
                                                                                                = ψi,n + ki,n .
   Hence,
                                 ∗                                      c ,ε
                        ϕ∗                                            ψ i −ki,n +ψi,n +ki,n −ci
             Z                                           Z
                          i,n +ψi,n −ci                                 i,n
                                                   ∗
                      e         ε       dνi ⊗ ν =                   e                 ε               dνi ⊗ ν ∗
               Xi ×Y                                       Xi ×Y
                                                                        c ,ε
                                                                      ψ i +ψi,n −ci
                                                         Z
                                                                        i,n
                                                      =             e         ε          dνi ⊗ ν ∗
                                                           Xi ×Y
                                                                   c ,ε
                                                                ψ i
                                                         Z               Z                         
                                                                                     ψi,n −ci
                                                      =       e
                                                                   i,n
                                                                    ε             e      ε
                                                                                                  ∗
                                                                                              dν dνi                             (6.2.12)
                                                           Xi                 Y
                                                                   c ,ε           c ,ε
                                                                ψ i             ψ i
                                                         Z
                                                                   i,n            i,n
                                                                             −
                                                      =       e     ε     ·e       ε    dνi
                                                           Xi
                                                      = 1.
   The penultimate equality holds since,
                                                                                    c ,ε
                                                                                  ψ i
                                             Z     ψi,n −ci
                                                                                              Z      ψi,n −ci
                        ci ,ε                                                       i,n
                                                               ∗              −
                      ψi,n     = −ε log          e     ε    dν      =⇒ e             ε     =       e     ε     dν ∗ .
                                               Y                                                Y
Also, since ||ϕ∗i,n ||∞ ≤ ||ci ||∞ for each 1 ≤ i ≤ p−1,
                                                                             Pp−1
                                                                                                  ϕ∗i,n dνi ≤ maxi ||ci ||∞ (1−λp ).
                                                                                            R
                                                                                  i=1    λi   Xi
Now, the inequality (6.2.11) becomes
                                                                                                               ϕ∗      ∗
                                                Z                                                Z
                                                                                                                p,n +ψp,n −cp
  −c1 ≤ max ||ci ||∞ (1 − λp ) + λp                   ϕ∗p,n dνp  − ε(1 − λp ) − ελp                          e        ε       dνp ⊗ ν ∗ .
            i                                     Xp                                                Xp ×Y
Hence, there exists a constant c2 such that
                                                                             ϕ∗        ∗
                                       Z                      Z
                               c2                                             p,n +ψp,n −cp
                                   ≤         ϕ∗p,n dνp   −ε                e          ε        dνp ⊗ ν ∗ .                       (6.2.13)
                              λp         Xp                     Xp ×Y
Now, consider
                                                                        ϕ∗       ∗
                                   Z                      Z
                                                                         p,n +ψp,n −cp
                                         ϕ∗p,n dνp  −ε                e         ε          dνp ⊗ ν ∗ .
                                    Xp                      Xp ×Y
                         t−a
Since f (t) = t − εe       ε    is concave and it attains its maximum at t = a,
                                      ϕ∗      ∗                                                        ϕ∗            ∗
    Z                    Z                                               Z                                               
                                        p,n +ψp,n −cp                                                    p,n −(cp −ψp,n )
        ϕ∗p,n dνp  −ε               e        ε        dνp ⊗ ν =  ∗                         ∗
                                                                                         ϕp,n − εe              ε           dνp ⊗ ν ∗
     Xp                     Xp ×Y                                           Xp ×Y
                                                                         Z
                                                                                                 ∗
                                                                     ≤               (cp − ψp,n       − ε) dνp ⊗ ν ∗
                                                                            Xp ×Y
                                                                             Z
                                                                                       ∗
                                                                     ≤−            ψp,n     dν ∗ + ||cp ||∞ − ε.
                                                                               Y
                                                                 79


Thus, by (6.2.13), there exists a constant c3 such that
                                                              Z
                                                 c3 ≤ −             ∗
                                                                  ψp,n     dν ∗ .                        (6.2.14)
                                                               Y
Since                 = 0, we get that
      Pp          ∗
         i=1 λi ψi,n
                                                         p−1       Z
                                                     1 X                   ∗
                                            c3 ≤               λi        ψi,n   dν ∗ .
                                                     λp i=1          Y
Hence,
                                    p−1     Z                             p−1       Z
                                   X                                     X               ∗ 
                                                     ∗            ∗
                                                                                          ψi,n − dν ∗ .
                                                       
                        λp c3 ≤         λi         ψi,n   +
                                                               dν −            λi
                                    i=1       Y                           i=1         Y
Thus, by (6.2.10), there is some constant c4 such that
                                            p−1      Z
                                            X              ∗ 
                                                 λi        ψi,n − dν ∗ ≤ c4 .
                                            i=1        Y
Since λi                   ∗
                                     we   have    that              ψi,n − dν ∗ ≤ c4 for each 1 ≤ i ≤ p − 1. This
         R  ∗                                               R  ∗ 
           Y
              ψ i,n −
                       dν    ≥   0,                      λ  i   Y
gives us that
                               ∗ < ∞ for each 1 ≤ i ≤ p − 1.
             ∗ 
    supn || ψi,n     || 1
                   − L (Y,dν )
    Hence, for each 1 ≤ i ≤ p − 1,
                       Z                      Z                            Z
                               ∗        ∗
                                                     ∗ 
                                                     ψi,n + dν ∗ +
                                                                                 ∗ 
                            |ψi,n |  dν =                                          ψi,n − dν ∗ < ∞.
                         Y                      Y                            Y
    Thus we have for each 1 ≤ i ≤ p − 1, supn ||ψi,n                  ∗
                                                                          ||L1 < ∞.
    Now observe that,
                                                                    p−1
                                  ∗                              1 X             ∗
                              ||ψp,n  ||L1 (Y,dν ∗ ) = || −                λi ψi,n  ||L1 (Y,dν ∗ )
                                                                λp i=1
                                                              p−1
                                                         1 X                ∗
                                                     ≤              λi ||ψi,n  ||L1 (Y,dν ∗ )
                                                        λp i=1
                                                                                            p−1
                                                                    ∗                   1   X
                                                     ≤  max ||ψi,n       ||L1 (Y,dν ∗ )            λi
                                                           i                            λp  i=1
                                                        1 − λp                    ∗
                                                     =              max ||ψi,n       ||L1 (Y,dν ∗ )
                                                            λp         i
                                                     < ∞.
                                                               80


    Hence,
                              sup ||ψi,n∗
                                            ||L1 (Y,dν ∗ ) < ∞,           ∀1 ≤ i ≤ p.      (6.2.15)
                               n
    Now we will prove that supn ||ϕ∗p,n ||L1 (Xp ,dνp ) < ∞.
    Observe that, for a given 1 ≤ i ≤ p,
                    ϕ∗i,n = (ψi,n∗ ci ,ε
                                    )
                                       Z ϕ∗ −c
                                                i,n   i
                          = −ε log           e ε dν ∗
                                  Z Y∗
                                       ϕi,n − ci
                          ≤ −ε                        dν ∗        (by Jensen’s inequality)
                                    Y        ε
                             Z
                                              ∗
                                                    dν ∗
                                                 
                          =        ci − ψi,n
                               Y
                                                    Z
                                                           ∗
                          ≤ max ||ci ||∞ −               ψi,n  dν ∗
                             1≤i≤p                   Y
                          ≤ c5 ,         for some constant c5 .
The last inequality above holds by (6.2.15).
    Hence, ∀1 ≤ i ≤ p,
                                         ϕ∗i,n (x) ≤ c5 ,           ∀x ∈ Xi .              (6.2.16)
In particular,
                                                Z
                                                      ϕ∗p,n dνp ≤ c5 ,                     (6.2.17)
                                                  Xp
and
                                      Z
                                                                                           (6.2.18)
                                             ∗ 
                                              ϕp,n + dνp ≤ max{0, c5 }.
                                        Xp
                                  ϕ∗      ∗
                                   p,n +ψp,n −cp
In (6.2.13), since −ε                              dνp ⊗ ν ∗ ≤ 0, we have that
                       R
                         Xp ×Y
                                e        ε
                                                         Z
                                                c2
                                                     ≤         ϕ∗p,n dνp .                 (6.2.19)
                                                λp         Xp
Combining (6.2.17) and (6.2.19), we get
                                                     Z
                                            c2
                                                 ≤         ϕ∗p,n dνp ≤ c5 .
                                            λp          Xp
                                                            81


                                         Z                         Z
                                  c2
                           i.e.                  ϕ∗p,n +
                                                                       ∗ 
                                      ≤                   dνp −           ϕp,n − dνp ≤ c5 .
                                  λp        Xp                      Xp
Thus, by (6.2.18), there is some constant c6 such that
                                               Z
                                                                                                                (6.2.20)
                                                      ∗ 
                                                      ϕp,n − dνp ≤ c6 .
                                                 Xp
Thus, we have that
             Z                      Z                       Z
                  |ϕ∗p,n |
                                          ∗                      ∗ 
                            dνp =          ϕp,n + dνp +             ϕp,n − dνp ≤ max{0, c5 } + c6 .
               Xp                    Xp                       Xp
Hence, supn ||ϕ∗p,n ||L1 (Xp ,dνp ) < ∞.
   For simplicity of the proof, we will just use the uniform L1 -boundedness of ϕ∗i,n for each
1 ≤ i ≤ p − 1. Later, we will improve this proof using the uniform boundedness of them.
   Now, since for each 1 ≤ i ≤ p, (ϕ∗i,n )n∈N is a sequence with supn ||ϕ∗i,n ||L1 (Xi ,dνi ) < ∞,
                                                                                                     P                
by Komlós theorem ([32]), there is a subsequence (ϕ∗i,nm )m∈N such that N1 N                                     ϕ ∗
                                                                                                            m=1 i,nm
converges pointwise νi -a.e. to some ϕ∗i as N → ∞ and the same is true for any further
subsequence of (ϕ∗i,nm ).
   Similarly, since for each 1 ≤ i ≤ p, supn ||ψi,n              ∗
                                                                    ||L1 (Y,dν ∗ ) < ∞, for each 1 ≤ i ≤ p, we
                                                            P                    
can find a subsequence (ψi,n      ∗
                                      )
                                    m m∈N
                                             such    that    1
                                                             N      m=1 i,nm converges pointwise ν -a.e. to
                                                                    N
                                                                          ψ ∗                                  ∗
some ψi∗ as N → ∞ and the same is true for any further subsequence of (ψi,n                          ∗
                                                                                                       m
                                                                                                         ).
   Recall that we have
                 p                               p                                    p
                X                              X                                     X
         sup         λi D(ϕi , ψi ) = lim            λi D(ϕi,n , ψi,n ) = lim            λi D(ϕi,nm , ψi,nm ).
         ϕi ,ψi i=1                     n→∞                                 m→∞
                                                i=1                                  i=1
   Now, fix δ > 0. Then, there exists an integer N0 such that for each m > N0 ,
                         p                              p
                        X                             X
                 sup          λi D(ϕi , ψi ) − δ ≤        λi D(ϕi,nm , ψi,nm )
                 ϕi ,ψi i=1                           i=1
                                                        p
                                                      X
                                                   ≤      λi D(ϕei,nm , ψi,nm )          (By (6.2.8))
                                                      i=1
                                                        p
                                                      X
                                                   =      λi D(ϕ∗i,nm , ψi,n∗
                                                                               m
                                                                                  ).
                                                      i=1
                                                           82


     Now, consider the subsequences {ϕ∗i,nN +m }m∈N and {ψi,n                         ∗
                                                                                              }
                                                                                        N0 +m m∈N
                                                                                                     . Note that, the av-
                                                         0
erages { N1 N     m=1 ϕi,nN +m }m∈N and { N             m=1 ψi,nN +m }m∈N also converge to ϕ , νi -a.e. and
               P         ∗                       1
                                                    PN          ∗                                              ∗
                             0                                       0
ψ , ν -a.e. respectively.
  ∗      ∗
     Then, for any N ≥ 1,
            p                         p         N
           X                         X     1 X
    sup        λi D(ϕi , ψi ) − δ ≤                 λi D(ϕ∗i,nN +m , ψi,n     ∗
                                                                                        )
    ϕi ,ψi i=1                       i=1
                                           N m=1                   0              N0 +m
                                      p                  N                           N
                                                                                                   !
                                     X              1 X ∗                      1 X ∗
                                  ≤       λi D                ϕ            ,             ψ
                                     i=1
                                                    N m=1 i,nN0 +m N m=1 i,nN0 +m
                                      p       Z          N
                                     X              1 X ∗
                                  =       λi                  ϕi,nN +m dνi
                                               Xi N m=1
                                                                     0
                                     i=1
                                              p                   1 PN          ∗           1 PN      ∗
                                                                        m=1 ϕi,nN +m + N        m=1 ψi,nN +m −ci
                                            X         Z           N
                                                                                     0                     0
                                       −ε          λi          e                           ε                     dνi ⊗ ν ∗ .
                                             i=1        Xi ×Y
                                                                                                                     (6.2.21)
Note that, on the first line above, we take the sum over m from 1 to N and divide by N on
both sides, and on the second line, we use the concavity of the functional D (see Appendix
5).
     Then,
         p                                       p
                                            (            Z           N
       X                                       X               1 X ∗
 sup        λi D(ϕi , ψi ) − δ ≤ lim sup             λi                   ϕ              dνi
ϕi ,ψi i=1                         N →∞
                                                i=1        Xi N m=1 i,nN0 +m
                                           p                   1 PN         ∗            1 PN     ∗
                                                                                                                         )
                                                                    m=1 ϕi,nN +m + N         m=1 ψi,nN +m −ci
                                          X        Z          N
                                                                                  0                     0
                                    −ε         λi           e                            ε                     dνi ⊗ ν ∗
                                          i=1        Xi ×Y
                                   p                  Z          N
                                  X                         1 X ∗
                               ≤      λi lim sup                      ϕi,nN +m dνi
                                  i=1
                                           N →∞        X i
                                                           N   m=1
                                                                              0
                                        p                              1 PN         ∗          1 PN       ∗
                                                                            m=1 ϕi,nN +m + N      m=1 ψi,nN +m −ci
                                      X                  Z            N
                                                                                        0                    0
                                  −ε       λi lim inf              e                           ε                    dνi ⊗ ν ∗ .
                                                 N →∞      Xi ×Y
                                      i=1
                                                                                                                     (6.2.22)
     Now, we will consider each of the limits in (6.2.22) above.
     Note that by (6.2.16), we have supm supx∈Xi ϕ∗i,nN                        (x) < ∞. Hence, by Fatou’s Lemma
                                                                         0 +m
                                                            83


(lim sup version),
                            Z          N                           Z                        N
                                  1 X ∗                                               1 X ∗
                  lim sup                   ϕ            dνi ≤           lim sup               ϕi,nN +m dνi
                    N →∞          N m=1 i,nN0 +m                      Xi N →∞ N m=1
                                                                                                      0
                              Xi
                                                                   Z                                                        (6.2.23)
                                                              =          ϕ∗i dνi .
                                                                      Xi
                                   P                                                              
      Since ∀1 ≤ i ≤ p,                    N
                                                                                                       converges pointwise νi ⊗
                                     1             ∗                1
                                                                       PN           ∗
                                    N      m=1 ϕi,nN          +    N      m=1 ψi,nN           − ci
                                                        0 +m                            0 +m
ν ∗ -a.e. to (ϕ∗i + ψi∗ − ci ), by Fatou’s Lemma,
                                                                 1 PN        ∗            1 PN      ∗
               ϕ∗    ∗                                                 m=1 ϕi,nN +m + N       m=1 ψi,nN +m −ci
  Z                                                 Z            N
                i +ψi −ci                                                         0                      0
             e      ε
                                    ∗
                          dνi ⊗ ν ≤ lim inf                  e                           ε                      dνi ⊗ ν ∗ . (6.2.24)
     Xi ×Y                                N →∞        Xi ×Y
Thus, by combining (6.2.22), (6.2.23) and (6.2.24), we get
                p                              p                                p
                                                                                                  ϕ∗     ∗
                                                       Z                                Z
                                                                                                    i +ψi −ci
              X                               X                                X
       sup          λi D(ϕi , ψi ) − δ ≤           λi        ϕ∗i   dνi − ε           λi         e       ε     dνi ⊗ ν ∗
       ϕi ,ψi i=1                             i=1        Xi                    i=1        Xi ×Y
                                               p                                                                            (6.2.25)
                                              X
                                         =         λi D(ϕ∗i , ψi∗ ).
                                              i=1
      Since δ > 0 is arbitrary, letting δ → 0, we get that
                                                 p                            p
                                               X                            X
                                        sup          λi D(ϕi , ψi ) ≤             λi D(ϕ∗i , ψi∗ ).
                                        ϕi ,ψi i=1                          i=1
Also, since                           = 0, we have that
                   Pp          ∗
                       i=1 λi ψi,nm
                                            p           N                    p
                                         X         1 X ∗                  X
                                   lim         λi            ψi,nm =            λi ψi∗ = 0 ν ∗ -a.e.
                                  N →∞
                                         i=1
                                                   N m=1                   i=1
Thus we can conclude that {ϕ∗i , ψi∗ }pi=1 is a maximizer for (6.2.6).
Remark 6.2.6. Note that, since N0 depends on δ, ∀1 ≤ i ≤ p, the sets
                              Aϕi = xi ∈ Xi : {ϕ∗i,nN                  } does not converge to ϕ∗i
                                       
                                                                 0 +m
and
                              Aψi = y ∈ Y : {ψi,n          ∗
                                                                      } does not converge to ψi∗
                                       
                                                             N 0 +m
depend on the choice of δ. However, since we only consider integrals against νi ’s and ν ∗ ,
these sets do not affect our calculations.
                                                                      84


Remark 6.2.7. Also observe that, even though we have strong duality results for {(ϕi ∈
Cb (Xi , νi ), ψi ∈ Cb (Y, ν ∗ ))}pi=1 , we get existence for {(ϕi ∈ L1 (Xi , νi ), ψi ∈ L1 (Y, ν ∗ ))}pi=1 .
We may get a better regularity for ϕ∗i , ∀1 ≤ i ≤ p − 1, due to the uniform boundedness of the
ϕ∗i,n , ∀1 ≤ i ≤ p − 1 (see (6.2.9)).
                                                      85


                                    BIBLIOGRAPHY
 [1] M. Agueh and G. Carlier. “Barycenters in the Wasserstein space”. In: SIAM Journal
     on Mathematical Analysis 43.2 (2011), pp. 904–924.
 [2] L. Ambrosio, A. Bressan, D. Helbing, A. Klar, E. Zuazua, L. Ambrosio, and N. Gigli.
     “A user’s guide to optimal transport”. In: Modelling and Optimisation of Flows on
     Networks: Cetraro, Italy 2009, Editors: Benedetto Piccoli, Michel Rascle (2013), pp. 1–
     155.
 [3] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the
     space of probability measures. Springer Science & Business Media, 2005.
 [4] E. Anderes, S. Borgwardt, and J. Miller. “Discrete Wasserstein barycenters: Optimal
     transport for discrete data”. In: Mathematical Methods of Operations Research 84.2
     (2016), pp. 389–409.
 [5] M. Beiglböck and W. Schachermayer. “Duality for Borel measurable cost functions”.
     In: Transactions of the American Mathematical Society 363.8 (2011), pp. 4203–4224.
 [6] J.-D. Benamou and Y. Brenier. “A computational fluid mechanics solution to the
     Monge-Kantorovich mass transfer problem”. In: Numerische Mathematik 84.3 (2000),
     pp. 375–393.
 [7] J. Bigot, E. Cazelles, and N. Papadakis. “Penalization of barycenters in the Wasserstein
     space”. In: SIAM Journal on Mathematical Analysis 51.3 (2019), pp. 2261–2285.
 [8] S. Borgwardt and S. Patterson. “Improved linear programs for discrete barycenters”.
     In: Informs Journal on Optimization 2.1 (2020), pp. 14–33.
 [9] J. M. Borwein, A. S. Lewis, and R. D. Nussbaum. “Entropy minimization, DAD
     problems, and doubly stochastic kernels”. In: Journal of Functional Analysis 123.2
     (1994), pp. 264–307.
[10] Y. Brenier. “Décomposition polaire et réarrangement monotone des champs de vecteurs”.
     In: CR Acad. Sci. Paris Sér. I Math. 305 (1987), pp. 805–808.
[11] Y. Brenier. “Optimal transportation of particles, fluids and currents”. In: Variational
     methods for evolving objects. Vol. 67. Mathematical Society of Japan, 2015, pp. 59–86.
[12] G. Carlier, K. Eichinger, and A. Kroshnin. “Entropic-Wasserstein barycenters: PDE
     characterization, regularity, and CLT”. in: SIAM Journal on Mathematical Analysis
     53.5 (2021), pp. 5880–5914.
[13] G. Carlier and I. Ekeland. “Matching for teams”. In: Economic theory 42.2 (2010),
                                             86


     pp. 397–418.
[14] G. Carlier and M. Laborde. “A differential approach to the multi-marginal Schrödinger
     system”. In: SIAM Journal on Mathematical Analysis 52.1 (2020), pp. 709–717.
[15] I. Csiszár. “I-divergence geometry of probability distributions and minimization prob-
     lems”. In: The annals of probability (1975), pp. 146–158.
[16] M. Cuturi. “Sinkhorn distances: Lightspeed computation of optimal transport”. In:
     Advances in neural information processing systems 26 (2013).
[17] S. Di Marino and A. Gerolin. “Optimal transport losses and Sinkhorn algorithm with
     general convex regularization”. In: arXiv preprint arXiv:2007.00976 (2020).
[18] A. N. Doledenok. “On Kantorovich multimarginal optimal transportation problems
     with density constraints”. In: 2018.
[19] P. Dupuis and R. S. Ellis. A weak convergence approach to the theory of large devia-
     tions. John Wiley & Sons, 2011.
[20] I. Ekeland and R. Temam. Convex analysis and variational problems. SIAM, 1999.
[21] J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré. “Inter-
     polating between optimal transport and mmd using sinkhorn divergences”. In: The
     22nd International Conference on Artificial Intelligence and Statistics. PMLR. 2019,
     pp. 2681–2690.
[22] W. Gangbo and R. J. McCann. “Optimal maps in Monge’s mass transport problem”.
     In: Comptes Rendus de l’Academie des Sciences-Serie I-Mathematique 321.12 (1995),
     p. 1653.
[23] W. Gangbo and R. J. McCann. “The geometry of optimal transportation”. In: Acta
     Math. 177.2 (1996), pp. 113–161.
[24] W. Gangbo and A. Święch. “Optimal maps for the multidimensional Monge-Kantorovich
     problem”. In: Communications on Pure and Applied Mathematics: A Journal Issued
     by the Courant Institute of Mathematical Sciences 51.1 (1998), pp. 23–45.
[25] Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun. “Ota: Optimal transport assignment for
     object detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision
     and Pattern Recognition. 2021, pp. 303–312.
[26] T. Geiger, D. Wachsmuth, and G. Wachsmuth. “Optimal control of ODEs with state
     suprema”. In: Math. Control Relat. Fields 11.3 (2021), pp. 555–578.
                                             87


[27] A. Genevay, M. Cuturi, G. Peyré, and F. Bach. “Stochastic optimization for large-scale
     optimal transport”. In: Advances in neural information processing systems 29 (2016).
[28] A. Gerolin, A. Kausamo, and T. Rajala. “Multi-marginal entropy-transport with re-
     pulsive cost”. In: Calculus of Variations and Partial Differential Equations 59.3 (2020),
     p. 90.
[29] N. Gigli and L. Tamanini. “Second order differentiation formula on RCD (K, N )
     spaces”. In: Rendiconti Lincei 29.2 (2018), pp. 377–386.
[30] L. V. Kantorovich. “On the translocation of masses”. In: Dokl. Akad. Nauk. USSR
     (NS). vol. 37. 1942, pp. 199–201.
[31] J. Kitagawa and B. Pass. “The multi-marginal optimal partial transport problem”. In:
     Forum of Mathematics, Sigma. Vol. 3. Cambridge University Press. 2015.
[32] J. Komlós. “A generalization of a problem of Steinhaus”. In: Acta Mathematica
     Academiae Scientiarum Hungaricae 18.1-2 (1967), pp. 217–229.
[33] J. Korman and R. McCann. “Optimal transportation with capacity constraints”. In:
     Transactions of the American Mathematical Society 367.3 (2015), pp. 1501–1521.
[34] J. Korman, R. J. McCann, and C. Seis. “Dual potentials for capacity constrained
     optimal transport”. In: Calculus of Variations and Partial Differential Equations 54.1
     (2015), pp. 573–584.
[35] J. Korman, R. J. McCann, and C. Seis. “An elementary approach to linear program-
     ming duality with application to capacity constrained transport”. In: J. Convex Anal.
     22.3 (2015), pp. 797–808.
[36] H. Lavenant, S. Claici, E. Chien, and J. Solomon. “Dynamical optimal transport on
     discrete surfaces”. In: ACM Transactions on Graphics (TOG) 37.6 (2018), pp. 1–16.
[37] C. Léonard. “A survey of the Schrödinger problem and some of its connections with
     optimal transport”. In: Discrete Contin. Dyn. Syst. 34.4 (2014), pp. 1533–1574.
[38] L. Li, A. Genevay, M. Yurochkin, and J. M. Solomon. “Continuous regularized wasser-
     stein barycenters”. In: Advances in Neural Information Processing Systems 33 (2020),
     pp. 17755–17765.
[39] S. D. Marino and A. Gerolin. “An optimal transport approach for the Schrödinger
     bridge problem and convergence of Sinkhorn algorithm”. In: Journal of Scientific
     Computing 85.2 (2020), pp. 1–28.
[40] R. J. McCann. “A convexity principle for interacting gases”. In: Advances in mathe-
                                             88


     matics 128.1 (1997), pp. 153–179.
[41] G. Monge. “Mémoire sur la théorie des déblais et des remblais”. In: Mem. Math.
     Phys. Acad. Royale Sci. (1781), pp. 666–704.
[42] N. Papadakis. “Transport Optimal pour le Traitement d’Images”. In: 2015.
[43] B. Pass. “Multi-marginal optimal transport: theory and applications”. In: ESAIM:
     Mathematical Modelling and Numerical Analysis 49.6 (2015), pp. 1771–1790.
[44] B. Pass. “On the local structure of optimal measures in the multi-marginal optimal
     transportation problem”. In: Calculus of Variations and Partial Differential Equations
     43.3 (2012), pp. 529–536.
[45] G. Peyré, M. Cuturi, et al. “Computational optimal transport: With applications to
     data science”. In: Foundations and Trends® in Machine Learning 11.5-6 (2019),
     pp. 355–607.
[46] S. T. Rachev and L. Rüschendorf. Mass Transportation Problems: Volume I: Theory.
     Vol. 1. Springer Science & Business Media, 1998.
[47] R. T. Rockafellar. Convex analysis, 2. printing ed. 1972.
[48] H. L. Royden and P. Fitzpatrick. Real analysis. Vol. 32. Macmillan New York, 1988.
[49] L. Ruschendorf. “Convergence of the iterative proportional fitting procedure”. In: The
     Annals of Statistics (1995), pp. 1160–1174.
[50] T. Salimans, H. Zhang, A. Radford, and D. Metaxas. “Improving GANs using optimal
     transport”. In: arXiv preprint arXiv:1803.05573 (2018).
[51] F. Santambrogio. “Optimal transport for applied mathematicians”. In: Birkäuser, NY
     55.58-63 (2015), p. 94.
[52] D. Simon and A. Aberdam. “Barycenters of natural images constrained wasserstein
     barycenters for image morphing”. In: Proceedings of the IEEE/CVF Conference on
     Computer Vision and Pattern Recognition. 2020, pp. 7910–7919.
[53] C. Villani. Optimal transport: old and new. Vol. 338. Springer, 2009.
[54] C. Villani. Topics in optimal transportation. Vol. 58. American Mathematical Soc.,
     2021.
[55] P. Zhao and Z.-H. Zhou. “Label distribution learning by optimal transport”. In:
     Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. 1. 2018.
                                            89


[56] L. Zhu, Y. Yang, S. Haker, and A. Tannenbaum. “An image morphing technique based
     on optimal mass preserving mapping”. In: IEEE transactions on image processing 16.6
     (2007), pp. 1481–1495.
                                           90


                                                  APPENDIX
1    Alternative dual functional for CCOT Problem:
     Let X, Y ⊆ Rd . For given f ∈ L1 (X), g ∈ L1 (Y ) and 0 ≤ h̃ ∈ L∞ (X × Y ), consider
                                        OT∗CC :=          sup        J(u, v, w)                               (1.1)
                                                     (u,v,w)∈Lipc,h̃
where
                          Z                      Z                       Z
       J(u, v, w) :=          u(x)f (x) dx +          v(y)g(y) dy +               w(x, y)h̃(x, y) dxdy,       (1.2)
                            X                     Y                          X×Y
                    (
       Lipc,h̃ :=      (u, v, w) : u ∈ L1 (X, f dx), v ∈ L1 (Y, gdy), w ∈ L1 (X × Y, h̃dxdy),
                                                                                      )                       (1.3)
                        u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 ,
     and
                                             ′
                                        OT∗CC :=           sup          J ′ (u, v),                           (1.4)
                                                   u∈L1 (X),v∈L1 (Y   )
where                             Z              Z               Z
                      ′
                    J (u, v) :=        uf dx +        vg dy −             [−c + u + v]+ h̃ dxdy.              (1.5)
                                    X              Y               X×Y
                                           ′
Proposition 1.1. OT∗CC = OT∗CC .
                         n                                                                                     o
Proof. Define A = (u, v, w) : u ∈ L1 (X, f dx), v ∈ L1 (Y, gdy), and w = − [−c + u + v]+ .
     Then,
                                               ′
                                          OT∗CC := sup J(u, v, w).
                                                       (u,v,w)∈A
First, let (u, v, w) ∈ A be arbitrary. Then w ≤ 0, w = − [−c + u + v]+ ≤ − [−c + u + v],
which gives u + v + w ≤ c, and w ∈ L1 (X × Y, h̃dxdy). Hence, (u, v, w) ∈ Lipc,h̃ . Thus,
A ⊆ Lipc,h̃ . So, we have
                                    sup    J(u, v, w) ≤          sup         J(u, v, w),                      (1.6)
                                 (u,v,w)∈A                  (u,v,w)∈Lipc,h̃
i.e.
                                                        ′
                                                 OT∗CC ≤ OT∗CC .                                              (1.7)
Now observe that,
                                (Z                        Z                       Z                           )
     OT∗CC =         sup               u(x)f (x) dx +         v(y)g(y) dy +              w(x, y)h̃(x, y) dxdy
                (u,v,w)∈Lipc,h̃     X                       Y                       X×Y
                                                           91


                              (Z                      Z
           =       sup               u(x)f (x) dx +       v(y)g(y) dy
              (u,v,w)∈Lipc,h̃     X                     Y
                    Z                                                   )
                 +                                 w(x, y)h̃(x, y) dxdy
                      {(x,y):−c(x,y)+u(x)+v(y)≥0}
                    Z                                                   )
                 +                                 w(x, y)h̃(x, y) dxdy
                      {(x,y):−c(x,y)+u(x)+v(y)<0}
                                   (Z                       Z
           ≤           sup                u(x)f (x) dx +        v(y)g(y) dy
              (u,v)∈L1 (X)×L1 (Y )     X                     Y
                    Z                                                                 )
                 +                                [c(x, y) − u(x) − v(y)]h̃(x, y) dxdy .            (1.8)
                      {(x,y):−c(x,y)+u(x)+v(y)≥0}
In the last line above, we used the fact that w ≤ c − u − v, and w ≤ 0.
    On the other hand,
                               (Z                       Z                  Z                           )
     ′
OT∗CC =           sup                 u(x)f (x) dx +        v(y)g(y) dy −        [−c + u + v]+ h̃ dxdy
          (u,v)∈L1 (X)×L1 (Y )      X                     Y                 X×Y
                               (Z                       Z
       =          sup                 u(x)f (x) dx +        v(y)g(y) dy
          (u,v)∈L1 (X)×L1 (Y )      X                     Y
                Z
             −                                [−c + u + v]+ h̃(x, y) dxdy
                  {(x,y):−c(x,y)+u(x)+v(y)≥0}
                Z                                                           )
             −                                [−c + u + v]+ h̃(x, y) dxdy
                  {(x,y):−c(x,y)+u(x)+v(y)<0}
                               (Z                       Z
       =          sup                 u(x)f (x) dx +        v(y)g(y) dy
          (u,v)∈L1 (X)×L1 (Y )      X                     Y
                Z                                                          )
             −                                [−c + u + v] h̃(x, y) dxdy
                  {(x,y):−c(x,y)+u(x)+v(y)≥0}
                               (Z                       Z
       =          sup                 u(x)f (x) dx +        v(y)g(y) dy
          (u,v)∈L1 (X)×L1 (Y )      X                     Y
                Z                                                                   )
             +                                [c(x, y) − u(x) − v(y)] h̃(x, y) dxdy .               (1.9)
                  {(x,y):−c(x,y)+u(x)+v(y)≥0}
By combining (1.8) and (1.9), we get
                                                               ′
                                               OT∗CC ≤ OT∗CC .                                     (1.10)
The inequalities (1.7) and (1.10) conclude that
                                                               ′
                                               OT∗CC = OT∗CC .
                                                       92


2      The relaxed CCOT Problem is strictly convex.
      Given an integer p > 2, a lower semi-continuous function c : Rpd 7→ R, f ∈ L1 (Rd ) and
h ∈ L1 (Rd × Rd ), consider the functional
                                            Z                       p
                                                             1 X
                                Icε (h)  :=       ch dx̂ +             || ⟨h⟩xi − fi ||22 ,                     (2.1)
                                              Rpd           2ε i=1
where dx̂ = dx1 . . . dxp .
Proposition 2.1. Icε is strictly convex.
Proof. Let 0 ≤ h1 , h2 ≤ h̃ be such that h1 ̸= h2 and let 0 < λ < 1.
      Then, observe that
                                Z                                            p
                                                                        1 X
  Icε (λh1     + (1 − λ)h2 ) =          c(λh1 + (1 − λ)h2 )dx̂ +                || ⟨(λh1 + (1 − λ)h2 )⟩xi − fi ||22
                                  R  pd                                2ε   i=1
                                   Z                          Z
                             =λ           ch1 dx̂ + (1 − λ)           ch2 dx̂
                                      Rpd                        Rpd
                                            p
                                         1 X
                                    +          || ⟨(λh1 + (1 − λ)h2 )⟩xi − (λ + (1 − λ))fi ||22
                                        2ε i=1
                                   Z                          Z
                             =λ           ch1 dx̂ + (1 − λ)           ch2 dx̂
                                      Rpd                        Rpd
                                            p
                                         1 X
                                    +          ||λ(⟨h1 ⟩xi − fi ) + (1 − λ)(⟨h2 ⟩xi − fi )||22
                                        2ε i=1
                                   Z                          Z
                             <λ           ch1 dx̂ + (1 − λ)           ch2 dx̂
                                      Rpd                        Rpd
                                            p                                       p
                                        λ X                            (1 − λ) X
                                    +          || ⟨h1 ⟩xi − fi ||22 +                 || ⟨h2 ⟩xi − fi ||22
                                        2ε i=1                             2ε i=1
                             = λIcε (h1 ) + (1 − λ)Icε (h2 ).
Note that in the penultimate line, the strict inequality holds due to the strict convexity of
the L2 norm.
3 Uniform boundedness for probability measures with finite moment.
Proposition 3.1. Given capacities {γ̃i }pi=1 ⊆ M+ (Rd × Rd ), p probability measures
ν1 , . . . , νp in P2 (Rd ), and a sequence of probability measures {ν̃n }n∈N ∈ P2 (Rd ) such that
Πγ̃i (νi , ν̃n ) ̸= ∅, ∀1 ≤ i ≤ p, if there is an M > 0 such that W               f2 2 (νi , ν̃n ) ≤ M, ∀i and n, we
have                                Z
                              sup |x|2 dν̃n ≤ C,              for some constant C.
                               n
Proof. Fix i ∈ {1, . . . , p} and n ∈ N. Consider the two-marginal capacity constrained OT
problem between the measures νi and ν̃n with capacity γ̃i given by (5.2.3).
                                                          93


   By the duality,
                            (Z                      Z                      Z                               )
      2
   W
   f2 (νi , ν̃n ) = sup             u(x) dνi (x) +        v(y) dν̃n (y) +           w(x, y) dγ̃i (x, y) ,    (3.1)
                       Bi,n      Rd                   Rd                     Rd ×Rd
   where the supremum is taken over the set Bi,n of real-valued functions u, v, w satisfying
u ∈ L1 (νi ), v ∈ L1 (ν̃n ), w ∈ L1 (γ̃i ) and w ≤ 0 with u(x) + v(y) + w(x, y) ≤ 21 |x − y|2 .
   Choose u(x) = inf y∈Rd { 21 |x|2 + 14 |y|2 − x · y}, v(y) = 41 |y|2 , w(x, y) = 0.
   Then,
   u(x) + v(y) + w(x, y) ≤ 12 |x|2 + 14 |y|2 − x · y + 14 |y|2 = 12 |x − y|2 for all (x, y) ∈ Rd × Rd .
   By plugging this choice of u, v, w in (3.1), we get
            Z                                                     Z
                         n1           1 2          o                   1 2               f2 2 (νi , ν̃n ).
                   inf           2
                              |x| + |y| − x · y dνi (x) +               |y| dν̃n (y) ≤ W                     (3.2)
              Rd y∈R
                       d    2         4                            Rd 4
               n                            o
   Since ∇y 21 |x|2 + 14 |y|2 − x · y = 12 y − x, we have
             n                            o
   inf y∈Rd 2 |x| + 4 |y| − x · y = − 21 |x|2 .
               1      2     1    2
   Thus, (3.2) gives us,
                              Z                                       Z
                                   1 2                  2                 1 2
                                    |y| dν̃n (y) ≤ Wf2 (νi , ν̃n ) +        |x| dνi (x).
                               Rd 4                                    Rd 2
Since Wf2 2 (νi , ν̃n ) ≤ M and νi ∈ P2 (Rd ), there is a constant C > 0, independent from n such
that                                          Z
                                                  1 2
                                                    |y| dν̃n (y) ≤ C.
                                               Rd 4
4 An integral condition for tightness:
Proposition 4.1. ([3], Remark 5.1.5)          R Let      X ⊆ Rd and {νn }n∈N ∈ P2 (X) be a sequence of
probability measures. Then, if supn |x|2 dνn < +∞, then {νn }n∈N is tight.
Proof. Let δ > 0 be arbitrary     q and fix n ∈ N. Suppose supn |x| dνn < M . Let Rδ be a real
                                                                           R 2
number such that Rδ > Mδ . Define the compact set Kδ = {x ∈ Rd : |x| ≤ Rδ }. Then,
                                                               Z
                                                             1                    M
                                        d
                           νn ({x ∈ R : |x| > Rδ }) ≤ 2              |x|2 dνn ≤       < δ.                   (4.1)
                                                            Rδ    Rd              Rδ2
   This proves that {νn }n∈N is tight.
   Note that we get the first inequality in (4.1) by the Chebychev’s inequality.
                                                          94


5    The entropic dual functional is strictly concave.
     Given two Polish spaces X, Y , a bounded cost function c : X × Y 7→ R, two probability
measures µ ∈ P(X), ν ∈ P(ν), and two functions ϕ ∈ Lexp                         ε (X, dµ), ψ ∈ Lε (Y, dν), consider
                                                                                                        exp
the functional                             Z               Z              Z
                                                                                     (ϕ+ψ−c)
                           D(ϕ, ψ) :=           ϕdµ +         ψdν − ε               e ε dµ ⊗ ν.                           (5.1)
                                             X              Y               X×Y
Proposition 5.1. The functional D is strictly concave.
Proof. Let ϕ1 , ϕ2 ∈ Lexp      ε (X, dµ), ψ1 , ψ2 ∈ Lε (Y, dν) and 0 < λ < 1 be a real number. We
                                                              exp
need to show that
             D(λϕ1 + (1 − λ)ϕ2 , λψ1 + (1 − λ)ψ2 ) > λD(ϕ1 , ψ1 ) + (1 − λ)D(ϕ2 , ψ2 ).
i.e.
     Z                           Z                 Z                            Z
   λ    ϕ1 dµ + (1 − λ)              ϕ2 dµ + λ ψ1 dν + (1 − λ) ψ2 dν
      X                        Z  X                  Y                            Y
                                        (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c)
                        −ε            e                 ε                  dµ ⊗ ν
                                X×Y
                        Z                  Z                     Z
                                                                           (ϕ1 +ψ1 −c)
                 >λ          ϕ1 dµ + λ ψ1 dν − ελ                        e      ε       dµ ⊗ ν
                          X                  Y                     X×Y
                                      Z                            Z                           Z
                                                                                                          (ϕ2 +ψ2 −c)
                        + (1 − λ)          ϕ2 dµ + (1 − λ) ψ2 dν − ε(1 − λ)                             e      ε      dµ ⊗ ν.
                                        X                            Y                           X×Y
So, it is sufficient to show that
        Z
                  (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c)
   −ε          e                   ε                dµ ⊗ ν
         X×Y
                                              Z                                                Z
                                                         (ϕ1 +ψ1 −c)                                      (ϕ2 +ψ2 −c)
                                     > −ελ             e      ε       dµ ⊗ ν − ε(1 − λ)                 e      ε      dµ ⊗ ν.
                                                X×Y                                              X×Y
i.e.
   Z
            (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c)
          e                  ε                 dµ ⊗ ν
     X×Y
                                         Z                                                Z
                                                   (ϕ1 +ψ1 −c)                                     (ϕ2 +ψ2 −c)
                                     <λ          e       ε      dµ ⊗ ν + (1 − λ)                 e      ε       dµ ⊗ ν. (5.2)
                                           X×Y                                             X×Y
     First, we will prove that
                          (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c)          (ϕ1 +ψ1 −c)                (ϕ2 +ψ2 −c)
                        e                ε                   < λe       ε       + (1 − λ)e         ε      .               (5.3)
     Observe that
                         (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c)         (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −λc−(1−λ)c)
                       e                 ε                  =e                        ε
                                                                  (ϕ1 +ψ1 −c)          (ϕ +ψ −c)
                                                                              +(1−λ) 2 ε 2
                                                            = eλ       ε
                                                                95


                                                (ϕ1 +ψ1 −c)            (ϕ2 +ψ2 −c)
                                           < λe      ε      + (1 − λ)e      ε      .
Note that the last line above holds due to the strict convexity of the exponential function.
Now, by integrating both sides of (5.3) with respect to dµ⊗ν on X ×Y , we get the inequality
(5.2).
    This completes the proof.
                                              96