VARIANTS OF THE OPTIMAL TRANSPORT PROBLEM AND THEIR DUALITY By Chamila Malagoda Gamage A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematics—Doctor of Philosophy 2023 ABSTRACT The classical Optimal Transport (OT) problem studies how to transport one distribution to another in the most efficient way. In the past few decades it has emerged as a very powerful tool in various fields, such as optimization theory, probability theory, partial differential equations, machine learning and data analysis. In this thesis, we will discuss some existing variants of the classical optimal transport problem, such as the capacity constrained OT problem, multi-marginal OT problem, entropy-regularized OT problem and barycenters, and we will introduce a couple of new variants by combining the existing versions. We will also discuss their duality results and some characterizations. To my loving daughter Mishini Olivia... iii ACKNOWLEDGEMENTS The journey of my doctoral studies has been an arduous yet immensely rewarding experience. I have faced numerous challenges, encountered countless obstacles, and navigated uncharted territories. Today, I stand at the culmination of my efforts, filled with a deep sense of satisfaction and accomplishment. None of this would have been possible without the amazing people I have had around me. First and foremost, I am extremely grateful to my advisor, Dr. Jun Kitagawa, for his unwavering support, guidance, and encouragement throughout this challenging process. His knowledge, insights, and belief in my abilities have been invaluable, and I feel incredibly fortunate to have had the opportunity to learn from him. I would also like to express the deepest of gratitude towards the members of my dissertation committee: Dr. Baisheng Yan, Dr. Zhengfang Zhou, and Dr. Russell Schwab, for their guidance through different stages of my Ph.D. career. I also take this opportunity to thank the National Science Foundation (NSF) as I was partially supported through my advisor’s NSF grants DMS-1700094 and DMS-2000128 many times in the form of Research Assistantships, which were very helpful for me. I am deeply indebted to Dr. Dejan Slepčev, at Carnegie Mellon University, whose guid- ance, expertise, and invaluable insights have shaped my research and made this thesis pos- sible. Your time, patience, and fruitful discussions have not only expanded my knowledge but also helped me think deeper and approach problems from new perspectives. Your men- torship has been invaluable, and I am truly grateful for the profound impact you have had on my academic growth. Furthermore, I would like to acknowledge the contributions of the professors who have taught and mentored me throughout my academic journey. Special thanks to Dr. Archil Gulisashvili and Dr. Sergiu Aizicovici, at Ohio University: Your dedication to teaching and commitment to excellence have provided me with a solid foundation in Analysis and PDEs. I am grateful for the knowledge and skills you imparted, which have been crucial to my iv research. I would also like to extend a special thanks to my teaching mentors, Andrew Krause, and Tsveta Sendova. Your guidance and support in developing my teaching skills have been instrumental in my growth as an educator. The knowledge and experiences you shared have not only enriched my teaching abilities but have also enhanced my research journey. To my dear friends, thank you for your unwavering support during the ups and downs of this Ph.D. journey. While I cannot mention everyone individually, I would like to express my special thanks to Dimitris Vardakis, Estefania Garcia, Mihalis Paparizos, Ana-Maria Raicu, Arman Travakoli, Joe Melby, Craig Gross, Keshav Sutrave, Rachel Domagalski, Franciska Domokos, Kai Huang, Rui Wang, Danika Van Niel, Samara Chamoun, Reshma Menon, Hitesh Gakhar, Leonardo Abbrescia and Kasun Fernando. Your friendship and support have been a constant source of motivation and encouragement. Your willingness to listen, provide feedback, and engage in intellectual discussions has enriched my research and per- sonal growth. I am grateful for the countless hours spent together, both academically and socially, and for the memories we have created. I could not have undertaken this journey without the immense support of my family, who have been my rock throughout this journey. First of all, I would like to thank my loving husband, Lubashan Pathirana, for his continuous support throughout this journey. His love, encouragement, and understanding have been my anchor during the most difficult moments. He has believed in me even when I doubted myself, and his constant presence has given me the strength to persevere. I would like to say a special thank you to my daughter, Mishini Olivia. She came into my life during this doctoral journey, and her presence has given me a reason to try harder, to push through the most difficult moments, and to keep going even when things seemed overwhelming. I would also like to thank my parents, my parents-in- law, and my sister, for their love, encouragement, and constant belief in my abilities. Your support has been the cornerstone of my achievements, and I am forever grateful for your sacrifices and understanding. v Finally, I would like to express my sincere appreciation to all the individuals who have contributed to this thesis, directly or indirectly. Your support, advice, and encouragement have played a significant role in shaping my research and shaping me as a researcher. To all those who have believed in me and supported me throughout this endeavor, I offer my deepest gratitude. Your contributions have been invaluable, and I am humbled by your presence in my life. vi TABLE OF CONTENTS INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 1 PRELIMINARIES . . . . . . . . . . . . . . . . . . . . . . . . . . 3 CHAPTER 2 THE CLASSICAL OPTIMAL TRANSPORT PROBLEM . . . . . 7 2.1 The Primal Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Properties of the optimizers . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 3 CAPACITY CONSTRAINED OT PROBLEM . . . . . . . . . . . 14 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 The Primal Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Duality - Version I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Duality - Version II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 A further characterization on the optimizers . . . . . . . . . . . . . . . . 19 CHAPTER 4 MULTI-MARGINAL OT PROBLEM . . . . . . . . . . . . . . . . 23 4.1 The Multi-Marginal OT (MMOT) Problem . . . . . . . . . . . . . . . . . 23 4.2 Capacity Constrained Multi-Marginal OT Problem . . . . . . . . . . . . 26 CHAPTER 5 BARYCENTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Barycenters in the Wasserstein Space . . . . . . . . . . . . . . . . . . . . 50 5.2 Capacity Constrained Barycenter Problem . . . . . . . . . . . . . . . . . 54 CHAPTER 6 ENTROPY REGULARIZATION . . . . . . . . . . . . . . . . . . 66 6.1 Entropy Regularized Optimal Transport (EROT) Problem . . . . . . . . 66 6.2 Entropy Regularized Barycenter Problem . . . . . . . . . . . . . . . . . . 71 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 vii INTRODUCTION The origin of the Optimal Transport (OT) Problem goes back to the year of 1781, where the French Mathematician Gaspard Monge introduced a problem in his classic paper Memoire sur la theorie des deblais et des remblais [41], which is about finding the most efficient way of moving dirt from one place to another which was inspired due to military and economic purposes. However, this problem remained unsolved for almost two centuries, until the Russian Mathematician and economist Leonid Kantorovich’s involvement in this problem ([30]) who made some progress with the invention of Linear Programming. To get an insight of this problem, we will consider an example in the discrete setting. Suppose there is a large number of iron mines and the iron has to be transported to the refining factories. The problem is to find where each unit of iron should be transported so that the total transportation cost is minimized. Such an assignment from an initial position to its final position, is known as an “optimal transport plan”. Over the past few decades, the theory of OT has gained a lot of attention and it has been applied in various fields such as optimization theory, probability theory partial differential equations, machine learning, etc. In 1987, in [10] Yann Brenier showed that under certain conditions, there exists a unique transport plan that minimizes the cost associated to the Euclidian distance squared. In 1995, Wilfrid Gangbo and Robert McCann generalized this result for cost functions which are strictly convex or concave ([22, 23]). In [6], Benamou and Brenier presents a dynamical formulation of the OT problems which connects the OT theory to many other fields such as fluid mechanics ([11]), image processing ([42]), data analysis ([36]), etc. The field of Computational OT is another rapidly growing area as it serves as a powerful tool to compare probability distributions. Object recognition ([25]), label classification ([55]), and generative modelling ([50]) are few among many sub-fields in machine learning that widely apply OT tools. Viewing the OT problem as a linear programming problem enables us to construct dual- ity theory for the OT problem. It plays a significant role in understanding and solving the 1 OT problem. Furthermore, it helps to characterize optimal solutions which is often chal- lenging to do without duality theory. For instance, in computation OT theory, the duality theory enables to use efficient computational algorithms, such as Sinkhorn algorithm ([45]) to approximate the solutions to the OT problem. In this thesis, we will discuss a few variants of the classical OT problem, namely, the ca- pacity constrained OT problem, multi-marginal OT problem, entropy-regularized OT prob- lem and barycenters, and their duality theory. We will also discuss a couple of new variants by combining already existing versions, such as the capacity constrained multi-marginal OT (CCMMOT) problem and capacity constrained barycenters. By combining the two notions of the capacity constrained OT problem and the barycenter problem, we will introduce ca- pacity constrained barycenters in Wasserstein space. Under certain assumptions, we will prove that the problem attains a minimizer and present some duality results. The notion of the CCMMOT problem already exists in the literature ([18]); however, a dual formulation for this problem does not exist. We will present a dual formulation for this problem and prove the strong duality result and the existence of dual maximizers. The entropy-regularized ver- sion of Wasserstein barycenters and their dual formulation also exist in the literature ([38]). The authors have proven that the strong duality holds and the existence of the primal prob- lem via duality result. In this thesis, we will provide a direct proof for the existence of a minimizer for the primal problem and the existence of dual maximizers. 2 CHAPTER 1 PRELIMINARIES Some standard symbols, definitions, and theorems used in this thesis are given below. Notation Let X be a Polish space (see Definition 1.0.1). • B(X): The sigma algebra of Borel sets of X. • P(X): The space of Borel probability measures on X. • P2 (X): The space of Borel probability measures with finite second moment. • M(X): The space of finite Borel measures. • M+ (X): The space of positive, finite Borel measures. • C(X) : Continuous functions on X. • Cb (X): Bounded, continuous functions on X. • L1 (Rd ): Functions integrable w.r.t. Lebesgue measure on Rd . • L1 (X, dµ): Functions integrable w.r.t. measure µ on X. • L0 (X, dµ): Measurable functions on the space (X, µ). • [f ]+ : Positive part of the function f (see Definition 1.0.16). • [f ]− : Negative part of the function f (see Definition 1.0.17). Definitions Definition 1.0.1. (Polish Spaces) A Polish space is a separable completely metrizable topo- logical space. 3 Definition 1.0.2. (Push forward of a measure) Given two Polish spaces X, Y , a Borel map T : X 7→ Y , and a probability measure µ ∈ P(X), we define the push forward of µ through T , denoted by T# µ ∈ P(Y ), as T# µ(E) = µ(T −1 (E)), ∀E ⊂ Y, Borel. Definition 1.0.3. (Convex set) A subset C of a vector space V is convex if (1−λ)x+λy ∈ C whenever x, y ∈ C, and 0 ≤ λ ≤ 1. Definition 1.0.4. (Weak Convergence) A sequence {µn }n∈N ⊆ P(X) converges weakly to µ ∈ P(X), if for all f ∈ Cb (X), Z Z lim f dµn = f dµ. n→∞ X X Definition 1.0.5. (Tightness) A set A ⊆ P(X) is tight, if ∀ε > 0, there exists a compact set Kε ⊂ X such that µ(X \ Kε ) < ε, ∀µ ∈ A. Definition 1.0.6. (Lower semi-continuity) Let (X, d) be a metric space. A function f : X 7→ R ∪ {+∞} is lower semi-continuous, if for every sequence xn such that xn → x, we have f (x) ≤ lim inf f (xn ). n→∞ Definition 1.0.7. (Support of a measure) Let X be a separable metric space. We define the support of a measure γ, denoted by spt(γ), as the smallest closed set on which γ is concentrated. \ spt(γ) := E. {E:E is closed and γ(X\E)=0} Definition 1.0.8. (Finite pth moment) A measure µ ∈ P(Rd ) has finite pth moment, if Z |x|p dµ(x) < +∞. Rd 4 Definition 1.0.9. (Vanishing measures on small sets) A probability measure µ ∈ P(Rd ) is said to vanish on small sets if and only if µ(E) = 0, ∀E ⊂ B(Rd ) having Hausdorff dimension d − 1 or less. Definition 1.0.10. (ω-continuity) A function f : X 7→ R is said to be ω-continuous, if there exists a function ω : [0, ∞] 7→ [0, ∞] such that limt→0 ω(t) = ω(0) = 0 and |f (x) − f (y)| ≤ ω(|x − y|), ∀x, y ∈ X. Definition 1.0.11. (K-Convexity along curves) Given a metric space (X, d), a functional ϕ : X 7→ (−∞, ∞] is called K-convex on a curve γ : t ∈ [0, 1] 7→ γt ∈ X, for some K ∈ R, if 1 ϕ(γt ) ≤ (1 − t)ϕ(γ0 ) + tϕ(γ1 ) − Kt(1 − t)d2 (γ0 , γ1 ), ∀t ∈ [0, 1]. 2 Definition 1.0.12. (Proper Convex function) A convex function f : X 7→ [−∞, ∞] is called proper, if f (x) < ∞ for at least one x ∈ X and f (x) > −∞ for all x ∈ X. Definition 1.0.13. (Infimal Convolution) Given two proper convex functions f, g on Rd , we define their infimal convolution, denoted by f □g, as (f □g)(x) = inf {f (x − y) + g(y)}, ∀x ∈ Rd . y Definition 1.0.14. (L-Lipschitzness) Given two metric spaces (X, dX ) and (Y, dY ), a func- tion f : X 7→ Y is said to be L-Lipschitz, if there is a real constant L ≥ 0 such that, for all x1 , x2 ∈ X, dY (f (x1 ), f (x2 )) ≤ LdX (x1 , x2 ). Definition 1.0.15. (Legendre-Fenchel Transform) Let E be a normed vector space, and φ a convex function on E with values in R ∪ {∞}. Then the Legendre-Fenchel transform of φ is the function φ∗ , defined on the dual space E ∗ by the formula φ∗ (z ∗ ) = sup{z ∗ · z − φ(z)}. z∈E 5 Definition 1.0.16. (Positive part) Given a function f : R 7→ R, we define its positive part, denoted by [f ]+ , as [f (x)]+ = max{f (x), 0}. Definition 1.0.17. (Negative part) Given a function f : R 7→ R, we define its negative part, denoted by [f ]− , as [f (x)]− = − min{f (x), 0}. Theorems Theorem 1.0.18. (Prokhorov) Let (X, d) be a Polish space. Then a family A ⊂ P(X) is relatively compact w.r.t. the weak topology if and only if it is tight. Theorem 1.0.19. (Fatou’s Lemma) Let fn : X 7→ [0, ∞] be measurable, for each n ∈ N. Then Z Z lim inf fn dµ ≤ lim inf fn dµ. X n→∞ n→∞ X Theorem 1.0.20. (Monotone Convergence theorem) Let {fn } be a sequence of measurable functions on X such that (i) 0 ≤ fk (x) ≤ fk+1 (x) ≤ ∞, for all k ∈ N and all x ∈ X, (ii) limn→∞ fn (x) = f (x), for all x ∈ X. Then, Z Z lim fn dµ = f dµ. n→∞ X X 6 CHAPTER 2 THE CLASSICAL OPTIMAL TRANSPORT PROBLEM 2.1 The Primal Problem Let X and Y be two Polish Spaces, µ ∈ P(X) and ν ∈ P(Y ) be two Borel probability measures, and c : X × Y 7→ R ∪ {∞} be a Borel measurable cost function. The Monge Problem is the following: Problem 2.1.1. Find a Borel map T : X 7→ Y , that minimizes the cost Z M (S) := c(x, S(x)) dµ(x) (2.1.1) X among all Borel maps S : X 7→ Y such that S# µ = ν. Such maps are called transport maps from µ to ν. Maps that minimize the cost M (S) are called optimal transport maps. The push forward condition S# µ = ν can be characterized by Z Z Z f (y)dν(y) = f (y) dS# µ(y) = f ◦ S(x) dµ(x), ∀f ∈ L1 (Y, dν). (2.1.2) Y Y X There are few major drawbacks in the Monge formulation. For example: • The constraint set could be empty. Eg: For µ = δ0 and ν = 21 δ1 + 12 δ−1 , the condition (2.1.2) cannot hold for any S : X 7→ Y that is µ-a.e. single-valued. • The cost M (S) could be non-linear in S (depending on c), hence could be difficult to solve. • The constraint S# µ = ν may not be closed under weak convergence in general. Eg: ([2], Chapter 1) Let µ = L|[0,1] and ν = 12 δ1 + 12 δ−1 . Consider the sequence of functions given by Sn (x) := S(nx) where S : R 7→ R is a 1-periodic function defined 7 by  on [0, 1/2)  1  S(x) = (2.1.3)  −1 on [1/2, 1)   Then, (Sn )# µ = ν, ∀n ∈ N, but (Sn ) weakly converges to S = 0 in Lp , ∀1 ≤ p < ∞, so that S# µ = δ0 ̸= ν. Due to these issues, we consider a relaxation of the Monge Problem, which is known as the Kantorovich Problem. Definition 2.1.2. For given two probability measures µ ∈ P(X) and ν ∈ P(Y ), we define the set of all transport plans from µ to ν by Π(µ, ν) := {γ ∈ P(X × Y ) : Projx (x, y)# γ = µ, Projy (x, y)# γ = ν}. (2.1.4) The conditions on γ above, are known as the marginal conditions and they can also be defined as γ(A × Y ) = µ(A), ∀A ∈ B(X), and γ(X × B) = ν(B), ∀B ∈ B(Y ). Then, the Kantorovich Problem is defined as below: Problem 2.1.3. Find a γ0 ∈ Π(µ, ν) that minimizes the cost Z K(γ) = c(x, y) dγ(x, y) (2.1.5) X×Y among all transport plans γ ∈ Π(µ, ν). Transport plans that minimize the cost K(γ) are called optimal transport plans. When compared to the Monge formulation, there are many advantages in the Kantorovich formulation, such as: • The set Π(µ, ν) is always non-empty as it contains µ ⊗ ν. • The cost K(γ) is linear in γ (regardless of c), hence much easier to solve. 8 • The set Π(µ, ν) is a convex set. We can easily see that for any γ1 , γ2 ∈ Π(µ, ν) and 0 ≤ λ ≤ 1, λγ1 +(1−λ)γ2 ∈ Π(µ, ν). • If T# µ = ν, then γ := (Id ×T )# µ ∈ Π(µ, ν), hence the set of transport plans contains all transport maps. Now, we will discuss the existence of a minimizer for the Kantorovich problem. Theorem 2.1.4. ([51], Theorem 1.7) Let X, Y be two Polish spaces and µ ∈ P(X) and ν ∈ P(Y ). If c : X × Y 7→ [0, ∞] is lower semi-continuous, then the Kantorovich problem (2.1.3) has a minimizer. The proof is based on the tightness of the set Π(µ, ν) and the Prokhorov theorem. From here onwards, we will call the Kantorovich problem, the classical Optimal Transport (OT) problem and we will denote it by (Z ) OTc := inf c(x, y) dγ(x, y) . (2.1.6) γ∈Π(µ,ν) X×Y 2.2 Duality Let X, Y be two compact Polish spaces, µ ∈ P(X) and ν ∈ P(Y ) and c : X × Y 7→ [0, ∞) be continuous. We will define the dual formulation of the OT problem as the following maximization problem: (Z Z ) OTˆ ∗ := sup ϕ(x) dµ(x) + ψ(y) dν(y) (2.2.1) c (ϕ,ψ)∈Φc X Y where Φc := {(ϕ, ψ) ∈ Cb (X) × Cb (Y ) : ϕ(x) + ψ(y) ≤ c(x, y)}. (2.2.2) Due to the lack of compactness of the above class of admissible functions, we will consider an alternative dual formulation. 9 Definition 2.2.1. Given a function f : X 7→ R ∪ {±∞}, we define its c-transform, f c : Y 7→ R ∪ {±∞} by f c (y) := inf {c(x, y) − f (x)}. (2.2.3) x∈X ∗ Similarly, given a function g : Y 7→ R ∪ {±∞}, we define its c∗ -transform, g c : X 7→ R ∪ {±∞} by ∗ g c (x) := inf {c(x, y) − g(y)}. (2.2.4) y∈Y Definition 2.2.2. A function f : X 7→ R ∪ {±∞} is called c-concave, if there exists a ∗ function g : Y 7→ R ∪ {±∞} such that f = g c . A function g : Y 7→ R ∪ {±∞} is called c∗ -concave, if there exists a function f : X 7→ R ∪ {±∞} such that g = f c . We will denote the set of c-concave functions on X by c-conc(X) and the set of c∗ -concave functions on Y by c∗ -conc(Y ). Observe that, given an admissible pair (ϕ, ψ) in OT ˆ ∗ , if we replace (ϕ, ψ) by (ϕ, ϕc ) and ∗ then again by (ϕcc , ϕc ), the value will be increased while satisfying the constraints ([51], Definition 1.10). Hence, we consider the following dual formulation. (Z Z ) OT∗c := sup ϕ(x) dµ(x) + ϕc (y) dν(y) . (2.2.5) ϕ∈c-conc(X) X Y Functions that maximize OT∗c are called Kantorovich potentials. Now, we will present the existence of dual maximizers and strong duality results. Theorem 2.2.3. ([51], Proposition 1.11) Let X, Y be compact subsets of Rd and c be a con- tinuous function. Then OT∗c has a solution (ϕ, ψ) such that ϕ ∈ c-conc(X), ψ ∈ c∗ -conc(Y ) and ψ = ϕc . In the proof, one starts with a maximizing sequence (ϕn , ψn ) and take the c-transforms so that it improves the dual formulation. This transformation will make (ϕn , ψn ) equi- continuous and equi-bounded, so that one can apply the Arzelà-Ascoli theorem to get the existence result. 10 Theorem 2.2.4. ([51], Theorem 1.39) Let X, Y be two Polish spaces and suppose that c : X × Y 7→ R is uniformly continuous and bounded. Then, OTc = OT∗c . The proof uses the concept of c-cyclical monotonicity, which we will discuss next. 2.3 Properties of the optimizers Definition 2.3.1. Given a function c : X × Y 7→ R ∪ {∞}, we say a set Γ ⊂ X × Y is c-cyclically monotone, if for any positive integer p, any permutation σ of {1, . . . , p}, and any finite family of points (x1 , y1 ), . . . , (xp , yp ) ∈ Γ, we have p p X X c(xi , yi ) ≤ c(xi , yσ(i) ). i=1 i=1 Theorem 2.3.2. ([51], Theorem 1.38) Let γ be an optimal transport plan for OTc and let c be a continuous function. Then, spt(γ) is a c-cyclically monotone set. Now, we will present a more general theorem. Theorem 2.3.3. ([53], Theorem 5.9) Let X, Y be two Polish spaces and µ ∈ P(X) and ν ∈ P(Y ). Suppose that c : X × Y 7→ R ∪ {∞} is a lower semi-continuous function, such that there exist some real-valued, upper semi-continuous functions u ∈ L1 (X, dµ), v ∈ L1 (Y, dν) satisfying c(x, y) ≥ u(x) + v(y), ∀(x, y) ∈ X × Y. Then, 1. Duality holds: (Z Z ) OTc = sup ϕ(x) dµ(x) + ψ(y) dν(y) (ϕ,ψ)∈Cb (X)×Cb (Y ) X Y ϕ+ψ≤c (Z Z ) = sup ϕ(x) dµ(x) + ψ(y) dν(y) (ϕ,ψ)∈L1 (X,dµ)×L1 (Y,dν) X Y ϕ+ψ≤c (Z Z ) = sup ϕ(x) dµ(x) + ϕc (y) dν(y) ϕ∈L1 (X,dµ) X Y 11 (Z Z ) c∗ = sup ψ (x) dµ(x) + ψ(y) dν(y) . ψ∈L1 (Y,dν) X Y Note that, in the above suprema, we may take ϕ ∈ c-conc(X) and ψ ∈ c∗ -conc(Y ). 2. Suppose c is real-valued and the cost OTc is finite. Then there is a measurable c- cyclically monotone set Γ ⊂ X × Y , such that for any γ ∈ Π(µ, ν), the following statements are equivalent. a) γ is optimal; b) γ is c-cyclically monotone; c) There exists a c-concave function ϕ : X 7→ R ∪ {−∞} such that ϕ(x) + ϕc (y) = c(x, y), γ-a.e.; d) There exist functions ϕ : X 7→ R ∪ {∞} and ψ : Y 7→ R ∪ {∞}, such that ϕ(x) + ϕc (y) ≤ c(x, y), ∀(x, y), with equality γ-a.e.; e) γ is concentrated on Γ. Now, we will present a uniqueness result for the optimal transport plans. This is known as the Brenier-McCann’s Theorem. Theorem 2.3.4. ([54], Theorem 2.12) Let c(x, y) = |x − y|2 and µ ∈ P2 (X), ν ∈ P2 (Y ). Suppose that µ vanishes on small sets. Then, 1. There exists a unique optimal transport plan, given by γ = (Id ×∇u)# µ, where ∇u is uniquely determined µ-a.e. such that u is convex and ∇u# µ = ν. Furthermore, spt(ν) = ∇u(spt(µ)). 2. ∇u is the unique solution to the Monge problem given by (2.1.1). 12 3. If ν also vanishes on smalls sets, then ∇u∗ is a ν-a.e. unique solution to the Monge problem from ν to µ, such that u∗ is convex and ∇u(∇u∗ (y)) = y ν-a.e. y and ∇u∗ (∇u(x)) = x µ-a.e. x. Finally, we will briefly discuss the notion of the Wasserstein distance. Definition 2.3.5. Let (X, d) be a Polish metric space and µ, ν ∈ P(X). For a given p ∈ [1, ∞), we define the Wasserstein distance of order p between µ and ν by  Z 1/p p Wp (µ, ν) := inf d(x, y) dγ(x, y) . γ∈Π(µ,ν) X×X Proposition 2.3.6. ([3], Chapter 7.1) Wp defines a distance on Pp (X). Proposition 2.3.7. ([53], Corollary 6.9) Wp is lower semi-continuous w.r.t. weak conver- gence of measures. 13 CHAPTER 3 CAPACITY CONSTRAINED OT PROBLEM 3.1 Introduction In the capacity constrained OT problem, we impose capacity constraints which limit the amount transported between any given source and corresponding target. As an example in the discrete case, we can consider a large number of coal mines from which coal has to be transported to refining factories and the problem is to find where each unit of coal should go with a minimum transportation cost. In the unconstrained OT problem, we assume that any amount of coal can be transported, whereas in the capacity constrained case, there is a limit to the amount of coal that can be transported from one mine to the corresponding factory. Formally, given two probability measures µ, ν ∈ P(Rd ), that represent the distributions in source and target, respectively, and a finite Borel measure γ̃ on Rd × Rd , that represents the capacity constraint for the transport plans, we minimize the cost: (Z ) inf c(x, y) dγ(x, y) (3.1.1) γ∈Πγ̃ (µ,ν) Rd ×Rd Here, the set Πγ̃ (µ, ν) represents the set of transport plans from µ to ν bounded by γ̃. In [46], Rachev and Rüschendorf introduced this problem on compact spaces where they study bounded below, Borel measurable and lower semicontinuous cost functions and ob- tained a dual formulation of the minimization problem. Recently, in a series of papers by Korman, McCann and Seis, [33, 35, 34], the authors have considered this problem for con- tinuous, bounded cost functions on Rd × Rd and for finite, bounded capacity constraints. There, they have obtained the equivalence between the primal problem and a dual problem with the existence of minimizers of the capacity constrained OT problem and existence of dual maximizers. However, unlike in the classical case, any further information about dual maximizers such as regularity or inheriting properties from the cost function is still unknown. In this chapter, we will present the existing results regarding this Capacity Constrained 14 OT problem and provide some characterization of the optimizers of the primal and dual problems. 3.2 The Primal Problem Similar to the work in [33], we will use functions to represent mass densities. Let f and g be two non-negative, compactly supported density functions in L1 (Rd ) with equal total masses, i.e. Rd f (x) dx = Rd g(y) dy, representing the source and the target R R densities. Let c be a Borel measurable function on Rd ×Rd representing the cost function. Let h̃ ∈ L∞ (Rd × Rd ) be a compactly supported function representing the capacity constraint. Denote by Πh̃ (f, g), the set of all joint densities h ∈ L1 (Rd × Rd ) with marginals f and g, and bounded by h̃, i.e. Z Z f (x) = h(x, y) dy, g(y) = h(x, y) dx, and 0 ≤ h ≤ h̃. Rd Rd We define the two-marginal Capacity Constrained Optimal Transport (CCOT) problem be- tween f and g under the capacity h̃ as the following minimization problem: OTCC := inf Ic (h) (3.2.1) h∈Πh̃ (f,g) where Z Ic (h) := c(x, y)h(x, y) dxdy. Rd ×Rd Unlike in the unconstrained problem, it is not always guaranteed that the set Πh̃ (f, g) is non-empty. The necessary and sufficient conditions for Πh̃ (f, g) to be non-empty are as follows: Proposition 3.2.1. ([46], Corollary 4.6.15) The set Πh̃ (f, g) ̸= ∅ if and only if Z Z Z f (x) dx + g(y) dy ≤ 1 + h̃(x, y) dxdy, A B A×B for any Borel measurable sets A, B ⊂ Rd . The following theorem states that OTCC has a minimizer. 15 Theorem 3.2.2. ([33], Theorem 3.1) Let c be a bounded, continuous function on Rd × Rd and 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈ L1 (Rd ) compactly supported functions such that the set Πh̃ (f, g) is non-empty. Then, OTCC has a minimizer in Πh̃ (f, g). In order to get uniqueness of the minimizers, we require the cost c to satisfy three con- ditions. (C1) c(x, y) is bounded, (C2) there is a Lebesgue negligible closed set Z ⊂ Rd ×Rd such that c(x, y) ∈ C 2 (Rd ×Rd \Z) and, (C3) c(x, y) is non-degenerate: i.e. det∇2xy c(x, y) ̸= 0 for all (x, y) ∈ Rd × Rd \ Z. Theorem 3.2.3. ([33], Theorem 8.1) Suppose that the cost c(x, y) satisfies the conditions (C1), (C2), and (C3). Let 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈ L1 (Rd ) compactly supported functions such that the set Πh̃ (f, g) is non-empty. Then, OTCC has a unique minimizer. Now, we will give a characterization of the minimizers of OTCC . Definition 3.2.4. Let h̃ ∈ L∞ (Rd × Rd ). A density h ∈ Πh̃ (f, g) is called geometrically extreme, if h(x, y) = 1W (x, y)h̃(x, y) for almost all (x, y) ∈ Rd × Rd , for some Lebesgue measurable set W ⊂ Rd × Rd . Theorem 3.2.5. ([33], Theorem 7.2) Suppose that the cost c(x, y) satisfies the conditions (C1), (C2), and (C3). Let 0 ≤ h̃ ∈ L∞ (Rd × Rd ) be compactly supported. Take 0 ≤ f, g ∈ L1 (Rd ) compactly supported functions such that the set Πh̃ (f, g) is non-empty. If h ∈ Πh̃ (f, g) minimizes OTCC , then h is geometrically extreme. 16 3.3 Duality - Version I For given f, g ∈ L1 (Rd ) and 0 ≤ h̃ ∈ L∞ (Rd ×Rd ), we consider the following maximization problem: OT∗CC := sup J(u, v, w) (3.3.1) (u,v,w)∈Lipc,h̃ where Z Z Z J(u, v, w) := u(x)f (x) dx + v(y)g(y) dy + w(x, y)h̃(x, y) dxdy, (3.3.2) Rd Rd Rd ×Rd and ( Lipc,h̃ := (u, v, w) : u ∈ L1 (Rd , f dx), v ∈ L1 (Rd , gdy), w ∈ L1 (Rd × Rd , h̃dxdy), ) (3.3.3) u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 . Now, we will present the strong duality result for the CCOT problem. Theorem 3.3.1. ([35], Theorem 1) Let f, g ∈ L1 (Rd ) be two probability densities such that Πh̃ (f, g) is non-empty and 0 ≤ h̃ ∈ L∞ (Rd ×Rd ) be compactly supported. Let c ∈ L1 (Rd ×Rd ). Then, OTCC = OT∗CC . Remark 3.3.2. In [35], the authors use an infinite dimensional linear programming duality with a quadratic penalization to get this result. In [34], the same authors prove the existence of dual maximizers for OT∗CC . Let X and Y be two compact subsets of Rd with unit volumes, f and g be probability densities on X and Y , respectively, and 0 ≤ h̃ ∈ L∞ (X × Y ). Instead of the dual functional (3.3.2), we consider the functional Z Z Z ′ J (u, v) := uf dx + vg dy − [−c + u + v]+ h̃ dxdy. (3.3.4) X Y X×Y and define ′ OT∗CC := sup J ′ (u, v). (3.3.5) u∈L1 (X,f dx),v∈L1 (Y,gdy) 17 ′ Note that OT∗CC = OT∗CC (see Appendix 1). The existence result of dual maximizers is given below: Theorem 3.3.3. ([34], Theorem 4.2) Let f, g and h̃ be continuous and strictly positive on their compact supports X, Y , and X × Y , respectively. Let c ∈ L1 (X × Y ). Fix an η > 1 and assume that Πh̃/η (f, g) is non-empty. Then, there exist functions (u, v) ∈ L1 (X, f dx) × L1 (Y, gdy), such that ′ OT∗CC = J ′ (u, v). The authors also provide a characterization of the optimizers of the primal and the dual problems as follows: Corollary 3.3.4. ([34], Corollary 1.1) Under the assumptions of Theorem 3.3.3, any h ∈ Πh̃ (f, g) is optimal if and only if there exist functions (u, v) ∈ L1 (X, f dx) × L1 (Y, gdy), such that  ≥ 0 where h = 0,        c−u−v = 0 where 0 < h < h̃, (3.3.6)      ≤ 0 where h = h̃.   3.4 Duality - Version II In [46], the authors consider the CCOT problem in a different setting. Let X, Y be two compact subsets of Rd and let c : X ×Y 7→ R∪{∞} be Borel measurable and bounded below. Let µ ∈ P(X), ν ∈ P(Y ) and γ̃ be a finite Borel measure on X × Y . Then, the CCOT is defined as the following minimization problem: (Z ) OTCC := inf c(x, y) dγ(x, y) , (3.4.1) γ∈Πγ̃ (µ,ν) X×Y where Πγ̃ (µ, ν) := {γ ∈ Π(µ, ν) : γ(A × B) ≤ γ̃(A × B), ∀(A, B) ∈ B(X) × B(Y )}. (3.4.2) 18 Its dual formulation is given by the following maximization problem: (Z Z Z ) OT∗CC := sup u(x) dµ(x) + v(y) dν(y) + w(x, y) dγ̃(x, y) , (3.4.3) B X Y X×Y where the supremum is taken over the set B of real-valued functions u, v, w satisfying u ∈ Cb (X), v ∈ Cb (Y ), w ∈ Cb (X ×Y ) and w ≤ 0 with u(x)+v(y)+w(x, y) ≤ c(x, y) everywhere. Now, we will present the duality theorem for this version. Theorem 3.4.1. ([46], Theorem 4.6.14) Let c : X × Y 7→ R ∪ {∞} be Borel measurable and bounded below. Then, the following statements are equivalent: (a) c is lower semi-continuous on X × Y . (b) The duality holds for all µ ∈ P(X), ν ∈ P(Y ) and γ̃ ∈ M+ (X × Y ). i.e. OTCC = OT∗CC . The proof is based on the abstract duality theorem (see [46], Theorem 4.6.1). 3.5 A further characterization on the optimizers Even though the idea of the CCOT problem is quite as natural as the classical OT problem, only a little is known about the optimizers when compared to other variants of OT problem. In this section, we will present some characterization on the optimizers of the primal and the dual problems for CCOT problem. Let X, Y be compact subsets of Rd and c be a non-negative, bounded, continuous function on X × Y . Let µ ∈ P(X), ν ∈ P(Y ) be probability measures which are absolutely continuous w.r.t. Lebesgue measure with densities f ∈ L1 (X) and g ∈ L1 (Y ) and γ̃ be a compactly supported finite measure on X × Y that is absolutely continuous w.r.t. Lebesgue measure with a bounded density h̃ ∈ L∞ (X × Y ). We will redefine the primal and the dual problems as follows: (Z ) Icap := inf c(x, y) dγ(x, y) . (3.5.1) γ∈Πγ̃ (µ,ν) X×Y 19 I∗cap := sup J(u, v, w), (3.5.2) (u,v,w)∈Lipc,γ̃ where Z Z Z J(u, v, w) = u(x) dµ(x) + v(y) dν(y) + w(x, y) dγ̃(x, y) X Y X×Y and ( Lipc,γ̃ := (u, v, w) : u ∈ L1 (X, dµ), v ∈ L1 (Y, dν), w ∈ L1 (X × Y, dγ̃), ) (3.5.3) u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 . Let γ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a maximizer for I∗cap . Let E ⊆ spt(γ̃) be the compact support of γ. Then, by Theorem 3.2.5, we have that   γ̃ on E ⊂ X × Y,   γ= (3.5.4)  0 elsewhere.   By duality (Theorem 3.3.1), we have that Z Z Z Z c dγ = u dµ + v dν + w dγ̃ X×Y X Y X×Y Z Z Z Z = u dµ + v dν + w dγ̃ + w dγ̃. X Y (X×Y )∩E (X×Y )∩E c By (3.5.4) and w ≤ 0 on (X × Y ) ∩ E c , we have that Z Z Z Z c dγ ≤ u dµ + v dν + w dγ. (3.5.5) X×Y X Y X×Y On the other hand, since (u, v, w) ∈ Lipc,γ̃ , we have that u(x) + v(y) + w(x, y) ≤ c(x, y), ∀(x, y) ∈ X × Y. By integrating both sides with respect to γ, we get Z Z Z Z u dµ + v dν + w dγ ≤ c dγ. (3.5.6) X Y X×Y X×Y 20 By combining (3.5.5) and (3.5.6), we get Z Z Z Z c dγ = u dµ + v dν + w dγ. (3.5.7) X×Y X Y X×Y Thus, by using the marginal conditions, we can write Z (c − u − v − w) dγ = 0. (3.5.8) X×Y Since we have the inequality c − u − v − w ≥ 0, we can conclude that c(x, y) − u(x) − v(y) − w(x, y) = 0 γ-a.e. (3.5.9) Now, let n ∈ N, σ be any permutation of {1, . . . , n} and pick an arbitrary collection of points (x1 , y1 ), . . . , (xn , yn ) ∈ spt(γ). Then, X n X n X n c(xi , yi ) − w(xi , yi ) = u(xi ) + v(yi ) i=1 i=1 i=1 X n = u(xi ) + v(yσ(i) ) i=1 X n ≤ c(xi , yσ(i) ) − w(xi , yσ(i) ). i=1 This shows that spt(γ) is (c − w)-cyclical monotone (see Definition 2.3.1). From here onwards, we will assume that the capacity γ̃ = κµ ⊗ ν for some κ ≥ 1. Let γ ∗ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a maximizer for I∗cap . Thus, by (3.5.7), we have Z Z Z Z ∗ c dγ = u dµ + v dν + w dγ ∗ . (3.5.10) X×Y X Y X×Y Again, consider the inequality u(x) + v(y) + w(x, y) ≤ c(x, y). By integrating both sides with respect to an arbitrary γ ∈ Π(µ, ν), we get Z Z Z Z u dµ + v dν + w dγ ≤ c dγ. X Y X×Y X×Y 21 i.e. Z Z Z u dµ + v dν ≤ (c − w) dγ. X Y X×Y Now, taking the infimum over γ ∈ Π(µ, ν) from both sides, we get Z Z Z u dµ + v dν ≤ inf (c − w) dγ. X Y γ∈Π(µ,ν) X×Y Observe that, since Πγ̃ (µ, ν) ⊆ Π(µ, ν), we have Z Z inf (c − w) dγ ≤ inf (c − w) dγ γ∈Π(µ,ν) X×Y γ∈Πγ̃ (µ,ν) X×Y Z ≤ (c − w) dγ ∗ (3.5.11) ZX×Y Z = u dµ + v dν, (by (3.5.10)). X Y Recall that the strong duality result holds for the classical OT problem with Borel mea- surable and µ ⊗ ν-a.e. finite costs (see [5], Theorem 2). Since c − w ≥ 0 is Borel measurable and µ ⊗ ν-a.e. finite, we have that Z Z Z inf (c − w) dγ = sup u dµ + v dν. (3.5.12) γ∈Π(µ,ν) X×Y u∈L1 (X),v∈L1 (Y ) X Y u+v≤c Thus, by (3.5.11) and (3.5.12), we can conclude that (u, v) maximizes the dual of the classical OT problem with cost (c − w). Finally, we will list the above characterization in the following proposition. Proposition 3.5.1. Let γ ∈ Πγ̃ (µ, ν) be a minimizer for Icap and (u, v, w) ∈ Lipc,γ̃ be a maximizer for I∗cap . Then, 1. c(x, y) − u(x) − v(y) − w(x, y) = 0 γ-a.e.. 2. spt(γ) is (c − w)-cyclical monotone. 3. If γ̃ = κµ ⊗ ν for some κ ≥ 1, then (u, v) maximizes the dual of the classical OT problem with cost (c − w). 22 CHAPTER 4 MULTI-MARGINAL OT PROBLEM 4.1 The Multi-Marginal OT (MMOT) Problem 4.1.1 Introduction As opposed to the classical OTP, which is a two-marginal problem, multi-marginal OTP (also known as the multi-dimensional Monge-Kantorovich Problem) studies transporting mass from a single source to multiple targets. More generally, given a p-tuple of probability measures (ν1 , . . . , νp ) in Rd , the problem consists of finding an optimal way of successively rearranging ν1 onto νi against a certain cost on Rpd where ν1 represents the mass distribution at the source and (νj )j̸=1 represent the mass distributions at the targets. MMOT is a versatile framework that can be applied in various fields, including image processing, computer vision, economics, machine learning, natural language processing, and medical imaging, among others. For example, MMOT is used in economics to model and solve problems involving the distribution of resources, such as the allocation of goods among multiple buyers and sellers [13]. Also, MMOT is used in machine learning tasks such as clustering, where it is used to group similar data points together based on their feature similarities. This was first discussed by Gangbo and Święch in [24], for a specific cost function. More characterization of the optimal solutions and some applications have been discussed in [43, 44] and a few variants such as multi-marginal partial OTP [31] have been introduced later on. 4.1.2 The Primal Problem (MMOT Problem) Let p be a positive integer. For a given p-tuple of probability measures (ν1 , . . . , νp ) in P(Rd ), and a lower semi-continuous cost function c(x1 , . . . , xp ) : Rpd 7→ R, we consider the minimization problem: Z OTMM (ν1 , . . . , νp ) := inf c(x1 , . . . , xp )dγ(x1 , . . . , xp ), (4.1.1) γ∈Π(ν1 ,...,νp ) Rpd 23 where Π(ν1 , . . . , νp ) := {γ ∈ P(Rpd ) : Projxi (x1 , . . . , xp )# γ = νi , ∀1 ≤ i ≤ p}. (4.1.2) When p = 2, (4.1.1) becomes the classical OT problem. Remark 4.1.1. In [24], the authors consider the cost p X c(x1 , . . . , xp ) = |xj − xk |2 (4.1.3) j̸=k and the minimization problem p Z X |Tj (x) − Tk (x)|2 OT′MM (T ) := inf dν1 (x). (4.1.4) K j̸=k Rd 2 where K is the set of all p-tuples of maps T = (T1 , . . . , Tp ) such that Ti : Rd 7→ Rd (i = 1, . . . , p) are Borel measurable and satisfy νi = Ti# ν1 . Theorem 4.1.2. Given νi ∈ P(Rd ) for each i = {1, . . . , p}, and a lower semi-continuous cost c : Rpd 7→ R, OTMM has a solution. The proof of the existence of a minimizer for (4.1.1) is quite standard. Similar to the proof of existence of solutions for the classical OT Problem, since the set Π(ν1 , . . . , νp ) is non- empty, convex and compact with respect to the weak topology and the functional γ 7→ c dγ R is linear with respect to γ, we can guarantee the existence of a minimizer for (4.1.1). Remark 4.1.3. In [24], the authors have proven that if the measures ν1 , . . . , νp are vanishing on (d − 1)-rectifiable sets and have finite second moments, then the MMOT Problem with cost pj̸=k |xj − xk |2 has a unique solution (See [24], Corollary 2.2). P 4.1.3 Duality Similar to the dual formulation of the two-marginal OT problem, we define the dual problem of the MMOT Problem as follows: 24 Given probability measures (ν1 , . . . , νp ) in P(Rd ), we consider the maximizing problem ( p Z ) X OT∗MM (u1 , . . . , up ) := sup ui (xi ) dνi (xi ) . (4.1.5) A i=1 Rd Here, A is the set of all p-tuples of functions (u1 , . . . , up ) such that ui ∈ L1 (Rd , dνi ) and upper semi-continuous, and pi=1 ui (xi ) ≤ c(x1 , . . . , xp ), ∀(x1 , . . . , xp ) ∈ Rpd . P The duality results between (4.1.1) and (4.1.5) have been discussed in the literature. How- ever, in [24], the authors provide duality results with extensive characterization. Therefore, we will present the duality results given in [24] for the cost given by (4.1.3). Theorem 4.1.4. ([24], Theorem 2.1) Assume that ν1 , . . . , νp are non-negative Borel proba- bility measures vanishing on (d − 1)-rectifiable sets and having finite second moments. Set Xi := spt(νi ) for i = 1, . . . , p. Then: (i) OT∗MM admits a maximizer u = (u1 , . . . , up ) ∈ A. (ii) There is a minimizer S = (S1 , . . . , Sp ) for (4.1.4) satisfying S1 (x) = x (x ∈ Rd ). The Si are one-to-one νi -a.e., are uniquely determined, and have the form Si (x) = ∇fi∗ (∇f1 (x)) (x ∈ Rd ), where |x|2 fi (x) = + ϕi (x), 2 the ϕi are convex functions, and fi∗ ∈ C 1 (Rd ) where fi∗ denotes the Legendre transform of fi (see [47], Section 26). (iii) Duality holds: the optimal values in (4.1.4) and (4.1.5) coincide. (iv) If ū = (ū1 , . . . , ūp ) ∈ A is another maximizer for (4.1.5), we can modify the ūi ’s on sets of zero νi measure to obtain a maximizer, still denoted ū, such that ūi is differentiable νi -a.e. Furthermore, ∇ui = ∇ūi νi -a.e. 25 4.2 Capacity Constrained Multi-Marginal OT Problem The notion of the capacity constrained multi-marginal OT Problem already has been introduced in [18]. The Capacity Constrained Multi-Marginal Optimal Transport (CCM- MOT) problem introduces capacities that limit the amount transported between the source and the targets. 4.2.1 The Primal Problem Suppose we are given a positive integer p, and p probability measures ν1 , . . . , νp ∈ P(Rd ). Let γ̃ be a compactly supported finite measure on Rpd such that γ̃ ≪ ν1 ⊗ . . . ⊗ νp . Define Πγ̃ (ν1 , . . . , νp ) := {γ ∈ Π(ν1 , . . . , νp ) : γ(A1 × . . . × Ap ) ≤ γ̃(A1 × . . . × Ap ), ∀Ai ∈ B(Rd ), ∀i}. (4.2.1) We assume that the set Πγ̃ (ν1 , . . . , νp ) is non-empty. Then, the CCMMOT problem is to minimize the cost: Z OTCCMM := inf c(x1 , . . . , xp ) dγ(x1 , . . . , xp ). (4.2.2) γ∈Πγ̃ (ν1 ,...,νp ) Rpd In [18], it has been shown that the CCMMOT problem has a solution and some char- acterization of the optimal solution is given. We will present some of the results from [18] below. Theorem 4.2.1. ([18], Theorem 3.1) If c ∈ L∞ (Rpd ), then OTCCMM has a solution. Next, we will provide a characterization of the optimal solutions of (4.2.2). Definition 4.2.2. A measure γ ∈ Πγ̃ (ν1 , . . . , νp ) is called an extreme point of the convex set Πγ̃ (ν1 , . . . , νp ) if γ is not the midpoint of a non-trivial line segment in Πγ̃ (ν1 , . . . , νp ). Theorem 4.2.3. ([18], Theorem 4.1) Suppose ν1 , . . . , νp are non-atomic Borel probability measures on Rd . Then, γ ∈ Πγ̃ (ν1 , . . . , νp ) is an extreme point of the set Πγ̃ (ν1 , . . . , νp ) if and only if γ = 1W γ̃ for a γ-measurable set W ⊂ Rpd . 26 To get the uniqueness of optimal solutions, we make the following two assumptions on the cost function. 1. Assume that the function c ∈ L∞ (Rpd ) is such that the mixed partial derivative of order p ∂ pc ∂i1 ,...,ip c = i ∂xi11 , . . . , xpp i exists for each set of variables (xi11 , . . . , xpp ), where 1 ≤ ik ≤ d for any k ≤ p. 2. Assume that for some fixed number N ∈ N ∪ {∞} there exist at most countably many disjoint open sets {Gk }N k=1 , Gk ⊆ R , such that the following conditions are satisfied: d (C1) each set Gk has positive Lebesgue measure; (C2) the union of all sets in {Gk }N k=1 has full Lebesgue measure; k (C3) for every k ≤ N there exist a set of variables (xk11 , . . . , xpp ) such that the functions ∂k1 ,...,kp c is either strictly positive or strictly negative on Gk . Theorem 4.2.4. ([18], Theorem 6.1) Suppose we are given a cost function c on Rd satisfying the conditions (C1), (C2), and (C3). Then, any γ ∈ P(Rpd ) that is an optimal plan is an extreme point of the set Πγ̃ (ν1 , . . . , νp ). Corollary 4.2.5. ([18], Corollary 6.1.1) If c on Rd satisfies the conditions (C1), (C2), and (C3), then the optimal plan is unique. 4.2.2 Duality To the best of our knowledge, a dual formulation for the CCMMOT problem does not exist in the literature. Therefore, in this section we present a dual formulation for the CCMMOT problem and prove the strong duality result and the existence of dual maximizers. We will generalize the techniques used in [34, 35] for the two-marginal capacity con- strained OT problem to get our results. 27 Let νi be a probability measure in P(Rd ) which is absolutely continuous w.r.t. Lebesgue measure with density fi ∈ L1 (Rd ) for each 1 ≤ i ≤ p and γ̃ be a compactly supported finite measure that is absolutely continuous w.r.t. Lebesgue measure with a bounded density h̃. We denote by Πh̃ (f1 , . . . , fp ) the set of all non-negative measurable joint densities on Rpd that are bounded by h̃. i.e. for each 1 ≤ i ≤ p, fi (xi ) = h(x1 , . . . , xp ) dx1 . . . dxi−1 dxi+1 . . . dxp , R and 0 ≤ h ≤ h̃. As the dual formulation of the CCMMOT problem, we consider the following maximization problem: OT∗CCMM := sup J(u1 , . . . , up , w) (4.2.3) (u1 ,...,up ,w)∈Liph̃ c where p Z Z X J(u1 , . . . , up , w) := ui (xi )fi (xi ) dxi + w(x1 , . . . , xp )h̃(x1 , . . . , xp ) dx1 . . . dxp i=1 Rd Rpd (4.2.4) and ( Liph̃c := (u1 , . . . , up , w) : ui ∈ L1 (Rd ), w ∈ L1 (Rpd ), p ) (4.2.5) X ui (xi ) + w(x1 , . . . , xp ) ≤ c(x1 , . . . , xp ), and w(x1 , . . . , xp ) ≤ 0 . i=1 First, we will prove that the strong duality holds for the CCMMOT problem. Theorem 4.2.6. Let νi be a probability measure in P(Rd ) which is absolutely continuous w.r.t. Lebesgue measure with density fi ∈ L1 (Rd ) for each 1 ≤ i ≤ p, γ̃ be a compactly supported finite measure that is absolutely continuous w.r.t. Lebesgue measure on Rpd with a bounded density h̃, c ∈ L1 (Rpd ) and assume Πh̃ (f1 , . . . , fp ) ̸= ∅. Then, OTCCMM = OT∗CCMM . Here, we will redefine OTCCMM as Z OTCCMM = inf Ic (h) := ch dx1 . . . dxp . (4.2.6) h∈Πh̃ (f1 ,...,fp ) Rpd 28 Proof. The proof of Theorem 4.2.6 is a consequence of the two propositions that we will be proving below. Proposition 4.2.7. Under the hypotheses of Theorem 4.2.6, we have OTCCMM ≥ OT∗CCMM (4.2.7) Proposition 4.2.8. Under the hypotheses of Theorem 4.2.6, there exists a sequence {(uε1 , . . . , uεp , wε )}ε↓0 in Liph̃c such that Ic (h0 ) = lim J(uε1 , . . . , uεp , wε ), (4.2.8) ε↓0 where h0 is a minimizer of (4.2.6) of the form h0 = 1W h̃, for a Lebesgue measurable set W ⊂ Rpd . Proof of Proposition 4.2.7: Let h ∈ Πh̃ (f1 , . . . , fp ) with Ic (h) finite and let (u1 , . . . , up , w) ∈ Liph̃c . Then we have Z Ic (h) = ch dx1 . . . dxp Rpd p Z Z Z p X X = ui fi dxi + wh̃ dx1 . . . dxp + (c − ui − w)h dx1 . . . dxp i=1 Rd Rpd Rpd i=1 Z + w(h − h̃) dx1 . . . dxp Rpd ≥ J(u1 , . . . , up , w). Note that, in the second line, we use the marginal conditions on h and in the last line, we used the properties of the set Liph̃c along with the fact that h ≤ h̃. By taking the infimum over h ∈ Πh̃ (f1 , . . . , fp ) and supremum over (u1 , . . . , up , w) ∈ Liph̃c from both sides, we get the inequality (4.2.7). Proof of Proposition 4.2.8: Similar to [35], we introduce a relaxed version of the MMOT problem with capacity constraints. Let ε > 0 be a small number. Define Z p 1 X Icε (h) := ch dx1 . . . dxp + || ⟨h⟩xi − fi ||22 (4.2.9) Rpd 2ε i=1 29 where ⟨g⟩xi denotes the ith marginal of g, for a given function g = g(x1 , . . . , xp ) defined on Rpd , i.e. for each 1 ≤ i ≤ p, ⟨g⟩xi = g(x1 , . . . , xp ) dx1 . . . dxi−1 dxi+1 . . . dxp . R For (u1 , . . . , up , w) ∈ Liph̃c such that each ui ∈ L2 (Rd ), we define p Z Z p X εX ε J (u1 , . . . , up , w) := ui fi dxi + wh̃ dx1 . . . dxp − ||ui ||22 . (4.2.10) i=1 Rd Rpd 2 i=1 Note that, if ui ∈ / L2 (Rd ) for some i ∈ {1, . . . , p}, we can extend J ε to a functional on the entire set Liph̃c by setting J ε (u1 , . . . , up , w) := −∞. First, we will prove that the statement of Proposition 4.2.7 holds for the relaxed version. Lemma 4.2.9. Let ε > 0 be given. Under the hypotheses of Theorem 4.2.6, we have inf Icε (h) ≥ sup J ε (u1 , . . . , up , w). (4.2.11) 0≤h≤h̃ (u1 ,...,up ,w)∈Liph̃ c Proof. Let 0 ≤ h ≤ h̃ and (u1 , . . . , up , w) ∈ Liph̃c be such that Icε (h) and J ε (u1 , . . . , up , w) are both finite. Then, observe that, Z p 1 X Icε (h) = ch dx1 . . . dxp + || ⟨h⟩xi − fi ||22 Rpd 2ε i=1 p Z Z p X εX = ui fi dxi + wh̃ dx1 . . . dxp − ||ui ||22 i=1 R d Rpd 2 i=1 Z p Z X + (c − ui − w)h dx1 . . . dxp + w(h − h̃) dx1 . . . dxp Rpd i=1 Rpd p 1 X + || ⟨h⟩xi − fi + εui ||22 2ε i=1 ≥ J ε (u1 , . . . , up , w). The last line holds true by the definition of the set Liph̃c , the fact that 0 ≤ h ≤ h̃ and the non-negativity of the L2 norms. Finally, by taking the infimum over h ∈ Πh̃ (f1 , . . . , fp ) and supremum over (u1 , . . . , up , w) ∈ Liph̃c from both sides, we get the inequality (4.2.11). Next, we will prove that the relaxed problem (4.2.9) has a minimizer. 30 Lemma 4.2.10. Let ε > 0 be given. Under the hypotheses of Theorem 4.2.6, there exists a minimizer hε of Icε , and hε can be chosen of the form hε = 1Wε h̃ for some Lebesgue measurable set Wε ⊆ Rpd . Furthermore, if ĥε is another minimizer of Icε , then for each D E 1 ≤ i ≤ p, ⟨hε ⟩xi = ĥε . xi Proof. Let h0 be a minimizer of (4.2.6). Since it is admissible for Icε , and h0 has marginals f1 , . . . , fp , we have that Icε (h0 ) = Ic (h0 ). Thus, we have that, −||h̃||∞ ||c||L1 (W̃ ) ≤ inf h Icε (h) ≤ Icε (h0 ) = Ic (h0 ) < ∞, where W f is the support of h̃. So, we pick a minimizing sequence {hn }n∈N of Icε . Since we have 0 ≤ hn ≤ h̃, for each n, we can find a subsequence (without relabeling) {hn }n∈N that converges weak-* in (L1 (Rpd ))∗ = L∞ (Rpd ) (see [48], Chapter 19) to some L∞ function hε that satisfies 0 ≤ hε ≤ h̃. Also note that for each 1 ≤ i ≤ p, Z ⟨hn ⟩xi − fi = (hn − h0 ) dx1 . . . dxi−1 dxi+1 . . . dxp . R(p−1)d Hence, Z Z Z !2 | ⟨hn ⟩xi − fi |2 dxi ≤ |hn − h0 | dx1 . . . dxi−1 dxi+1 . . . dxp dxi . Rd Rd R(p−1)d Since hn , h0 ≤ h̃ ∈ L∞ (Rpd ) with compact support, the sequence {⟨hn ⟩xi − fi }n∈N is bounded in L2 . Hence, we can find a further subsequence (without relabeling) {hn }n∈N such that the sequences {⟨hn ⟩xi − fi }n∈N weakly converge to ⟨hε ⟩xi − fi in L2 for each i.  Now, fix an i ∈ {1, . . . , p} and let ξ = ξ(xi ) be an arbitrary smooth and compactly supported test function. Then, Z Z    [ ⟨hn ⟩xi − fi − ⟨hε ⟩xi − fi ]ξ dxi = ⟨hn ⟩xi − ⟨hε ⟩xi ξ dxi Rd d ZR = (hn − hε )ξ dx1 . . . dxp . Rpd Since hn converges to hε weak-*, we get that ⟨hn ⟩xi − fi converges to ⟨hε ⟩xi − fi . Now, by the lower semi-continuity of the L2 norm with respect to weak L2 convergence, for each i, we have that || ⟨hε ⟩xi − fi ||L2 (Rd ) ≤ lim inf || ⟨hn ⟩xi − fi ||L2 (Rd ) . n→∞ 31 Also, since c ∈ L1 (Rpd ), and hε , hn are supported in W f , by the weak-* convergence on L∞ (W f ), we get Z Z chε dx1 . . . dxp = lim chn dx1 . . . dxp . Rpd n→∞ Rpd Thus, we have that inf Icε (h) = lim inf Icε (hn ) h n→∞ p (Z ) 1 X = lim inf chn dx1 . . . dxp + || ⟨hn ⟩xi − fi ||22 n→∞ Rpd 2ε i=1 Z p 1 X ≥ lim inf chn dx1 . . . dxp + lim inf || ⟨hn ⟩xi − fi ||22 n→∞ Rpd 2ε i=1 n→∞ Z p 1 X ≥ chε dx1 . . . dxp + || ⟨hε ⟩xi − fi ||22 R pd 2ε i=1 = Icε (hε ). This concludes that hε is a minimizer for Icε . Since the relaxed problem is strictly convex (see Appendix 2), it is clear that hε has unique marginals. Also, since hε is a minimizer for Ic in the class Πh̃ (⟨hε ⟩x1 , . . . , ⟨hε ⟩xp ), we can choose hε such that hε = 1Wε h̃ for some Lebesgue measurable set Wε ⊂ W f (see Theorem 4.2.3 and Theorem 4.2.4). For the rest of the proof, we define the following functions. 1 uεi := − (⟨hε ⟩xi − fi ) ∀1 ≤ i ≤ p, (4.2.12) ε and p ( ) X wε := min c − uεi , 0 . (4.2.13) i=1 Note that, by the definition of wε , we get that wε ≤ 0 and uεi + wε ≤ c. Thus, Pp i=1 (uε1 , . . . , uεp , wε ) ∈ Liph̃c defined in (4.2.5). Also note that for each i, uεi are determined inde- pendently of the choice of hε in Πh̃ (f1 , . . . , fp ). We will claim that (uε1 , . . . , uεp , wε ) maximizes J ε in Liph̃c . 32 Lemma 4.2.11. Let hε = 1Wε h̃ be a minimizer for Icε . Then   ≤ 0 a.e. in Wε , p   X c− uiε (4.2.14)  ≥ 0 a.e. in W  i=1  f \ Wε . Proof. Let ξ ≥ 0 be an arbitrary smooth test function. Define, for any σ ∈ R hσε := hε + σξ(h̃ − hε ). Since hε = 1Wε h̃, we have  a.e. in Wε ,  hε  hσε = (4.2.15) σξ h̃ a.e. in W   f \ Wε . Observe that, h0ε = hε and for 0 ≤ σ ≤ ||ξ||−1 ∞ , 0 ≤ hε ≤ h̃. Since hε minimizes Ic , we σ ε have Icε (hε ) = Icε (h0ε ) ≤ Icε (hσε ). Now observe that, Z p 1 X Icε (hσε ) = chσε dx1 . . . dxp + || ⟨hσε ⟩xi − fi ||22 Rpd 2ε i=1 Z p 1 X D E = c(hε + σξ(h̃ − hε )) dx1 . . . dxp + || (hε + σξ(h̃ − hε )) − fi ||22 Rpd 2ε i=1 x i Z Z p 1 X = chε dx1 . . . dxp + cσξ(h̃ − hε ) dx1 . . . dxp + || ⟨hε ⟩xi − fi ||22 Rpd Rpd 2ε i=1 p Z E 2 1 X D + σξ (h̃ − hε ) dxi 2ε i=1 Rd xi p Z  D E  1X  + σξ (h̃ − hε ) · ⟨hε ⟩xi − fi dxi . ε i=1 Rd xi Then, the right derivative of Icε at σ = 0 is given by Icε (σ) − Icε (0) ∂+ Icε (0) = lim+ σ→0 (Z σ Z 1 = lim+ chε dx1 . . . dxp + cσξ(h̃ − hε ) dx1 . . . dxp σ→0 σ Rpd Rpd p p Z E 2 1 X 2 1 X D + || ⟨hε ⟩xi − fi ||2 + σξ (h̃ − hε ) dxi 2ε i=1 2ε i=1 Rd xi 33 p Z  D E  1X  + σξ (h̃ − hε ) · ⟨hε ⟩xi − fi dxi ε i=1 Rd xi p Z ) 1 X − chε dx1 . . . dxp − || ⟨hε ⟩xi − fi ||22 Rpd 2ε i=1 p Z (Z σ X D E 2 = lim+ cξ(h̃ − hε ) dx1 . . . dxp + ξ (h̃ − hε ) dxi σ→0 Rpd 2ε i=1 Rd xi p Z  D ) 1X E   + ξ (h̃ − hε ) · ⟨hε ⟩xi − fi dxi ε i=1 Rd xi Z p Z  D E  1X  = cξ(h̃ − hε ) dx1 . . . dxp + ξ (h̃ − hε ) · ⟨hε ⟩xi − fi dxi Rpd ε i=1 Rd xi Z p Z 1X  = cξ(h̃ − hε ) dx1 . . . dxp + ⟨hε ⟩xi − fi ξ(h̃ − hε ) dx1 . . . dxp Rpd ε i=1 Rpd p Z ! X = c− uεi ξ(h̃ − hε ) dx1 . . . dxp . Rpd i=1 Since h0ε = hε minimizes Icε , we have p Z ! X 0 ≤ ∂+ Icε (0) = c− uεi ξ(h̃ − hε ) dx1 . . . dxp . Rpd i=1 As this holds for any arbitrary test function ξ ≥ 0, we have that (c − Pp i=1 uεi ) (h̃−hε ) ≥ 0 a.e.. Since h̃ − hε ≥ 0 a.e. and h̃ − hε > 0 a.e in W f \ Wε , we get the second part of the inequality in (4.2.14). With a similar argument for hσε := hε − σξhε , we can prove the first inequality in (4.2.14). Next, we will prove a duality result for the relaxed version. Lemma 4.2.12. Let hε = 1Wε h̃ be a minimizer for Icε and uεi , wε be defined by (4.2.12) and (4.2.13). Then Icε (hε ) = J ε (uε1 , . . . , uεp , wε ). (4.2.16) In particular, (uε1 , . . . , uεp , wε ) is a maximizer for J ε (u1 , . . . , up , w) in Liph̃c . 34 Proof. Observe that, p Z Z p X ε εX ε 2 J ε (uε1 , . . . , uεp , wε ) = uεi fi dxi + w h̃ dx1 . . . dxp − ||u || Rd Rpd 2 i=1 i 2 Zi=1 Z Z = chε dx1 . . . dxp − chε dx1 . . . dxp + wε hε dx1 . . . dxp Rpd Rpd Rpd p p Z Z XZ X − ε w hε dx1 . . . dxp + uεi ⟨hε ⟩xi dxi − uεi ⟨hε ⟩xi dxi Rpd i=1 Rd i=1 Rd p p XZ Z εX ε 2 + uεi fi dxi + wε h̃ dx1 . . . dxp − ||u || i=1 Rd Rpd 2 i=1 i 2 p Z Z ! X ε = chε dx1 . . . dxp − c− uεi −w hε dx1 . . . dxp Rpd Rpd i=1 Z p XZ ε + w (h̃ − hε ) dx1 . . . dxp + uεi (fi − ⟨hε ⟩xi ) dxi Rpd i=1 Rd p εX ε 2 − ||u || 2 i=1 i 2 Z p 1 X = chε dx1 . . . dxp + || ⟨hε ⟩xi − fi ||22 R pd 2ε i=1 Z + wε (h̃ − hε ) dx1 . . . dxp R pd p Z ! X − c− uεi − wε hε dx1 . . . dxp Rpd i=1 Z = Icε (hε ) + wε (h̃ − hε ) dx1 . . . dxp Rpd p Z ! X − c− uεi − wε hε dx1 . . . dxp . Rpd i=1 (4.2.17) From (4.2.14) and (4.2.13), we observe that, on Wε , hε = h̃, hence wε (h̃ − hε ) = 0 and since c − pi=1 uεi ≤ 0, wε = c − pi=1 uεi , we obtain c − pi=1 uεi − wε = 0. On the other hand, P P P on W f \ Wε , hε = 0, hence (c − Pp uεi − wε )hε = 0, and since c − Pp uεi ≥ 0, wε = 0, we i=1 i=1 obtain wε (h̃ − hε ) = 0. Thus, (4.2.17) becomes J ε (uε1 , . . . , uεp , wε ) ≥ Icε (hε ). 35 By (4.2.11) and the fact that (uε1 , . . . , uεp , wε ) ∈ Liph̃c , we can conclude that (uε1 , . . . , uεp , wε ) is a maximizer for J ε (uε1 , . . . , uεp , wε ) in Liph̃c . Now, we will prove that the minimizer of the relaxed problem (4.2.9) approximates the original problem (4.2.6). Lemma 4.2.13. Let {hε }ε↓0 be a sequence of minimizers for Icε . Then, it is precompact in the L∞ -weak-* topology and every limit point h0 is a minimizer of Ic . Furthermore, lim Ic (hε ) = Ic (h0 ), (4.2.18) ε↓0 lim ε||uεi ||22 = 0, ∀1 ≤ i ≤ p. (4.2.19) ε↓0 Proof. Since we have 0 ≤ hε ≤ h̃, there exists a subsequence {hεn }n∈N that converges weakly- * in L∞ (W f ) to some function 0 ≤ h̄ ≤ h̃. Let h̄0 be a minimizer of Ic . Note that, h̄0 is admissible for the relaxed problem and since hεn is a minimizer for the relaxed problem, we have Icεn (hεn ) ≤ Icεn (h̄0 ) = Ic (h̄0 ). (4.2.20) By the weak-* convergence of {hεn } in L∞ (W f ), we have that Z Z ch̄ dx1 . . . dxp = lim chεn dx1 . . . dxp . Rpd εn ↓0 Rpd i.e. Ic (h̄) = lim Ic (hεn ). (4.2.21) εn ↓0 Since Ic (hεn ) ≤ Icεn (hεn ), we have lim inf Ic (hεn ) ≤ lim inf Icεn (hεn ). (4.2.22) εn ↓0 εn ↓0 By taking the lim inf in (4.2.20), we have lim inf Icεn (hεn ) ≤ Ic (h̄0 ). (4.2.23) εn ↓0 36 Thus, by combining (4.2.21), (4.2.22), and (4.2.23) Ic (h̄) ≤ Ic (h̄0 ). (4.2.24) Now, we will claim that h̄ has the marginals f1 , . . . , fp . From (4.2.20), we have Z p Z 1 X 2 chεn dx1 . . . dxp + || ⟨hεn ⟩xi − fi ||2 ≤ ch̄0 dx1 . . . dxp . Rpd 2εn i=1 Rpd So, p Z X || ⟨hεn ⟩xi − fi ||22 ≤ 2εn c(h̄0 − hεn ) dx1 . . . dxp . i=1 Rpd Note that, since h̄0 , hεn ∈ L∞ (Rpd ) with 0 ≤ hεn ≤ h̃, ∀εn , and c ∈ L1 (Rpd ), we have c(h̄0 − hεn ) dx1 . . . dxp ≤ 2||h̃||L∞ (Rpd ) ||c||L1 (Rpd ) . R Rpd Hence, supn Rpd c(h̄0 − hεn ) dx1 . . . dxp < ∞. R Therefore, as εn ↓ 0, for each i, ⟨hεn ⟩xi → fi in L2 . (4.2.25) Now, fix an i ∈ {1, . . . , p} and let ξ = ξ(xi ) be an arbitrary smooth and compactly supported test function. Then, Z   Z Z    h̄ xi − fi ξ dxi = ⟨hεn ⟩xi − fi ξ dxi + h̄ xi − ⟨hεn ⟩xi ξ dxi Rd Rd Rd Z Z  = ⟨hεn ⟩xi − fi ξ dxi + (h̄ − hεn )ξ dx1 . . . dxp d Rpd ZR Z  ≤ | ⟨hεn ⟩xi − fi ξ| dxi + (h̄ − hεn )ξ dx1 . . . dxp Rd Rpd Z ≤ || ⟨hεn ⟩xi − fi ||L2 (Rd ) ||ξ||L2 (Rd ) + (h̄ − hεn )ξ dx1 . . . dxp . Rpd In the last line, the first integral on the right converges to zero by (4.2.25) and the second integral converges to zero by the weak-* convergence of hεn . Since ξ is arbitrary, we get that for each i, h̄ xi = fi . Thus, we have h̄ ∈ Πh̃ (f1 , . . . , fp ). Hence, by (4.2.24), we have that Ic (h̄) = Ic (h̄0 ) = min Ic (h). (4.2.26) h∈Πh̃ (f1 ,...,fp ) 37 Thus, we have min Ic (h) = Ic (h̄) = lim inf Ic (hεn ) ≤ lim inf Icεn (hεn ) ≤ Ic (h̄0 ) = min Ic (h). h∈Πh̃ (f1 ,...,fp ) εn ↓0 εn ↓0 h∈Πh̃ (f1 ,...,fp ) So, we can find a subsequence {hεn }n∈N (not relabeled), such that min Ic (h) = lim Ic (hεn ) = lim Icεn (hεn ). (4.2.27) h∈Πh̃ (f1 ,...,fp ) εn ↓0 εn ↓0 Therefore, p (Z ) 1 X Ic (h̄) = lim chεn dx1 . . . dxp + || ⟨hεn ⟩xi − fi ||22 εn ↓0 Rpd 2ε n i=1 p (Z ) ( ) 1 X = lim chεn dx1 . . . dxp + lim || ⟨hεn ⟩xi − fi ||22 εn ↓0 Rpd ε n ↓0 2ε n ( p ) i=1 1 X = Ic (h̄) + lim || ⟨hεn ⟩xi − fi ||22 . εn ↓0 2εn i=1 This gives us that p ( ) 1 X lim || ⟨hεn ⟩xi − fi ||22 = 0. εn ↓0 εn i=1 Using the definition of uεi n in (4.2.12), we get p X lim εn ||uεi n ||22 = 0. εn ↓0 i=1 Hence, for each 1 ≤ i ≤ p, lim εn ||uεi n ||22 = 0. εn ↓0 This completes the proof of Lemma 4.2.13. Finally, rewriting (4.2.16) using J(uε1n , . . . , uεpn , wεn ) and Ic (hεn ), we get that p X J(uε1n , . . . , uεpn , wεn ) = Ic (hεn ) + εn ||uεi n ||22 . i=1 Letting εn ↓ 0, we get lim J(uε1n , . . . , uεpn , wεn ) = lim Ic (hεn ) = Ic (h0 ). εn ↓0 εn ↓0 This completes the proof of Proposition 4.2.8, hence the proof of Theorem 4.2.6. 38 Remark 4.2.14. Note that the assumption of h̃ being compactly supported, automatically gives us that the densities fi ’s are compactly supported. Next, we will prove the existence of dual maximizers for the CCMMOT problem. Recall that the CCMMOT problem is given by Z OTCCMM := inf c(x1 , . . . , xp ) dγ(x1 , . . . , xp ) γ∈Πγ̃ (ν1 ,...,νp ) Rpd and the dual problem is given by OT∗CCMM := sup J(u1 , . . . , up , w), (u1 ,...,up ,w)∈Liph̃ c where p Z Z X J(u1 , . . . , up , w) := ui (xi )fi (xi ) dxi + w(x1 , . . . , xp )h̃(x1 , . . . , xp ) dx1 . . . dxp . i=1 Rd Rpd Let h̃ ∈ L∞ (X1 × . . . × Xp ), and the probability densities fi ∈ L1 (Xi ), for each 1 ≤ i ≤ p are compactly supported. We will assume that the sets Xi = spt(fi ) have unit Lebesgue measure. In order to get the existence result, we will consider the following dual formulation: ′ OT∗CCMM := sup J ′ (u1 , . . . , up ), (4.2.28) ui ∈L1 (Xi ),∀1≤i≤p where p Z p Z " # X X ′ J (u1 , . . . , up ) := ui fi dxi − −c + ui h̃ dx1 . . . dxp . (4.2.29) i=1 Xi X1 ×...×Xp i=1 + Note that, (see Appendix 1) ′ OT∗CCMM = OT∗CCMM . Now, we will show that (4.2.28) has a solution. Theorem 4.2.15. Let fi and h̃ be continuous and strictly positive on their compact supports Xi ⊆ Rd and X1 × . . . × Xp respectively. Let c ∈ L1 (X1 × . . . × Xp ). Fix an η > 1 and 39 assume that Πh̃/η (f1 , . . . , fp ) is non-empty. Then, there exist functions (u1 , . . . , up ) where ui ∈ L1 (Xi ) ∀i, such that ′ OT∗CCMM = J ′ (u1 , . . . , up ). As in [34], our main goal will be to get a coercivity estimate of maximizing sequences which will guarantee L1 -boundedness. For each 1 ≤ i ≤ p, let Z ui fi := ui (xi )fi (xi ) dxi . Rd Note that for a p-tuple of constants (k1 , . . . , kp ) with ki = 0, we have Pp i=1 J ′ (u1 , . . . , up ) = J ′ (u1 + k1 , . . . , up + kp ). Hence, we can find a p-tuple of constants (k1 , . . . , kp ) with ki = 0, so that Pp i=1 (ui + ki )fi = (uj + kj )fj , ∀i ̸= j. First, we will find a bound on the means ui fi . Lemma 4.2.16. Fix ui ∈ L1 (Xi ), c ∈ L1 (X1 × . . . × Xp ) and a probability density h ∈ L∞ (X1 × . . . × Xp ) with marginals (f1 , . . . , fp ). Suppose there is some η > 1 such that h ≤ h̃η . Then, p X ||ηhc||L1 (X1 ×...×Xp ) − J ′ (u1 , . . . , up ) ′ J (u1 , . . . , up ) ≤ ui fi ≤ . (4.2.30) i=1 η−1 Proof. From definition (4.2.29), we have p Z p Z " # X X ′ J (u1 , . . . , up ) = ui fi dxi − −c + ui h̃ dx1 . . . dxp . i=1 Xi X1 ×...×Xp i=1 + By the non-positivity of the second integral, we get one direction of the inequality: p X ′ J (u1 , . . . , up ) ≤ ui fi . (4.2.31) i=1 On the other hand, since 0 ≤ h ≤ and the fact that (−c + ui ]+ , h̃ Pp Pp η i=1 ui ) ≤ [−c + i=1 we have p p Z " # X X J ′ (u1 , . . . , up ) = ui fi − −c + ui h̃ dx1 . . . dxp i=1 X1 ×...×Xp i=1 + 40 p p Z ! X X ≤ ui fi − η −c + ui h dx1 . . . dxp i=1 X1 ×...×Xp i=1 p Z p X X = ui fi + η ch dx1 . . . dxp − η ui fi i=1 X1 ×...×Xp i=1 p X ≤ −(η − 1) ui fi + ||ηch||L1 (X1 ×...×Xp ) . i=1 Thus, we have p X ||ηch||L1 (X1 ×...×Xp ) − J ′ (u1 , . . . , up ) ui fi ≤ . (4.2.32) i=1 η−1 By combining (4.2.31) and (4.2.32), we get the inequality (4.2.30) we desired. Remark 4.2.17. Note that, if we assume that (ui + ki )fi = (uj + kj )fj , ∀i ̸= j for some constants (k1 , . . . , kp ), the bounds in (4.2.30) imply that ui fi is also bounded ,∀1 ≤ i ≤ p. Next, we will obtain a bound on the oscillation of ui fi around its mean for each i. Lemma 4.2.18. Let ui , fi ∈ L1 (Xi ) for each 1 ≤ i ≤ p, c ∈ L1 (X1 × . . . × Xp ) and h̃ ∈ L∞ (X1 × . . . × Xp ). Suppose that there is some 0 < ε ≤ 1 such that εf1 . . . fp ≤ h̃ for all (x1 , . . . , xp ) ∈ X1 × . . . × Xp . Then, for each 1 ≤ i ≤ p we have p ε X ||ui fi − ui fi ||L1 (X1 ×...×Xp ) ≤ −J ′ (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) + |ui fi |. (4.2.33) 6 i=1 Proof. Fix an i ∈ {1, . . . , p}. Define the oscillation around the mean as Z σi := |ui fi − ui fi | dxi . Xi Let ( ) σi Xi± = xi : ±(ui (xi )fi (xi ) − ui fi ) > 3 and m± i = |Xi | ± Also, let Z A± i =± (ui (xi )fi (xi ) − ui fi ) dxi . Xi± 41 Note that by definition, A± i ≥ 0. Now, observe that Z Z (ui fi − ui (xi )fi (xi )) dxi = (ui fi − ui (xi )fi (xi )) dxi σi {|ui fi −ui fi |≤ 3 } Xi Z − (ui fi − ui (xi )fi (xi )) dxi σi {ui fi −ui fi <− 3 } Z − (ui fi − ui (xi )fi (xi )) dxi σi {ui fi −ui fi > 3 } Z = ui fi |Xi | − ui fi − (ui fi − ui (xi )fi (xi )) dxi Xi− Z − (ui fi − ui (xi )fi (xi )) dxi Xi+ Z = (ui (xi )fi (xi ) − ui fi ) dxi Xi+ Z + (ui (xi )fi (xi ) − ui fi ) dxi Xi− − = A+ i − Ai . (4.2.34) In the second line, first term, we used the fact that the sets Xi have unit Lebesgue measure. Then, by the definition of σi , Z σi = |ui fi − ui fi | dxi Xi Z Z = |ui fi − ui fi | dxi + |ui fi − ui fi | dxi σi σi {|ui fi −ui fi |≤ 3 } {ui fi −ui fi <− 3 } Z + |ui fi − ui fi | dxi σi {ui fi −ui fi > 3 } Z Z σi n σi o ≤ |ui fi − ui fi | ≤ + |ui fi − ui fi | dxi + |ui fi − ui fi | dxi 3 3 Xi− Xi+ σi − − 1 − m+ + A+  ≤ i − m i i + Ai . 3 Thus, 2 m+ m−   2 A+ i + A−i ≥ + i + i σi ≥ σi . (4.2.35) 3 3 3 3 42 On the other hand, by (4.2.34) we have Z + − Ai − Ai = (ui fi − ui (xi )fi (xi )) dxi σi {|ui fi −ui fi |≤ 3 } σi n σi σi o ≥− − ≤ ui fi − ui fi ≤ 3 3 3 (4.2.36) σi − 1 − m+  =− i − mi 3 σi ≥− . 3 Thus by (4.2.35) and (4.2.36), we get σi A+ i ≥ . (4.2.37) 6 Therefore, first we will find a bound on A+ i , so that we get a bound on σi . Now, observe that Z + Ai = (ui fi − ui fi ) dxi Xi+ Z Z =− ui fi dxi + ui fi dxi Xi+ Xi+ Z + = −(ui fi )mi + ui f1 . . . fp dx1 . . . dxp X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp p Z ! X = −(ui fi )m+ i + −c + uk f1 . . . fp dx1 . . . dxp X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp k=1 Z X Z + cf1 . . . fp dx1 . . . dxp − uj fj fi dxi X1 ×...×Xi−1 ×Xi+ ×Xi+1 ×...×Xp j̸=i Xi+ (4.2.38) p p p Z " # 1 X 1X 1X ≤ −c + uk h̃ dx1 . . . dxp − uk fk + uk fk ε X1 ×...×Xp k=1 ε k=1 ε k=1 + Z X Z + |c|f1 . . . fp dx1 . . . dxp − uj fj fi dxi − (ui fi )m+ i X1 ×...×Xp j̸=i Xi+   1 1 ≤ − J ′ (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) + + − mi ui fi ε ε Z ! 1 X + − fi dxi uj fj . ε Xi+ j̸=i Note that in the penultimate line, we used the fact that εf1 . . . fp ≤ h̃. 43 So, from (4.2.38), we have that, ′ εA+ +  i ≤ −J (u1 , . . . , up ) + ε||cf1 . . . fp ||L1 (X1 ×...×Xp ) + 1 − εmi ui fi Z ! X + 1−ε fi dxi uj fj . Xi+ j̸=i Since each fi is a probability density and m+ i , ε ≤ 1, we have 0 ≤ 1 − εmi ≤ 1 and + R 0 ≤ 1 − ε X + fi dxi ≤ 1. i So, we get that p X εA+ ′ (4.2.39)   i ≤ −J (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) + uk fk + k=1 Since we have A+i ≥ σi 6 by (4.2.37), we get p ε ′ X σi ≤ −J (u1 , . . . , up ) + ||cf1 . . . fp ||L1 (X1 ×...×Xp ) + uk fk . 6 k=1 This is the desired inequality (4.2.33). Now, we will obtain L1 -bounds on the ui ’s. Proposition 4.2.19. Let c ∈ L1 (X1 × . . . × Xp ) and h̃ ∈ L∞ (X1 × . . . × Xp ). Let h be a prob- ability density with marginals f1 , . . . , fp such that h ≤ h̃ η for some η > 1. Suppose that there is some ε > 0 such that εf1 (x1 ) . . . fp (xp ) ≤ h̃(x1 , . . . , xp ) and ε ≤ min{f1 (x1 ), . . . , fp (xp )} for almost all (x1 , . . . , xp ) ∈ X1 × . . . × Xp and suppose that there exist some functions ui ∈ L1 (Xi ), i ∈ {1, . . . , p} such that ui fi = uj fj , ∀i ̸= j. Then OT∗CCMM −1 ≤ J ′ (u1 , . . . , up ) implies that ||ui ||L1 (Xi ) is bounded for each 1 ≤ i ≤ p. Remark 4.2.20. Note that the above bound depends only on OT∗CCMM , ε, η, ||c||L1 and ||h̃||L∞ . Proof. Fix an i ∈ {1, . . . , p}. Observe that 1 1 1 1 ||ui ||L1 (Xi ) ≤ || ||L∞ (Xi ) ||ui fi ||L1 (Xi ) ≤ ||ui fi ||L1 (Xi ) ≤ ||ui fi ||L1 (Xi ) + ||ui fi − ui fi ||L1 (Xi ) fi ε ε ε Using Lemma 4.2.16, Lemma 4.2.18 and the fact that OT∗CCMM −1 ≤ J ′ (u1 , . . . , up ), we get a bound for ||ui ||L1 (Xi ) for each i. 44 Now, we will discuss the proof of the existence of maximizers. By Proposition 4.2.19, we can pick a maximizing sequence (u1 , . . . , up ) with uniform L1 -bounds for OT∗CCMM . However, since the L1 -unit ball is not compact, an L1 -bound is not sufficient to get the convergence of a subsequence. Therefore, in order to get better compactness properties, we extend the definition of J ′ (u1 , . . . , up ) from L1 space to the space of signed measures with finite total variation. Let M (X) denote the space of signed Radon measures on X. Let C ∈ M (X1 × . . . × Xp ) be such that dC(x1 , . . . , xp ) = c(x1 , . . . , xp ) dx1 . . . dxp and Ui ∈ M (Xi ) for each 1 ≤ i ≤ p. Define p Z p Z " # X X J˜′ (U1 , . . . , Up ) := fi dUi − h̃ d −C + H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd . i=1 Xi X1 ×...×Xp i=1 + (4.2.40) Here, dHid (x) = dxi denotes Lebesgue measure on Rd . First, we will prove that the functional J˜′ is upper semi-continuous with respect to weak-* convergence in M (X1 ) × . . . × M (Xp ). Lemma 4.2.21. Let c ∈ L1 (X1 × . . . × Xp ) be such that dC = c dH pd . Let f1 , . . . , fp and h̃ be continuous, non-negative functions on the compact sets X1 , . . . , Xp and X1 × . . . × Xp respectively. Then, the functional J˜′ is upper semi-continuous with respect to weak-* convergence in M (X1 ) × . . . × M (Xp ). Proof. Let (U1n , . . . , Upn )∞ n∈N be a bounded sequence in M (X1 ) × . . . × M (Xp ) such that (U1n , . . . , Upn ) converges to (U1 , . . . , Up ) when tested against functions in C(X1 )×. . .×C(Xp ). We need to show that lim sup J˜′ (U1n , . . . , Upn ) ≤ J˜′ (U1 , . . . , Up ). n→∞ Observe that, p Z p ( Z " # ) X X lim sup fi dUin − h̃ d −C + H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd n→∞ Xi X1 ×...×Xp i=1 i=1 + 45 p Z p Z " # X X ≤ lim sup fi dUin + lim sup − h̃ d −C + H1d ⊗ ... ⊗ Uin ⊗ ... ⊗ Hpd n→∞ Xi n→∞ X1 ×...×Xp i=1 i=1 + p Z p Z " # X X = lim sup fi dUin − lim inf h̃ d −C + H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd . n→∞ Xi n→∞ X1 ×...×Xp i=1 i=1 + We already have that, for each 1 ≤ i ≤ p, Z Z fi dUi = lim fi dUin . (4.2.41) Xi n→∞ Xi So, it remains to show that p Z " # X h̃ d −C + H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd X1 ×...×Xp i=1 + p Z " # X ≤ lim inf h̃ d −C + H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd . (4.2.42) n→∞ X1 ×...×Xp i=1 + Fix an i ∈ {1, . . . , p}. Let p ! X dµni := h̃ d −C + H1d ⊗ . . . ⊗ Uin ⊗ . . . ⊗ Hpd i=1 and p ! X dµi := h̃ d −C + H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd . i=1 Then, we have that dµni converges weakly-* to dµi . i.e. given a function ϕ ∈ C(X1 × . . . × Xp ), we have Z Z ϕ dµni =− ch̃ϕ dx1 . . . dxp X1 ×...×Xp X1 ×...×Xp p Z Z ! X + h̃ϕdx1 . . . dxi−1 dxi+1 . . . dxp dUin i=1 Xi X1 ×...×Xi−1 ×Xi+1 ×...×Xp Z n→∞ −−−→ − ch̃ϕ dx1 . . . dxp X1 ×...×Xp p Z Z ! X + h̃ϕdx1 . . . dxi−1 dxi+1 . . . dxp dUi i=1 Xi X1 ×...×Xi−1 ×Xi+1 ×...×Xp 46 Z = ϕ dµi . (4.2.43) X1 ×...×Xp Now, we will decompose the measures µni and µi into their positive and negative parts as µni = µni + − µni − and µi = µi+ − µi+ . Since µni converges to µi weak-*, by plugging ϕ ≡ 1 in (4.2.43), we get µni + (X1 × . . . × Xp ) − µni − (X1 × . . . × Xp ) = µni (X1 × . . . × Xp ) n→∞ −−−→ µi (X1 × . . . × Xp ) = µi+ (X1 × . . . × Xp ) − µi− (X1 × . . . × Xp ). (4.2.44) Let X̃ = X1 × . . . × Xp . So, we have µi+ (X̃) − µi+ (X̃) ≤ lim inf {µni + (X̃) − µni − (X̃)}. (4.2.45) n→∞ On the other hand, since the total variation norm ||µi ||TV given by |µi | = µi+ + µi− is lower semi-continuous w.r.t. weak-* convergence, we have µi+ (X̃) + µi− (X̃) ≤ lim inf ||µni ||TV n→∞ (4.2.46) = lim inf {µni + (X̃) + µni − (X̃)}. n→∞ i.e. µi+ (X̃) + µi− (X̃) ≤ lim inf {µni + (X̃) + µni − (X̃)}. (4.2.47) n→∞ Now, by combining (4.2.45) and (4.2.47), we get µi+ (X̃) ≤ lim inf µni + (X̃). n→∞ Note that, since we only have the positive part of the measures in (4.2.42), it is enough to get lower semi-continuity of µni + (X̃) only. This completes the proof of Lemma 4.2.21. Finally, we will present the main result for existence of dual maximizers. Theorem 4.2.22. Let fi and h̃ be continuous and strictly positive on their compact supports Xi ⊆ Rd and X1 × . . . × Xp respectively. Let c ∈ L1 (X1 × . . . × Xp ). Fix an η > 1 and 47 assume that Πh̃/η (f1 , . . . , fp ) is non-empty. Then, there exist functions (u1 , . . . , up ) where ui ∈ L1 (Xi ) ∀i, such that ′ OT∗CCMM = J ′ (u1 , . . . , up ). Proof. Let (un1 , . . . , unp )n∈N be a maximizing sequence for OT∗CCMM , i.e. for each i ∈ {1, . . . , p}, uni ∈ L1 (Xi ) and ′ OT∗CCMM = lim J ′ (un1 , . . . , unp ). n→∞ Fix ε ≤ minx1 ,...,xp ∈X̃ {f1 (x1 ), . . . , fp (xp )}. Since h̃ is bounded away from zero, we may pick ε sufficiently small so that we have εf1 (x1 ) . . . fp (xp ) ≤ h̃ for all (x1 , . . . , xp ) ∈ X1 × . . . × Xp . We can find a p-tuple of constants (k1n , . . . , kpn ) with pi=1 kin = 0 such that adding kin to P each uni ensures that uni fi = unj fj for each i ̸= j. Now, by Proposition 4.2.19, for each i, we get a bound for ||uni ||L1 (Xi ) independent from n. For each i, let Uin ∈ M(Xi ) be such that dUin (xi ) = duni (xi ) dxi . Then, we have that ||Uin ||TV = ||uni ||L1 (Xi ) is bounded. Thus, by Alaoglu’s theorem, we can get a subsequence (without relabelling) (U1n , . . . , Upn ) that converges weakly-* to some (U1 , . . . , Up ) ∈ M (X1 ) × . . . × M (Xp ). n→∞ So, we have that J˜′ (U1n , . . . , Upn ) = J ′ (un1 , . . . , unp ) −−−→ OT∗CCMM . By Lemma 4.2.21, we have that OT∗CCMM ≤ J˜′ (U1 , . . . , Up ). Next, we will prove that OT∗CCMM ≥ J˜′ (U1 , . . . , Up ). Let C = cH1d ⊗ . . . ⊗ Hpd . By Lebesgue’s decomposition theorem, for each i, the measures Ui can be written as Ui = Uiac +Uis , where Uiac ≪ H d and Uis ⊥ H d . On the other hand, by the Hahn-Jordan decomposition theorem, we can write Ui = Ui+ − Ui− . Since the Hahn-Jordan decomposition and the Lebesgue decomposition commute, we have Ui+ = [Uiac ]+ + [Uis ]+ , and Ui− = [Uiac ]− + [Uis ]− . 48 Now, observe that p " # X −C + H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd i=1 + p p " # " # X X = −C + H1d ⊗ . . . ⊗ Uiac ⊗ . . . ⊗ Hpd + H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd . i=1 + i=1 + Thus, p Z p Z " # X X J˜′ (U1 , . . . , Up ) = fi dUi − h̃ d −C + H1d ⊗ . . . ⊗ Ui ⊗ . . . ⊗ Hpd i=1 Xi X1 ×...×Xp i=1 + p p XZ XZ = fi dUiac + fi dUis i=1 Xi i=1 Xi p Z " # X − h̃ d −C + H1d ⊗ ... ⊗ Uiac ⊗ ... ⊗ Hpd X1 ×...×Xp i=1 + p Z " # X − h̃ d H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd X1 ×...×Xp i=1 + p Z X = J˜′ (U1ac , . . . , Upac ) + fi dUis Xi Z "i=1p # X − h̃ d H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd X1 ×...×Xp i=1 + p Z X ≤ J ′ (u1 , . . . , up ) + fi dUis i=1 Xi p Z " # X − hd H1d ⊗ . . . ⊗ Uis ⊗ . . . ⊗ Hpd X1 ×...×Xp i=1 = J ′ (u1 , . . . , up ) ′ ≤ OT∗CCMM . Here, for each i, ui represents the Radon-Nikodym derivative of Uiac such that ui dxi = dUiac and h ∈ Πh̃/η (f1 , . . . , fp ). So, we have proven that ′ ′ OT∗CCMM ≤ J˜′ (U1 , . . . , Up ) ≤ J ′ (u1 , . . . , up ) ≤ OT∗CCMM . Thus, (u1 , . . . , up ) is the desired maximizer. 49 CHAPTER 5 BARYCENTERS 5.1 Barycenters in the Wasserstein Space 5.1.1 Introduction The problem of finding a barycenter in the Wasserstine space is a nonlinear interpolation between several probability measures. As an example in the discrete case, consider several coal mines, and the coal extracted has to be sent to a factory that is located centrally. The problem is to find the location to construct such a factory so that total transportation cost is minimized. This notion of barycenters in Wasserstein space was first introduced by Agueh and Carlier in [1] and they have provided existence, uniqueness, characterizations of the minimizer and regularity of the barycenter, and relate it to the multi-marginal OT problem considered by Gangbo and Święch in [24]. This problem has a wide range of applications including economics [13] and data science [4, 8]. To elaborate a few applications of Wasserstein Barycenters; in image processing, Wasser- stein barycenters have been used to generate “averaged” images from a set of input images. This technique is particularly useful for denoising and image reconstruction. “Image morph- ing” or “image interpolation” is such an application ([52, 56]). In Machine Learning, it can be used to generate representative data points from a set of input data, which can then be used to train machine learning models. This can be particularly useful in cases where the input data is noisy or incomplete. Overall, Wasserstein barycenters are a powerful mathematical tool that have found many applications in different fields. Their properties and computational efficiency make them a popular choice for solving optimization problems involving probability measures. To provide some background on the Wasserstein Barycenters, we will present some ex- isting results in [1] in the next few sections. 50 5.1.2 The Primal Problem Recall that we define the squared 2-Wasserstein distance between two probability mea- sures µ, ν ∈ P2 (Rd ) by Z 1 W22 (µ, ν) := inf |x − y|2 dγ(x, y). γ∈Π(µ,ν) Rd ×Rd 2 Let p ≥ 2 be an integer. Given a p-tuple of probability measures (ν1 , . . . νp ) with each νi ∈ P2 (Rd ) and a p-tuple of positive real numbers (λ1 , . . . λp ) with pi=1 λi = 1, we define P the following minimization problem: p ( ) X OTBC = inf λi W22 (νi , ν) . (5.1.1) ν∈P2 (Rd ) i=1 A solution of (5.1.1) is called the barycenter of the probabilities νi with weights λi . Remark 5.1.1. For p = 2 with λ1 = λ2 = 21 , this problem means finding the midpoint be- tween the two measures ν1 and ν2 and such an interpolation is already known as the McCann’s interpolation [40]. Theorem 5.1.2. ([1], Proposition 2.3) Given an integer p ≥ 2, a p-tuple of probability measures (ν1 , . . . νp ) with each νi ∈ P2 (Rd ) and a p-tuple of positive real numbers (λ1 , . . . λp ) with pi=1 λi = 1, the Barycenter Problem given by (5.1.1) has a solution. P The proof is given in [1] and it uses the direct method in Calculus of Variations. To study the uniqueness and other properties of the barycenters, we require a dual formulation. 5.1.3 Duality Define the space of continuous functions with at most quadratic growth, ( ) f Y := (1 + | . |2 )Cb (Rd ) = f ∈ C(Rd ) : is bounded 1 + | .|2 that is equipped with the norm |f (x)| ||f ||Y := sup . (5.1.2) x∈Rd 1 + |x|2 51 We define the dual of (5.1.1) as p Z ( ) X λi OT∗BC := sup inf 2 |x − y| − fi (y) dνi (x). (5.1.3) fi ∈Cb (Y ), P p i=1 fi =0 i=1 Rd y∈R d 2 In [1], the authors have proven that the strong duality holds for the barycenter problem and dual maximizers for (5.1.3) exist. Theorem 5.1.3. ([1], Proposition 2.2 & Proposition 2.3) OTBC = OT∗BC , and OT∗BC given by (5.1.3) has a solution. With the duality results, we can characterize the barycenters in several ways. Given below are few results from [1]. Proposition 5.1.4. ([1], Proposition 3.5) Assume that there is an index i ∈ {1, . . . , p} such that νi vanishes on small sets. Then OTBC admits a unique solution ν which is given by ν = ∇ϕi# νi , where given a solution (f1 , . . . , fp ) of OT∗BC , ϕi is the convex potential defined by ( ) λi λi λi ϕi (x) := |x|2 − inf |x − y|2 − fi (y) . (5.1.4) 2 y∈Rd 2 Proposition 5.1.5. ([1], Proposition 3.8) Assume that νi vanishes on small sets for every i ∈ {1, . . . , p}, and let ν ∈ P2 (Rd ). Then the following conditions are equivalent: 1. ν solves OTBC . 2. ν = ∇ϕi# νi for every i, where ϕi is defined by (5.1.4). 3. There exist convex potentials ψi such that ∇ψi is the Brenier’s map transporting νi to ν, and a constant C such that p X |y|2 λi ψi∗ (y) ≤C+ , ∀y ∈ Rd , with equality ν-a.e. (5.1.5) i=1 2 52 Remark 5.1.6. If ν and the potentials ψi satisfy the third statement of Proposition 5.1.5, then the support of ν is included in the contact set where the convex function ϕ := pi=1 λi ψi∗ P |.|2 agrees with its quadratic majorant C + 2 . Note that, at such a point, we have p X λi ∂ψi∗ (x) ⊂ ∂ϕ(x) ⊂ {x}, i=1 so that each potential ψi∗ is differentiable at x. The potentials ψi∗ are therefore differentiable on the support of ν and satisfy the relation p X λi ∇ψi∗ = id. (5.1.6) i=1 Remark 5.1.7. If (5.1.6) holds everywhere for the Brenier’s maps ∇ψi transporting νi to ν, then ν is optimal for OTBC . In [1], the authors also have shown that OTBC is equivalent to a linear programming problem of multi-marginal optimal transport type similar to the problem solved by Gangbo and Święch in [24]. Now we will list two main Theorems which describe this relation. Theorem 5.1.8. ([1], Theorem 4.1) Assume that νi vanishes on small sets for i = 1, . . . , p. Then the multi-marginal problem given by (Z )  X  sup λi λj xi · xj dγ(x1 , . . . , xp ), γ ∈ Π(ν1 , . . . , νp ) . (5.1.7) (Rd )p 1≤i≤j≤p admits a unique solution γ ∈ Π(ν1 , . . . , νp ). Moreover, γ is of the form γ = (T11 , . . . , Tp1 )# ν1 with Ti1 = ∇u∗i ◦ ∇u1 for i = 1, . . . , p where ui are the strictly convex potentials defined by λi 2 gi (x) ui (x) := |x| + , ∀x ∈ Rd , (5.1.8) 2 λi and (g1 , . . . , gp ) are the convex potentials that solve the dual of (5.1.7) given by ( p Z p ) X X X inf gi dνi , gi (xi ) ≥ λi λj xi · xj , ∀x ∈ (Rd )p . (5.1.9) i=1 Rd i=1 1≤i≤j≤p 53 Proposition 5.1.9. ([1], Proposition 4.2) Assume that νi vanishes on small set for i = 1, . . . , p. Then the solution of OTBC given by (5.1.1) is given by ν = T# γ, where γ is the solution of (5.1.7) and T is defined by p X T (x) := λi xi , ∀x := (x1 , . . . , xp ) ∈ (Rd )p . (5.1.10) i=1 Remark 5.1.10. Note that the Euclidean barycenter T (x) defined in (5.1.10) is characterized by the property p ( ) X λi |xi − T (x)|2 = inf λi |xi − y|2 . (5.1.11) y∈Rd i=1 5.2 Capacity Constrained Barycenter Problem In this section, we introduce the notion of capacity constrained barycenters in Wasserstein space which is a generalization of the barycenter problem (5.1.1). As the name suggests, the capacity constrained barycenter problem introduces capacities to each of the two marginal problems associated. Under certain assumptions on the capacities, we have proven that the problem attains a minimizer and obtained duality results. 5.2.1 The Primal Problem Given an integer p ≥ 2, a p-tuple of probability measures (ν1 ,...,νp ) each in P2 (Rd ) and a p-tuple of positive real numbers (λ1 ,...,λp ) with i=1 λi = 1, we define the following Pp minimizing problem: p ( ) X OTCCBC := inf′ J(ν) := λi Wf22 (νi , ν) (5.2.1) ν∈P i=1 where P′ = P2 (Rd ) ∩ ν ′ : Πγ˜i (νi , ν ′ ) ̸= ∅, ∀i ∈ {1, 2, . . . p} , (5.2.2)  and (Z ) 1 f 2 (νi , ν) = W 2 inf |x − y|2 dγi (x, y) . (5.2.3) γi ∈Πγ˜i (νi ,ν) Rd ×Rd 2 Here, {γ̃i }pi=1 ⊆ M+ (Rd × Rd ) is the set of capacities of the two marginal problems and Πγ̃i (νi , ν) is the set of transport plans from νi to ν bounded by γ̃i (see Definition 3.4.2). 54 We call the problem (5.2.1), the Capacity Constrained Barycenter (CCBC) problem between the measures ν1 , . . . , νp . We will consider this minimization problem under two assumptions. • Assumption 1: We assume that {γ̃i }pi=1 are compactly supported finite measures that are absolutely continuous w.r.t. Lebesgue measure with bounded densities for all i ∈ {1, 2, . . . p}. • Assumption 2: We assume that the set P′ is non-empty. Recall that the non-empty condition of the set Πγ˜i (νi , ν) is given by the following theorem. Theorem 5.2.1. ([46], Corollary 4.6.15) Let X, Y be compact sets and for given Borel probability measures µ ∈ P(X), ν ∈ P(Y ) and a finite Borel measure γ̃ on X × Y , we have that Πγ̃ (µ, ν) ̸= ∅ if and only if µ(A) + ν(B) ≤ γ̃(A × B) + 1, ∀A ∈ B(X) and ∀B ∈ B(Y ). Remark 5.2.2. Note that the assumption 2 is not a very restrictive assumption. For in- stance, we could pick the capacities γ̃i such that γ̃i = νi ⊗ ξ for some probability measure ξ ∈ P2 (Rd ) so that the set P′ becomes non-empty. Lemma 5.2.3. W f22 is weakly lower semi-continuous. Proof. Let {µn }n∈N and {ν̃n }n∈N be two sequences of probability measures in P′ such that µn ⇀ µ∗ and ν̃n ⇀ ν ∗ . Since {µn } and {ν̃n } are tight Π(µn , ν̃n ) is also tight (see [53], Lemma 4.4). S Now, let γn∗ ∈ Π(µn , ν̃n ) be optimal such that γn∗ ≤ γ̃. Since {γn∗ } is also tight, there exists a γ ∗ ∈ P(Rd × Rd ) such that γn∗ ⇀ γ ∗ and γ ∗ ∈ Π(µ∗ , ν ∗ ). We also get that γ ∗ ≤ γ̃ (see the proof of (5.2.8) in Theorem 5.2.4 below). Now, (Z ) f 2 (µ∗ , ν ∗ ) = 1 W 2 inf |x − y|2 dγ(x, y) γ∈Πγ̃ (µ∗ ,ν ∗ ) Rd ×Rd 2 55 Z 1 ≤ |x − y|2 dγ ∗ (x, y) Rd ×Rd 2 Z 1 ≤ lim inf |x − y|2 dγn∗ (x, y) (see [53], Lemma 4.3) n Rd ×Rd 2 (Z ) 1 = lim inf inf |x − y|2 dγn (x, y) n γn ∈Πγ̃ (µn ,νn ) d R ×R d 2 = lim inf W f22 (µn , ν̃n ). n This proves that W f22 is weakly lower semi-continuous. Now, we will show that the CCBC problem has a solution. Theorem 5.2.4. Under the assumptions 1 and 2, the CCBC problem given by (5.2.1) has a solution. Proof. Let {ν̃n }n∈N ⊆ P′ be a minimizing sequence of OTCCBC . i.e. limn→∞ J(ν̃n ) = inf ν J(ν). Then, we can find an M > 0 such that J(ν̃n ) ≤ M for all n. Thus, for each 1 ≤ i ≤ p and for each n ∈ N, λi W f2 2 (νi , ν̃n ) ≤ M . Using the duality and the assumption that νi ’s have finite second moments, we can show that (see Appendix 3) Z sup |x|2 dν̃n ≤ C, for some constant C. n Hence, {ν̃n } is tight (see Appendix 4). Then, by Prokhorov’s theorem, there exists a subsquence {ν̃n } (not relabeled), that converges weakly to some ν ∗ ∈ P(Y ). Since, |x|2 dν ∗ ≤ lim inf n |x|2 dν̃n ≤ C (see [53], Lemma 4.3), we have that ν ∗ ∈ R R P2 (Y ). Now, we will prove that ∀1 ≤ i ≤ p, there exists a γi ∈ Πγ˜i (ν̃n , ν ∗ ), so that ν ∗ ∈ P′ . Fix an i ∈ {1, . . . , p} and n ∈ N. By assumption 2, there exists some γi,n ∈ Π(νi , ν̃n ) such that γi,n ≤ γ̃i . Since {γi,n } is tight, there exists a subsequence {γi,n } (not relabeled), that weakly converges to some γi∗ ∈ P(X × Y ). 56 Let E ⊂ Rd be an open set and {ϕl }∞ l=1 ⊂ Cb (R ) be a sequence of functions such that d 0 ≤ ϕl ≤ 1E and ϕl ↗ 1E pointwise. Then, Z Z Z ϕl dν̃n = ϕl d(Projy (x, y)# γi,n ) = ϕl ◦ Projy (x, y) dγi,n . Rd Rd Rd Letting n → ∞, we get Z Z ∗ ϕl dν = ϕl ◦ Projy (x, y) dγi∗ . (5.2.4) Rd Rd In (5.2.4), on the left hand side, we used the fact that ν̃n ⇀ ν ∗ and on the right hand side, we used the fact that γi,n ⇀ γi∗ . Now, letting l → ∞ in 5.2.4, by monotone convergence theorem, we get Z Z 1E dν = ∗ 1E ◦ Projy (x, y) dγi∗ . Rd Rd i.e. ν ∗ (E) = Projy (x, y)# γi∗ (E). (5.2.5) Now, let B ⊂ Rd be any Borel set. Since Borel measures are outer regular, we have that ν ∗ (B) = Projy (x, y)# γi∗ (B). (5.2.6) Similarly, we can prove that, for any Borel set A ∈ Rd , νi (A) = Projx (x, y)# γi∗ (A). (5.2.7) By (5.2.6) and (5.2.7), we can conclude that γi∗ ∈ Π(νi , ν ∗ ). Now, let E, F ⊂ Rd be two open sets and {ϕl }∞ l=1 ⊂ Cb (R × R ) be a sequence of d d functions such that 0 ≤ ϕl ≤ 1E×F and ϕl ↗ 1E×F pointwise. Since γi,n ≤ γ̃i , we have Z Z ϕl dγi,n ≤ ϕl dγ̃i . Rd ×Rd Rd ×Rd Now, letting n → ∞, gives us Z Z ϕl dγi∗ ≤ ϕl dγ̃i . Rd ×Rd Rd ×Rd 57 Letting l → ∞, the monotone convergence theorem gives us Z Z 1E×F dγi ≤ ∗ 1E×F dγ̃i , Rd ×Rd Rd ×Rd i.e. γi∗ (E × F ) ≤ γ̃i (E × F ). Hence, for given two Borel sets A, B ⊂ Rd , we have that γi∗ (A × B) ≤ γ̃i (A × B). (5.2.8) Thus, γi∗ ∈ Πγ˜i (νi , ν ∗ ), i.e. ν ∗ ∈ P′ . Now, we will claim that ν ∗ is the desired minimizer. Observe that, p X J(ν ∗ ) = f 2 (νi , ν ∗ ) λi W 2 i=1 p X ≤ λi lim inf W f22 (νi , ν̃n ) (By Lemma 5.2.3) n i=1 p X ≤ lim inf λi Wf22 (νi , ν̃n ) n i=1 = lim inf J(ν̃n ) n = OTCCBC . This concludes that ν ∗ is a minimizer for CCBC Problem. 5.2.2 Duality For the dual formulation, we will be adopting the duality results for two-marginal capacity constrained OT problem by Rachev and Rüschendorf in [46]. Let K × K be a compact subset of Rd × Rd containing supports of the capacities γ̃i for all i. For an integer p ≥ 2 consider the dual problem: ( p Z X nλ o i OT∗CCBC = sup inf |x − z|2 − fi (z) − wi (x, z) dνi (x) A i=1 K z∈K 2 p Z ) (5.2.9) X + wi (x, y) dγ̃i (x, y) . i=1 K×K 58 Here, the supremum is taken over the set A of functions fi and wi satisfying fi ∈ p X Cb (K), wi ∈ Cb (K × K) , wi ≤ 0, ∀i ∈ {1, 2, . . . p} and fi = 0. i=1 In the Theorem below, we will show that the strong duality result holds for the CCBC Problem. We will be using some similar arguments as in [1], Proposition 2.2. Theorem 5.2.5. Under the assumptions 1 and 2, the strong duality holds, i.e. OTCCBC = OT∗CCBC . Proof. Let ν ∈ P′ ∩ P(K), γi ∈ Π(νi , ν) such that γi ≤ γ̃i and (f1 , . . . , fp , w1 , . . . , wp ) such that pi=1 fi = 0 and wi ≤ 0. P Then, for each (x, y) ∈ K × K, we have nλ o λ i i inf |x − z| − fi (z) − wi (x, z) ≤ |x − y|2 − fi (y) − wi (x, y). 2 z∈K 2 2 Hence, nλ o λi i inf |x − z| − fi (z) − wi (x, z) + fi (y) + wi (x, y) ≤ |x − y|2 . 2 z∈K 2 2 Now, by integrating w.r.t. γi over K × K, we get Z nλ o Z Z i 2 inf |x − z| − fi (z) − wi (x, z) dνi (x) + fi (y) dν(y) + wi (x, y) dγi (x, y) K z∈K 2 K K×K Z λi ≤ |x − y|2 dγi (x, y). (5.2.10) K×K 2 Since γi ≤ γ̃i and wi ≤ 0, we have Z Z wi (x, y) dγ̃i (x, y) ≤ wi (x, y) dγi (x, y). (5.2.11) K×K K×K By combining (5.2.10) and (5.2.11), we get Z nλ o Z Z i 2 inf |x − z| − fi (z) − wi (x, z) dνi (x) + fi (y) dν(y) + wi (x, y) dγ̃i (x, y) K z∈K 2 K K×K Z λi ≤ |x − y|2 dγi (x, y). (5.2.12) K×K 2 59 Now, we take the summation over i to get p Z nλ o p Z X i X 2 inf |x − z| − fi (z) − wi (x, z) dνi (x) + wi (x, y) dγ̃i (x, y) i=1 K z∈K 2 i=1 K×K p Z X λi ≤ |x − y|2 dγi (x, y). (5.2.13) i=1 K×K 2 Note that, in (5.2.13), we used the fact that Pp i=1 fi = 0. Now, by taking the infimum over γi ∈ Πγ˜i (νi , ν), infimum over ν ∈ P′ and supremum over (fi , wi ) ∈ A from both sides, we get OT∗CCBC ≤ OTCCBC . (5.2.14) Next, we will prove that OT∗CCBC ≥ OTCCBC . For i ∈ {1, . . . , p}, define Hi : Cb (K) × Cb (K × K) 7→ R ∪ {∞} by  Z nλ o i 2 − |x − z| − fi (z) − wi (x, z) dνi (x)   inf     K z∈K 2 Hi (fi , wi ) = R − K×K wi (x, y) dγ̃i (x, y) if wi ≤ 0 . (5.2.15)     otherwise  ∞  Now, we take the Legendre transform of Hi (fi , wi ) in both variables to get (Z Z ) Hi∗ (ν, ξ) = sup fi dν + wi dξ − Hi (fi , wi ) . (5.2.16) fi ∈Cb (K),wi ≤0 K K×K Here, ν ∈ M(K) and ξ ∈ M(K × K). Letting ξ = 0 in (5.2.16), we get (Z Z nλ o i Hi∗ (ν, 0) = sup inf 2 |x − z| − fi (z) − wi (x, z) dνi (x) + fi (y) dν(y) fi ∈Cb (K),wi ≤0 K z∈K 2 K Z ) + wi (x, y) dγ̃i (x, y) . (5.2.17) K×K First, we will show that if ν ∈ M(K) \ P(K), then Hi∗ (ν, 0) = ∞. 60 Case 1: Suppose ∃A ∈ B(K) such that ν(A) < 0. Then, there exists a function f ∈ Cb (K) such that f ≤ 0 and f dν > 0. Let t ≥ 0 be R K arbitrary and choose fi = tf and wi = 0. Then, Z nλ o Z i Hi∗ (ν, 0) ≥ inf 2 |x − y| − tf (y) dνi + tf dν K y∈K 2 K Z ≥t f dν, ∀t ≥ 0. K Hence, Hi∗ (ν, 0) = ∞. Case 2: Suppose ν(K) < 1. Let t > 0 be arbitrary and choose fi = −t and wi = 0. Then, Z nλ o Z i Hi∗ (ν, 0) ≥ inf 2 |x − y| + t dνi − t dν K y∈K 2 K Z Z  ≥t dνi − dν , ∀t ≥ 0 K K = (1 − ν(K))t, ∀t ≥ 0. Hence, Hi∗ (ν, 0) = ∞. Case 3: Suppose ν(K) > 1. Let t > 0 be arbitrary and choose fi = t and wi = 0. Then, Z nλ o Z i Hi∗ (ν, 0) ≥ inf 2 |x − y| − t dνi + t dν K y∈K 2 K  Z Z  ≥t − dνi + dν , ∀t ≥ 0 K K = (−1 + ν(K))t, ∀t ≥ 0. Hence, Hi∗ (ν, 0) = ∞. Therefore, whenever ν ∈ M(K) \ P(K), we have Hi∗ (ν, 0) = ∞. Now, let ν ∈ P(K). By the duality of the two-marginal CCOT Problem (Theorem 3.4.1), we get Hi∗ (ν, 0) = λi W f 2 (νi , ν). 2 (5.2.18) 61 Hence, we have p p X X OTCCBC = inf f 2 (νi , ν) = inf λi W 2 Hi∗ (ν, 0). (5.2.19) ν ν i=1 i=1 Now, consider the Legendre transform of restricted to Cb (K) viewed as a Pp ∗ i=1 Hi (·, 0), subspace of Cb (K)∗∗ . p !∗ (Z p ) X X Hi∗ (·, 0) (f ) = sup f dν − Hi∗ (ν, 0) . ν K i=1 i=1 where f ∈ Cb (K) and ν ∈ M(K). Letting f = 0, we get p !∗ ( p ) X X Hi∗ (·, 0) (0) = sup − Hi∗ (ν, 0) . ν i=1 i=1 Hence, !∗ p p ( ) X X − Hi∗ (·, 0) (0) = inf Hi∗ (ν, 0) . (5.2.20) ν i=1 i=1 By combining (5.2.19) and (5.2.20), we get p !∗ X OTCCBC = − Hi∗ (·, 0) (0). (5.2.21) i=1 Now, we define the infimal convolution of Hi′ s as ( p p ) X X H(f ) = inf Hi (fi , wi ) : fi = f, wi ≤ 0 . (5.2.22) i=1 i=1 Then, p p ( ) X X H(0) = inf Hi (fi , wi ) : fi = 0, wi ≤ 0 . i=1 i=1 So, p p ( ) X X −H(0) = sup − Hi (fi , wi ) : fi = 0, wi ≤ 0 . i=1 i=1 p Z ( X nλ o i = P sup inf |x − z|2 − fi (z) − wi (x, z) dνi (x) p i=1 fi =0, i=1 K z∈K 2 wi ≤0 p Z ) X + wi (x, y) dγ̃i (x, y) . i=1 K×K 62 Thus, we have −H(0) = OT∗CCBC . (5.2.23) Also note that, (Z ) H ∗ (ν) = sup f dν − H(f ) f K p p (Z ( )) X X = sup f dν − inf Hi (fi , wi ) : fi = f, wi ≤ 0 f K i=1 i=1 p p (Z ( )) X X = sup f dν + sup − Hi (fi , wi ) : fi = f, wi ≤ 0 f K i=1 i=1 p Z p ( ( )) X X = sup fi dν + sup − Hi (fi , wi ) f i=1 K fi ∈Cb (K),wi ≤0 i=1 p Z p ( ) X X = sup fi dν − Hi (fi , wi ) fi ∈Cb (K),wi ≤0 i=1 K i=1 p (Z ) X = sup fi dν − Hi (fi , wi ) i=1 fi ∈Cb (K),wi ≤0 K p X = Hi∗ (ν, 0). i=1 Hence, we have p X ∗ H (ν) = Hi∗ (ν, 0). (5.2.24) i=1 Now, by combining (5.2.21) and (5.2.24), we get OTCCBC = −H ∗∗ (0). (5.2.25) Since we have (5.2.23), it only remains to show that H ∗∗ (0) = H(0). (5.2.26) Since H is convex, it is sufficient (see [20]) to show that H is continuous at 0 for the norm topology given by (5.1.2). To prove that, we rewrite Hi defined in (5.2.15) as Z Z n λi 2 o Hi (fi , wi ) = sup fi (z) + wi (x, z) − |x − z| dνi (x) − wi (x, y) dγ̃i (x, y). K z∈K 2 K×K 63 Plugging in z = 0, we get Z Z Z λi Hi (fi , wi ) ≥ fi (0) + wi (x, 0) dνi (x) − 2 |x| dνi (x) − wi (x, y) dγ̃i (x, y). (5.2.27) K 2 K K×K Since ∀1 ≤ i ≤ p, wi ∈ Cb (K × K), there exist some negative real numbers mi such that mi ≤ wi (x, 0) ≤ 0, for all x ∈ K. Thus, (5.2.27) becomes Z λi Hi (fi , wi ) ≥ fi (0) + mi − |x|2 dνi (x). (5.2.28) 2 K Note that, in (5.2.28), the last integral is non-negative since wi ≤ 0. Now, taking the summation over i, we get p p p p Z X X X λi X Hi (fi , wi ) ≥ fi (0) + mi − |x|2 dνi (x). i=1 i=1 i=1 2 i=1 K Now, for all f ∈ Cb (K) such that fi = f , we have Pp i=1 p p Z X λi X H(f ) ≥ f (0) + mi − |x|2 dνi (x). (5.2.29) i=1 2 i=1 K By the finite second moment condition of each of the νi ’s, we get that, H(f ) > −∞. (5.2.30) Now, let f be such that ||f ||Y ≤ p 4 min{λ1 , . . . , λp }. Choosing fi = f p and wi = 0 in H(fi , wi ), we have p   X f H(f ) ≤ Hi ,0 i=1 p p Z n f (y) λ o X i ≤ sup − |x − y|2 dνi i=1 K y∈K p 2 p Z nλ X i 2 λi 2 o ≤ sup (1 + |y| ) − |x − y| dνi i=1 K y∈K 4 2 p Z   X λi λi 2 = + |x| dνi i=1 K 4 2 p Z 1 X λi = + |x|2 dνi . 4 i=1 2 K 64 Again, by the finite second moment condition of each of the νi ’s, we get that, H(f ) < ∞. (5.2.31) Thus, by (5.2.30) and (5.2.31), we have shown that the convex function H never takes the value −∞ and is bounded from above in a neighborhood of 0. Thus, by a standard convex analysis result (see [20], Proposition 2.5), H is continuous at 0. Hence, H ∗∗ (0) = H(0). Thus, by combining (5.2.23) and (5.2.25), we get OTCCBC = OT∗CCBC . (5.2.32) 65 CHAPTER 6 ENTROPY REGULARIZATION 6.1 Entropy Regularized Optimal Transport (EROT) Problem 6.1.1 Introduction Optimal transport offers an elegant solution to many theoretical and practical prob- lems at the interface between probability, partial differential equations, and optimization. This success however comes at a high computational price, and the computation becomes prohibitive whenever the dimension exceeds a few hundred, since it requires the solution of a linear program over distributions on a product space. Entropic regularization provides us with an approximation of optimal transport, with lower computational complexity and easy implementation. It operates by adding an entropic regularization penalty to the original problem making it a strictly convex problem, hence guaranteeing a unique minimizer. In this section, we explore the EROT problem and its duality. The idea of smoothing the classical OTP with an entropic regularization term was first introduced by Cuturi in [16] and it has been shown that the resulting optimum can be computed through Sinkhorn’s matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers. Duality results and a characterization of the dual maximizers have been presented by Marino and Gerolin in [39, 17]. Some of the main results, definitions and remarks mentioned in this section are taken from [39, 45] . 6.1.2 The Primal Problem Let (X, dX ) and (Y, dY ) be Polish Spaces, c : X × Y → R be a cost function, µ ∈ P(X) and ν ∈ P(Y ) be probability measures and ε > 0. We consider the following minimization problem: Z OTε = inf c(x, y)dγ(x, y) + ε KL(γ|µ ⊗ ν). (6.1.1) γ∈Π(µ,ν) X×Y 66 The function KL(γ|ξ) in (6.1.1) is the Kullback-Leibler divergence between the probability measures γ and ξ in P(X × Y ) which is defined as Z γ log(γ)d(ξ) if γ ≪ ξ    KL(γ|ξ) = X×Y . otherwise  +∞  Here, γ denotes the Radon-Nikodym derivative of γ with respect to the measure ξ. The problem (6.1.1) can also be represented as OTε = inf ε KL(γ|K) (6.1.2) γ∈Π(µ,ν) where K represents the Gibbs Kernel for the cost c, given by: c(x,y) K = k(x, y)µ ⊗ ν = e− ε µ ⊗ ν. (6.1.3) The existence of a minimizer in (6.1.1) and its characterizations have been discussed by several authors under different settings ([9], [15], [29] [49]). Under the definition (6.1.3) of K ([37], Proposition 2.3), a necessary and sufficient con- dition for a minimizer, γε , of (6.1.1) to be unique is given by:   R fε (x) gε (y)k(x, y)dν(y) = 1  Y γε = fε (x)gε (y)K, where fε , gε solve . (6.1.4)  R gε (y) fε (y)k(x, y)dµ(x) = 1  X Here, the functions fε (x) and gε (y) are known as the Entropic potentials and they are unique up to the trivial transformation f 7→ f /λ, g 7→ λg for some λ > 0. The system solved by the Entropic potentials is called the Schrödinger system. The following theorem states that if we assume that µ and ν are positive everywhere and their entropy w.r.t. K is finite, then the minimizer of (6.1.1) takes a special form. Theorem 6.1.1. ([9], Corollary 3,9) Let (X, dX ) and (Y, dY ) be two Polish spaces, and c : X × Y → [0, ∞) be a bounded cost function. Suppose that µ ∈ P(X), ν ∈ P(Y ) are 67 two probability measures that are absolutely continuous w.r.t. Lebesgue measure, such that c(x,y) µ(x), ν(y) > 0, ∀x ∈ X, y ∈ Y , and KL(µ ⊗ ν|K) < +∞ for K = e− ε µ ⊗ ν. Then, OTε has a unique minimizer of the form γε (x, y) = fε (x)gε (y)K(x, y). 6.1.3 Duality We define a dual functional for (6.2.1) as follows: Z Z Z u(x)+v(y)−c(x,y) Dε (u, v) = u(x)dµ(x) + v(y)dν(y) − ε e ε d(µ ⊗ ν). X Y X×Y Then, the dual formulation of (6.2.1) is the following maximization problem: OT∗ε = sup Dε (u, v). (6.1.5) u∈Cb (X),v∈Cb (Y ) Now, we will present the duality result for the EROT problem. This has been discussed in several articles, such as [14, 21, 28, 27, 39]. Theorem 6.1.2. ([39], Proposition 2.11) Let (X, dX ), (Y, dY ) be two Polish metric space, c : X × Y 7→ R be a bounded function and let ε > 0. Suppose that µ ∈ P(X) and ν ∈ P(Y ) are two probability measures that are absolutely continuous w.r.t. Lebesgue measure. Then, OTε = OT∗ε +ε. In [39], the authors also provide a direct proof of existence of maximizers in (6.1.5) following the direct method of Calculus of Variations. The main idea in the proof is to define a generalized version of c-transform (see Definition (2.2.1)), called the (c, ε)-transform. 6.1.3.1 Entropy-Transform and its properties In this section, we will consider functions c : X × Y → R which are bounded. First, we will define the space Lexp ε , in which the functions admissible for the dual problem will lie. 68 Definition 6.1.3. Given an ε > 0, a Polish space (X, dX ), and a probability measure µ ∈ P(X), we define the set Lexp ε (X, dµ) by n Lexp ε (X, dµ) := u : X → [−∞, ∞) : u is a measurable function in (X, dµ) Z o and 0 < eu/ε dµ < ∞ . X For u ∈ Lexp ε (X, dµ) we also define σu := ε log eu/ε dµ . R  X Observe that u ∈ Lexpε (X, dµ) may take the value −∞ on a set of positive measure, but not µ-a.e., since we have a condition X eu/ε dµ > 0. R Definition 6.1.4. Given an ε > 0, two Polish spaces (X, dX ), (Y, dY ), two probability mea- sures µ ∈ P(X), ν ∈ P(Y ), and a bounded function c : X × Y 7→ R, we define the (c, ε)- ε (X, dµ) → L (Y, dν) by transform F(c,ε) : Lexp 0 Z  u(x)−c(x,y) F (c,ε) (u)(y) := −ε log e ε dµ(x) . (6.1.6) X ∗ ,ε) Similarly, we define the (c∗ , ε)-transform F(c ε (Y, dν) → L (X, dµ) by 0 : Lexp Z  v(y)−c(x,y) (c∗ ,ε) F (v)(x) := −ε log e ε dν(y) . (6.1.7) Y For simplicity, we will use the notation u(c,ε) = F(c,ε) (u). Remark 6.1.5. Note that the definition of (c, ε)-transform is consistent with the definition of c-transform in the sense that as ε → 0, u(c,ε) → uc ([26], Lemma 4.2). Now we will list some results about the (c, ε)-transform mentioned in [39] without proof. Lemma 6.1.6. ( [39], Lemma 2.3) Let (X, dX ), (Y, dY ) be Polish spaces, u ∈ Lexp ε (X, dµ), ε (Y, dν) and ε > 0. Then, v ∈ Lexp (i) u(c,ε) ∈ L∞ (Y, dν) and v (c,ε) ∈ L∞ (X, dµ), satisfying the inequality R u(x)  R u(x)  −∥c∥∞ − ε log X e ε dµ ≤ u(c,ε) (y) ≤ ∥c∥∞ − ε log X e ε dµ . 69 (ii) u(c,ε) ∈ Lexp ε (Y, dν) and v (c,ε) ∈ Lexp ε (X, dµ). Furthermore, |σu(c,ε) + σu | ≤ ∥c∥∞ . Given below are some more properties of the (c, ε)-transform. Proposition 6.1.7. ([39], Prop 2.4) Let (X, dX ) and (Y, dY ) be Polish metric spaces, c : X × Y → [0, ∞] be a bounded function, µ ∈ P(X), ν ∈ P(Y ) be probability measures, ε (X, dµ) and ε > 0. Then, u ∈ Lexp (i) if c is L-Lipschitz, then u(c,ε) is L-Lipschitz; (ii) if c is ω-continuous, then u(c,ε) is ω-continuous; (iii) if |c| ≤ M , then |u(c,ε) + σu | ≤ M ; (iv) if |c| ≤ M , then F(c,ε) : L∞ (X, dµ) → Lp (Y, dν) is a 1-Lipschitz compact operator. (v) if c is K-concave with respect to y, then u(c,ε) is K-concave. 6.1.3.2 Dual Problem Recall that the dual functional of the EROT problem is given by Z Z Z u(x)+v(y)−c(x,y) Dε (u, v) = u(x)dµ(x) + v(y)dν(y) − ε e ε d(µ ⊗ ν). (6.1.8) X Y X×Y Now, we will state some results used to get the existence of dual maximizers. Lemma 6.1.8. ([39], Lemma 2.6) Consider the functional Dε : Lexp exp ε (X, dµ) × Lε (Y, dν) → R defined by (6.1.8). Then, Dε (u, u(c,ε) ) ≥ Dε (u, v) ∀v ∈ Lexp ε (Y, dν), (6.1.9) Dε (u, u(c,ε) ) = Dε (u, v) if and only if v = u(c,ε) . (6.1.10) In particular, u(c,ε) ∈ argmax{Dε (u, v) : v ∈ Lexp ε (Y, dν)}. 70 Lemma 6.1.9. ([39], Lemma 2.7) Given u ∈ Lexp ε (X, dµ) and v ∈ Lε (Y, dν), there exist exp ε (X, dµ) and v ∈ Lε (Y, dν) such that u∗ ∈ Lexp ∗ exp • Dε (u, v) ≤ Dε (u∗ , v ∗ ), • ∥v ∗ ∥∞ ≤ 3∥c∥∞ /2, • ∥u∗ ∥∞ ≤ 3∥c∥∞ /2. Moreover we can choose a ∈ R such that u∗ = (v + a)(c,ε) and v ∗ = (u∗ )(c,ε) . The theorem given below states that the dual functional (6.1.8) attains a maximizer. Theorem 6.1.10. ([39], Theorem 2.8) Let (X, dX ), (Y, dY ) be Polish spaces, c : X × Y → R be a bounded function, µ ∈ P(X), ν ∈ P(Y ) be probability measures and ε > 0. Then, the dual problem given by sup {Dε (u, v) : u ∈ Lexp exp ε (X, dµ), v ∈ Lε (Y, dν)} (6.1.11) attains the supremum for a unique couple (u0 , v0 ) (up to the trivial tranformation (u, v) 7→ (u + a, v − a)). In particular we have u0 ∈ L∞ (X, dµ) and v0 ∈ L∞ (Y, dν); moreover, we can choose the maximizers such that ∥u0 ∥∞ , ∥v0 ∥∞ ≤ 23 ∥c∥∞ . The proof mainly uses Lemma 6.1.8 and Lemma 6.1.9 along with Banach-Alaoglu theo- rem. 6.2 Entropy Regularized Barycenter Problem 6.2.1 Introduction Similar to finding optimizers of the classical OT problem, finding the Wasserstein barycenter is also not an easy task. By introducing the entropy regularization to the classical Wasserstein Barycenter problem, it becomes more tractable and flexible. 71 Two different notions of regularization exist in the literature. In [7, 12], the authors introduce a penalization term which is a function of the barycenter. On the other hand, in [38], the authors add a penalty term which is a function of the transport plans between the barycenter and the target measures. The main difference between these two approaches is the reference point that is used to regularize the resulting measure. The choice of regularization approach depends on the specific application and the desired properties of the resulting measure. However, the similarities or the differences of these approaches have not been discussed widely in the literature. In this section, as in [38] we will be considering the latter approach. In [38], the authors introduce the regularized Wasserstein barycenter problem and its dual formulation. They prove that the strong duality holds and the existence of the primal problem via duality result. However, they do not discuss the existence of the maximizers of the dual functional. In this section, we will provide a direct proof for the existence of a minimizer for the primal problem and the existence of dual maximizers. 6.2.2 The Primal Problem Throughout this section, we will be working with compact subsets of Rd and symmetric, bounded cost functions. Recall that the entropy regularized OT problem is defined as Z OTε (µ, ν) = inf c(x, y)dγ(x, y) + ε KL(γ|µ ⊗ ν). (6.2.1) γ∈Π(µ,ν) X×Y By the duality result, we have Z Z Z u(x)+v(y)−c(x,y) OTε (µ, ν) = sup u(x)dµ(x) + v(y)dν(y) − ε e ε d(µ ⊗ ν). u∈Cb (X),v∈Cb (Y ) X Y X×Y Also recall that given an integer p ≥ 2, X1 , . . . , Xp , Y compact subsets of Rd , a p-tuple of probability measures (ν1 ,...,νp ) each in P(Xi ), a p-tuple of positive real numbers (λ1 ,...,λp ) p X with λi =1, and cost functions ci : Xi × Y 7→ R, we define the Wasserstein Barycenter i=1 problem as p X OTBC (ν1 , . . . , νp ) = inf λi OTci (νi , ν) (6.2.2) ν∈P(Y ) i=1 72 where OTci (νi , ν) is the optimal transport cost between νi and ν with cost ci given in (2.1.6). In [1], Proposition 2.2, gives us the following duality result for the case ci (x, y) = 21 |x−y|2 . p Z n o X OTBC (ν1 , . . . , νp ) = sup inf λ i c i (x i , y) − f i (y) dνi (xi ). (6.2.3) p d Xi y∈R P fi ∈Cb (Y ), i=1 fi =0 i=1 Now we will introduce the Entropy Regularized Barycenter (ERBC) problem. Given an integer p ≥ 2, X1 , . . . , Xp , Y compact subsets of Rd , a p-tuple of probability p X measures (ν1 ,...,νp ) each in P(Xi ), a p-tuple of positive real numbers (λ1 ,...,λp ) with λi =1, i=1 and ε > 0, we define the ERBC problem as p X OTεBC (ν1 , . . . , νp ) = inf λi OTε (νi , ν). (6.2.4) ν∈P(Y ) i=1 Now we will prove that the minimization problem (6.2.4) has a minimizer. For simplicity we will assume that ci (x, y) = 21 |x − y|2 for each 1 ≤ i ≤ p in this proof, but we can prove that this result holds for more general costs. Also note that in [38], the authors have provided a proof for the existence of a minimizer via duality and here we provide a direct proof without assuming the duality result. Lemma 6.2.1. OTε is weakly lower semi-continuous. Proof. Let {µn }n∈N and {νn }n∈N be two sequences such that µn ⇀ µ∗ and νn ⇀ ν ∗ . We can pick subsequences (not relabeled) {µn }n∈N and {νn }n∈N such that lim OTε (µn , νn ) = lim inf OTε (µn , νn ). n→∞ n→∞ Since {µn }n∈N and {νn }n∈N are tight Π(µn , νn ) is also tight (see [53], Lemma 4.4). Now S n∈N let {γn }n∈N be a sequence with marginals µn and νn which is "close" to the optimal value of OTε (µn , νn ), i.e. given δ > 0, Z |x − y|2 dγn + ε KL(γn |µn ⊗ νn ) ≤ OTε (µn , νn ) + δ. Since {γn }n∈N is also tight, we can pick a subsequence (not relabeled) {γn }n∈N such that γn ⇀ γ ∗ and γ ∗ ∈ Π(µ∗ , ν ∗ ). 73 Now, observe that, Z ∗ ∗ OTε (µ , ν ) ≤ |x − y|2 dγ ∗ + ε KL(γ ∗ |µ∗ ⊗ ν ∗ ) Z ≤ lim inf |x − y|2 dγn + ε lim inf KL(γn |µn ⊗ νn ) n n (Z ) ≤ lim inf |x − y|2 dγn + ε KL(γn |µn ⊗ νn ) n ≤ lim inf OTε (µn , νn ) + δ n = lim OTε (µn , νn ) + δ. n In the second inequality, we get the first term due to weak lower semi-continuity of the total cost (see [53], Lemma 4.3), and the second term due to the lower semi-continuity of the relative entropy, which is a well-known result (see [19], Lemma 1.4.3). Finally, letting δ → 0, we get the lower semi-continuity result. Theorem 6.2.2. Let X1 , . . . , Xp , Y be compact subsets of Rd . Given an integer p > 0, a p-tuple of probability measures (ν1 ,...,νp ) each in P2 (Xi ), a p-tuple of positive real numbers p X (λ1 ,...,λp ) with λi =1, and ε > 0, there exists a minimizer for the ERBC problem given i=1 by (6.2.4). Proof. Let p p Z X X 1 I(ν) := λi OTε (νi , ν) = inf λi |x − y|2 dγi (x, y) + ε KL(γi |νi ⊗ ν). i=1 i=1 γi ∈Π(νi ,ν) Xi ×Y 2 Also, we will denote the squared 2 - Wasserstein distance by Z 1 W22 (νi , ν) := inf |x − y|2 dγi (x, y). γi ∈Π(νi ,ν) Xi ×Y 2 Then, (6.2.4) becomes OTεBC (ν1 , . . . , νp ) = inf I(ν). ν∈P(Y ) Now, let {ν n }n∈N be a minimizing sequence of OTεBC , i.e. limn→∞ I(νn ) = inf ν I(ν). Then, we can find an M > 0 such that I(νn ) ≤ M for all n. Since ε KL(γi |νi ⊗ ν) ≥ 0, we 74 have for each n, p X λi W22 (νi , νn ) ≤ I(νn ) ≤ M. i=1 Thus, for each 1 ≤ i ≤ p and for each n ∈ N, λi W22 (νi , νn ) ≤ M . Using the duality of the Kantorovich problem and the assumption that νi ’s have finite second moments, we can show that (see Appendix 3) Z sup |x|2 dνn ≤ C, for some constant C. n Hence, {νn } is tight (see Appendix 4). Then, by Prokhorov’s theorem, there exists a subsquence {νn } (not relabeled), that converges weakly to some ν ∗ ∈ P(Y ). Since, |x|2 dν ∗ ≤ lim inf n |x|2 dνn ≤ C, we R R have that ν ∗ ∈ P2 (Y ). Here, the first inequality is again due to the weak convergence of measures for lower semi-continuous bounded below costs ([19], Theorem A.3.12). Note that |x|2 dν = W22 (ν, δ0 ) for any ν ∈ P2 (Y ). R X Now, we will prove that ν ∗ is the desired minimizer. Observe that, inf I(ν) = lim inf I(νn ) ν n p X = lim inf λi OTε (νi , νn ) n i=1 p X ≥ lim inf λi OTε (νi , νn ) n i=1 p X ≥ λi OTε (νi , ν ∗ ) (By Lemma 6.2.1) i=1 = I(ν ∗ ). This proves that (6.2.4) has a minimizer. Remark 6.2.3. Note that we assumed that X1 , . . . , Xp , Y are compact only to be consistent with the assumptions on the ERBC Problem in the original paper [38]. However, we do 75 not require the compactness of the spaces in the proof, hence the Theorem 6.2.2 holds for X1 = . . . = Xp = Y = Rd . 6.2.3 Duality We define the dual functional of the ERBC problem as p Z Z  X (ϕi (xi )+ψi (y)−ci (xi ,y)) JεBC (ϕ1 , . . . , ϕp , ψ1 , . . . , ψp ) = λi ϕi dνi − ε e ε ∗ dνi ⊗ ν (xi , y) . i=1 Xi Xi ×Y (6.2.5) Since we already know the existence of a barycenter of the minimization problem (6.2.4), say ν ∗ , we will use ν ∗ for the following duality results. Now we will present the duality result for the ERBC problem discussed in [38]. Theorem 6.2.4. ([38], Theorem 3.1) The dual formulation of (6.2.4) is p Z Z  X (ϕi (xi )+ψi (y)−ci (xi ,y)) ∗ sup λi ϕi (xi )dνi (xi ) − ε e ε dνi ⊗ ν (xi , y) . {(ϕi ∈Cb (Xi ),ψi ∈Cb (Y ))}pi=1 i=1 Xi Xi ×Y Pp i=1 λi ψi =0 (6.2.6) Moreover, strong duality holds in the sense that the infimum of (6.2.4) equals the supremum of (6.2.6), and a solution to (6.2.4) exists. If {(ϕi , ψi )}pi=1 solves (6.2.6), then each (ϕi , ψi ) is a solution to the dual formulation (6.1.5) of OTε (νi , ν ∗ ). The proof relies on the convex duality theory of locally convex topological spaces. Now, we will prove that dual maximizers for (6.2.6) exist. Theorem 6.2.5. Given an integer p > 2, let X1 , . . . , Xp , Y be compact subsets of Rd , ci : Xi × Y 7→ R+ be symmetric, bounded cost functions such that for each 1 ≤ i ≤ p, ci is Li -Lipschitz, (ν1 ,...,νp ) be a p-tuple of probability measures each in P(Xi ), (λ1 ,...,λp ) be p X a p-tuple of positive real numbers with λi =1, and ε > 0. Then, there exist functions i=1 {(ϕi ∈ L1 (Xi , νi ), ψi ∈ L1 (Y, ν ∗ ))}pi=1 that solve (6.2.6). 76 Proof. We will redefine the dual formulation as p X J∗BC := sup λi D(ϕi , ψi ), (6.2.7) {(ϕi ∈Cb (Xi ),ψi ∈Cb (Y ))}pi=1 i=1 Pp i=1 λi ψi =0 where Z Z Z (ϕi +ψi −ci ) ∗ D(ϕi , ψi ) := ϕi dνi + ψi dν − ε e ε dνi ⊗ ν ∗ . Xi Y Xi ×Y Let (ϕ1,n , . . . , ϕp,n , ψ1,n , . . . , ψp,n )n∈N be a maximizing sequence. i.e. we have that for each p X i = 1, . . . , p, ϕi,n ∈ Cb (Xi ), ψi,n ∈ Cb (Y ) with i=1 λi ψi,n = 0 and lim Pp λi D(ϕi,n , ψi,n ) = n→∞ i=1 J∗BC . For each 1 ≤ i ≤ p define ϕei,n = ψi,n ci ,ε . By Lemma 6.1.8, we have that D(ϕi,n , ψi,n ) ≤ D(ϕei,n , ψi,n ). (6.2.8) Pp−1 λi ki,n Now for each 1 ≤ i ≤ p − 1, define ki,n = −σψi,n and kp,n = − i=1 λp (the definition of the softmax operator σψi,n is given in definition 6.1.3). Note that, we have pi=1 λi ki,n = 0. P By Proposition 6.1.7 part (iii), we have that |ψi,n ci ,ε + σψi,n | ≤ ||ci ||∞ . A simple calculation gives us that |ψi,n ci ,ε − ki,n | ≤ ||ci ||∞ for each 1 ≤ i ≤ p − 1. Now, for each 1 ≤ i ≤ p, we define ϕ∗i,n = ϕei,n − ki,n and ψi,n ∗ = ψi,n + ki,n . Then for each 1 ≤ i ≤ p − 1, we have that ||ϕ∗i,n ||∞ ≤ ||ci ||∞ . i.e. sup ||ϕ∗i,n ||∞ < ∞. (6.2.9) n Observe that, p p X X ∗ λi ψi,n = λi (ψi,n + ki,n ) = 0. i=1 i=1 Since for each 1 ≤ i ≤ p, ci is Li -Lipschitz, by Proposition 6.1.7 part (i), for each 1 ≤ i ≤ p and n ∈ N, ψi,n ci ,ε is Li -Lipschitz. Hence for each 1 ≤ i ≤ p − 1, ϕ∗i,n = ψi,n ci ,ε − ki,n is Lipschitz continuous with the same Lipschitz constant maxi Li . Now, we will prove that for each 1 ≤ i ≤ p, supn ||ψi,n ∗ ||L1 (Y,dν ∗ ) < ∞. 77 Let 1 ≤ i ≤ p − 1 be arbitrary. Then, Z ψ∗ Z ψ +k i,n i,n i,n ∗ e ε dν = e ε dν ∗ Y ZY ψi,n −σψ i,n = e ε dν ∗ Y Z ψ i,n ∗ Z ψi,n −ε log e ε dν Y = e ε dν ∗ Y Z −1 ψi,n ∗ Z ψi,n log e ε dν = e ε e Y dν ∗ Y Z Z −1 ψi,n ψi,n = e ε e ε dν ∗ dν ∗ Y Y = 1. Since for a given 1 ≤ q < ∞, there is a constant c such that ctq ≤ et , for each t > 0, we have that for each 1 ≤ q < ∞, ∗ q ∗ ψi,n Z  Z ∗ 1 ψi,n dν ≤ e ε dν ∗ . Y ε + c Y Thus,  ∗ q ∗ εq Z ψi,n + dν ≤ for some constant c. Y c So, supn || ψi,n ∗ < ∞ for each 1 ≤ q < ∞ and for each 1 ≤ i ≤ p − 1.  ∗  || q + L (Y,dν ) In particular, (6.2.10)  ∗  sup || ψi,n || 1 + L (Y,dν ) ∗ < ∞, ∀1 ≤ i ≤ p − 1. n Now, since (ϕ∗i,n , ψi,n ∗ )1≤i≤p,n∈N is a maximizing sequence, there is some constant c1 such that p X −c1 ≤ λi D(ϕ∗i,n , ψi,n ∗ ) i=1 p p ϕ∗ ∗ Z Z X X i,n +ψi,n −ci = λi ϕ∗i,n dνi −ε λi e ε dνi ⊗ ν ∗ Xi Xi ×Y i=1 i=1 (6.2.11) p−1 p−1 ϕ∗ ∗ Z Z Z X X i,n +ψi,n −ci = λi ϕ∗i,n dνi + λp ϕ∗p,n dνp − ε λi e ε dνi ⊗ ν ∗ i=1 Xi Xp i=1 Xi ×Y ϕ∗ ∗ Z p,n +ψp,n −cp − ελp e ε dνp ⊗ ν ∗ . Xp ×Y 78 Recall that, for each 1 ≤ i ≤ p, ϕ∗i,n = ψi,n ci ,ε − ki,n and ψi,n ∗ = ψi,n + ki,n . Hence, ∗ c ,ε ϕ∗ ψ i −ki,n +ψi,n +ki,n −ci Z Z i,n +ψi,n −ci i,n ∗ e ε dνi ⊗ ν = e ε dνi ⊗ ν ∗ Xi ×Y Xi ×Y c ,ε ψ i +ψi,n −ci Z i,n = e ε dνi ⊗ ν ∗ Xi ×Y c ,ε ψ i Z Z  ψi,n −ci = e i,n ε e ε ∗ dν dνi (6.2.12) Xi Y c ,ε c ,ε ψ i ψ i Z i,n i,n − = e ε ·e ε dνi Xi = 1. The penultimate equality holds since, c ,ε ψ i Z ψi,n −ci Z ψi,n −ci ci ,ε i,n ∗ − ψi,n = −ε log e ε dν =⇒ e ε = e ε dν ∗ . Y Y Also, since ||ϕ∗i,n ||∞ ≤ ||ci ||∞ for each 1 ≤ i ≤ p−1, Pp−1 ϕ∗i,n dνi ≤ maxi ||ci ||∞ (1−λp ). R i=1 λi Xi Now, the inequality (6.2.11) becomes ϕ∗ ∗ Z Z p,n +ψp,n −cp −c1 ≤ max ||ci ||∞ (1 − λp ) + λp ϕ∗p,n dνp − ε(1 − λp ) − ελp e ε dνp ⊗ ν ∗ . i Xp Xp ×Y Hence, there exists a constant c2 such that ϕ∗ ∗ Z Z c2 p,n +ψp,n −cp ≤ ϕ∗p,n dνp −ε e ε dνp ⊗ ν ∗ . (6.2.13) λp Xp Xp ×Y Now, consider ϕ∗ ∗ Z Z p,n +ψp,n −cp ϕ∗p,n dνp −ε e ε dνp ⊗ ν ∗ . Xp Xp ×Y t−a Since f (t) = t − εe ε is concave and it attains its maximum at t = a, ϕ∗ ∗ ϕ∗ ∗ Z Z Z   p,n +ψp,n −cp p,n −(cp −ψp,n ) ϕ∗p,n dνp −ε e ε dνp ⊗ ν = ∗ ∗ ϕp,n − εe ε dνp ⊗ ν ∗ Xp Xp ×Y Xp ×Y Z ∗ ≤ (cp − ψp,n − ε) dνp ⊗ ν ∗ Xp ×Y Z ∗ ≤− ψp,n dν ∗ + ||cp ||∞ − ε. Y 79 Thus, by (6.2.13), there exists a constant c3 such that Z c3 ≤ − ∗ ψp,n dν ∗ . (6.2.14) Y Since = 0, we get that Pp ∗ i=1 λi ψi,n p−1 Z 1 X ∗ c3 ≤ λi ψi,n dν ∗ . λp i=1 Y Hence, p−1 Z p−1 Z X X  ∗  ∗ ∗ ψi,n − dν ∗ .   λp c3 ≤ λi ψi,n + dν − λi i=1 Y i=1 Y Thus, by (6.2.10), there is some constant c4 such that p−1 Z X  ∗  λi ψi,n − dν ∗ ≤ c4 . i=1 Y Since λi ∗ we have that ψi,n − dν ∗ ≤ c4 for each 1 ≤ i ≤ p − 1. This R  ∗  R  ∗  Y ψ i,n − dν ≥ 0, λ i Y gives us that ∗ < ∞ for each 1 ≤ i ≤ p − 1.  ∗  supn || ψi,n || 1 − L (Y,dν ) Hence, for each 1 ≤ i ≤ p − 1, Z Z Z ∗ ∗  ∗  ψi,n + dν ∗ +  ∗  |ψi,n | dν = ψi,n − dν ∗ < ∞. Y Y Y Thus we have for each 1 ≤ i ≤ p − 1, supn ||ψi,n ∗ ||L1 < ∞. Now observe that, p−1 ∗ 1 X ∗ ||ψp,n ||L1 (Y,dν ∗ ) = || − λi ψi,n ||L1 (Y,dν ∗ ) λp i=1 p−1 1 X ∗ ≤ λi ||ψi,n ||L1 (Y,dν ∗ ) λp i=1 p−1 ∗ 1 X ≤ max ||ψi,n ||L1 (Y,dν ∗ ) λi i λp i=1 1 − λp ∗ = max ||ψi,n ||L1 (Y,dν ∗ ) λp i < ∞. 80 Hence, sup ||ψi,n∗ ||L1 (Y,dν ∗ ) < ∞, ∀1 ≤ i ≤ p. (6.2.15) n Now we will prove that supn ||ϕ∗p,n ||L1 (Xp ,dνp ) < ∞. Observe that, for a given 1 ≤ i ≤ p, ϕ∗i,n = (ψi,n∗ ci ,ε ) Z ϕ∗ −c i,n i = −ε log e ε dν ∗ Z Y∗ ϕi,n − ci ≤ −ε dν ∗ (by Jensen’s inequality) Y ε Z ∗ dν ∗  = ci − ψi,n Y Z ∗ ≤ max ||ci ||∞ − ψi,n dν ∗ 1≤i≤p Y ≤ c5 , for some constant c5 . The last inequality above holds by (6.2.15). Hence, ∀1 ≤ i ≤ p, ϕ∗i,n (x) ≤ c5 , ∀x ∈ Xi . (6.2.16) In particular, Z ϕ∗p,n dνp ≤ c5 , (6.2.17) Xp and Z (6.2.18)  ∗  ϕp,n + dνp ≤ max{0, c5 }. Xp ϕ∗ ∗ p,n +ψp,n −cp In (6.2.13), since −ε dνp ⊗ ν ∗ ≤ 0, we have that R Xp ×Y e ε Z c2 ≤ ϕ∗p,n dνp . (6.2.19) λp Xp Combining (6.2.17) and (6.2.19), we get Z c2 ≤ ϕ∗p,n dνp ≤ c5 . λp Xp 81 Z Z c2 i.e. ϕ∗p,n +    ∗  ≤ dνp − ϕp,n − dνp ≤ c5 . λp Xp Xp Thus, by (6.2.18), there is some constant c6 such that Z (6.2.20)  ∗  ϕp,n − dνp ≤ c6 . Xp Thus, we have that Z Z Z |ϕ∗p,n |  ∗   ∗  dνp = ϕp,n + dνp + ϕp,n − dνp ≤ max{0, c5 } + c6 . Xp Xp Xp Hence, supn ||ϕ∗p,n ||L1 (Xp ,dνp ) < ∞. For simplicity of the proof, we will just use the uniform L1 -boundedness of ϕ∗i,n for each 1 ≤ i ≤ p − 1. Later, we will improve this proof using the uniform boundedness of them. Now, since for each 1 ≤ i ≤ p, (ϕ∗i,n )n∈N is a sequence with supn ||ϕ∗i,n ||L1 (Xi ,dνi ) < ∞,  P  by Komlós theorem ([32]), there is a subsequence (ϕ∗i,nm )m∈N such that N1 N ϕ ∗ m=1 i,nm converges pointwise νi -a.e. to some ϕ∗i as N → ∞ and the same is true for any further subsequence of (ϕ∗i,nm ). Similarly, since for each 1 ≤ i ≤ p, supn ||ψi,n ∗ ||L1 (Y,dν ∗ ) < ∞, for each 1 ≤ i ≤ p, we  P  can find a subsequence (ψi,n ∗ ) m m∈N such that 1 N m=1 i,nm converges pointwise ν -a.e. to N ψ ∗ ∗ some ψi∗ as N → ∞ and the same is true for any further subsequence of (ψi,n ∗ m ). Recall that we have p p p X X X sup λi D(ϕi , ψi ) = lim λi D(ϕi,n , ψi,n ) = lim λi D(ϕi,nm , ψi,nm ). ϕi ,ψi i=1 n→∞ m→∞ i=1 i=1 Now, fix δ > 0. Then, there exists an integer N0 such that for each m > N0 , p p X X sup λi D(ϕi , ψi ) − δ ≤ λi D(ϕi,nm , ψi,nm ) ϕi ,ψi i=1 i=1 p X ≤ λi D(ϕei,nm , ψi,nm ) (By (6.2.8)) i=1 p X = λi D(ϕ∗i,nm , ψi,n∗ m ). i=1 82 Now, consider the subsequences {ϕ∗i,nN +m }m∈N and {ψi,n ∗ } N0 +m m∈N . Note that, the av- 0 erages { N1 N m=1 ϕi,nN +m }m∈N and { N m=1 ψi,nN +m }m∈N also converge to ϕ , νi -a.e. and P ∗ 1 PN ∗ ∗ 0 0 ψ , ν -a.e. respectively. ∗ ∗ Then, for any N ≥ 1, p p N X X 1 X sup λi D(ϕi , ψi ) − δ ≤ λi D(ϕ∗i,nN +m , ψi,n ∗ ) ϕi ,ψi i=1 i=1 N m=1 0 N0 +m p N N ! X 1 X ∗ 1 X ∗ ≤ λi D ϕ , ψ i=1 N m=1 i,nN0 +m N m=1 i,nN0 +m p Z N X 1 X ∗ = λi ϕi,nN +m dνi Xi N m=1 0 i=1 p 1 PN ∗ 1 PN ∗ m=1 ϕi,nN +m + N m=1 ψi,nN +m −ci X Z N 0 0 −ε λi e ε dνi ⊗ ν ∗ . i=1 Xi ×Y (6.2.21) Note that, on the first line above, we take the sum over m from 1 to N and divide by N on both sides, and on the second line, we use the concavity of the functional D (see Appendix 5). Then, p p ( Z N X X 1 X ∗ sup λi D(ϕi , ψi ) − δ ≤ lim sup λi ϕ dνi ϕi ,ψi i=1 N →∞ i=1 Xi N m=1 i,nN0 +m p 1 PN ∗ 1 PN ∗ ) m=1 ϕi,nN +m + N m=1 ψi,nN +m −ci X Z N 0 0 −ε λi e ε dνi ⊗ ν ∗ i=1 Xi ×Y p Z N X 1 X ∗ ≤ λi lim sup ϕi,nN +m dνi i=1 N →∞ X i N m=1 0 p 1 PN ∗ 1 PN ∗ m=1 ϕi,nN +m + N m=1 ψi,nN +m −ci X Z N 0 0 −ε λi lim inf e ε dνi ⊗ ν ∗ . N →∞ Xi ×Y i=1 (6.2.22) Now, we will consider each of the limits in (6.2.22) above. Note that by (6.2.16), we have supm supx∈Xi ϕ∗i,nN (x) < ∞. Hence, by Fatou’s Lemma 0 +m 83 (lim sup version), Z N Z N 1 X ∗ 1 X ∗ lim sup ϕ dνi ≤ lim sup ϕi,nN +m dνi N →∞ N m=1 i,nN0 +m Xi N →∞ N m=1 0 Xi Z (6.2.23) = ϕ∗i dνi . Xi  P  Since ∀1 ≤ i ≤ p, N converges pointwise νi ⊗ 1 ∗ 1 PN ∗ N m=1 ϕi,nN + N m=1 ψi,nN − ci 0 +m 0 +m ν ∗ -a.e. to (ϕ∗i + ψi∗ − ci ), by Fatou’s Lemma, 1 PN ∗ 1 PN ∗ ϕ∗ ∗ m=1 ϕi,nN +m + N m=1 ψi,nN +m −ci Z Z N i +ψi −ci 0 0 e ε ∗ dνi ⊗ ν ≤ lim inf e ε dνi ⊗ ν ∗ . (6.2.24) Xi ×Y N →∞ Xi ×Y Thus, by combining (6.2.22), (6.2.23) and (6.2.24), we get p p p ϕ∗ ∗ Z Z i +ψi −ci X X X sup λi D(ϕi , ψi ) − δ ≤ λi ϕ∗i dνi − ε λi e ε dνi ⊗ ν ∗ ϕi ,ψi i=1 i=1 Xi i=1 Xi ×Y p (6.2.25) X = λi D(ϕ∗i , ψi∗ ). i=1 Since δ > 0 is arbitrary, letting δ → 0, we get that p p X X sup λi D(ϕi , ψi ) ≤ λi D(ϕ∗i , ψi∗ ). ϕi ,ψi i=1 i=1 Also, since = 0, we have that Pp ∗ i=1 λi ψi,nm p N p X 1 X ∗ X lim λi ψi,nm = λi ψi∗ = 0 ν ∗ -a.e. N →∞ i=1 N m=1 i=1 Thus we can conclude that {ϕ∗i , ψi∗ }pi=1 is a maximizer for (6.2.6). Remark 6.2.6. Note that, since N0 depends on δ, ∀1 ≤ i ≤ p, the sets Aϕi = xi ∈ Xi : {ϕ∗i,nN } does not converge to ϕ∗i  0 +m and Aψi = y ∈ Y : {ψi,n ∗ } does not converge to ψi∗  N 0 +m depend on the choice of δ. However, since we only consider integrals against νi ’s and ν ∗ , these sets do not affect our calculations. 84 Remark 6.2.7. Also observe that, even though we have strong duality results for {(ϕi ∈ Cb (Xi , νi ), ψi ∈ Cb (Y, ν ∗ ))}pi=1 , we get existence for {(ϕi ∈ L1 (Xi , νi ), ψi ∈ L1 (Y, ν ∗ ))}pi=1 . We may get a better regularity for ϕ∗i , ∀1 ≤ i ≤ p − 1, due to the uniform boundedness of the ϕ∗i,n , ∀1 ≤ i ≤ p − 1 (see (6.2.9)). 85 BIBLIOGRAPHY [1] M. Agueh and G. Carlier. “Barycenters in the Wasserstein space”. In: SIAM Journal on Mathematical Analysis 43.2 (2011), pp. 904–924. [2] L. Ambrosio, A. Bressan, D. Helbing, A. Klar, E. Zuazua, L. Ambrosio, and N. Gigli. “A user’s guide to optimal transport”. In: Modelling and Optimisation of Flows on Networks: Cetraro, Italy 2009, Editors: Benedetto Piccoli, Michel Rascle (2013), pp. 1– 155. [3] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005. [4] E. Anderes, S. Borgwardt, and J. Miller. “Discrete Wasserstein barycenters: Optimal transport for discrete data”. In: Mathematical Methods of Operations Research 84.2 (2016), pp. 389–409. [5] M. Beiglböck and W. Schachermayer. “Duality for Borel measurable cost functions”. In: Transactions of the American Mathematical Society 363.8 (2011), pp. 4203–4224. [6] J.-D. Benamou and Y. Brenier. “A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem”. In: Numerische Mathematik 84.3 (2000), pp. 375–393. [7] J. Bigot, E. Cazelles, and N. Papadakis. “Penalization of barycenters in the Wasserstein space”. In: SIAM Journal on Mathematical Analysis 51.3 (2019), pp. 2261–2285. [8] S. Borgwardt and S. Patterson. “Improved linear programs for discrete barycenters”. In: Informs Journal on Optimization 2.1 (2020), pp. 14–33. [9] J. M. Borwein, A. S. Lewis, and R. D. Nussbaum. “Entropy minimization, DAD problems, and doubly stochastic kernels”. In: Journal of Functional Analysis 123.2 (1994), pp. 264–307. [10] Y. Brenier. “Décomposition polaire et réarrangement monotone des champs de vecteurs”. In: CR Acad. Sci. Paris Sér. I Math. 305 (1987), pp. 805–808. [11] Y. Brenier. “Optimal transportation of particles, fluids and currents”. In: Variational methods for evolving objects. Vol. 67. Mathematical Society of Japan, 2015, pp. 59–86. [12] G. Carlier, K. Eichinger, and A. Kroshnin. “Entropic-Wasserstein barycenters: PDE characterization, regularity, and CLT”. in: SIAM Journal on Mathematical Analysis 53.5 (2021), pp. 5880–5914. [13] G. Carlier and I. Ekeland. “Matching for teams”. In: Economic theory 42.2 (2010), 86 pp. 397–418. [14] G. Carlier and M. Laborde. “A differential approach to the multi-marginal Schrödinger system”. In: SIAM Journal on Mathematical Analysis 52.1 (2020), pp. 709–717. [15] I. Csiszár. “I-divergence geometry of probability distributions and minimization prob- lems”. In: The annals of probability (1975), pp. 146–158. [16] M. Cuturi. “Sinkhorn distances: Lightspeed computation of optimal transport”. In: Advances in neural information processing systems 26 (2013). [17] S. Di Marino and A. Gerolin. “Optimal transport losses and Sinkhorn algorithm with general convex regularization”. In: arXiv preprint arXiv:2007.00976 (2020). [18] A. N. Doledenok. “On Kantorovich multimarginal optimal transportation problems with density constraints”. In: 2018. [19] P. Dupuis and R. S. Ellis. A weak convergence approach to the theory of large devia- tions. John Wiley & Sons, 2011. [20] I. Ekeland and R. Temam. Convex analysis and variational problems. SIAM, 1999. [21] J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré. “Inter- polating between optimal transport and mmd using sinkhorn divergences”. In: The 22nd International Conference on Artificial Intelligence and Statistics. PMLR. 2019, pp. 2681–2690. [22] W. Gangbo and R. J. McCann. “Optimal maps in Monge’s mass transport problem”. In: Comptes Rendus de l’Academie des Sciences-Serie I-Mathematique 321.12 (1995), p. 1653. [23] W. Gangbo and R. J. McCann. “The geometry of optimal transportation”. In: Acta Math. 177.2 (1996), pp. 113–161. [24] W. Gangbo and A. Święch. “Optimal maps for the multidimensional Monge-Kantorovich problem”. In: Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 51.1 (1998), pp. 23–45. [25] Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun. “Ota: Optimal transport assignment for object detection”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 303–312. [26] T. Geiger, D. Wachsmuth, and G. Wachsmuth. “Optimal control of ODEs with state suprema”. In: Math. Control Relat. Fields 11.3 (2021), pp. 555–578. 87 [27] A. Genevay, M. Cuturi, G. Peyré, and F. Bach. “Stochastic optimization for large-scale optimal transport”. In: Advances in neural information processing systems 29 (2016). [28] A. Gerolin, A. Kausamo, and T. Rajala. “Multi-marginal entropy-transport with re- pulsive cost”. In: Calculus of Variations and Partial Differential Equations 59.3 (2020), p. 90. [29] N. Gigli and L. Tamanini. “Second order differentiation formula on RCD (K, N ) spaces”. In: Rendiconti Lincei 29.2 (2018), pp. 377–386. [30] L. V. Kantorovich. “On the translocation of masses”. In: Dokl. Akad. Nauk. USSR (NS). vol. 37. 1942, pp. 199–201. [31] J. Kitagawa and B. Pass. “The multi-marginal optimal partial transport problem”. In: Forum of Mathematics, Sigma. Vol. 3. Cambridge University Press. 2015. [32] J. Komlós. “A generalization of a problem of Steinhaus”. In: Acta Mathematica Academiae Scientiarum Hungaricae 18.1-2 (1967), pp. 217–229. [33] J. Korman and R. McCann. “Optimal transportation with capacity constraints”. In: Transactions of the American Mathematical Society 367.3 (2015), pp. 1501–1521. [34] J. Korman, R. J. McCann, and C. Seis. “Dual potentials for capacity constrained optimal transport”. In: Calculus of Variations and Partial Differential Equations 54.1 (2015), pp. 573–584. [35] J. Korman, R. J. McCann, and C. Seis. “An elementary approach to linear program- ming duality with application to capacity constrained transport”. In: J. Convex Anal. 22.3 (2015), pp. 797–808. [36] H. Lavenant, S. Claici, E. Chien, and J. Solomon. “Dynamical optimal transport on discrete surfaces”. In: ACM Transactions on Graphics (TOG) 37.6 (2018), pp. 1–16. [37] C. Léonard. “A survey of the Schrödinger problem and some of its connections with optimal transport”. In: Discrete Contin. Dyn. Syst. 34.4 (2014), pp. 1533–1574. [38] L. Li, A. Genevay, M. Yurochkin, and J. M. Solomon. “Continuous regularized wasser- stein barycenters”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 17755–17765. [39] S. D. Marino and A. Gerolin. “An optimal transport approach for the Schrödinger bridge problem and convergence of Sinkhorn algorithm”. In: Journal of Scientific Computing 85.2 (2020), pp. 1–28. [40] R. J. McCann. “A convexity principle for interacting gases”. In: Advances in mathe- 88 matics 128.1 (1997), pp. 153–179. [41] G. Monge. “Mémoire sur la théorie des déblais et des remblais”. In: Mem. Math. Phys. Acad. Royale Sci. (1781), pp. 666–704. [42] N. Papadakis. “Transport Optimal pour le Traitement d’Images”. In: 2015. [43] B. Pass. “Multi-marginal optimal transport: theory and applications”. In: ESAIM: Mathematical Modelling and Numerical Analysis 49.6 (2015), pp. 1771–1790. [44] B. Pass. “On the local structure of optimal measures in the multi-marginal optimal transportation problem”. In: Calculus of Variations and Partial Differential Equations 43.3 (2012), pp. 529–536. [45] G. Peyré, M. Cuturi, et al. “Computational optimal transport: With applications to data science”. In: Foundations and Trends® in Machine Learning 11.5-6 (2019), pp. 355–607. [46] S. T. Rachev and L. Rüschendorf. Mass Transportation Problems: Volume I: Theory. Vol. 1. Springer Science & Business Media, 1998. [47] R. T. Rockafellar. Convex analysis, 2. printing ed. 1972. [48] H. L. Royden and P. Fitzpatrick. Real analysis. Vol. 32. Macmillan New York, 1988. [49] L. Ruschendorf. “Convergence of the iterative proportional fitting procedure”. In: The Annals of Statistics (1995), pp. 1160–1174. [50] T. Salimans, H. Zhang, A. Radford, and D. Metaxas. “Improving GANs using optimal transport”. In: arXiv preprint arXiv:1803.05573 (2018). [51] F. Santambrogio. “Optimal transport for applied mathematicians”. In: Birkäuser, NY 55.58-63 (2015), p. 94. [52] D. Simon and A. Aberdam. “Barycenters of natural images constrained wasserstein barycenters for image morphing”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 7910–7919. [53] C. Villani. Optimal transport: old and new. Vol. 338. Springer, 2009. [54] C. Villani. Topics in optimal transportation. Vol. 58. American Mathematical Soc., 2021. [55] P. Zhao and Z.-H. Zhou. “Label distribution learning by optimal transport”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. 1. 2018. 89 [56] L. Zhu, Y. Yang, S. Haker, and A. Tannenbaum. “An image morphing technique based on optimal mass preserving mapping”. In: IEEE transactions on image processing 16.6 (2007), pp. 1481–1495. 90 APPENDIX 1 Alternative dual functional for CCOT Problem: Let X, Y ⊆ Rd . For given f ∈ L1 (X), g ∈ L1 (Y ) and 0 ≤ h̃ ∈ L∞ (X × Y ), consider OT∗CC := sup J(u, v, w) (1.1) (u,v,w)∈Lipc,h̃ where Z Z Z J(u, v, w) := u(x)f (x) dx + v(y)g(y) dy + w(x, y)h̃(x, y) dxdy, (1.2) X Y X×Y ( Lipc,h̃ := (u, v, w) : u ∈ L1 (X, f dx), v ∈ L1 (Y, gdy), w ∈ L1 (X × Y, h̃dxdy), ) (1.3) u(x) + v(y) + w(x, y) ≤ c(x, y), and w(x, y) ≤ 0 , and ′ OT∗CC := sup J ′ (u, v), (1.4) u∈L1 (X),v∈L1 (Y ) where Z Z Z ′ J (u, v) := uf dx + vg dy − [−c + u + v]+ h̃ dxdy. (1.5) X Y X×Y ′ Proposition 1.1. OT∗CC = OT∗CC . n o Proof. Define A = (u, v, w) : u ∈ L1 (X, f dx), v ∈ L1 (Y, gdy), and w = − [−c + u + v]+ . Then, ′ OT∗CC := sup J(u, v, w). (u,v,w)∈A First, let (u, v, w) ∈ A be arbitrary. Then w ≤ 0, w = − [−c + u + v]+ ≤ − [−c + u + v], which gives u + v + w ≤ c, and w ∈ L1 (X × Y, h̃dxdy). Hence, (u, v, w) ∈ Lipc,h̃ . Thus, A ⊆ Lipc,h̃ . So, we have sup J(u, v, w) ≤ sup J(u, v, w), (1.6) (u,v,w)∈A (u,v,w)∈Lipc,h̃ i.e. ′ OT∗CC ≤ OT∗CC . (1.7) Now observe that, (Z Z Z ) OT∗CC = sup u(x)f (x) dx + v(y)g(y) dy + w(x, y)h̃(x, y) dxdy (u,v,w)∈Lipc,h̃ X Y X×Y 91 (Z Z = sup u(x)f (x) dx + v(y)g(y) dy (u,v,w)∈Lipc,h̃ X Y Z ) + w(x, y)h̃(x, y) dxdy {(x,y):−c(x,y)+u(x)+v(y)≥0} Z ) + w(x, y)h̃(x, y) dxdy {(x,y):−c(x,y)+u(x)+v(y)<0} (Z Z ≤ sup u(x)f (x) dx + v(y)g(y) dy (u,v)∈L1 (X)×L1 (Y ) X Y Z ) + [c(x, y) − u(x) − v(y)]h̃(x, y) dxdy . (1.8) {(x,y):−c(x,y)+u(x)+v(y)≥0} In the last line above, we used the fact that w ≤ c − u − v, and w ≤ 0. On the other hand, (Z Z Z ) ′ OT∗CC = sup u(x)f (x) dx + v(y)g(y) dy − [−c + u + v]+ h̃ dxdy (u,v)∈L1 (X)×L1 (Y ) X Y X×Y (Z Z = sup u(x)f (x) dx + v(y)g(y) dy (u,v)∈L1 (X)×L1 (Y ) X Y Z − [−c + u + v]+ h̃(x, y) dxdy {(x,y):−c(x,y)+u(x)+v(y)≥0} Z ) − [−c + u + v]+ h̃(x, y) dxdy {(x,y):−c(x,y)+u(x)+v(y)<0} (Z Z = sup u(x)f (x) dx + v(y)g(y) dy (u,v)∈L1 (X)×L1 (Y ) X Y Z ) − [−c + u + v] h̃(x, y) dxdy {(x,y):−c(x,y)+u(x)+v(y)≥0} (Z Z = sup u(x)f (x) dx + v(y)g(y) dy (u,v)∈L1 (X)×L1 (Y ) X Y Z ) + [c(x, y) − u(x) − v(y)] h̃(x, y) dxdy . (1.9) {(x,y):−c(x,y)+u(x)+v(y)≥0} By combining (1.8) and (1.9), we get ′ OT∗CC ≤ OT∗CC . (1.10) The inequalities (1.7) and (1.10) conclude that ′ OT∗CC = OT∗CC . 92 2 The relaxed CCOT Problem is strictly convex. Given an integer p > 2, a lower semi-continuous function c : Rpd 7→ R, f ∈ L1 (Rd ) and h ∈ L1 (Rd × Rd ), consider the functional Z p 1 X Icε (h) := ch dx̂ + || ⟨h⟩xi − fi ||22 , (2.1) Rpd 2ε i=1 where dx̂ = dx1 . . . dxp . Proposition 2.1. Icε is strictly convex. Proof. Let 0 ≤ h1 , h2 ≤ h̃ be such that h1 ̸= h2 and let 0 < λ < 1. Then, observe that Z p 1 X Icε (λh1 + (1 − λ)h2 ) = c(λh1 + (1 − λ)h2 )dx̂ + || ⟨(λh1 + (1 − λ)h2 )⟩xi − fi ||22 R pd 2ε i=1 Z Z =λ ch1 dx̂ + (1 − λ) ch2 dx̂ Rpd Rpd p 1 X + || ⟨(λh1 + (1 − λ)h2 )⟩xi − (λ + (1 − λ))fi ||22 2ε i=1 Z Z =λ ch1 dx̂ + (1 − λ) ch2 dx̂ Rpd Rpd p 1 X + ||λ(⟨h1 ⟩xi − fi ) + (1 − λ)(⟨h2 ⟩xi − fi )||22 2ε i=1 Z Z <λ ch1 dx̂ + (1 − λ) ch2 dx̂ Rpd Rpd p p λ X (1 − λ) X + || ⟨h1 ⟩xi − fi ||22 + || ⟨h2 ⟩xi − fi ||22 2ε i=1 2ε i=1 = λIcε (h1 ) + (1 − λ)Icε (h2 ). Note that in the penultimate line, the strict inequality holds due to the strict convexity of the L2 norm. 3 Uniform boundedness for probability measures with finite moment. Proposition 3.1. Given capacities {γ̃i }pi=1 ⊆ M+ (Rd × Rd ), p probability measures ν1 , . . . , νp in P2 (Rd ), and a sequence of probability measures {ν̃n }n∈N ∈ P2 (Rd ) such that Πγ̃i (νi , ν̃n ) ̸= ∅, ∀1 ≤ i ≤ p, if there is an M > 0 such that W f2 2 (νi , ν̃n ) ≤ M, ∀i and n, we have Z sup |x|2 dν̃n ≤ C, for some constant C. n Proof. Fix i ∈ {1, . . . , p} and n ∈ N. Consider the two-marginal capacity constrained OT problem between the measures νi and ν̃n with capacity γ̃i given by (5.2.3). 93 By the duality, (Z Z Z ) 2 W f2 (νi , ν̃n ) = sup u(x) dνi (x) + v(y) dν̃n (y) + w(x, y) dγ̃i (x, y) , (3.1) Bi,n Rd Rd Rd ×Rd where the supremum is taken over the set Bi,n of real-valued functions u, v, w satisfying u ∈ L1 (νi ), v ∈ L1 (ν̃n ), w ∈ L1 (γ̃i ) and w ≤ 0 with u(x) + v(y) + w(x, y) ≤ 21 |x − y|2 . Choose u(x) = inf y∈Rd { 21 |x|2 + 14 |y|2 − x · y}, v(y) = 41 |y|2 , w(x, y) = 0. Then, u(x) + v(y) + w(x, y) ≤ 12 |x|2 + 14 |y|2 − x · y + 14 |y|2 = 12 |x − y|2 for all (x, y) ∈ Rd × Rd . By plugging this choice of u, v, w in (3.1), we get Z Z n1 1 2 o 1 2 f2 2 (νi , ν̃n ). inf 2 |x| + |y| − x · y dνi (x) + |y| dν̃n (y) ≤ W (3.2) Rd y∈R d 2 4 Rd 4 n o Since ∇y 21 |x|2 + 14 |y|2 − x · y = 12 y − x, we have n o inf y∈Rd 2 |x| + 4 |y| − x · y = − 21 |x|2 . 1 2 1 2 Thus, (3.2) gives us, Z Z 1 2 2 1 2 |y| dν̃n (y) ≤ Wf2 (νi , ν̃n ) + |x| dνi (x). Rd 4 Rd 2 Since Wf2 2 (νi , ν̃n ) ≤ M and νi ∈ P2 (Rd ), there is a constant C > 0, independent from n such that Z 1 2 |y| dν̃n (y) ≤ C. Rd 4 4 An integral condition for tightness: Proposition 4.1. ([3], Remark 5.1.5) R Let X ⊆ Rd and {νn }n∈N ∈ P2 (X) be a sequence of probability measures. Then, if supn |x|2 dνn < +∞, then {νn }n∈N is tight. Proof. Let δ > 0 be arbitrary q and fix n ∈ N. Suppose supn |x| dνn < M . Let Rδ be a real R 2 number such that Rδ > Mδ . Define the compact set Kδ = {x ∈ Rd : |x| ≤ Rδ }. Then, Z 1 M d νn ({x ∈ R : |x| > Rδ }) ≤ 2 |x|2 dνn ≤ < δ. (4.1) Rδ Rd Rδ2 This proves that {νn }n∈N is tight. Note that we get the first inequality in (4.1) by the Chebychev’s inequality. 94 5 The entropic dual functional is strictly concave. Given two Polish spaces X, Y , a bounded cost function c : X × Y 7→ R, two probability measures µ ∈ P(X), ν ∈ P(ν), and two functions ϕ ∈ Lexp ε (X, dµ), ψ ∈ Lε (Y, dν), consider exp the functional Z Z Z (ϕ+ψ−c) D(ϕ, ψ) := ϕdµ + ψdν − ε e ε dµ ⊗ ν. (5.1) X Y X×Y Proposition 5.1. The functional D is strictly concave. Proof. Let ϕ1 , ϕ2 ∈ Lexp ε (X, dµ), ψ1 , ψ2 ∈ Lε (Y, dν) and 0 < λ < 1 be a real number. We exp need to show that D(λϕ1 + (1 − λ)ϕ2 , λψ1 + (1 − λ)ψ2 ) > λD(ϕ1 , ψ1 ) + (1 − λ)D(ϕ2 , ψ2 ). i.e. Z Z Z Z λ ϕ1 dµ + (1 − λ) ϕ2 dµ + λ ψ1 dν + (1 − λ) ψ2 dν X Z X Y Y (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c) −ε e ε dµ ⊗ ν X×Y Z Z Z (ϕ1 +ψ1 −c) >λ ϕ1 dµ + λ ψ1 dν − ελ e ε dµ ⊗ ν X Y X×Y Z Z Z (ϕ2 +ψ2 −c) + (1 − λ) ϕ2 dµ + (1 − λ) ψ2 dν − ε(1 − λ) e ε dµ ⊗ ν. X Y X×Y So, it is sufficient to show that Z (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c) −ε e ε dµ ⊗ ν X×Y Z Z (ϕ1 +ψ1 −c) (ϕ2 +ψ2 −c) > −ελ e ε dµ ⊗ ν − ε(1 − λ) e ε dµ ⊗ ν. X×Y X×Y i.e. Z (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c) e ε dµ ⊗ ν X×Y Z Z (ϕ1 +ψ1 −c) (ϕ2 +ψ2 −c) <λ e ε dµ ⊗ ν + (1 − λ) e ε dµ ⊗ ν. (5.2) X×Y X×Y First, we will prove that (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c) (ϕ1 +ψ1 −c) (ϕ2 +ψ2 −c) e ε < λe ε + (1 − λ)e ε . (5.3) Observe that (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −c) (λϕ1 +(1−λ)ϕ2 +λψ1 +(1−λ)ψ2 −λc−(1−λ)c) e ε =e ε (ϕ1 +ψ1 −c) (ϕ +ψ −c) +(1−λ) 2 ε 2 = eλ ε 95 (ϕ1 +ψ1 −c) (ϕ2 +ψ2 −c) < λe ε + (1 − λ)e ε . Note that the last line above holds due to the strict convexity of the exponential function. Now, by integrating both sides of (5.3) with respect to dµ⊗ν on X ×Y , we get the inequality (5.2). This completes the proof. 96