COMPOSITIONALITY IN DIFFUSION MODELS
How can we learn generative models to sample data with arbitrary logical compositions of statistically independent attributes? The prevailing solution is to sample from distributions expressed as a composition of attributes’ conditional marginal distributions under the assumption that they are statistically independent. However, we show that standard conditional diffusion models violate this assumption, even when all attribute compositions are observed during training. And, this violation is significantly more severe when only a subset of the compositions is observed. We propose CoInD to address this problem. It explicitly enforces statistical independence between the conditional marginal distributions by minimizing Fisher’s divergence between the joint and marginal distributions. The theoretical advantages of CoInD are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. The benefit is more pronounced for scenarios that current solutions relying on the assumption of conditionally independent marginals struggle with, namely, logical compositions involving the NOT operation and when only a subset of compositions are observed during training.CoInD’s ability to capture the compositional nature of the world, results in faithful and controlled generation, can be leveraged to address many downstream applications. One such application is the task of compositional shift. Machine learning systems often struggle with robustness under subpopulation shifts, especially when only a subset of attribute combinations is observed during training—a severe form of subpopulation shift referred to as compositional shift. To address this problem, we ask: Can we improve robustness of downstream classifier by training on synthetic data that spans all possible attribute combinations? Conditional diffusion models trained on limited data yield an incorrect underlying distribution, and synthetic data sampled from these models often results in unfaithful samples and do not enhance downstream performance. In contrast, CoInD produces faithful samples, which translates to SoTA worst-group accuracy on compositional shift tasks on CelebA. Our code is available at https://github.com/sachit3022/compositional-generation/
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- Attribution 4.0 International
- Material Type
-
Theses
- Authors
-
Gaudi, Sachit
- Thesis Advisors
-
Boddeti, Vishnu
- Committee Members
-
Liu, Xiaoming
Kong, Yu
Xu, Felix Juefei
- Date Published
-
2025
- Subjects
-
Computer science
- Program of Study
-
Computer Science - Master of Science
- Degree Level
-
Masters
- Language
-
English
- Pages
- 77 pages
- Permalink
- https://doi.org/doi:10.25335/y7m5-z453