A METHODOLOGY FOR MODELING EMERGENCE IN COMPLEX SYSTEMS
                                     By
                            Nathan Brugnone
                           A DISSERTATION
                               Submitted to
                       Michigan State University
                in partial fulfillment of the requirements
                             for the degree of
           Community Sustainability—Doctor of Philosophy
     Computational Mathematics, Science, & Engineering—Dual Major
                                    2023


                                        ABSTRACT
Understanding how social and ecological systems interact and work together as complex
adaptive systems is essential for understanding the emergence of environmental and social
systems states. However, complex systems domain general methods are not readily acces-
sible to researchers and policymakers limiting our ability to both understand and intervene
to address contemporary social and environmental problems. At the same time, rapid new
developments in machine learning (ML) have allowed a sea change in extracting informa-
tion from disparate datasets to make useful for human decision-making. This dissertation
develops and studies novel ML methods to understand the emergent properties of complex
systems. By proposing new approaches, we show how these emerging technologies can be
applied to both (1) theoretical and (2) real-world problems. The first chapter introduces
a domain-general unsupervised machine learning method for identifying clusters in data
with hierarchical structure like that found in complex systems. Chapter 2 utilizes complex
systems theory and theoretical machine learning to drive development of a theoretical frame-
work and method for understanding graphical models of complex systems. The final chapter
demonstrates how these methods can be applied through a case study of maternal and child
healthcare (MCH) in Gombe, Nigeria.
    Complex systems exhibit hierarchical structure that can be studied at various scales.
Individuals are members of families who are in turn members of communities which are
members of local government areas and so on. Chapter 1 proposes a clustering method
that coarsens fine structures to reveal nested hierarchies in complex data. The method is
presented in a theoretical framework based upon graph signal processing. The performance
of this method is assessed on canonical ground-truth datasets. It is then shown to identify
novel structure in real-world complex system data.
    Complex systems can also be characterized by the patterns of emergent phenomena
that they generate. These patterns often exhibit self-similarity which can be modeled with
stochastic processes called textures. Chapter 2 introduces a theoretical framework that gen-


eralizes statistically self-similar random fields to graphs, again via the graph signal processing
paradigm. Perhaps surprisingly, this generalization is shown to facilitate the classification
of cognitive maps based upon structure. We model cognitive maps with samples from ran-
dom graph families. We find that the statistical model produces sufficiently rich features to
enable the accurate classification of these random graph families, thereby paving the way to
application in unsupervised, real-world contexts.
    Chapter 3 builds upon Chapters 1 and 2 by introducing a novel clustering method and
comparing its performance with two others on real-world complex systems data. The data
consists of value-laden statements made by expectant mothers and fathers in Gombe, Nigeria
about utilization of the local maternal and child health (MCH) system. We take a method-
ological approach to identifying groups of individuals expressing similar values, which we
seek to improve wisdom-of-crowds estimates of healthcare utilization. Similarities and dif-
ferences among the identified values-based subpopulations provide insight into challenges
and potentials for application of machine learning in similar development contexts.


                               ACKNOWLEDGEMENTS
This dissertation would not have been possible without the support of a great number of
generous people. I am indebted to Steven Gray for academically adopting me and cooking
an awesome crab dinner around which this document coalesced. Steven introduced me
to James Gentile and the amazing complex & social systems research group at Two Six
Technologies where some of the research in this dissertation has been conducted. I owe
a great debt of gratitude to Matt Hirn and Robby Richardson whose patient guidance,
kindness, and curiosity enabled me to pursue ideas at the intersection of complex systems
modeling and machine learning. Matt’s Herculean teaching and research efforts set a high bar
that continues to inspire his students. Shout out to Dirk Colbry who continually welcomed
me to teach beside him and proved that mathematics and data science are human-centered
disciplines that can improve our well-being. Special thanks to Michael Murillo and his
agent-based modeling group for many interesting discussions and encouragement. Thanks
also to the MSU Modeling Ecological and Social Systems faculty Laura Schmitt Olabisi and
Arika Ligmann-Zielinska whose collaborative approaches continue to inform my perspective.
Thanks to Heather Williamson, Gail Vander Stoep, and the staff in CSUS and CMSE for
turning untimely paperwork into a fully functioning degree program. Thanks to the members
of my research groups: Timmy, Payam, Carissa, Mahdi, Anna, Mike et al. To everyone
involved with the Sustainable Michigan Endowed Project–Pat and Paul, Jessica and Laura
(CC ‘holla), Kyle, Zach, Bethany et al.–thank you for the formative experiences. Thanks also
to my mom, dad, and brothers, Bobby and Thomas, not only for the formative experiences,
but also for the interest and support–thank you. Thanks to my wife, Danielle, for believing
in and pushing me to stick with it through difficult times. To my youngest son, Miles, thank
you for the cuddles. And very special thanks to my oldest son, Asa, whose excitement and
kindness have been absolutely essential to seeing this dissertation through.
                                             iv


                            TABLE OF CONTENTS
CHAPTER 1      COARSE GRAINING OF DATA VIA
               INHOMOGENEOUS DIFFUSION CONDENSATION                         . . . . . . .  1
    1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . .  1
    1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . .  1
    1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . .  3
    1.4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . .  4
    1.5 Diffusion Condensation . . . . . . . . . . . . . . . . . . . . .    . . . . . . .  5
    1.6 Properties of Diffusion Condensation . . . . . . . . . . . . .      . . . . . . .  9
    1.7 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . 11
    1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . 18
    1.9 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .      . . . . . . . 18
    BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . . . . . 20
CHAPTER 2      SELF-SIMILAR GRAPH SIGNALS & SYSTEM
               CLASSIFICATION . . . . . . . . . . . . . . . .         . . . . . . . . . . 23
    2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . 23
    2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . 23
    2.3 Background . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . . . 24
    2.4 Methods & Materials . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . 30
    2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
    2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . 37
    2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . 37
    BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . .        . . . . . . . . . . 39
    APPENDIX           EXPERIMENTAL PARAMETERS . . .                  . . . . . . . . . . 41
CHAPTER 3      FROM ‘OUGHT’ TO ‘IS’: A COMPARISON OF
               UNSUPERVISED METHODS FOR VALUES-INFORMED
               WISDOM OF CROWDS . . . . . . . . . . . . . . . . . . .             . . . . 42
    3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . 42
    3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . 42
    3.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . 44
    3.4 Methods & Materials . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . 51
    3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
    3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . 88
    3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . 90
    3.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . . 91
    BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . . 92
    APPENDIX           VALUE HYPOTHESES & DEMOGRAPHICS . .                        . . . . 96
                                           v


                                         CHAPTER 1
                          COARSE GRAINING OF DATA VIA
                INHOMOGENEOUS DIFFUSION CONDENSATION
© 2019 IEEE. Reprinted, with permission, from Brugnone et al. (2019).
1.1    Abstract
    Big data often has emergent structure that exists at multiple levels of abstraction, which
are useful for characterizing complex interactions and dynamics of the observations. Here,
we consider multiple levels of abstraction via a multiresolution geometry of data points at
different granularities. To construct this geometry we define a time-inhomogeneous diffusion
process that effectively condenses data points together to uncover nested groupings at larger
and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low
pass filters on the data affinity graph that are applied in sequence to gradually eliminate local
variability while adjusting the learned data geometry to increasingly coarser resolutions. We
provide visualizations to exhibit our method as a “continuously-hierarchical” clustering with
directions of eliminated variation highlighted at each step. The utility of our algorithm is
demonstrated via neuronal data condensation, where the constructed multiresolution data
geometry uncovers the organization, grouping, and connectivity between neurons.
1.2    Introduction
    A fundamental task in data analysis is to characterize variability that separates infor-
mative data relations from disruptive ones, e.g., due to noise or collection artifacts. In
predictive tasks such as classification, for example, one might seek to extract and preserve
information that enhances class separation, while eliminating intra-class variance. However,
in descriptive tasks and data exploration, such knowledge does not a priori exist, and instead
data processing methods must detect emergent patterns that encode meaningful abstractions
of the data. Furthermore, it is often the case that data abstraction cannot be conducted
at a single scale, and instead one must consider multiresolution data representations that
generate several scales of abstraction – each emphasizing different properties in the data.
                                                 1


    The need for multiresolution data representations is of particular importance in biomed-
ical data exploration, where recent technological advances introduce vast amounts of unla-
beled data to be explored by limited numbers of domain experts. For example, in single-cell
transcriptomics, high-throughput genomic and epigenetic assays have led to an explosion in
high-dimensional biological data measured from various systems including imaging (Giesen
et al., 2014; Angelo et al., 2014), mass cytometry (Bendall et al., 2011), and scRNA-
seq (Shapiro et al., 2013; Kolodziejczyk et al., 2015). To fully utilize this transformative
big data availability, computational methods are needed that leverage the intrinsic data ge-
ometry (e.g., using manifold learning techniques (Moon et al., 2018)) to enable exploratory
analysis upon it).
    A common approach towards data abstraction is to use clustering algorithms that pro-
vide coarse-grained representations of the data by grouping data points into salient clus-
ters (Levine et al., 2015; Galluccio et al., 2013; Von Luxburg, 2007), either at a single
scale or hierarchically (see Section 1.3). However, standard clustering algorithms such as k-
means (Lloyd, 1982; Kanungo et al., 2002) or expectation maximization (EM) (Moon, 1996)
have many limitations. For example, they fail to perform well on high-dimensional data, or
they require a number of assumptions about the underlying structure of the data (Ng et al.,
2002). In particular, a primary challenge in clustering is determining the optimal number
of clusters or groups. Many algorithms require the user to explicitly choose the number of
clusters (as in k-means) or tune a parameter that directly relates to the number of detected
clusters (e.g., as in Phenograph (Levine et al., 2015)). In exploratory settings, this makes
it particularly challenging to detect small, unique, or otherwise rare data type clusters, and
extract new knowledge from them.
    Here, we present a new approach to address the challenge of multiscale data coarse
graining by using a data-driven time-inhomogeneous diffusion process, which we call diffusion
condensation. Our proposed diffusion condensation process learns a “continuous hierarchy”
of coarse-grained representations by iteratively contracting the data-points towards a time-
                                               2


varying data manifold that represents increasingly coarser resolutions. At each iteration, the
data points move to the center of gravity of their local neighbors as defined by this data-
driven diffusion process (Coifman and Lafon, 2006; Nadler et al., 2005). This in turn alters
the next steps of the diffusion process to reflect the new data positions. Across iterations, this
construction creates a time-inhomogeneous Markov process on the data, which represents
the changing affinities between data points, along with changing granularities. The process
eventually collapses the entire data set to a single point. However, intermediate steps in this
process produce coarse-grained data representations at particular granularities or abstraction
levels. Importantly, our results show that distinct clusters emerge at different scales and each
data point (e.g., each cell in transcriptomic data) is represented by a time series of feature
vectors that capture multiresolution information in the data. Therefore, the data embedding
provided by the constructed diffusion condensation process can be thought of as a dynamic
video, as opposed to static snapshots provided by traditional manifold learning, such as
diffusion maps (Coifman and Lafon, 2006) and other dimensionality reduction methods (Van
Der Maaten et al., 2009).
1.3    Related Work
    Typical attempts at providing multiscale data abstraction or summarization rely on hi-
erarchical clustering, which is a family of methods that attempts to derive a tree of clusters
based on either recursive agglomeration of datapoints or recursive splitting. Agglommerative
methods include the popular linkage clustering, or community detection methods including
the Louvain Method (Blondel et al., 2008). Splitting based approaches include recursive
bisection (Dasgupta et al., 2006) and divisive analysis clustering (Kaufman and Rousseeuw,
2009). At each iteration, these methods explicitly attempt to discover the best split or merge
at each iteration, thereby forcing points together or apart as the case may be. Diffusion con-
densation by contrast does not force any splits or mergers at any iteration and simply allows
datapoints to come together naturally via repeated condensation steps. Thus, there may
be many iterations in which a cluster of datapoints remains distinct from other clusters.
                                                 3


This time length under which the cluster persists can itself be a metric of the distinctness
of a cluster, and the agglomeration of all such cluster persistence times creates a diagram
similar to those created in persistent homology (Wasserman, 2018; Kwitt et al., 2015). Thus
the hierarchical tree created by diffusion condensation (displayed as a Sankey diagram in
Figure 1.4) has branches whose lengths are meaningful in terms of cluster separation.
1.4    Preliminaries
1.4.1    Manifold learning
    High dimensional data can often be modeled as originating from a sampling Z = {zi }N     i=1 ⊂
Md of a d dimensional manifold Md that is mapped to n ≫ d dimensional observations
X = {x1 , . . . , xN } ⊂ Rn via a nonlinear function xi = f (zi ). Intuitively, the reason for this
phenomenon is that data collection measurements (modeled here via f ) typically result in
high dimensional observations, even when the intrinsic dimensionality, or degrees of freedom,
in the data is relatively low. This manifold assumption is at the core of the vast field of
manifold learning (e.g., (Moon et al., 2018; Coifman and Lafon, 2006; Van Der Maaten et al.,
2009; Izenman, 2012), and references therein), which leverages the intrinsic data geometry,
modeled as a manifold, for exploring and understanding patterns, trends, and structure in
data.
1.4.2    Diffusion geometry
    In Coifman and Lafon (2006), diffusion maps were proposed as a robust way to capture
intrinsic manifold geometry in data using random walks that aggregate local affinity to reveal
nonlinear relations in data and allow their embedding in low dimensional coordinates. These
local affinities are commonly constructed using a Gaussian kernel
                                             ∥xi − xj ∥2
                                                        
                         K(xi , xj ) = exp −               , i, j = 1, ..., N                 (1.1)
                                                  ε
where K is an N × N Gram matrix whose (i, j) entry is denoted by K(xi , xj ) to emphasize
the dependency on the data X. The bandwidth parameter ε controls neighborhood sizes. A
diffusion operator is defined as the row-stochastic matrix P = D−1 K where D is a diagonal
                                                 4


matrix with D(xi , xi ) =      K(xi , xj ), which is referred to as the degree of xi . The matrix P
                          P
                             j
defines single-step transition probabilities for a time-homogeneous diffusion process (which
is a Markovian random walk) over the data, and is thus referred to as the diffusion operator.
Furthermore, as shown in Coifman and Lafon (2006), powers of this matrix Pt , for t > 0,
can be used for multiscale organization of X, which can be interpreted geometrically when
the manifold assumption is satisfied.
1.4.3    Diffusion filters
    While originally conceived for dimensionality reduction via the eigendecomposition of
the diffusion operator, recent works (e.g., Van Dijk et al. (2018); Lindenbaum et al. (2018);
Gama et al. (2019); Gao et al. (2019)) have extended the diffusion framework of Coifman
and Lafon (2006) to allow processing of data features by directly using the operator P.
In this case, P serves as a smoothing operator, and may be regarded as a generalization
of a low-pass filter for either unstructured or graph-structured data. Indeed, consider a
vector v ∈ RN that we think of as a signal v(xi ) over X. Then Pv(xi ) replaces the value
v(xi ) with a weighted average of the values v(xj ) for those points xj such that ∥xi − xj ∥ =
   √
O( ε). Applications of this approach include data denoising and imputation (Van Dijk et al.,
2018), data generation (Lindenbaum et al., 2018), and graph embedding with geometric
scattering (Gama et al., 2019; Gao et al., 2019).
1.5    Diffusion Condensation
1.5.1    Time inhomogeneous heat diffusion
    The matrix P defines the transition probabilities of a random walk over the data set X.
Computing powers of P runs the walk forward, so that Pt gives the transition probabilities
of the t-step random walk. Since the same transition probabilities are used for every step of
the walk, the resulting diffusion process is time homogeneous.
    A time inhomogeneous diffusion process arises from an inhomogeneous random walk in
which the transition probabilities change with every step. Its t-step transition probabilities
                                                   5


are given by
                                                 P(t) = Pt Pt−1 · · · P1
where Pk is the Markov matrix that encodes the transition probabilities at step k.
     Suppose the data set X has an additional parameter t = 0, 1, 2, . . . that results from
measurements X(t) = {x1 (t), . . . , xN (t)} of a time-varying manifold Md (τ ) at discretely
sampled times τt = εt. Let Pt be the resulting Markov matrix derived from X(t), constructed
according to the anisotropic diffusion process of (Coifman and Lafon, 2006, Section 3) (which
is similar to the construction described in Section 1.4). One can show (Marshall and Hirn,
2018) the resulting inhomogeneous diffusion process P(t) approximates heat diffusion over the
time varying manifold Md (τ ). The singular vectors of this process can be used to construct
a so-called time coupled diffusion map, which gives time-space geometric summaries of the
data X.
     The perspective of Marshall and Hirn (2018) is that the data is intrinsically time varying.
However, one can also start with a static data set X and construct a series of deformations
of the data. In this paper we take the latter perspective and deform the data set according
to an imposed, data driven time inhomogeneous process P(t) that reduces variability within
the data over time. The resulting process is referred to as condensation, and is described in
the next section.
1.5.2         The diffusion condensation process
     Recall from Section 1.4 that the application of the operator P to a vector v averages the
values of v over small neighborhoods in the data. In the case of data X = {x1 , . . . , xN } ⊂ Rn
measured from an underlying manifold Md with the model xi = f (zi ) for zi ∈ Md , this
averaging operator can be directly applied to the coordinate functions f = (f1 , . . . , fn ).
Let fk ∈ RN be the vector corresponding to the coordinate function fk evaluated on the
data samples, i.e., fk (zi ) = fk (zi ). The resulting description of the data is given by X̄ =
{x̄1 , . . . , x̄N } where x̄i = (Pf1 (zi ), . . . , Pfn (zi )). The coordinates of X̄ are smoothed versions
of the coordinates of X, which dampens high frequency variations in the coordinate functions
                                                              6


and thus removes small perturbations in the data. This smoothing technique is used in
Van Dijk et al. (2018) to impute and denoise data.
    Here we consider not only the task of eliminating variability that originates from noise,
but also coarse graining the data coordinates to provide multiple resolutions of the captured
information in them. Therefore, we aim to gradually eliminate local variability in the data
using a time inhomogeneous diffusion process that refines the constructed diffusion geometry
to the coarser resolution as time progresses. This condensation process proceeds as follows.
Let X(0) = X be the original data set with Markov matrix P0 = P and X(1) = X̄ the
coordinate-smoothed data described in the previous paragraph. We can iterate this process
to further reduce the variability in the data by computing the Markov matrix P1 using
the coordinate representation X(1). A new coordinate representation X(2) is obtained by
applying P1 to the coordinate functions of X(1). In general, one can apply the process for
an arbitrary number of steps, which results in the condensation process. Let X(t) be the
coordinate representation of the data after t ≥ 0 steps so that X(t) = {x1 (t), . . . , xN (t)}
                (t)               (t)              (0)
with xi (t) = (f1 (zi ), . . . , fn (zi )), where fk = fk . We obtain X(t + 1) by applying Pt , the
                                                                          (t)
Markov matrix computed from X(t), to the coordinate vectors fk . This process results in:
                                (t+1)        (t)
                              fk      = Pt fk = Pt Pt−1 · · · P1 P0 fk , t≥0                 (1.2)
    From (1.2) we see the coordinate functions of the condensation process at time t + 1 are
derived from the imposed time inhomogeneous diffusion process P(t) = Pt · · · P0 . The low
                                                                                                 (t)
pass operator Pt applies a localized smoothing operation to the coordinate functions fk .
Over the entire condensation time, however, the original coordinate functions fk are smoothed
by the cascade of diffusion operators Pt · · · P0 . This process adaptively removes the high
frequency variations in the original coordinate functions. The effect on the data points X is
to draw them towards local barycenters, which are defined by the inhomogeneous diffusion
process. Once two or more points collapse into the same barycenter, they are identified as
being members of the same cluster. In Section 1.6 we demonstrate condensation’s dynamic
data deformations to remove variability and collapse points into clusters.
                                                       7


Input       : X ← NxM matrix of N data points, M features
              ϵ ← initial filter bandwidth
Output : Xt ← NxM data matrix after t condensations
begin
    i ← 0; iprev ← −2
      Q′ ← IN
      Qdif f ← ∞
      labels ← Range(0, N )
      while i − iprev > 1 do
        iprev ← i
         while Qdif f >= 1 × 10−4 do
              i←i+1
               D ← Distance(X)
               Merge(labels[Where(D < 1 × 10−3 )])
               A ← Affinity(D)
               Q ← Diag(RowSum(A))
               K ← Q−1 AQ−1
               P ← RowNormalize(K)
               X ←P ×X
               Qdif f ← ||Diag(Q) − Diag(Q’ )||l∞
               Q′ ← Q
        end
        ϵ←ϵ×2
         Qdif f ← ∞
    end
end
                                    Algorithm 1.1 Condensation
1.5.3    Algorithm
    Pseudocode is provided in Algorithm 1.1. Although not strictly necessary, cluster conver-
gence may be accelerated by increasing the bandwidth, ϵ, when the l∞ -norm of the difference
between densities of the previous and current iterations, Diag(Q′ ) and Diag(Q), falls below
a threshold.
    The present implementation provides proof-of-concept. We see computational complexity
is dominated by matrix multiply and is O(n4 ) when t ≥ n. Thus, more research is needed
to scale the algorithm.
                                                 8


1.6    Properties of Diffusion Condensation
1.6.1    Cluster self-organization
    Unlike other state-of-the-art clustering algorithms, such as k-means, diffusion conden-
sation does not require the user to a priori choose a potentially arbitrary number of data
clusters to find. Rather, condensation grows self-organizing cluster hierarchies that emerge
through local interactions among the data manifold’s sampling density and curvature vari-
ation. To disentangle and illustrate such properties, we provide condensation video stills
in Figure 1.1. To begin we highlight the hyperuniformly-sampled (i.e., grid-sampled) circle
manifold on the top-left of Figure 1.1, which demonstrates the base case of homogeneous
data density and constant curvature. Note the absence of cluster formation. Comparing this
to the hyperuniformly-sampled ellipse on the right of Figure 1.1, we observe the formation
of nontrivial condensation clusters, particularly in the regions of high curvature.
    Similarly, the uniformly-sampled circle manifold of constant curvature in the bottom-
left of Figure 1.1 exhibits local cluster formation. Hence, we conjecture that nontrivial
data density or curvature variation are sufficient conditions for the formation of diffusion
condensates.
1.6.2    Cluster characterization via spectral decay
    In addition to still frames, it is enlightening to consider cluster formation via its correspon-
dence with the spectral decay. Figure 1.2 demonstrates that data condensation corresponds
with sudden, rapid spectral decay. Recall that a nested series of hierarchical data representa-
tions may be achieved through diffusion maps by taking successive powers of the diffusion op-
erator, Pt (not P(t) ), or, equivalently, powering its eigenvalues, λti ∈ [0, 1) for i = 2, 3, . . . , N ,
which function as coordinates of the spectral embedding (e.g., xj 7→ {λti ψi (xj )}i≥2 , for all
xj ∈ X). Of particular interest is the contrast between smooth decay to 0 of the diffusion
maps spectrum as t → ∞ and the rapid, finite-time eigenvalue and singular value decays
of Pt and P(t) , respectively, pictured in Figure 1.3. The latter characterization may be
useful in the identification of hierarchical condensation events in high dimensions, for ex-
                                                   9


Figure 1.1 Condensation of hyperuniform circle (top-left), uniform circle (bottom-left), and
hyperuniform ellipse (right) at early/late iterations (left/right, respectively); point radius
corresponds to local density; arrows computed via the infinitesimal generator Pk −I  ϵ
                                                                                       N
                                                                                          show
the gradient field and clearly depict data point acceleration during cluster condensation
ample. We note that while the condensation operator, P (t) , is constructed as in Marshall
and Hirn (2018), its use in clustering is novel. Spectral characterizations of cluster hierarchy
persistence further differentiate the present work. For instance, Figure 1.2 displays many
features of interest. Most striking is the correspondence between rapid spectral decay of Pt
and cluster formation, which are depicted just before the moment of condensation. We see
three major areas of cluster formation beginning near iteration 15, again near iteration 53,
and once again near the last iteration, 100, when the algorithm halts.
1.6.3    Condensation allows multiscale persistence analysis
     Since the condensation algorithm naturally allows points to come together via a low pass
filter application at each iteration, the time-point in the process at which clusters naturally
come together and the length of time for which a cluster persists (without merging) offer
notions of cluster metastability. This can be used to derive a partitioning of the dataspace
that has mixed levels of granularity. By contrast, most clustering methods are only able
                                               10


Figure 1.2 Alternative characterization of hyperuniform ellipse cluster formation via top 14
nontrivial singular values of the Markov/diffusion operators, {{σi (Pk )}tk=0 }15
                                                                                i=2 (far left),
and corresponding video stills of hyperuniformly-sampled ellipse condensation at iterations
15, 53, and 100
to produce results at a particular granularity; for example, k-means tends to favor clusters
that roughly divide the data into k partitions of similar sizes. However, different parts of
the dataspace may naturally separate at different levels of granularity and this is not visible
in other methods. Even hierarchical clustering, due to forced splits and merges, may not
reveal the levels of granularity at which data groupings are most distinct. We visualize this
persistence information using Sankey diagrams (see Figures 1.4 and 1.5) that show natural
groupings of the data. In Section 1.7.1 we use this capability of condensation to suggest
a more relevant subtyping of retinal bipolar neurons on the basis of their transcriptomic
profile, as compared to previous literature.
1.7    Empirical Results
1.7.1    Single-cell transcriptomics data
    A recent study of retinal bipolar neurons using single-cell transcriptomics was performed
(Shekhar et al., 2016) to classify cells into coherent subtypes. The study identified 15 cell sub-
types by using the method of Blondel et al. (2008), of which 13 were well known and 2 were
novel. We use the findings of said study to benchmark the condensation algorithm. From
the dataset, we use a randomly selected sample of 20,000 cells with gene expression counts
                                                 11


                1.0
                0.5
                0.0
                1.0   0         20           40          60          80
            k   0.5
                0.0
                1.0   0         20           40          60          80
                0.5
                0.0
                      0         20           40          60          80
                                               Iteration
Figure 1.3 Characterization of hyperuniform ellipse cluster formation via top 14 nontrivial
                                                                    i=2 (top, see also Figure
singular values of the Markov/diffusion operators, {{σi (Pk )}tk=0 }15
1.2), {{σi (P )}k=0 }i=2 (middle), and {{σi (P )}k=0 }i=2 (bottom, diffusion maps operator)
             (k) t    15                      k t     15
sequenced to a median depth of 8,200 mapped reads per cell to perform condensation. The
condensation ran for 64 iterations until it achieved a metastable state of 12 clusters (close to
the 15 reported in Shekhar et al. (2016)). However due to the continuous clustering history
offered by condensation, we are able to assess when these metastable clusters first form;
see Figure 1.4 for a diagram of iterations 44 to 64. A key advantage of the condensation
method is its ability to compute cluster persistence based on the lengths of the clustering
tree branches, which we use to reassess the subtyping of retinal bipolar cells performed in
Shekhar et al. (2016).
   Using community detection methods, Shekhar et al. (2016) found that cluster BC1 (bipo-
lar cone cells, subtype 1) is better described as two clusters, BC1A and BC1B. Shekhar et
al. Shekhar et al. (2016) even confirm that morphologically BC1B seems to be a unipolar
cluster rather than bipolar. Condensation clustering corroborates this new finding. Indeed,
as shown in Figure 1.4, the dark grey BC1 subclusters stay persistently separated until the
                                              12


Figure 1.4 Sankey diagram showing results of 20 iterations of diffusion condensation on the
scRNA-Seq retinal bipolar dataset; left side representing final clusters and right
representing earlier stages of the process; the two dark strands represent BC1B and BC1A
sub-populations; red representing BC3A cell type which becomes distinct quite early in the
process; light green being BC7, forms from three distinct strands suggesting possible subtle
sub-populations
last iteration shown. Therefore, the two subcluster-state is more persistent than the single
cluster.
     On the other hand, condensation suggests alternative groupings of other clusters not iden-
tified by previous papers on retinal bipolar neurons including Shekhar et al. (2016) . Among
these, we find that although BC3 has been described in terms of two subcomponents, BC3A
and BC3B in biological literature, and in Shekhar et al. (2016), these subclusters merge early
(iteration 53) and the transcriptional profiles are not significantly distinct overall, despite
certain selective markers such as Erbb4, Nnat being different between the two. Additionally,
                                               13


we find that our results strongly suggest that BC7 consists of 3 distinct subtypes that per-
sist separately until the last iteration. Previously, the BC7 cell type has been described as
a Vstm2b+Casp7+ cone cell that is distinct from other BC types as predicted by Shekhar
et al. (2016). Our analysis, however, reveals that there may be multiple sub-populations that
are distinct within this cell type designation. While additional experimentation are required
to follow up on this finding, condensation provides a way to examine granularities at which
data is best organized based on cluster persistence via the whole condensation history.
1.7.2    Neural connectome data
    Since the condensation algorithm operates via a series of diffusion operators, which can
be regarded as types of adjacency matrices, we sought to understand if the algorithm would
apply to coordinate-free spaces. To achieve this we took a datatype that naturally exists as
a graph: the neural connectome data of the Caenorhabditis elegans brain, a neuropil called
the “nerve ring” consisting of 181 neurons. Here an adjacency matrix was created from
the contact profiles determined by images along slices of the worm, i.e., neurons that were
more frequently in contact with one another were assumed to have a stronger connection
and communication with one another. This adjacency matrix was then eigendecomposed
to create a coordinate space in order to perform the condensation. The remainder of the
algorithm remained as described.
    First we sought to test out the robustness of the condensation algorithm by applying it
to two complete connectomes of the Caenorhabditis elegans brain. Previous comparisons be-
tween these connectomes had concluded that they largely share similar structure at the level
of cell morphology and synaptic positions (White et al., 1986). We therefore hypothesized
that by comparing the output from these similar connectomes we could test the robustness
of our algorithm. Specifically, we focused on analyzing the relationship between cell-cell
contact profiles for every neuron within the two connectomes (Brittin et al., 2018). Cell-cell
contact relationships should define modules with the brain that are bundled together, and
we hypothesized that if the algorithm was working as expected, it should extract similar con-
                                               14


Figure 1.5 A, B) Sankey diagrams of the condensation results for two C. elegans
connectomes: a young adult (A) and an adult worm (B). C, D) Sankey diagrams of a
subset on neurons (blue in A and B) for a young adult (C) and an adult worm (D); letter
code corresponds to specific neuron names; cells are pseudocolored based on the function of
the specific neuron in the circuit, with blue representing mechanosensory neurons, red
representing interneurons, and yellow representing additional neurons unknown to function
in this circuit; note similar condense profiles at the level of single cells between both young
adult (C) and an adult worm (D) connectomes; cluster outlined in black is the cluster
analyzed further in (E); E) Circuit diagram for the anterior body mechanosensation circuit;
we color the specific neurons according to function as in (C and D); circuit diagram
adapted from Girard et al. (2006); F) Cartoon depicting the head of C. elegans; the
vertical black line shows where the electron micrograph serial section was collected; G)
Serial electron micrograph image; neurons corresponding to the brain of the animal are
highlighted green; H) Cropped view of a cross section from the serial electron micrographs
corresponding to the anterior body mechanosensation circuit (represented in E); neurons
are pseudocolored as in (C-E); note how both the relative positions contact profiles of these
neurons are similar between both animals, as predicted by the algorithm
                                               15


tact profiles among the two connectomes. We observed that our algorithm produced similar
condense profiles for the two connectomes (See Figure 1.5), suggesting that our method can
be used to robustly analyze connectomics data. We see that the Sankey diagrams preserve
much of the structure including an important mechanosensory circuit. To quantify the sim-
ilarity between condensation clusters generated from the two connectomes, we compute the
adjusted Rand index (ARI) at each condensation iteration from 0 to 24 and then take the
mean. This yields an ARI = 0.7, for −1 ≤ ARI ≤ 1, where the closer to ARI = 1 the better.
    A major advantage of the diffusion condensation algorithm 1.1 is that it allows analyses of
computational iterations to extract biologically relevant information informing the clustering
steps. We hypothesized that these iterative steps could reveal units of circuit architecture
underlying the brain. To test this, we examined the clusters for well described circuits,
specifically, for the anterior body mechanosensation circuit (Girard et al., 2006). The anterior
body mechanosensation circuit contains 2 classes of mechanosensory neurons and 4 classes of
command interneurons that contact and connect to each other, and based on their contact
profile, should be identified by the algorithm (Chalfie et al., 1985; Wicks and Rankin, 1995).
Indeed, iteration 14 (Figure 1.5) identified the circuit in both worms, revealing the predicted
relationships between these connecting neurons. Interestingly, iteration 14 also contains
neurons of unknown function that, upon closer inspection, are closely associated to the
circuit, but have not been implicated in mechanosensory behaviors. Therefore inspection of
the condensation algorithm not only extracted the known circuit, but also motivated a new
hypothesis regarding the function of unknown neurons associated to the circuit. Together,
our analyses demonstrate that the method can be used to compare connectomics data across
organisms, to extract biologically relevant units of circuit architecture and even to inform
new experiments and discoveries of biological importance. We propose that this method will
be broadly useful for systems level analysis of connectomics data.
                                               16


1.7.3    Algorithm comparison
    We compare condensation at two times with Mini Batch K-Means, Agglomerative Clus-
tering with Ward linkage, and Agglomerative Clustering with average linkage. The two
condensation times are early and later, where early is half the iterations of later. The com-
putational experiments are conducted on part of the scikit-learn clustering dataset with
default, tuned parameter values and datasets. The variance of the center blob in the Gaus-
sian blobs dataset (Figure 1.6, row two) was decreased from 2.5 to 1.5 for separability.
Figure 1.6 Condensation results as compared with Mini Batch K-Means (center),
Agglomerative Clustering with Ward linkage (center-right), and Agglomerative Clustering
with average linkage (right) on the scikit-learn clustering dataset (N = 300); the early
condensation snapshot (left) is taken at half the iterations of the later (center-left)
    In Figure 1.6 we see the earlier iteration of condensation exhibits finer clustering by
curvature than the later. Similarly, row three of Figure 1.6 exhibits coarser clustering in the
later condensation labeling. These examples demonstrate the multiscale nature of clusters
assigned via condensation. We note that while we employ only the Euclidean metric in these
examples, preliminary tests using other metrics yield promising results.
                                              17


1.8     Conclusion
     We presented a multiresolution data abstraction approach based on a time-inhomogeneous
diffusion condensation process that gradually coarse grains data features along the intrin-
sic data geometry. We demonstrated the application of this method to biomedical data
analysis, in particular in single cell transcriptomics. Furthermore, the presented diffusion
condensation can be seen as a cascade of data-driven lowpass filters that gradually elim-
inates variations in the data to extract increasingly abstract features. Indeed, under this
interpretation, the abstraction provided by the condensation process can be related to com-
mon intuitions of features extracted by hidden layers of deep convolutional networks, e.g., in
image processing. Such features are commonly considered as increasing in abstraction capa-
bilities together with the depth of the network. However, we note that while convolutional
networks typically employ relatively-simple pointwise nonlinearities, here the nonlinearity we
employ is the reconstruction of the diffusion geometry based on the coarse grained features
along the cascade. Therefore, our cascade both learns a multiresolution data geometry and
extracts multiresolution characterizations of groupings based on invariant features at each
iteration. Finally, we note the increasing interest in geometric deep learning, which aims to
tie together filter training in deep networks with non-Euclidean geometric structures that
often exist intrinsically in modern data. While our approach here relies only on lowpass
filters, it opens interesting directions in employing trained filters (or even designed diffusion
wavelets, as done in diffusion and geometric scattering (Gama et al., 2019; Gao et al., 2019)
together with geometric reconstruction for multiscale feature extraction from data.
1.9     Acknowledgements
     In reference to IEEE copyrighted material which is used with permission in this thesis,
the IEEE does not endorse any of Michigan State University’s products or services. Internal
or personal use of this material is permitted. If interested in reprinting/republishing IEEE
copyrighted material for advertising or promotional purposes or for creating new collective
works for resale or redistribution, please go to http://www.ieee.org/publications_standards/
                                                18


publications/rights/rights_link.html to learn how to obtain a License from RightsLink. If
applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may
supply single copies of the dissertation.
                                            19


                                    BIBLIOGRAPHY
Angelo, M., Bendall, S. C., Finck, R., Hale, M. B., Hitzman, C., Borowsky, A. D., Levenson,
  R. M., Lowe, J. B., Liu, S. D., Zhao, S., et al. (2014). Multiplexed ion beam imaging of
  human breast tumors. Nature medicine, 20(4):436.
Bendall, S. C., Simonds, E. F., Qiu, P., Amir, E.-a. D., Krutzik, P. O., Finck, R., Bruggner,
  R. V., Melamed, R., Trejo, A., Ornatsky, O. I., Balderas, R. S., Plevritis, S. K., Sachs, K.,
  Pe, D., Tanner, S. D., and Nolan, G. P. (2011). Single-Cell Mass Cytometry of Differential.
  Science, 332(May):687–695.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of
  communities in large networks. Journal of statistical mechanics: theory and experiment,
  2008(10):P10008.
Brittin, C. A., Cook, S. J., Hall, D. H., Emmons, S. W., and Cohen, N. (2018). Volumet-
  ric reconstruction of main caenorhabditis elegans neuropil at two different time points.
  bioRxiv, page 485771.
Brugnone, N., Gonopolskiy, A., Moyle, M. W., Kuchroo, M., van Dijk, D., Moon, K. R.,
  Colon-Ramos, D., Wolf, G., Hirn, M. J., and Krishnaswamy, S. (2019). Coarse graining of
  data via inhomogeneous diffusion condensation. In 2019 IEEE International Conference
  on Big Data (Big Data), pages 2624–2633. IEEE.
Chalfie, M., Sulston, J. E., White, J. G., Southgate, E., Thomson, J. N., and Brenner,
  S. (1985). The neural circuit for touch sensitivity in caenorhabditis elegans. Journal of
  Neuroscience, 5(4):956–964.
Coifman, R. R. and Lafon, S. (2006). Diffusion maps. Applied and computational harmonic
  analysis, 21(1):5–30.
Dasgupta, A., Hopcroft, J., Kannan, R., and Mitra, P. (2006). Spectral clustering by recur-
  sive partitioning. In European Symposium on Algorithms, pages 256–267. Springer.
Galluccio, L., Michel, O., Comon, P., Kliger, M., and Hero, A. O. (2013). Clustering with a
  new distance measure based on a dual-rooted tree. Information Sciences, 251:96–113.
Gama, F., Ribeiro, A., and Bruna, J. (2019). Diffusion scattering transforms on graphs. In
  International Conference on Learning Representations. arXiv:1806.08829.
Gao, F., Wolf, G., and Hirn, M. (2019). Geometric scattering for graph data analysis.
  To appear in the Proceedings of the 36th International Conference on Machine Learning,
  arXiv:1810.03068.
Giesen, C., Wang, H. A., Schapiro, D., Zivanovic, N., Jacobs, A., Hattendorf, B., Schüffler,
                                             20


  P. J., Grolimund, D., Buhmann, J. M., Brandt, S., et al. (2014). Highly multiplexed
  imaging of tumor tissues with subcellular resolution by mass cytometry. Nature methods,
  11(4):417.
Girard, L. R., Fiedler, T. J., Harris, T. W., Carvalho, F., Antoshechkin, I., Han, M., Stern-
  berg, P. W., Stein, L. D., and Chalfie, M. (2006). Wormbook: the online review of
  caenorhabditis elegans biology. Nucleic acids research, 35(suppl_1):D472–D475.
Izenman, A. J. (2012). Introduction to manifold learning. Wiley Interdisciplinary Reviews:
  Computational Statistics, 4(5):439–446.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., and Wu, A. Y.
  (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE
  Transactions on Pattern Analysis & Machine Intelligence, 24(7):881–892.
Kaufman, L. and Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster
  analysis, volume 344. John Wiley & Sons.
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C., and Teichmann, S. A. (2015).
  The technology and biology of single-cell rna sequencing. Molecular cell, 58(4):610–620.
Kwitt, R., Huber, S., Niethammer, M., Lin, W., and Bauer, U. (2015). Statistical topological
  data analysis-a kernel perspective. In Advances in neural information processing systems,
  pages 3070–3078.
Levine, J. H., Simonds, E. F., Bendall, S. C., Davis, K. L., El-ad, D. A., Tadmor, M. D.,
  Litvin, O., Fienberg, H. G., Jager, A., Zunder, E. R., et al. (2015). Data-driven phe-
  notypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell,
  162(1):184–197.
Lindenbaum, O., Stanley, J., Wolf, G., and Krishnaswamy, S. (2018). Geometry based data
  generation. In Advances in Neural Information Processing Systems, pages 1400–1411.
Lloyd, S. (1982). Least squares quantization in pcm. IEEE transactions on information
  theory, 28(2):129–137.
Marshall, N. and Hirn, M. J. (2018). Time-coupled diffusion maps. Applied and Computa-
  tional Harmonic Analysis, 45(3):709–728.
Moon, K. R., Stanley, J., Burkhardt, D., van Dijk, D., Wolf, G., and Krishnaswamy, S.
  (2018). Manifold learning-based methods for analyzing single-cell rna-sequencing data.
  Current Opinion in Systems Biology, 7:36–46.
Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal processing
  magazine, 13(6):47–60.
                                              21


Nadler, B., Lafon, S., Kevrekidis, I., and Coifman, R. R. (2005). Diffusion maps, spectral
  clustering and eigenfunctions of fokker-planck operators. In Weiss, Y., Schölkopf, P. B.,
  and Platt, J. C., editors, Advances in Neural Information Processing Systems 18, pages
  955–962. MIT Press.
Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002). On spectral clustering: Analysis and an
  algorithm. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in
  Neural Information Processing Systems 14, pages 849–856. MIT Press.
Shapiro, E., Biezuner, T., and Linnarsson, S. (2013). Single-cell sequencing-based technolo-
  gies will revolutionize whole-organism science. Nature Reviews Genetics, 14(9):618.
Shekhar, K., Lapan, S. W., Whitney, I. E., Tran, N. M., Macosko, E. Z., Kowalczyk, M.,
  Adiconis, X., Levin, J. Z., Nemesh, J., Goldman, M., McCarroll, S. A., Cepko, C. L.,
  Regev, A., and Sanes, J. R. (2016). Comprehensive Classification of Retinal Bipolar
  Neurons by Single-Cell Transcriptomics. Cell, 166(5):1308–1323.e30.
Van Der Maaten, L., Postma, E., and Van den Herik, J. (2009). Dimensionality reduction:
  a comparative. J Mach Learn Res, 10(66-71):13.
Van Dijk, D., Sharma, R., Nainys, J., Yim, K., Kathail, P., Carr, A., Burdziak, C., Moon,
  K. R., Chaffer, C. L., Pattabiraman, D., et al. (2018). Recovering gene interactions from
  single-cell data using data diffusion. Cell, 174(3):716–729.
Von Luxburg, U. (2007).      A tutorial on spectral clustering.    Statistics and computing,
  17(4):395–416.
Wasserman, L. (2018). Topological data analysis. Annual Review of Statistics and Its Ap-
  plication, 5:501–532.
White, J. G., Southgate, E., Thomson, J. N., and Brenner, S. (1986). The structure of the
  nervous system of the nematode caenorhabditis elegans. Philos Trans R Soc Lond B Biol
  Sci, 314(1165):1–340.
Wicks, S. R. and Rankin, C. H. (1995). Integration of mechanosensory stimuli in caenorhab-
  ditis elegans. Journal of Neuroscience, 15(3):2434–2444.
                                               22


                                         CHAPTER 2
                   SELF-SIMILAR GRAPH SIGNALS & SYSTEM
                                     CLASSIFICATION
2.1    Abstract
    Complex systems are characterized by emergent patterns. These patterns often exhibit
self-similarity that arises from interactions across various scales. Self-similar patterns can
be modeled as random fields called textures. In this paper we generalize a class of self-
similar random fields that are indexed by time to random fields that are indexed by the
vertices of a graph. We take the perspective of graph signal processing and draw upon
recent developments in the theory of self-similar random fields on manifolds. by invoking
the duality between graphs and graph signals, we then empirically demonstrate how the
proposed model can support the modeling of complex social-ecological systems phenomena
through data-driven classification of system models. We conclude by suggesting multiple use
cases.
2.2    Introduction
    From cauliflower to clouds to coastlines, self-similar patterns surround us. Mandelbrot
(1967) popularized the study of self-similarity through enchanting patterns called fractals.
Statistical self-similarity refers to properties of distributions the consistently reappear at
various scales. In natural systems, mountain elevations, river water levels, and forest tree
densities are said to be spatially self-similar. Self-similarity appears in tandem with long-
range dependence, which refers to (second-order stationary) signals with correlations that
persist across great distances (Pipiras and Taqqu, 2017). Self-similarity and long-range
dependence are characteristic of complex systems and notably emerge through agent-based
simulations. Such patterns are observed, for instance, in the size distribution of forest fires
and the spatial distribution of communities in Schelling’s segregation model (Batten, 2001).
In economic systems, Pareto studied (Amoroso, 1938; Schoenberg and Patel, 2012) self-
similar patterns in the distribution of income and wealth.
                                               23


    According to Grimm et al. (2005, p.1):
      ...patterns are defining characteristics of a system and often, therefore, indicators
      of essential underlying processes and structures. Patterns contain information on
      the internal organization of a system, but in a coded form.
A major task for systems modelers, then, is to account for these patterns. We explore this
pattern-oriented approach to systems from the perspective of random processes.
2.3   Background
2.3.1    Random Processes & Fields
    A random process can be defined as a collection of real-valued random variables {X (t)}t∈I
that are indexed by a set, I, which contains an origin at t = 0 as well as a linearly ordered
structure like R and Z. Random processes are often used to model time-evolving phenomena
such as prices. A Euclidean random field is an extension of random processes to index sets
that have not only an origin, but also higher-dimensional vector space structure and larger
symmetry groups. For instance, Rd is a d-dimensional vector space whose symmetries include
rotation about the origin. The algebraic structure associated with an index set is often used
to endow a random field with certain properties. A canonical example is the Brownian
motion on R, which has an ordered structure that is defined with respect to the origin. In
general, I could be any set with or without additional structure, for instance, the vertex set
of a graph.
                                                            d                                  d
    A random field is called strictly stationary if X (s) = X (t) for every s, t ∈ I, where =
denotes equality of all finite dimensional distributions. In applications, stationarity functions
as a modeling assumption that can facilitate parameter estimation and theoretical analysis.
Strict stationarity, however, can be too restrictive for modeling purposes, which motivates
consideration of second-order stationary fields for which only the mean and covariance are
assumed to be stationary (Pipiras and Taqqu, 2017), i.e., for every s, t ∈ I with I a vector
                                                24


space and a ∈ R,
                                           E [X (s)] = a,
and,
             Cov (X (s) , X (t)) := E [(X (s) − E [X (s)]) (X (t) − E [X (t)])]
                                 = E [(X (0) − E [X (0)]) (X (t − s) − E [X (t − s)])]
                                 = Cov (X (0) , X (t − s)) .
The field is called centered if a = 0. Vector space structure also enters into random fields via
                                                   d
isotropy (Ayache, 2018), which says X (Qt) = X (t) for each t ∈ Rd and every orthogonal
operator Q on Rd . Following Gelbaum (2014) it is then tempting to define a (centered)
second-order stationary random field as one for which,
                         Cov (X (ι (s)) , X (ι (t))) = Cov (X (s) , X (t)) ,
for every s, t ∈ I and every isometry ι on I. One could even neglect the metric structure
and define a second-order G-stationary random field as one for which,
                        Cov (X (g ∗ s) , X (g ∗ t)) = Cov (X (s) , X (t)) ,
for every s, t ∈ I and g ∈ G, with G a subgroup of Aut (I), the group of automorphisms on
I, and g∗ denoting the group action. We will say a random field {X (t)}t∈I has stationary
increments if,
                                                       d
                           {X (g ∗ s) − X (g ∗ t)} = {X (s) − X (t)} ,
for s, t ∈ I and g ∈ Aut (I). We will similarly say a field is G-stationary if,
                                                     d
                                      {X (g ∗ t)} = {X (t)} .
for all g ∈ G for G as above. If I = Rd , for example, and G consists of all orthogonal trans-
formations, then the field is isotropic. If I is the integer lattice modulo periodic translations
                                                  25


and G consists of all translations, then the field is stationary in the traditional sense. When
the field is centered and characterized by second order moments, our notion of stationarity
corresponds to graph weak stationarity as defined in Marques et al. (2017).
2.3.1.1     Statistical Self-Similarity
    We define a statistically self-similar Euclidean random field as one for which given a
scalar, c ∈ R≥0 ,
                                              d
                                       X (ct) = cH X (t) ,                                  (2.1)
where H is the self-similarity, or Hurst, parameter (Ayache, 2018; Pipiras and Taqqu, 2017;
Gelbaum, 2014). It is notable that this definition implies that there are no stationary sta-
tistically self-similar Euclidean random fields since cH X (t) → ∞ as c → ∞. Furthermore,
X (0) = 0.
2.3.2     Extension to graphs
    We would like to use equation 2.1 to develop models for classes of complex social-ecological
systems phenomena. The immediate challenge encountered when attempting to generalize
this definition to graphs and manifolds, however, is how to interpret the expression X (ct).
Let V denote the vertex set of an undirected graph, Γ = (V, E), with E the edge set.
Changing notation emphasizes the issue, as X (cv) for v ∈ V is nonsensical. This familiar
challenge arises when attempting to extend wavelets to graphs using the spatial/temporal
notions of dilation and translation (Hammond et al., 2011). In the context of manifolds,
Gelbaum (2014) circumvents this issue by utilizing the heat kernel to extend Euclidean
fractional Brownian fields to complete Riemannian manifolds and showing that distributional
scaling can be understood with respect to the underlying metric. We will draw on this
approach and introduce a weaker form of self-similarity based upon wavelet theory.
2.3.2.1     Wavelets
    Wavelets are spatially localized waveforms with concentrated frequency support. Their
localized nature enables measurement and representation of fine details in images and audio
                                               26


(Mallat, 1999) as well as signal data supported on graphs and manifolds (Hammond et al.,
2011; Gao et al., 2019; Perlmutter et al., 2020). The study of wavelets has yielded insights into
the empirical successes of deep convolutional neural networks (both Euclidean and graph)
via the scattering paradigm (Bruna and Mallat, 2013; Mallat, 2016; Perlmutter et al., 2019),
and parallel work has discovered interpretable models of non-Gaussian random fields (Morel
et al., 2022; He and Hirn, 2021; Bruna and Mallat, 2019; Zhang and Mallat, 2021).
    Formally, we can define a wavelet as a zero-mean function,
                                           ψ:I→C
                                                 t 7→ ψ (t) ,
with     |ψ (t)| dt = 1 for some nice measure dt. For our purposes, it will be convenient to
      R
consider a wavelet as a convolution operator on vectors, f , and write this as an inner product,
                                                    Z
                                                              ∗
                                ⟨f, ψv,j ⟩L2 (I) = f (t) ψv,j    (t) dt
                                                 = Ψj f (v) ,    v ∈ I,
where ψv,j denotes Ψ∗j δv , and can be interpreted as a wavelet with spatial support concen-
trated in a neighborhood of v ∈ I at a scale 2j for j ≤ J the maximal scale. For all wavelets
we will consider we have,
                                 ψv,j (t) = ψg∗v,j (g ∗ t) ,    g ∈ G,                      (2.2)
for G a subgroup of the group of automorphisms on I. This corresponds to translation
and/or rotational invariance about the origin in the Euclidean case.
2.3.2.2     Graph Wavelets
    Given the adjacency matrix, A, of a graph, we define the averaging operator,
                                               1
                                                  I + D−1 A ,                               (2.3)
                                                              
                                        P=
                                               2
where D is the diagonal matrix of degrees and I is the identity. Given a signal x on a graph,
Px averages x in 1-hop neighborhoods of each vertex. We next define the wavelet operator
                                                    27


at a scale 2j as,
                                           j−1       j
                                 Ψj = P 2      − P2 ,   j ≤ J.                          (2.4)
This difference between averaging operators applied to a signal yields the details lost to
averaging between scales 2j−1 and 2j . Our wavelets are related to those utilized in graph
scattering (Gama et al., 2019; Gao et al., 2019) and can be interpreted within the framework
of Perlmutter et al. (2019). We note that our wavelets as well as those considered in the
above works satisfy equation 2.2. This is easily seen since adjacency is preserved under
automorphisms.
    Wavelets are powerful tools for processing random fields as we show in the following
result.
Theorem 1. Let X be a random field on I with G-stationary increments, and let dt be the
density of a measure on I that is invariant to the action of G. Then ΨX is G-stationary.
Proof. Let g ∈ G be arbitrary. Since a wavelet has zero mean,
                          Z
                                    ∗
               Ψj X (v) = X (t) ψv,j  (t) dt
                          Z
                                       ∗
                        = X (g ∗ t) ψv,j    (g ∗ t) dt
                          Z
                        = X (g ∗ t) ψg∗−1 ∗v,j (t) dt, (by 2.2)
                          Z                                       Z
                        = X (g ∗ t) ψg−1 ∗v,j (t) dt − X (g ∗ v) ψg∗−1 ∗v,j (t) dt
                                       ∗
                          Z
                        = [X (g ∗ t) − X (g ∗ v)] ψg∗−1 ∗v,j (t) dt
                          Z
                        d
                        = [X (t) − X (v)] ψg∗−1 ∗v,j (t) dt
                          Z
                        = X (t) ψg∗−1 ∗v,j (t) dt
                        = Ψj X g −1 ∗ v .
                                        
    This is meaningful in the context of manifolds, for instance, when dt corresponds to the
Riemannian volume element, G corresponds to a group of isometries, and the wavelets are
                                                28


constructed as in Perlmutter et al. (2020). This perspective may, in turn, be interpreted to
subsume the Euclidean cases where, for instance, wavelet filtering of the Brownian motion on
R produces stationary random processes due to the stationarity of increments of Brownian
motion.
    We note that result 1 is most interesting when the automorphism group on I is nontrivial.
However, 1 is also interesting when considered across all graphs, as it indicates that there are
asymptotically few nonstationary processes with stationary increments since the proportion
of graphs on n vertices with nontrivial automorphism groups tends to zero as n tends to
infinity (Godsil and Royle, 2001).
2.3.2.3     Graph wide sense self-similarity
    As noted by Morel et al. (2022) in the Euclidean setting, the scaling definition 2.1 for
self-similarity can be too restrictive for modeling and is impossible to numerically verify.
This motivates a weaker definition based upon wavelets.
Definition 2.3.1 (Wide-Sense Self-Similarity). A graph random field will be called wide
sense self-similar up to a maximum scale 2J if there exist coefficients c1 , c2 , ζ1 , and ζ2 such
that for all j ≤ J,
                                     E [|Ψj X (v)|] = c1 2jζ1 ,                               (2.5)
and,
                                    E |Ψj X (v)|2 = c2 2jζ2 .                                 (2.6)
                                                  
2.3.3     Multiscale statistics
    Complex systems (e.g., economic, climate, and social systems) exhibit cross-scale depen-
dencies that generate emergent phenomena. Wavelets isolate fluctuations at different scales.
To study interactions between scales it is tempting to consider Gram statistics between these
scales, i.e. ⟨Ψj X, Ψk X⟩ for j, k ≤ J. However, interpreting our wavelets as polynomials on
the spectrum of the normalized graph Laplacian (e.g., Perlmutter et al. (2019)), we see that
                                               29


Ψj and Ψk share very little spectral overlap, and hence ⟨Ψj X, Ψk X⟩ ≈ 0 for j ̸= k. We can,
however, apply the ReLU operator to wavelet filtered signals to push higher frequency com-
ponents into a lower range so as to capture cross-scale statistics. Extending the Euclidean
model of He and Hirn (2021), we introduce our main statistical model for self-similar graph
signals.
Definition 2.3.2 (Nonlinear Graph Wavelet Model). Given a statistically self-similar ran-
dom field, X, we represent it as,
                                                                                         (2.7)
                   
             X 7→ ⟨σ (γΨj X) , σ (γΨk X)⟩L2 (I) : 1 ≤ j ≤ k ≤ J, γ ∈ {−1, 1} .
    Note that the ReLU (rectified linear unit) operator, defined as the pointwise nonlinearity
σ (f ) = max {f, 0}, fulfills,
                                    |f | = σ (f ) + σ (−f ) ,                            (2.8)
Hence, the ReLU graph wavelet statistics subsume the modulus wavelet statistics.
2.4    Methods & Materials
    We employ the statistical model 2.7 to study classification of systems based upon struc-
tural similarity. System structure indicates function (Meadows, 2008). This structure can
be encoded in graphical models that are constructed via interview, collaborative modeling
processes, and even via natural language processing of scholarly texts. As we discuss in
chapter 3, there are certain estimation tasks that can be improved by aggregating knowledge
over expert subpopulations. In such applications, subpopulation identification amounts to
an unsupervised process. In this section, we study self-similar graph signal statistics in a
supervised setting to understand potentials for generalization to unsupervised tasks.
2.4.1    From Modeling to Classifying Systems
    Just as the function of a system is encoded in its structure, patterns of cognition about
system behavior can be represented in models called fuzzy cognitive maps (FCMs) (Gray
et al., 2013) and causal loop diagrams (CLDs) (Voinov et al., 2018). An example FCM is
depicted in Figure 2.1.
                                               30


Figure 2.1 Manually drawn FCM describing food commodity prices and conflict dynamics
in Zamfara, Nigeria; based upon Mmahi and James (2023)
    These models represent causal knowledge about systems as graphs. Using the statistical
model in 2.7, we can place these models of potentially different sizes and providence into a
common framework for comparison.
2.4.1.1     Random Graphs
    We generate graphical surrogates of FCMs and CLDs using random graph models. This
technique is used, for instance, in Aminpour et al. (2020). In our case, surrogates allow for
assessment with respect to ground truths for which we may vary parameters and study per-
formance in a controlled manner. We use random graphs drawn from the Erdös-Renyi, the
Barabasi-Albert, and the Watts-Strogatz model families (pictured in Figure 2.2). Each of
these families reproduces aspects of the FCM/CLD construction process. For instance, the
Barabasi-Albert process mimics the construction of FCMs by individual experts and small
groups of like-minded stakeholders, where new nodes are sequentially connected to more
central concepts with higher probability. By contrast, the Erdös-Renyi family follows a con-
struction process that mimics construction by individuals possessing more general knowledge
as well as larger, more diverse audiences engaged in collaborative modeling where there is less
emphasis placed upon a small handful of central concepts and more on the connectivity of
                                              31


Figure 2.2 Random graph models from the Erdös-Renyi (left), Barabasi-Albert (center),
and Watts-Strogatz (right) families
the entire system. Finally, the Watts-Strogatz small-world model mimics a reflective process
wherein a base model is presented and participants suggest corrections to the proposal.
2.4.1.2    Experimental Setup
    Graphs are sampled from random graph families. We vary the number of nodes, sample
sizes, and parameters of the random graph models to understand performance of the method.
Except where noted in the small-sample experiments, we draw 40 samples from each graph
family. Where relevant and possible, we numerically tune the densities of these graph families
so they are as close as possible on average; this prevents this simple summary statistic from
being used as a proxy. Note the density of a graph is defined as the number of edges divided
by the total number of possible edges. Graph families on the same number of nodes with
approximately equal mean densities therefore have the same number of edges on average as
well. So, our problem really becomes one of studying topology, i.e., system structure.
    Each graph is represented via the covariance structure of a set of qualitatively self-similar
signals defined on the nodes of the graph. As in Gao et al. (2019), for each graph we com-
pute the node-wise eccentricity and clustering coefficient signals. We then form the graph
wavelets described in equation 2.4 and compute the ReLU graph wavelet Gram statistics for
                                              32


both signals as in equation 2.7. We also sum these statistics (interpreted as a 1-1 convo-
lution) and compute the ReLU graph wavelet Gram statistics. This is essentially a 2-layer
graph convolutional neural network in which nothing is learned. These representations are
concatenated to form coordinates to which support-vector machines (SVM) is applied. We
use nested 5-fold cross-validation to tune hyperparamters and assess prediction uncertainty.
Hyperparameters are selected via grid search.
2.5   Results
2.5.1   Single-Class, Varied Parameters
Figure 2.3 Classification accuracy for Erdös-Renyi graphs with edge probabilities p and
p + 0.01. The lower of the probabilities is reported as the x coordinate.
    Figure 2.3 shows the results of classification of Erdös-Renyi graphs for pairs of graph
samples (n = 40 each) where the edge probability parameter of the first sample is p, and
that of the second sample is p + 0.01. p is varied over the range {.3, .4, .5, .6, .7, .8, .9, .98}.
The method is able to distinguish most successfully outside of a neighborhood of p = 0.5
                                              33


and with highest accuracy for the largest values of p.
                                 Experiment       Accuracy
                                 Low-density     0.82 ± 0.020
                               Medium-density    0.85 ± 0.018
                                 High-density    0.99 ± 0.004
                              Table 2.1 Barabasi-Albert results
    To assess multiple parameters from the same class, we apply the method to multiple
samples of Barabasi-Albert graphs with similar densities in a sequence of three experiments
enumerated by mean density. Parameters for these three experiments appear in Appendix
2.7. Table 2.5.1 displays these results. Accuracy is observed to increase with density.
2.5.2    Single-Class, Varied Size
Figure 2.4 Classification accuracy on pairs of samples from Erdös-Renyi graphs whose
number of nodes differs by the value along the x-axis (i.e., zero corresponds to identical
families)
    To understand method performance at distinguishing graph size, we draw samples from
                                              34


two classes of Erdös-Renyi graphs with the same edge probability parameter, p, and vary
the number of nodes. For each experiment, samples are drawn from one family with node
count fixed at 100, while another sample is drawn from a family with a number of nodes
equal to 60, 80, 90, 95, 98, and 99. This corresponds to six pairs of samples in total. Figure
2.4 shows that classification accuracy decreases as the graph families become more similar.
There appears to be a qualitative increase in the variance of the predictive accuracy as well.
2.5.3    Multiclass Classification
Figure 2.5 Multiclass classification accuracy as a function of mean density of the graph
families
    To assess performance on a presumably more complex task, we introduce a multiclass
classification problem. For each of four experiments we draw six sets of samples from:
   1. Erdös-Renyi with 100 nodes
   2. Erdös-Renyi with 50 nodes
   3. Barabasi-Albert with 100 nodes
                                              35


   4. Barabasi-Albert with 50 nodes
   5. Watts-Strogatz with 100 nodes
   6. Watts-Strogatz with 50 nodes.
These families are each tuned to have approximately the same density on average with the
exception of the highest density experiment for which the Barabasi-Albert model achieved
only a maximum mean density of 0.5. The other two graph family densities coincided.
Figure 2.5 indicates a drop in performance as density increases. However, the worst case
mean performance is conservatively lower-bounded by 86%.
2.5.4   Small Sample Sizes
     Figure 2.6 Small-sample classification accuracy of Erdös-Renyi and Watts-Strogatz
    Finally, we study the effect of small sample sizes through a sequence of pairs of exper-
iments. For sample sizes 20, 10, and 5, we compare classification of Erdös-Renyi graphs
with different numbers of nodes (60 and 100) and the same edge probability, p = 0.5. We
also compare classification of Erdös-Renyi and Watts-Strogatz families on 100 nodes at these
                                              36


sample sizes. As depicted in Figure 2.6, smaller sample sizes result in a significant drop in
accuracy only at the most extreme end.
2.6    Discussion
    The ReLU graph wavelet statistics are able to differentiate families of graphs remarkably
well across a variety of parameter regimes, in multiclass tasks, and crucially for development
applications, in the low-data regime. The statistical model exhibits notably fine resolution
when distinguishing Erdos-Renyi graphs where edge probability differs by only 0.01. We
see the same capabilities in the classification of Barabasi-Albert models with immediately
adjacent parameters.
    The drop in performance seen while reducing the difference in number of nodes between
classes while holding edge probability fixed (and equal) suggests that these statistics differ-
entiate systems based upon structure as encoded in the graph topology. This is emphasized
by (1) the fine detail captured when node numbers and families are equal and edge forma-
tion probability is similar, and (2) the low performance in distinguishing 99 from 100 nodes.
This is desirable as we want to learn emergent structure, and topologically we expect that
there is not much difference between two Erdos-Renyi models with the same edge formation
probability and very similar node counts. The analogous result holds for the other models
considered, and it is reasonable to expect this generally when there is some logic, formal pro-
cess, or consistent heuristics underlying construction. Given that we want to treat systems
as dynamically similar even if they differ by a small number of nodes, this can be seen as a
strength of the method and framework.
    A potential drawback of the method is that it treats directed, signed, weighted graphs as
undirected, unsigned, weighted graphs. Extension to directed graphs is straightforward and
there are a number of avenues of exploration that we leave to future work.
2.7    Conclusion
    In this work we used complex systems theory and theoretical machine learning to guide
development of a statistical model of self-similar emergent phenomena on graphs and more
                                               37


general spaces. We then empirically demonstrated in silico that the theory leads to a mean-
ingful framework via a new method for classifying systems based upon structure. We con-
nected this with perspectives from collaborative systems modeling and suggested how it
could be implemented to support estimation tasks as described in chapter 3. This frame-
work opens a number of exciting possibilities. Preliminary work incorporating the texture
synthesis paradigm (He and Hirn, 2021) suggests that our statistical model might be success-
ful at supporting wisdom-of-crowds type forecasts of complex dynamical phenomena such as
the interactions among food prices and social-ecological processes in low-data environments
                                            38


                                    BIBLIOGRAPHY
Aminpour, P., Gray, S. A., Jetter, A. J., Introne, J. E., Singer, A., and Arlinghaus, R. (2020).
  Wisdom of stakeholder crowds in complex social–ecological systems. Nature Sustainability,
  3(3):191–199.
Amoroso, L. (1938). Vilfredo pareto. Econometrica: Journal of the Econometric Society,
  pages 1–21.
Ayache, A. (2018). Multifractional stochastic fields: wavelet strategies in multifractional
  frameworks. World Scientific.
Batten, D. F. (2001). Complex landscapes of spatial interaction. The Annals of Regional
  Science, 35:81–111.
Bruna, J. and Mallat, S. (2013). Invariant scattering convolution networks. IEEE transac-
  tions on pattern analysis and machine intelligence, 35(8):1872–1886.
Bruna, J. and Mallat, S. (2019). Multiscale sparse microcanonical models. Mathematical
  Statistics and Learning, 1(3):257–315.
Gama, F., Ribeiro, A., and Bruna, J. (2019). Diffusion scattering transforms on graphs. In
  International Conference on Learning Representations.
Gao, F., Wolf, G., and Hirn, M. (2019). Geometric scattering for graph data analysis. In
  International Conference on Machine Learning, pages 2122–2131. PMLR.
Gelbaum, Z. (2014). Fractional brownian fields over manifolds. Transactions of the American
  Mathematical Society, 366(9):4781–4814.
Godsil, C. and Royle, G. F. (2001). Algebraic graph theory, volume 207. Springer Science &
  Business Media.
Gray, S. A., Zanre, E., and Gray, S. R. (2013). Fuzzy cognitive maps as representations of
  mental models and group beliefs. In Fuzzy cognitive maps for applied sciences and engi-
  neering: From fundamentals to extensions and learning algorithms, pages 29–48. Springer.
Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W. M., Railsback, S. F., Thulke,
  H.-H., Weiner, J., Wiegand, T., and DeAngelis, D. L. (2005). Pattern-oriented modeling
  of agent-based complex systems: lessons from ecology. science, 310(5750):987–991.
Hammond, D. K., Vandergheynst, P., and Gribonval, R. (2011). Wavelets on graphs via
  spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150.
He, J. and Hirn, M. (2021). Texture synthesis via projection onto multiscale, multilayer
                                               39


  statistics. arXiv preprint arXiv:2105.10825.
Mallat, S. (1999). A wavelet tour of signal processing. Elsevier.
Mallat, S. (2016). Understanding deep convolutional networks. Philosophical Trans-
  actions of the Royal Society A: Mathematical, Physical and Engineering Sciences,
  374(2065):20150203.
Mandelbrot, B. (1967). How long is the coast of britain? statistical self-similarity and
  fractional dimension. science, 156(3775):636–638.
Marques, A. G., Segarra, S., Leus, G., and Ribeiro, A. (2017). Stationary graph processes
  and spectral estimation. IEEE Transactions on Signal Processing, 65(22):5911–5926.
Meadows, D. H. (2008). Thinking in systems: A primer. chelsea green publishing.
Mmahi, O. P. and James, F. T. (2023). Brigandage and criminal victimization in nahuche
  community, zamfara state: impact on food security. Environment, Development and Sus-
  tainability, pages 1–18.
Morel, R., Rochette, G., Leonarduzzi, R., Bouchaud, J.-P., and Mallat, S. (2022). Scale
  dependencies and self-similarity through wavelet scattering covariance. arXiv preprint
  arXiv:2204.10177.
Perlmutter, M., Gao, F., Wolf, G., and Hirn, M. (2019). Understanding graph neural net-
  works with asymmetric geometric scattering transforms. arXiv preprint arXiv:1911.06253.
Perlmutter, M., Gao, F., Wolf, G., and Hirn, M. (2020). Geometric wavelet scattering
  networks on compact riemannian manifolds. In Mathematical and Scientific Machine
  Learning, pages 570–604. PMLR.
Pipiras, V. and Taqqu, M. S. (2017). Long-range dependence and self-similarity, volume 45.
  Cambridge university press.
Schoenberg, F. P. and Patel, R. D. (2012). Comparison of pareto and tapered pareto dis-
  tributions for environmental phenomena. The European Physical Journal Special Topics,
  205(1):159–166.
Voinov, A., Jenni, K., Gray, S., Kolagani, N., Glynn, P. D., Bommel, P., Prell, C., Zellner,
  M., Paolisso, M., Jordan, R., et al. (2018). Tools and methods in participatory modeling:
  Selecting the right tool for the job. Environmental Modelling & Software, 109:232–255.
Zhang, S. and Mallat, S. (2021). Maximum entropy models from phase harmonic covariances.
  Applied and Computational Harmonic Analysis, 53:199–230.
                                              40


                                    APPENDIX
                        EXPERIMENTAL PARAMETERS
Parameters for these three experiments are (where n denotes the number of nodes and m
the Barabasi-Albert parameter),
   1. Low-Density Experiment
       a) n = 100, m = 22
       b) n = 100, m = 23
        c) n = 100, m = 21
       d) n = 50, m = 9
        e) n = 50, m = 10
        f) n = 50, m = 11
   2. Medium-Density Experiment
       a) n = 100, m = 44
       b) n = 100, m = 45
        c) n = 100, m = 43
       d) n = 50, m = 22
        e) n = 50, m = 23
        f) n = 50, m = 21
   3. High-Density Experiment
       a) n = 100, m = 65
       b) n = 100, m = 66
        c) n = 100, m = 67
       d) n = 50, m = 35
        e) n = 50, m = 34
        f) n = 50, m = 33
                                           41


                                         CHAPTER 3
                   FROM ‘OUGHT’ TO ‘IS’: A COMPARISON OF
             UNSUPERVISED METHODS FOR VALUES-INFORMED
                                 WISDOM OF CROWDS
3.1    Abstract
    Many social and ecological problems require us to consider objectively verifiable phe-
nomena as well as subjective states of knowledge and associated value systems. When ap-
proximating the facts of reality, the wisdom of crowds phenomenon demonstrates that many
pooled estimates can be more accurate than individual or expert estimates. For complex
and social systems, wisdom of crowd approaches are improved by aggregating knowledge over
subpopulations. In this paper we introduce three unsupervised methods for identifying sub-
populations based upon value-laden statements in narrative data from hyperlocal maternal
and child health (MCH) contexts in Gombe State, Nigeria. We employ data science tech-
niques and compare methods to assess the stability of inferences. We find the hypothesized
groups to be method dependent and discuss implications for wisdom-of-crowd estimates in
sustainable development contexts.
3.2    Introduction
    Our most pressing social and ecological problems require consideration of not only objec-
tively verifiable phenomena, but also more subjective states of knowledge and value systems.
When approximating the facts of reality, the wisdom of crowds phenomenon empirically
demonstrates that the pooled estimates of a group of individuals can often provide a more
accurate model than most individual estimates (Yi et al., 2012; Galton, 1907). The key to
leveraging the wisdom of crowds rests in pooling estimates. The seminal research by Gal-
ton (1907) on the wisdom of crowds presents evidence that the median of a group of scalar
estimates of the weight of an ox provides a more sensible model than the mean due to its
greater stability to outliers. Yi et al. (2012) subsequently showed that individual estimates
of solutions to higher dimensional problems—specifically the Euclidean minimum spanning
                                                42


tree and traveling salesman problems—can be combined by reference to the majority to
better approximate the optimal. Each of these tasks are designed around a ground truth
with respect to which estimates can be compared. This ground truth constitutes a uniquely
specified problem ontology, the is of the problem.
    Such an unambiguous specification is generally not possible, however, in complex social
and ecological problems. As observed by Levin et al. (2021), many sustainability problem
contexts exhibit what Breiman (2001) calls the Rashomon effect wherein multiple plausible
models account for the objectively verifiable aspects and yet remain mutually inconsistent.
This is further complicated by the wickedness of such problems, which signifies disagreement
among stakeholders regarding the facts, the nature of the problem, and/or the manner in
which it ought to be addressed (Rittel and Webber, 1973). Hence, a “ground truth” may
be ill-defined. Nevertheless, human actors must engage with these problems, which compels
wisdom-of-crowds researchers to develop frames of reference with respect to which progress
may be assessed. Aminpour et al. (2020) do so by comparing models aggregated from fisheries
stakeholders to that from a group of traditional scientific “experts." Similar to Galton (1907)
and Yi et al. (2012), Aminpour et al. (2020, 2021) show that better models are attained by
first averaging over stakeholder subpopulations and then aggregating based upon the median
of these subgroup models, allowing noise to be reduced and the expertise to be represented.
In hyperlocal contexts relevant to the present study, however, formal expert models are
lacking. Instead, local stakeholders possess the expert knowledge. Researcher outsiders
must, in turn, develop methodologies that operationalize this knowledge while accounting
for heterogeneity among participant value systems and cognitions.
    In this paper we investigate a process for hypothesizing salient subgroups within human
populations based upon values that individuals express in narrative. We call these sub-
groups cognitographic clusters. Inspired by replicability issues in general science as well as
the rapid expansion of machine learning into social science, we study three unsupervised
methods: the traditional method of K-means, an emerging method utilizing hierarchical
                                              43


density-based spatial clustering of applications with noise (HDBSCAN) and uniform man-
ifold approximation and projection (UMAP), as well as a new method that draws upon
optimal transport (OT) theory and spectral clustering. We compare method performance
on data from the maternal and child health (MCH) service utilization contexts of three lo-
cal government areas (LGAs) in Gombe State, Nigeria. At each step, we reiterate upon
the traditional-and-emerging-method theme by employing a traditional, statistical method
of analysis and another emerging method from the field of explainable artificial intelligence
(AI). To understand the extent to which inferences are method-agnostic or -dependent we
compare across methods.
3.3   Background
    In hyperlocal sustainable development contexts, there is misalignment among formal
expert predictions and local behavior. This stems not only from constrained disciplinary
perspectives underlying expert models, but also uncertainty about, and heterogeneity among,
local behavior. Failure to account for this mismatch produces undesirable outcomes that
perpetuate old problems and generate new. For instance, failure to consider local burial
practices in the early stages of the 2014 Ebola outbreak in West Africa led to insufficient
projections of contagion, which generated the largest such epidemic in history (Maxmen,
2015). Similarly, reduction of the maternal mortality ratio has stagnated in the last decade
(WHO, 2023), particularly in northern Nigeria (GSMoH, 2010) where demand for maternal
and child health (MCH) services is influenced by sociocultural factors (Sinai et al., 2017).
In these contexts, behavior tied to heterogeneous local value systems produces outcomes at
odds with development goals.
    For outsider researchers tasked with decision-making support for sustainable development
missions, there is relatively little information available about cultural and system-level factors
that generate changes in behavior. There is, furthermore, uncertainty about how local
subpopulations understand and explain changes as a function of interactions among these
factors. This Rashomon effect (Breiman, 2001) has been observed in complex social system
                                                 44


contexts (Levin et al., 2021) where it intertwines with the wicked problems phenomenon
(Rittel and Webber, 1973) to present fundamental modeling challenges. Estimating a systems
model requires accounting for this epistemic heterogeneity. What if modelers were able to
leverage this diversity of perspective to improve sustainable development operations?
    Transdisciplinary research methodologies like participatory modeling (PM) (Gray et al.,
2015; Voinov et al., 2018) are often used to uncover connections among diverse values and
knowledge, and to identify factors that explain how these manifest in behavior. Recent
research in collective intelligence and PM suggests that local stakeholder knowledge may be
profitably combined across subpopulations to account for scientifically validated features of
complex social-ecological systems (Aminpour et al., 2020, 2021). We draw upon this wisdom
of stakeholder crowds to better understand hyperlocal drivers of change in MCH utilization.
3.3.1    Wisdom of Crowds
    The wisdom of crowds refers to the ability of groups to outperform individuals in decision-
making and estimation tasks. Wisdom of crowd inferences are typically generated by pooling
yes-no or scalar valued estimates made by individuals. The INFER project (INtegrated
Forecasting and Estimates of Risk) and Cosmic Bazaar, for example, use crowd-sourced
event probability estimates to produce early warning signals for policymakers (Team, 2023).
Investigations by Yi et al. (2012) show that success in the scalar regime can generalize
to certain higher dimensional phenomena, and in particular to the properties of graphs
embedded in Euclidean space. Aminpour et al. (2020) further suggests that the wisdom of
stakeholder crowds is able to approximate scientifically validated aspects of complex social
and ecological phenomena. In the context of MCH utilization, individual estimates of the
system are causal explanations. They relate system components to causal relationships
among them. Fuzzy cognitive maps (FCMs) can be used to tap into this locally specific
causal knowledge (Aminpour et al., 2020) of the MCH system.
                                              45


3.3.2    Fuzzy Cognitive Mapping
    FCMs are signed, digraph causal models that provide representations of complex dy-
namical systems. Factors are represented as nodes whose values qualitatively increase and
decrease, and causal relationships are encoded as edges decorated with signs and weights
corresponding to causal direction and magnitude, respectively. The FCM framework was
introduced by Kosko (1986) to extend Axelrod’s cognitive maps in a manner that accounts
for the fuzziness of natural language. Fuzziness arises when attempting to precisely disam-
biguate, for instance, “a lot of rain” from “a little bit of rain.” FCMs have been employed
to understand human decision making in a wide range of complex contexts including war
games, organizational management, engineering (Papageorgiou and Salmeron, 2012), and
social and ecological systems (Özesmi and Özesmi, 2004; Gray et al., 2015). The graphical
structure yields dynamical models that may be simulated to inform users of counter-intuitive
dynamics, to identify potential leverage points and intervention strategies, and to explore
the consequences of such.
         Figure 3.1 FCM from a focus group discussion in Bojude, Gombe, Nigeria
    Aminpour et al. (2020, 2021) investigate the usage of FCM as wisdom of crowd models
of complex social and ecological systems. Similar to previous work, they find that a well-
chosen estimate pooling method is key to effectively harnessing the collective intelligence
                                              46


of stakeholders. By first averaging over subpopulation models and then aggregating by
their median, stakeholder models can accurately reproduce scientifically validated system
models (Aminpour et al., 2020). In highly localized MCH contexts, however, locally adapted,
scientifically validated models are lacking, and, hence, a formally proposed ground truth
is absent. Instead, implicit representations of the target system–specifically, stakeholder
explanations for system behavior–can make for useful formal models when made explicit.
Supposing one has access to these mental models, how does one identify subgroups exhibiting
consensus that can be meaningfully differentiated from that of other subpopulations for
incorporation into wisdom of crowd models?
    One approach is to begin by considering features that are universal all human systems.
Since FCMs capture heterogeneous cognitions about, and explanations for, system behavior
(i.e., what constitutes the system), we can augment with beliefs about how the system ought
to behave (e.g., preferences, desires). By explicitly incorporating stakeholder senses of "is"
and "ought," we may have some hope of successfully addressing challenges presented by the
wicked Rashomon effect and thereby improve MCH outcomes along sociocultural lines. This
motivates our consideration of human values.
3.3.3     Values
    Anthropologist David Graeber (Graeber, 2001, p.47) defines values as:
       ...the way people represent the importance of their own actions to themselves:
       normally, as reflected in one or another socially recognized form.
Values correspond to oughts expressed by individuals and groups, as in, “the federal gov-
ernment ought not restrict access to reproductive health services.” Through beliefs, norms,
attitudes, and behavioral intentions, values indicate–and are indicated in–behavioral ten-
dencies and shared ideals of human groups (Schwartz, 2006). The behavioral implications
of values suggests their consideration may be useful for modeling interventions in uncertain
sustainable development contexts like localized MCH utilization. We can draw on domain
                                              47


general quantitative approaches like the World Value Survey (WVS) to support these efforts.
3.3.3.1     World Values Survey
    Inglehart’s model of cultural values relies on data collected through the World Value
Survey for the last 40 years in 60 countries around the world (Inglehart and Welzel, 2010).
The model covers individual and country-level dimensions and theoretically rests upon the
assumption that economic development is linked to distinct value orientations. The two most
important cross-national dimensions that emerged through the analysis of a broad range of
political, social and religious norms and beliefs are a polarization between traditional/sacred
versus secular/rational orientations towards authority and survival versus self-expression
values, providing a comprehensive measurement of all major areas of human concern, from
religion to politics to economic and social life. Each society can be located on a global map
of cross-cultural variations based on these two dimensions in Figure 3.2 (Inglehart, 2020).
 Figure 3.2 Inglehart-Welzel World Cultural Map 2023, reproduced from Inglehart (2020)
    A particular point on the cultural map reflects the relative position on a number of topics
that often form the basis of influence campaigns because they are tied to core values. Tradi-
                                               48


tional values emphasize the importance of religion, parent-child ties, deference to authority
and traditional family values. People who embrace these values also reject divorce, abor-
tion, euthanasia and suicide. Secular-rational values have the opposite preferences to the
traditional values. These societies place less emphasis on religion, traditional family values
and authority. Divorce, abortion, euthanasia and suicide are seen as relatively acceptable.
Survival values place emphasis on economic and physical security. They are linked with a rel-
atively ethnocentric outlook and low levels of trust and tolerance. Self-expression values, by
contrast, give high priority to environmental protection, growing tolerance of foreigners, gays
and lesbians and gender equality, and rising demands for participation in decision-making in
economic and political life.
    Using factor analysis these dimensions can be reduced to 10 items, which cover approxi-
mately 70% of all cross-national variation of values (Inglehart et al., 2000). However, societies
vary widely. For example, in Pakistan or Nigeria, 90% of the population say that God is
extremely important in their lives, while in Japan only 6% take this position. Similarly,
countries where survival values are high also exhibit low levels of subjective well-being, poor
health, low interpersonal trust, intolerance of outgroups, and low support for gender equal-
ity. These countries tend to perceive cultural diversity as threatening, emphasize materialist
values, absolute rules, and familiar norms, and favor authoritarian forms of government.
Foreigners tend to be seen as dangerous outsiders who reduce already scarce resources. The
opposite is seen in countries with low survival measures. Because these dimensions are highly
correlated, if people in a given population place a strong emphasis on religion, the relative
position of that population on many other variables can be predicted.
    Recently, researchers in the field of natural language processing (NLP) trained a model to
help identify WVS values that resonate within narrative or ethnographic text (Benkler et al.,
2022). In this paper, we explore the possibility of generating cultural clusters directly from
text utilizing this model alongside other automated techniques. These are used to support
modeling of MCH utilization.
                                                49


3.3.4    Maternal and Child Healthcare (MCH) Utilization in Nigeria
    Improving outcomes and addressing inequalities in maternal and child health (MCH)
has been a central component of global health initiatives in the last two decades, figuring
prominently in the Millennium and later Sustainable Development Goals and international
humanitarian and development efforts. However, while the global maternal mortality ratio
(MMR) declined by 33% between 2000 and 2015, it has remained stubbornly stagnant since
2016, despite myriad initiatives around the world to improve global maternal outcomes
(WHO, 2023).
    Significant disparities persist in maternal outcomes across the world: In 2020, the lifetime
risk of maternal death in low-income countries was 1 in 49, compared to 1 in 5,300 in high-
income countries (WHO, 2023). Nigeria’s national MMR is one of the highest in the world
at 917 per 100,000 live births (WHO, 2023), and in Gombe State the MMR is even higher at
1,002 per 100,000 live births (GSMoH, 2010). Quality maternal health services can prevent
nearly all maternal death, but availability of, demand for, and utilization of these services
is low in many of the countries that experience the highest rates of maternal mortality
(WHO, 2023). Nigeria and Gombe State is no exception here either: the WHO standard for
healthcare worker density is 4.45 healthcare workers per 1000 population (WHO, 2023), but
the density in Nigeria is just 2.52 per 1,000 (Uzochukwu, 2017), and the density in Gombe
State is even lower at about 1 per 1,000 (GSMoH, 2010).
    Demand for maternal health services in Northern Nigeria may be explained by socio-
cultural factors including religion, tradition, urbanization, education, family structure, and
marital practices (Sinai et al., 2017) as well as communal and cultural norms (WHO, 2023).
Understanding interactions among social determinants can support development of inter-
ventions that reduce social and structural barriers to utilization, which disproportionately
impact socially marginalized women and girls (WHO, 2023). However, highly granular social
determinant data is lacking and resource intensive to collect. Furthermore, locally specific
cultural factors may not be accounted for within existing scholarly literature, which often
                                                50


span national or multinational regions. To better understand drivers of change in MCH uti-
lization as a function of cultural practice in hyperlocal spaces, such as at the local government
area (LGA) level, stakeholder knowledge may be combined across subpopulations.
3.3.5    Summary
    In this study, we wish to simultaneously account for the cognitions and value systems
of local stakeholders to inform wisdom of crowd models of maternal and child healthcare
utilization within three local government areas (LGAs) in Nigeria. Tapping into the wisdom
of the crowds in such complex social system contexts requires a method of pooling estimates
that leverages both consistencies in understanding and diversities of perspective (Aminpour
et al., 2021) and produces low variance estimates. Aminpour et al. (2020, 2021) find that
this diversity can be accounted for by first averaging over subpopulation FCMs and then
aggregating by the median of the averaged maps. We propose to perform what we term a
cognitographic analysis, that is, to identify subpopulations who express similar value systems.
To this end, we introduce three unsupervised methods of cognitographic clustering and
compare using a variety of measures.
3.4    Methods & Materials
3.4.1    Overview
    We introduce a narrative dataset in which expectant mothers and fathers self-signify
espoused values and cognitions. We then describe an automated procedure called recognizing
value resonance (RVR) which uses natural language processing (NLP) to efficiently identify
values expressed within these narratives. Each document is associated with a feature vector
of densities over value hypotheses. To understand mesoscale similarities and differences
among these features, we employ three clustering methods that attend to different aspects
of the data.
    Our methodology is motivated by replicability issues in general science. As such we
adopt a data science perspective, defined as the science of learning from data (Donoho,
                                                 51


2017). We are specifically motivated by a result of Bernau et al. (2014) as discussed in
Donoho (2017). Bernau et al. (2014) presents a meta-analysis of multiple medical studies
that evaluates a family of models across a family of data sets and assesses their performance.
Interestingly, the most generally successful model is found to have been developed on the most
troublesome data set. Donoho (2017) further identifies a cross-workflow analysis in medicine,
Madigan et al. (2014), which demonstrates the instability of inferences to various workflows
applied to the same data set. This instability produces not only differing inferences but also
contradicting conclusions. Similarly, our approach studies and compares three clustering
methods on a single data set. Analyses are performed with traditional and emerging methods.
3.4.2    Recognizing Value Resonance in Narrative
3.4.2.1   Narrative Completion Data
    The text data which we analyze for value resonance in this work is a collection of story
completions generated by local participants surveyed in Gombe state, Nigeria as a response
to narrative prompts provided by survey workers in the field. Narrative prompts for story
completion is a method aimed at collecting rich linguistic data containing information about
respondents’ implicit knowledge and worldviews in a specific domain, in contrast to responses
obtained by posing explicit questions devoid of context and imagery. The narrative prompts
used in data collection were specifically designed to elicit locals’ reasoning in the context
of maternal and child health care (MCH) as practiced in their geographical locale. Each
participant was provided (in sequence) with three narrative prompts randomly selected from
a pool of 45 prompts created to evoke a plausible scenario in the local context. Of these,
22 prompts were targeted toward male respondents, and 23 were targeted toward female
respondents. The respondents were then asked to complete the story in a few sentences,
which were audio-recorded by the enumerators and later translated from Hausa into English
by a professional translator.
    The narrative prompts were written in accordance with the precede-proceed model of
public health behavior (Green, 1974; Green and Kreuter, 1991), which offers a typology of
                                              52


three top-level categories of factors influencing health care related beliefs and behaviors on
the individual and group levels. The three main sets of factors are:
   1. Predisposing (e.g. education, rootedness in one’s own culture, prior experience)
   2. Reinforcing (e.g. influence of family members, friends, community leaders, the media,
      and cultural authorities)
   3. Enabling (e.g. Physical and financial security, access to health providers, quality of
      available care)
 Respondent sex               Narrative prompt                          Story completion
 Female              Sarah is not progressing well            . . . decides to take her to the bigger
                      during labor , and the staff in          hospital as advised by the hospital
                      the local clinic call for help at        staff , there emergency care is
                      a bigger hospital . They tell her        given to her by the doctor ,
                     husband, who. . .                        and she safely delivers a baby
                                                              boy.
 Male                Asah runs a clinic in his village.       . . . that in the clinic the women
                     He notices that recent mothers            do not receive appropriate medical
                     nearby are not bringing their             care , others also complained
                     children in for postnatal visits.        that their husbands don’t give
                     He asks around to find out why           them permission to go because
                     that is. He learns. . .                  even if they go they are not
                                                              cared for.
Table 3.1 Examples of narrative prompts and story completions from male and female
perspectives
    Table 3.1 shows two examples of a narrative prompt1 intended for a respondent and a
story completion from one of the respondents. Factors present in both the prompt and the
response are highlighted in colors representing their categorization according to the precede-
proceed model (red for predisposing, green for reinforcing, and blue for enabling).
    While the goal of the story completion approach is to instantiate otherwise implicit
knowledge and beliefs devoid of bias introduced by explicit lines of questioning, the scenarios
   1
     Note: predisposing in red , reinforcing in green , and enabling in blue
                                                    53


present in the prompts reflect factors a priori believed to bear significant weight on locals’
reasoning in the MCH context. The set of factors embedded in these hypothetical situations
included gender imbalance in household decision-making power in the region, prevailing low
levels of financial security, limited physical access to health care, and the common use of
traditional health practice in contrast to institutional care, among others.
    To collect the story completion data, the survey team randomly selected three wards2 in
Gombe state. The target population within each ward was defined using a stratified cluster
sample. Survey enumerators then utilized a systematic walk to sample participants from the
target community. Apart from recording the story completions, enumerators collected a rich
set of demographic information for each respondent. Answers to 37 different demographic
questions were solicited, including individual biological, social and cultural characteristics,
family information, and living conditions. Figure 3.3 shows the frequency of demographic
categories for a subset of these questions.
    All experimental protocols were approved by Health Media Lab (HML), a U.S.-based IRB
registered with the US Department of Health and Human Services DHHS OHRP, and by a
Health Research Ethics Committee (HREC) managed by the Gombe State Ministry of Health
(SMOH) in Gombe, Nigeria. The team carried out the research in precise accordance with
the proposed approach, using the consent forms, instruments, items, and materials reviewed
and approved by HML and the Gombe SMOH HREC, and in accordance with the legal and
research ethical guidelines and requirements as stated by HML and the Gombe State Ministry
of Health. The approach, materials, consent forms, instruments, items, and activities were
also reviewed and overseen by the Army Human Research Protection Office (HRPO) in
accordance with Department of Defense Human Subjects Research (HSR) requirements. All
participants were given study information including privacy protection and anticipated risk
information, as well as contact information for the study PI, the Gombe SMOH HREC, and
the Health Media Lab, and provided informed consent before participating in the study.
    2
      Lowest level administrative unit in the state, usually centered around an anguwa (“town” or “village”)
                                                       54


                         Figure 3.3 Sample demographic frequencies
3.4.2.2    Recognizing Value Resonance Model
    Recognizing Value Resonance (RVR) is a natural language processing task concerned
with detecting implicit endorsement of, rejection of, or neutrality towards a certain belief
given a span of text (Benkler et al., 2022). Given a belief hypothesis H and textual premise
P, H ‘resonates’ with P if P communicates the speaker’s belief in H, ‘conflicts’ with P if
P communicates the speaker’s opposition to H, and is ‘neutral’ if neither is true. A 2022
study published a hand-annotated dataset, the World Values Corpus (WVC), and a fine-
tuned model, Resonance-Tuned RoBERTa designed to model the task of RVR (Benkler
et al., 2022). The value hypotheses in the WVC were derived from the World Values Survey
                                              55


questions. We utilize a technique described in the study’s analysis of folktales to extract the
document level value-density vectors behind our cluster analyses presented in this paper.
    We extract document-level value density vectors as follows. First, each document is
parsed out sentence by sentence. Next, each sentence is paired with each of the 384 value
hypotheses from the WVC. Each ⟨premise sentence, hypothesis⟩ pair is then scored for
RVR using Resonance-tuned RoBERTa. Finally, for each value hypothesis we calculate the
proportion of sentences in which it was scored as “resonates”, and the proportion of sentences
in which it was scored “conflicts.” This essentially returns two vectors of length 384 where each
vector details the document-level density of a single RVR score for every value hypothesis in
the WVC (H1 , . . . , H384 ). We then prepend the “resonates” vector to the “conflicts” vector
to create a vector of length 768 delineating the sentence level density distribution of all the
WVC values within each document.
    We focus on value hypotheses that associate with religion, social and gender roles, insti-
tutions, as well as self-expression, safety, and tradition (see Appendix 3.8 for full list). In
an attempt to reduce the dimensionality of the space, we calculate the story level density
distribution of all the WVC values and select values that resonate or conflict, at minimum,
in a fraction q of the story completions as dimensions along which the k-means clustering is
to be performed. For the experiments which we detail in this work we used q = 0.05.
3.4.3    Clustering
3.4.3.1     Overview
    Clustering is an unsupervised machine learning approach to pattern recognition that as-
signs a finite set of labels to data points relative to some measure of (dis)similarity. Formally,
given a dataset, X , and data points, xi , xj ∈ X , clusters are generated based upon,
                                      d : X × X → R≥0
                                          (xi , xj ) 7→ d(xi , xj ),
                                                    56


a mapping symmetric in its arguments (Hastie et al., 2009). In Euclidean space, for instance,
we have d (xi , xj ) = ||xi − xj ||2 . Clusters are then identified with respect to this local measure
based upon statistical, geometrical, and/or other globally defined decision heuristics. The
following introduces three unsupervised clustering methods: The traditional method of k-
means, an emerging method utilizing HDBSCAN and UMAP, and a method based upon
optimal transport and spectral clustering.
3.4.3.2      k-means
     The k-means clustering method is one of the most widely used classical tools in the
machine learning problem space. The aim of k-means is to partition a set of n observations
into k clusters such that each observation belongs to the cluster whose centroid (the mean
of its members) is closest to it in terms of the L2 -norm. This problem is NP-hard, however,
efficient heuristics eliciting locally optimal solutions exist. The number of clusters k is a
free parameter whose “optimal” or “appropriate” value is context-dependent and often ill-
defined. The method assumes real-valued data on an interval scale; thus, when working with
categorical or ordinal data, mappings to interval scales are necessary.
     We use the k-means method here due to its widespread familiarity and relative acces-
sibility both in terms of applying the basic algorithm as well as interpreting the resulting
partition. Focusing on interpretability in particular, one may simply think of the clusters as
compact “clouds” of points in Euclidean (albeit often highly dimensional) space that highlight
the existing separation or gaps in the population along one or more dimensions.
     In the context of partitioning our target population (recent mothers and fathers in Gombe
state, Nigeria) into clusters based on the value sets that their members (implicitly) espouse,
we first determine the resonance of or conflict with each value from the WVC in all of the
collected narrative prompt story completions3 using the RVR model. For each WVC entry
(counting resonance and conflict with each value as two separate values), we introduce a new
data dimension, with numeric values of 1 (where WVC values resonate/conflict in a given
    3
      After removing items with obvious data entry errors and null or otherwise invalid story completions.
                                                    57


text) and 0 (where WVS values do not resonate in the given manner in a response).
     We then perform k-means clustering along the selected value resonance dimensions with
k = 1, 2, . . . , 204 and compute the sum of squared errors (SSE) between points and their
centroids for each k-partition. We then select a partition that lies on the “elbow” of the
partitions arranged in the two-dimensional (k, SSE) space, that is, the first partition into
k’ clusters, such that the improvement in SSE between the k’ +1 and k’ partitions is below
some threshold value5 .
3.4.3.3       HDBSCAN & UMAP
     Uniform manifold approximation and projection (UMAP) and hierarchical density-based
spatial clustering of applications with noise (HDBSCAN) are two popular machine learning
algorithms used for dimensionality reduction and clustering tasks, respectively. UMAP is
a nonlinear dimensionality reduction algorithm that constructs a high-dimensional graph
representation of the data and then optimizes a low-dimensional projection of this graph,
preserving both global and local structure of the data (McInnes et al., 2018). HDBSCAN
is a clustering algorithm that automatically determines the number of clusters in a dataset,
unlike traditional clustering algorithms that require the user to specify the number of clusters
in advance. HDBSCAN identifies high density regions of data and groups together points in
these regions, making it highly useful for handling datasets with varying densities and noise
(McInnes et al., 2017). Both UMAP and HDBSCAN have gained popularity in recent years
due to their performance on large datasets (Allaoui et al., 2020; Blanco-Portals et al., 2022;
Pealat et al., 2021; Asyaky and Mandala, 2021).
     These algorithms are commonly used together in machine learning workflows for several
reasons. First, by reducing the number of dimensions, the data becomes more manageable
for clustering algorithms like HDBSCAN, which can be computationally expensive on high-
dimensional data. Second, UMAP can improve the separation of clusters by preserving the
     4
       Higher values were not tested under the assumption that such high numbers of clusters would be of
little value to analysts pursuing further insight into cognito-cultural determinants of MCH utilization (and
most other domains for that matter),
     5
       This choice is, once again, subjective and to some extent arbitrary.
                                                       58


local structure of the data. This can make it easier for HDBSCAN to identify clusters
and reduce the likelihood of false positives or false negatives. Lastly, UMAP can help to
reduce noise in the data by separating out irrelevant features. This can help to improve the
accuracy of the clustering results by reducing the impact of noisy data points. In short, by
combining these two techniques, it is possible to gain a more comprehensive understanding of
the structure of high-dimensional datasets and identify meaningful patterns and relationships
within the data.
    The use of UMAP and HDBSCAN together is promising for clustering analysis of our
RVR output vectors for three reasons. Firstly, HDBSCAN is robust with respect to noise,
reducing the potential for noisy input documents to bias the output clusters. Secondly,
since HDBSCAN does not require pre-specification of the number of output clusters, it may
be less influenced by researcher bias. Thirdly, UMAP enables high-dimensional data to be
represented in fewer low dimensions, which supports the density-based clustering approach of
HDBSCAN. To grasp the latter benefit, observe that our RVR output vectors include both
‘resonant’ and ‘conflict’ label densities for all hypotheses, and clustering the raw vectors
may be problematic due to multicollinearity. UMAP can reduce our RVR vectors to two-
dimensional coordinate pairs, with the x-axis representing the document position in the
UMAP ‘resonant’ space, and the y-axis representing the position in the ‘conflict’ space.
HDBSCAN can, then, cluster RVR vectors based on their location in a two-dimensional
‘resonant’ vs. ‘conflict’ space.
    Below we describe our process using UMAP and HDBSCAN to cluster the RVR output
vectors. Let RVR denote the set of all RVR output vectors for all the narrative documents we
process. For HDBSCAN, an individual RVR output vector for document d is conceptualized
as such rvrd = [r1 , ..., r45 , c1 , ..., c45 ], where rh is the proportion of sentences in document d
in which hypothesis h is labeled ‘resonant,’ and ch is the proportion of sentences in which
hypothesis h is labeled ‘conflicts.’ We begin this clustering approach by dividing the space
into two subspaces, RVRr , the ‘resonant’ set of all RVR output vectors, and RVRc , the
                                                        59


‘conflicting’ set of all RVR output vectors. We then use UMAP to approximate each of
these spaces independently and return a one-dimensional representation for each vector’s
position in its corresponding reduced space. We then combine these representations to form
the set of coordinate pairs RVRreduced representing all the narrative documents’ positions in
the UMAP reduced ‘resonant’ vs ‘conflicting’ space. We then utilize HDBSCAN with the
Minkowski distance metric to cluster this set of coordinate pairs.
3.4.3.4     Optimal Transport & Diffusion Maps
    To amplify similarities and differences based upon the RVR data structure, we introduce
a novel method based upon optimal transport (OT) theory. For this method, individual value
resonance densities are conceptualized as normalized histograms encoding the frequency of
‘resonance,’ ‘neutrality,’ and ‘conflict’ with a particular value hypothesis by all sentences
within a set of narrative documents produced by an individual participant.
    OT was introduced to model the efficient allocation of resources (Peyré et al., 2019).
Given two probability distributions on a metric space, it provides the most efficient plan for
transporting (or transforming) one into the other subject to some costs of doing so. With this
transport plan comes a distance quantifying the total cost, the optimal transport distance.
Intuitively, given a mound of soil and a hole of equal volume, this distance gives the lowest
cost for filling the hole with the soil. Larger distances correspond to higher costs. Formally,
given two value resonance densities µ and ν, a symmetric matrix M encoding transport costs,
and denoting the set of all couplings with marginals µ and ν as Π (µ, ν), the OT distance
solves,
                                   W (µ, ν) = min ⟨γ, M ⟩F ,
                                               γ∈Π(µ,ν)
where ⟨·, ·⟩F denotes the Frobenius inner product. In the present context, it ought to cost
more to change a narrative ‘resonance’ into a ‘conflict’ (and vice versa) than to a neutrality.
That is, expressing, “I believe in God,” is further from, “I do not believe in God,” than it is
silence on the matter. This is assured by defining a cost matrix, M , for which there is an
                                                60


order of magnitude difference between the cost of transporting from ‘resonate’ to ‘conflict’
(and vice versa) and the cost of transporting between ‘resonate’ or ‘conflict’ and ‘’neutrality’,
e.g.,
                                                        
                                           0 10 100
                                                        
                                     M =   10    0  10 ,
                                                         
                                                        
                                            100 10 0
where ordered rows/columns correspond to ‘resonance,’ ‘neutrality,’ and ‘conflict.’ To am-
plify areas of greatest participant-expressed diversity within data, each pairwise value hy-
pothesis distance is weighted according to the sample population-level entropy defined below.
Then, given these locally defined OT distances between value resonance densities, the data
is macroscopically organized for clustering via spectral embedding.
    Spectral clustering refers to a family of methods that cluster data by applying k-means
not to the data coordinates themselves, but to the coordinates of an embedding of the data
into an Euclidean space where distances correspond to more general metrics on the data
(Hastie et al., 2009). The embedding functions map similar data points nearer to each other
and dissimilar points further apart via the coordinates of eigenfunctions of Laplacian type
operators on inner product spaces of functions defined on the data points.
    Initial exploratory analyses indicated the global geometry of the data to be influenced by
regions of high sampling density, which motivated a density normalized approach inspired
by diffusion maps (Coifman and Lafon, 2006). The diffusion maps framework accounts for
sampling by normalizing via kernel density estimation. This allows for the partial disentan-
glement of statistics and geometry and thereby facilitates separation based upon differences
among features and coalescence upon similarity. A diffusion map treats each data point not
as a point, but as a one-parameter family of probability densities that can be thought of
as ‘fuzzy’ combinations of data points and their neighbors. This yields a nested collection
of distances that correspond to the overlap (or lack thereof) of posterior densities of t-step
random walks centered at each data point on the value resonance manifold.
                                                61


    To implement diffusion maps, we convert the OT distances–measures that increase with
greater dissimilarity–to affinities, which increase with greater similarity. We employ the
standard kernel machine method of exponentiating the negative of a scaled squared metric,
sometimes called a radial basis function when applied to Hilbertian metric spaces (Smola
and Schölkopf, 1998). This affinity takes the form,
                                                        d(µ,ν)2
                                         K (µ, ν) = e−     ϵ    ,
where d (·, ·) is the entropy-weighted linear combination of OT distances between value res-
onance densities, µ and ν, corresponding to two participants. That is,
                                                 X
                                      d (µ, ν) =     αh dh (µ, ν) ,
                                                 h∈H
where H is the set of value hypotheses, dh (·, ·) is the OT distance between the hth value
hypothesis density of value resonance densities µ and ν, and αh is the normalized entropy
associated to the hth value hypothesis. Normalization ensures h∈H αh = 1, which is stan-
                                                                     P
dard when defining data-adapted dissimilarity measures (Hastie et al., 2009). We observe
that taking αh =     1
                    |H|
                        for all h yields an aggregate distance, d (·, ·), that is essentially the OT
distance between densities on the product space of the values hypotheses with infinite cost
of transporting between different hypotheses. ϵ is a small positive hyperparameter, called
the bandwidth, that must be tuned. We do so by the multiscale eigenvalue approach to
spectral clustering introduced in Little et al. (2020). This simultaneously tunes ϵ and K̂,
the estimated number of clusters, by considering eigenvalues, λk , of the operator introduced
below as a function of ϵ (note: De Plaen et al. (2020) proves there is a regime of ϵ for which
this kernel is positive semi-definite).
    Following the diffusion maps approach, we form the diffusion operator, P = D−1 K,
where D is the diagonal matrix of row sums of K. The ith row of P is a probability density
centered at data point xi ∈ X . Powers of P are taken to denoise the data wherein it acts as
a low pass filter. This denoising approach has been similarly applied to the study of noisy,
high-dimensional biological data (Moon et al., 2019). This simultaneously spreads each
                                                  62


probability density over neighboring data points and attenuates high frequency fluctuations,
allowing for dominant meso- and macroscopic patterns to emerge. The eigenfunctions of P
are orthogonal on a weighted inner product space of functions on the data and provide the
embedding coordinates for clustering via k-means. The hypothesized number of clusters,
K̂, is chosen by observing the largest difference between adjacent eigenvalues, known as
the spectral gap statistic, for which all other eigenvalues are small (note: this formalism
corresponds to observing the bottom eigenvalues of I−P, which are 1−λk for all λk ∈ Λ (P),
the spectrum of P). This is known to be optimal under certain assumptions when a ground
truth exists (Little et al., 2020), and it is a sensible geometrically informed heuristic in
general.
3.4.4    Cluster Comparative Analysis
3.4.4.1    Within-Method
    Clustering methods are assessed with respect to 1) the generated clusters, 2) the between-
cluster statistics of demographic and value resonance features, and 3) SHAP values, a method
intended to increase model interpretability.
    To understand similarities and differences between clusters, we consider between-cluster
statistics of demographic features and value hypothesis resonances. These assess the degree
to which clustered subpopulations may be differentiated according to objectively verifiable
information. Demographic metadata associated with each participant is held out for clus-
tering. Dummy variables are generated for respondent categorical demographics, and scalar
demographic features are centered and standardized to unit variance. Cluster means are then
compared, and demographic variables are identified for which this contrast is significant for
at least one pair. The tested demographics include (a full list is provided in the Appendix
3.8):
   1. Age of respondent
   2. Number of children in household
   3. Monthly income
                                               63


   4. Religion (Christian, Islam)
   5. Marital status (married monogamous, married polygamous)
   6. Urban or rural household
Between-cluster differences in value hypothesis resonance and conflict are also tested.
    To gain qualitative insight into the clusters and methods, a surrogate model is intro-
duced. Surrogate models are simplified models built to approximate the behavior of complex,
black-box algorithms and provide insight into their predictions. They are generally used in
explainable AI to help humans understand and interpret the predictions of the complex
models (Danilevsky et al., 2020). We employ a surrogate modeling system to understand,
explain, and interpret clustering results.
    The first step in our surrogate modeling system is training a random forest classifier to
predict the cluster label of each document using their corresponding value density vector as
the random forest’s input. A random forest (RF) classifier is an ensemble learning method
that creates a large number of decision trees, each of which is trained on a subset of the
training data and a random subset of the features. This randomness helps reduce overfitting
and increases the generalization performance of the model. An RF prediction is made by
aggregating the predictions of all individual decision trees. The main advantages of the RF
algorithm are its relative ease of use, minimal hyperparameter tuning required, robustness
to outliers and missing data, and high explainability6 .
    After constructing our random forest surrogate model we apply a model explainability
technique called SHAP (SHapley Additive exPlanations). SHAP is a technique for explain-
ing the predictions of machine learning models (Lundberg and Lee, 2017). SHAP values
are based on the concept of Shapley values from cooperative game theory, which assign a
contribution to each player in a coalition game based on their marginal contribution to the
coalition’s success. SHAP values decompose model predictions into linear contributions from
each feature. The SHAP value of a feature is the average change, over all possible feature
    6
      It is worth qualifying that RF classifiers are complex for humans to interpret given the forest complexity,
but highly explainable with mechanized processing.
                                                        64


combinations, in a prediction as the feature value is varied.
    The SHAP values of our random forest surrogate model communicate the contribution
of each feature (value hypothesis) to classifying a document as belonging to each cluster ac-
cording to the random forest approximation of our clustering methods. SHAP value outputs
indicate:
   1. The most important features in document classification, ranked both globally (across
      all clusters) and locally (within-clusters).
   2. The local direction and strength of feature impact. A positive SHAP value indicates
      a positive contribution to classification in the relevant cluster; a negative SHAP value
      indicates a feature negatively contributes.
   3. The magnitude of the SHAP value, which indicates the strength of the feature’s con-
      tribution to a document’s classification.
3.4.4.2    Between-Method
    We introduce the Jaccard score to quantify similarities between clustering methods as a
function of consistencies between clusters as sets. Given two clusterings of the same data
set, A and B, the Jaccard score between two clusters a ∈ A and b ∈ B is defined as,
                                                    |a ∩ b|
                                        J (a, b) =          .
                                                    |a ∪ b|
The Jaccard similarity score maps cluster pairs to the interval [0, 1], with 0 assigned to
clusters that share no members and 1 to identical clusters.
    To evaluate the similarity of different clustering methods based on predicted relative fea-
ture importance in cluster assignment, we utilize pairwise Rank Biased Overlap (RBO) scores
(Webber et al., 2010). We first create ordered lists of features for each clustering algorithm
using the average impact of each feature on model output magnitude (mean(|SHAP|)). We
then evaluate the pairwise similarity between each pair of lists using a RBO score. The RBO
score is a robust measure of similarity between two rankings that considers rank positions
and depth of overlap between two lists. To account for top weighted-ness, we specify that
                                                 65


the top 10 most important features should be weighted as 85% of the final similarity score.
RBO scores range from 0 to 1, with higher values indicating greater similarity between the
two rankings.
3.5    Results
3.5.1    Exploratory Analysis
    Densities of mean value hypothesis coefficient magnitudes are depicted in Figure 3.4. Ver-
tical bars are the sample means for each value hypothesis. We see the majority of narratives
are identified by the RVR model as neutral to most value hypotheses. This suggests that
expressions of resonance and/or conflict with value hypotheses are the most informative, i.e.,
data regions where participant expressed diversity is greatest.
Figure 3.4 Sample level value hypothesis densities, averaged over the sample for each value
hypothesis
    In all of the following, clusters are sequentially numbered by size with 0 corresponding
to the largest cluster.
                                               66


3.5.2    k-means
3.5.2.1    Overview
    Employing the elbow-plot heuristic to choose a reasonable number of clusters, we observe
that the sum of squared errors (SSE) drops off less rapidly when moving from 4 to greater
numbers of clusters. Hence, the number of clusters is estimated as K̂ = 4.
3.5.2.2    Statistical Differences Between Clusters
Demographics Demographic characteristics for which there is at least one statistically
significant difference between clusters (p < 0.1) appear in Figure 3.5. These include respon-
dent sex, highest level of educational attainment (post-secondary), occupation (business),
age of head of household, and number of deceased children.
               Figure 3.5 Significant between-cluster demographics for k-means
    Clusters 0 and 1 exhibit a larger proportion of female respondents than clusters 2 and 3.
                                              67


Cluster 2 respondents report younger heads of household than larger clusters. As compared
with larger clusters, cluster 3 respondents report more post-secondary education and fewer
deceased children, with a smaller proportion identifying as businesspersons.
Value Hypotheses         Displayed in Figure 3.6 are cluster mean value resonances and con-
flicts for which one or more between clusters difference is statistically significant (p < 0.1).
Clusters are ordered largest to smallest, left to right.
                     Figure 3.6 Significant value hypotheses for k-means
     K-means identifies a largest, neutral cluster for which the magnitude of most coefficients
is small. This contrasts with cluster 3, the smallest, which exhibits largest overall coeffi-
cient magnitudes, particularly for value hypotheses related to social roles and tradition. The
greater proportion of male respondents in cluster 2 coincides with larger value hypothesis co-
efficients favoring men. The larger magnitudes could indicate a tendency towards expression
of values.
                                                68


3.5.2.3    Cluster Explainability
    Figure 3.7 shows mean SHAP value magnitudes aggregated across clusters and ordered
from largest. All value hypotheses are notably resonances, denoted with the suffix “.RES.”
These are largely related to gender roles, social actors, and healthcare services.
                        Figure 3.7 Ordered SHAP values for k-means
    As with aggregate SHAP values, individual clusters present largest SHAP values for
resonances (.RES). Furthermore, value hypotheses with strongest impact relate to gender
and healthcare roles. Smaller clusters exhibit larger resonance magnitudes (bright red in
the SHAP plots) associated with cluster membership, qualitatively reiterating the statistical
findings Figure 3.5. SHAP values associated with cluster 3 notably emphasize the knowledge
of social actors, whereas cluster 2 emphasizes gender roles. Clusters 0 and 1 exhibit mixtures
of these.
                     Figure 3.8 Top 5 SHAP values for k-means cluster 0
                                              69


                      Figure 3.9 Top 5 SHAP values for k-means cluster 1
                      Figure 3.10 Top 5 SHAP values for k-means cluster 2
                      Figure 3.11 Top 5 SHAP values for k-means cluster 3
3.5.3    HDBSCAN & UMAP
3.5.3.1    Statistical Differences Between Clusters
Demographics Demographic characteristics for which at least one between-cluster differ-
ence is statistically significant (p < 0.1) appear in Figure 3.12. HDBSCAN identifies a signal
related to respondent occupation and number of children under 18 years of age or number
of children in school. Cluster 4 is associated with a larger proportion reporting agricultural
occupations, whereas cluster 3 has a larger representation of teachers. Cluster 2 exhibits a
larger proportion of respondents reporting trade as occupation and primary education as the
highest level of attainment. There is also some indication of cluster 2 respondents parenting
younger children.
Value Hypotheses          Per-cluster mean value resonances and conflicts showing at least one
                                                70


             Figure 3.12 Significant between-cluster demographics for HDBSCAN
statistically significant difference between clusters (p < 0.1) appear in Figure 3.13. Clusters
are ordered largest to smallest, left to right. Cluster 1 exhibits a stronger on average conflict
with “the government should prohibit people from immigrating.” Clusters 0 and 4 resonate
with “it is not important to know about science in one’s daily life.” Cluster 3 indicates a
tendency towards neutrality with the notable exception of conflict with “it is not important
to know about science in one’s daily life.” Cluster 4 exhibits the highest degree of resonance-
conflict heterogeneity.
3.5.3.2     Cluster Explainability
    Figure 3.14 indicates cluster membership is broadly accounted for by conflict with the
value hypotheses “the government should prohibit people from immigrating” and “it is not
important to know about science in one’s daily life.” These accumulated coefficient mag-
                                                71


                  Figure 3.13 Significant value hypotheses for HDBSCAN
nitudes eclipse all others. Among top represented value hypotheses, both resonance and
conflict appear.
                      Figure 3.14 Ordered SHAP values for HDBSCAN
    Clusters 0 and 1 differ by the SHAP value magnitude of conflict with “the govern-
ment should prohibit people from immigrating,” with cluster 1 showing larger values. This
between-cluster pattern is repeated for conflict with “it is not important to know about sci-
                                              72


ence in one’s daily life” and resonance with “one possesses great deal of freedom of choice
and control over family planning,” and furthermore with each indicated value hypothesis.
There is notable across-cluster consistency in the top value hypotheses identified by SHAP
analysis.
                    Figure 3.15 Top 5 SHAP values for HDBSCAN cluster 0
                    Figure 3.16 Top 5 SHAP values for HDBSCAN cluster 1
                    Figure 3.17 Top 5 SHAP values for HDBSCAN cluster 2
   SHAP values for institution-, knowledge-, and gender-related value hypotheses are larger
overall. Clusters 0 and 1 present complementary SHAP values, where larger magnitudes
are associated with membership in 1. Clusters 2 and 3 show large SHAP values for greater
conflict with “it is not important to know about science in one’s daily life,” but complement
each other on gender role hypotheses. Cluster 4 SHAP values indicate the importance
                                             73


                Figure 3.18 Top 5 SHAP values for HDBSCAN cluster 3
                Figure 3.19 Top 5 SHAP values for HDBSCAN cluster 4
of value hypothesis conflicts to cluster membership. This reiterates statistical results of
demographics.
                                           74


3.5.4    Optimal Transport & Diffusion Maps
3.5.4.1    Overview
    To identify value hypotheses for which diversity of value resonance is greatest, we combine
all participant value resonances across each value hypothesis, compute the entropy of each,
and represent in Figure 3.20 (note: density plots utilize Gaussian kernels and are intended
as qualitative visual representations rather than quantitative density estimates). Higher
resonance entropy at the upper tail of this distribution corresponds to greater diversity
and suggests more informative value hypotheses. To inform selection of the bandwidth
                         Figure 3.20 Value hypothesis entropy density
parameter, ϵ, the distribution of weighted OT distances between each pair of resonance
densities is considered. ϵ should be large enough so that the underlying data graph remains
connected but not so large that important information is lost. Hence we explore a range
that lies within the highest density region of OT distances. Then the bottom 10 eigenvalues
of I − P and their differences are plotted as a function of ϵ in Figures 3.21 and 3.22.
                                               75


Figure 3.21 Multiscale eigenvalues; darker
                                                   Figure 3.22 Multiscale eigenvalue
colors correspond to lower frequency
                                                   differences
eigenvectors
    The largest gaps between adjacent eigenvalues for which all smaller eigenvalues are close
to zero occur between 4 and 5 as well as 7 and 8. These yield cluster number estimates
of K̂ = 4 and K̂ = 7; we find that these are robust to a large range of low-pass filtering
as parameterized by t, the number of times P is powered to smooth. A value of t = 256
is chosen as it falls on the lower end of this stability range, thus preserving more detail.
ϵ4 = 0.09 and ϵ7 = 0.045 are then identified at the corresponding peaks of the eigenvalue
differences.
3.5.4.2    Statistical Differences Between Clusters
Demographics Figure 3.23 displays demographic characteristics for which at least one
between-cluster difference is statistically significant for the 4-cluster outcome (p < 0.1).
Clusters 0 and 1 can be differentiated on the basis of sex. Cluster 0 has an approximate 60%
female representation and may be differentiated from clusters 1 and 2 by a greater prevalence
of members reporting no occupation (20% versus 9% and 10%, respectively).
    Significant demographic features for the 7-cluster outcome appear in Figure 3.24. We see
more culturally indicated features emerging with this greater granularity. Arabic education
                                               76


         Figure 3.23 Significant between-cluster demographics for diffusion maps 4
accounts for a relatively large proportion of cluster 1 and is notable given the relative size of
the cluster and relatively small preponderance of Arabic education in the data. This cluster
may be further differentiated from cluster 5 on the basis of number of children either living
or deceased, and from cluster 3 by a larger proportion reporting Islam for religion. Cluster
3 demographics suggest a more rural population with a larger proportion identifying as
Christian and reporting more secondary education as compared with larger clusters. There
is also a small trend towards younger heads of household and fewer young children when
moving from cluster 1 to smaller clusters. Cluster 5 participants report slightly lower income
than 4.
                                               77


          Figure 3.24 Significant between-cluster demographics for diffusion maps 7
Value Hypotheses Displayed in Figure 3.25 and Figure 3.26 are per-cluster mean value
resonances and conflicts for which at least one between-cluster difference is statistically
significant for the 4 and 7 hypothesized clusters (p < 0.1), respectively.
    Cluster 1 exhibits larger resonance magnitudes with gender-related value hypotheses.
This correlates with the larger male representation identified through demographics. Clus-
ters 0, 1, and 2 exhibit similar patterns of value resonance, with cluster 0 exhibiting the
smallest coefficients. Clusters 1 and 3 show the largest conflict with “the government should
prohibit people from immigrating.” Cluster 3 has a more uniform distribution of coefficient
magnitudes with the exception of a large conflict with “it is not important to know about
science in one’s daily life.”
    For the 7-cluster outcome, cluster 5 displays a visually striking resonance with “it is
                                              78


                 Figure 3.25 Significant value hypotheses for diffusion maps 4
not important to know about science in one’s daily life.” This is in contrast to cluster 3,
which shows a relatively strong conflict here. This cluster also exhibits a larger proportion
of Christians. Cluster 5 furthermore exhibits a moderate conflict with “the government
should prohibit people from immigrating.” Cluster 3 presents no conflict with “locally, one
can generally feel secure,” which is in contrast to the majority of clusters. Cluster 6 uniquely
and moderately conflicts with “the health services, clinic, or nearest hospital can be relied
upon to deliver” and slightly with “the midwives at the clinic have a great deal of knowledge
when it comes to family planning and childbirth.” Cluster 4 uniquely shows no resonance with
“men should help make decisions on major household purchases,” although those resonance
magnitudes are generally small across clusters. Cluster 1 exhibits overall smaller magnitude
coefficients. It also resonates least with many gender-related questions. Cluster 6 presents
the most resonance-conflict heterogeneity.
                                               79


                 Figure 3.26 Significant value hypotheses for diffusion maps 7
3.5.4.3    Cluster Explainability
Diffusion Maps 4 Aggregate SHAP value magnitudes for the 4-cluster outcome (Fig-
ure 3.27) are largest for resonances with the exception of conflicts with “the government
should prohibit people from immigrating” and “it is not important to know about science in
one’s daily life." Gender-related value hypotheses are also prominent. However, we observe
dominant representation by institution-related values.
                    Figure 3.27 Ordered SHAP values for diffusion maps 4
                                               80


   SHAP values associated with clusters 0 and 1 fall along the lines of gender-related values
hypotheses, specifically resonances regarding decision making, with smaller magnitudes as-
sociated with membership in cluster 0. This correlates with respondent sex. Cluster 2 SHAP
values indicate cluster membership is associated with lower levels of resonance and conflict
with “the government should prohibit people from immigrating” and more resonance with
gender-related value hypotheses. This is in contrast to other clusters, particularly cluster 3.
               Figure 3.28 Top 5 SHAP values for diffusion maps 4, cluster 0
               Figure 3.29 Top 5 SHAP values for diffusion maps 4, cluster 1
               Figure 3.30 Top 5 SHAP values for diffusion maps 4, cluster 2
                                             81


                Figure 3.31 Top 5 SHAP values for diffusion maps 4, cluster 3
Diffusion Maps 7 There is a large representation by resonance with values hypotheses
in the aggregate SHAP value magnitudes (Figure 3.32). However, we also see conflict with
“the government should prohibit people from immigrating” and “it is not important to know
about science in one’s daily life” as well as “the health services, clinic, or nearest hospital
can be relied upon to deliver safe abortion.”
                    Figure 3.32 Ordered SHAP values for diffusion maps 7
    Cluster 0 SHAP values indicate membership is associated with greater magnitude of
gender-related decision making value hypotheses and lesser magnitudes of government-related.
This is in contrast with cluster 1, where membership is associated with lower coefficient mag-
nitudes of resonances. Cluster 2 SHAP values emphasize institutions and safety, whereas
cluster 3 emphasizes knowledge, actors, and institutions. We observe that clusters 4 and
6 have large SHAP values associated with larger magnitude coefficients on single value hy-
potheses.
    Cluster 0 SHAP values indicate membership is associated with greater magnitude of
gender-related decision making value hypotheses and lesser magnitudes of government-related.
                                              82


This is in contrast with cluster 1, where membership is associated with lower coefficient mag-
nitudes of resonances. Cluster 2 SHAP values emphasize institutions and safety, whereas
cluster 3 emphasizes knowledge, actors, and institutions. We observe that clusters 4 and
6 have large SHAP values associated with larger magnitude coefficients on single value hy-
potheses.
                Figure 3.33 Top 5 SHAP values for diffusion maps 7, cluster 0
                Figure 3.34 Top 5 SHAP values for diffusion maps 7, cluster 1
                Figure 3.35 Top 5 SHAP values for diffusion maps 7, cluster 2
                                             83


Figure 3.36 Top 5 SHAP values for diffusion maps 7, cluster 3
Figure 3.37 Top 5 SHAP values for diffusion maps 7, cluster 4
Figure 3.38 Top 5 SHAP values for diffusion maps 7, cluster 5
Figure 3.39 Top 5 SHAP values for diffusion maps 7, cluster 6
                           84


3.5.5    Comparative Analysis
    Each of the three methods identifies a similar number of clusters, the size distributions of
which appear in Figure 3.40. Cluster size distributions of diffusion maps 4 and k-means are
remarkably similar. HDBSCAN partitions the respondent sample into two similarly sized
large clusters while identifying 3 smaller groups. Diffusion maps 7 generates more uniformly
sized clusters.
                    Figure 3.40 Cluster size distributions for each method
    The total number of pairwise statistically significant cluster mean value hypothesis res-
onances and conflicts identified by each approach is given in Figure 3.41.
3.5.5.1    Jaccard Score
    Jaccard similarity scores for each pair of methods are shown in Figures 3.42, 3.43, and
3.44. Sparser arrays with greater Jaccard scores signify greater agreement between methods.
We observe the most sparsity in heatmaps between HDBSCAN and other methods. This
                                               85


            Figure 3.41 Number of significant pairwise cluster value hypotheses
may be partially attributed to the two large, primary clusters identified by HDBSCAN. The
largest Jaccard scores appear between diffusion maps 7 and HDBSCAN, diffusion maps 7
and k-means, and diffusion maps 4 and k-means. Diffusion maps 7 and HDBSCAN are the
most globally similar based upon sparsity and Jaccard score concentration.
        Figure 3.42 Jaccard similarity between diffusion maps 4 and other methods
                                             86


Figure 3.43 Jaccard similarity between           Figure 3.44 Jaccard similarity between
diffusion maps 7 and other methods               k-means and HDBSCAN
3.5.5.2    Rank Biased Overlap
    Figure 3.45 presents the pairwise Rank Biased Overlap (RBO) similarity scores for each
pair of clustering algorithms. HDBSCAN appears in the most similar methodology pairing
alongside diffusion maps 7, and in the least similar when compared to both k-means and
diffusion maps 4. The 7 cluster diffusion maps algorithm showed the greatest levels of global
similarity to the other methods.
                                             87


           Figure 3.45 Rank Biased Overlap (RBO) similarity scores for each pair
3.6   Discussion
    We observe the emergence of cognitographic clusters that express resonance and conflict
with values related to gender, decision making, institutions, social actors, and knowledge.
The methods can be arranged along a spectrum, with k-means and HDBSCAN falling at
the extremes, corresponding to greater emphasis on similarities and differences, respectively.
Diffusion maps may be viewed as a qualitative interpolation between them. This is supported
not only by the demographic analyses, RBO and Jaccard scores, but also visual inspection
of SHAP plots, cluster counts, and cluster size distributions.
    At coarser scales, k-means finds clusters that are more readily differentiated by demo-
graphics and value hypotheses. The method also appears to be biased towards discriminating
largely on the most frequently embedded values and teasing out the differences in the degree
to which these commonly espoused values are present across clusters. This would also explain
why the values with the highest SHAP scores are exclusively resonances, as the data set as a
whole is heavily biased towards resonance, as opposed to conflicts. Because a large subset of
                                             88


the most common values relate to gender roles, it is perhaps not surprising that differences
in alignment with such values would be most pronounced along the gender dimension, as
demonstrated by the demographic breakdown of the k-means clusters.
    At the greater granularities of diffusion maps 7 and HDBSCAN, we observe the emer-
gence of more culturally indicated features. The smaller clusters produced by these methods
exhibit large SHAP values for a few value hypotheses, suggesting greater heterogeneity that
coalesces around smaller numbers of beliefs. SHAP analysis of the diffusion maps methods as
well as HDBSCAN places a large weight on “the government should prohibit people from im-
migrating.” This is in contrast to the k-means SHAP analysis where this hypothesis does not
appear. The HDBSCAN method apparently zeroes in on values with the largest variance of
resonance/conflict among the population (i.e. the most polarizing values). This would align
with “the government should prohibit people from immigrating," and “it is not important to
know about science in one’s daily life” being the two values assigned an overwhelming degree
of importance in determining cluster membership compared to the remaining set of values.
Because the maximally polarized values may not necessarily be well-represented throughout
the population (as is the case here), the resulting clusters may be skewed in size, and the
demographic differences may be more subtle and less interpretable. Finally, the diffusion
maps method strikes a balance between the others by bestowing roughly equal importance
to values both commonly espoused and polarizing, as one may observe from the cluster ex-
plainability diagrams. There is notable agreement between SHAP and statistical analyses
across the methods.
    We remark that, although intended as soft validations, care is required for interpretation
of statistical analyses of demographic features and value resonance magnitudes when consid-
ered in aggregate. The emergence of patterns among cluster demographics suggests that they
may be used as proxies for cluster identification; however the main interest is values-based
clustering. By contrast, the population could be clustered solely by demographics, but this
would miss heterogeneous values among demographic groups. Remarkably, while analyses
                                              89


of demographics are based upon 90% confidence intervals (CI), with an 85% CI, all meth-
ods identify a cluster that can be differentiated from others based upon Arabic education.
Diffusion maps 7 captures this at the more stringent level.
     These results indicate that each method is potentially well-suited for a different use case.
If the objective is to tease out differences in how different groups relate to values that are
among the most salient across the whole population then k-means would be an appropriate
choice. If one, however, wishes to elicit how the population breaks down with respect to
highly polarizing (albeit perhaps fringe or niche) values, HDBSCAN would work well. To
capture a mixture of both phenomena, one could choose diffusion maps applied to optimal
transport distances.
3.7      Conclusion
     Inferences about values-based subpopulations are method-dependent. This should be
interpreted not as a bug but as a feature. Consider a scenario in which 3 social scientists (say,
from different fields) are asked to manually cluster respondents based upon values expressed
in narratives. Surely one would observe some inter-researcher agreement. Yet without further
constraint, each would attend to different aspects of the data and draw conclusions thereon.
Furthermore, in light of intersectionality7 , participants would likely self-organize into clusters
in context-specific ways. Perhaps this would occur, for instance, along the lines of sex or
gender if reproductive rights were invoked in a cultural context favoring self-direction. Yet
in a context favoring tradition, emergent structures might coalesce around religious and
political affiliation or some other points of consensus and divergence.
     Translating these insights into wisdom of crowd estimates is delicate. Is a greater empha-
sis on value system consensus or divergence relevant to the task at hand? Relevant to whom?
And to what ends? While we cannot answer these questions generally, we can consider them
relative to sustainable development operations. In these contexts, one could imagine opera-
tionalizing an ensemble of models formed at varying granularities, perhaps utilizing multiple
     7
       Intersectionality refers to the interconnected nature of social categorizations such as race, gender, and
religion as they apply to a particular individual or group
                                                        90


methods. Suppose the task is predictive modeling of hyperlocal MCH utilization on one-
month time lags. Interpreted within the framework of deep uncertainty Walker et al. (2012),
one would hope for an ensemble that yields lower variance estimates or, perhaps, a family of
competing estimates that are associated with different subpopulations. Accounting for these
heterogeneous sources of uncertainty should simultaneously enable practitioners to explore
richer intervention scenarios and achieve more locally equitable, desirable, and meaningful
outcomes.
3.8   Acknowledgements
    This material is based upon work supported by the Army Contracting Command, DARPA,
and ARO under Contract No. W911NF-21-C-0007. The views, opinions and/or findings ex-
pressed are those of the author and should not be interpreted as representing the official views
or policies of the Department of Defense or the U.S. Government. Distribution Statement
"A" (Approved for Public Release, Distribution Unlimited).
                                             91


                                    BIBLIOGRAPHY
Allaoui, M., Kherfi, M. L., and Cheriet, A. (2020). Considerably improving clustering al-
  gorithms using umap dimensionality reduction technique: a comparative study. In Image
  and Signal Processing: 9th International Conference, ICISP 2020, Marrakesh, Morocco,
  June 4–6, 2020, Proceedings 9, pages 317–325. Springer.
Aminpour, P., Gray, S. A., Jetter, A. J., Introne, J. E., Singer, A., and Arlinghaus, R. (2020).
  Wisdom of stakeholder crowds in complex social–ecological systems. Nature Sustainability,
  3(3):191–199.
Aminpour, P., Gray, S. A., Singer, A., Scyphers, S. B., Jetter, A. J., Jordan, R., Murphy Jr,
  R., and Grabowski, J. H. (2021). The diversity bonus in pooling local knowledge about
  complex problems. Proceedings of the National Academy of Sciences, 118(5):e2016887118.
Asyaky, M. S. and Mandala, R. (2021). Improving the performance of hdbscan on short text
  clustering by using word embedding and umap. In 2021 8th International Conference on
  Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pages 1–6. IEEE.
Benkler, N., Friedman, S., Schmer-Galunder, S., Mosaphir, D., Sarathy, V., Kantharaju,
  P., McLure, M. D., and Goldman, R. P. (2022). Cultural value resonance in folktales: A
  transformer-based analysis with the world value corpus. In Social, Cultural, and Behav-
  ioral Modeling: 15th International Conference, SBP-BRiMS 2022, Pittsburgh, PA, USA,
  September 20–23, 2022, Proceedings, pages 209–218. Springer.
Bernau, C., Riester, M., Boulesteix, A.-L., Parmigiani, G., Huttenhower, C., Waldron, L.,
  and Trippa, L. (2014). Cross-study validation for the assessment of prediction algorithms.
  Bioinformatics, 30(12):i105–i112.
Blanco-Portals, J., Peiró, F., and Estradé, S. (2022). Strategies for eels data analysis.
  introducing umap and hdbscan for dimensionality reduction and clustering. Microscopy
  and Microanalysis, 28(1):109–122.
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder
  by the author). Statistical science, 16(3):199–231.
Coifman, R. R. and Lafon, S. (2006). Diffusion maps. Applied and computational harmonic
  analysis, 21(1):5–30.
Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., and Sen, P. (2020). A
  survey of the state of explainable ai for natural language processing. arXiv preprint
  arXiv:2010.00711.
De Plaen, H., Fanuel, M., and Suykens, J. A. (2020). Wasserstein exponential kernels. In
  2020 International Joint Conference on Neural Networks (IJCNN), pages 1–6. IEEE.
                                               92


Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical
  Statistics, 26(4):745–766.
Galton, F. (1907). Vox populi.
Graeber, D. (2001). Toward an anthropological theory of value: The false coin of our own
  dreams. Springer.
Gray, S. A., Gray, S., De Kok, J. L., Helfgott, A. E., O’Dwyer, B., Jordan, R., and Nyaki,
  A. (2015). Using fuzzy cognitive mapping as a participatory approach to analyze change,
  preferred states, and perceived resilience of social-ecological systems. Ecology and Society,
  20(2).
Green, L. W. (1974). Toward cost-benefit evaluations of health education: some concepts,
  methods, and examples. Health Education Monographs, 2(1_suppl):34–64.
Green, L. W. and Kreuter, M. W. (1991). Health education planning. Mayfield Pub. Co.
GSMoH, G. S. M. o. H. (2010). Gombe state government strategic health development plan.
Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. (2009). The elements of
  statistical learning: data mining, inference, and prediction, volume 2. Springer.
Inglehart, R. (2020). The inglehart-welzel world cultural map–world values survey 7 [provi-
  sional version].
Inglehart, R., Basanez, M., Diez-Medrano, J., Halman, L., and Luijkx, R. (2000). World
  values surveys and european values surveys, 1981-1984, 1990-1993, and 1995-1997. Ann
  Arbor-Michigan, Institute for Social Research, ICPSR version.
Inglehart, R. and Welzel, C. (2010). The wvs cultural map of the world. World Values
  Survey.
Kosko, B. (1986). Fuzzy cognitive maps. International journal of man-machine studies,
  24(1):65–75.
Levin, P. S., Gray, S. A., Möllmann, C., and Stier, A. C. (2021). Perception and conflict in
  conservation: The rashomon effect. BioScience, 71(1):64–72.
Little, A. V., Maggioni, M., and Murphy, J. M. (2020). Path-based spectral clustering:
  Guarantees, robustness to outliers, and fast algorithms. Journal of machine learning
  research, 21.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions.
  Advances in neural information processing systems, 30.
                                              93


Madigan, D., Stang, P. E., Berlin, J. A., Schuemie, M., Overhage, J. M., Suchard, M. A.,
  Dumouchel, B., Hartzema, A. G., and Ryan, P. B. (2014). A systematic statistical ap-
  proach to evaluating evidence from observational studies. Annual Review of Statistics and
  Its Application, 1:11–39.
Maxmen, A. (2015). How the fight against ebola tested a culture’s traditions. National
  Geographic, 30.
McInnes, L., Healy, J., and Astels, S. (2017). hdbscan: Hierarchical density based clustering.
  J. Open Source Softw., 2(11):205.
McInnes, L., Healy, J., and Melville, J. (2018). Umap: Uniform manifold approximation and
  projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Moon, K. R., van Dijk, D., Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., Yim,
  K., Elzen, A. v. d., Hirn, M. J., Coifman, R. R., et al. (2019). Visualizing structure and
  transitions in high-dimensional biological data. Nature biotechnology, 37(12):1482–1492.
Papageorgiou, E. I. and Salmeron, J. L. (2012). A review of fuzzy cognitive maps research
  during the last decade. IEEE transactions on fuzzy systems, 21(1):66–79.
Pealat, C., Bouleux, G., and Cheutet, V. (2021). Improved time-series clustering with
  umap dimension reduction method. In 2020 25th International Conference on Pattern
  Recognition (ICPR), pages 5658–5665. IEEE.
Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport: With applications
  to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607.
Rittel, H. W. and Webber, M. M. (1973). Dilemmas in a general theory of planning. Policy
  sciences, 4(2):155–169.
Schwartz, S. (2006). A theory of cultural value orientations: Explication and applications.
  Comparative sociology, 5(2-3):137–182.
Sinai, I., Anyanti, J., Khan, M., Daroda, R., and Oguntunde, O. (2017). Demand for
  women’s health services in northern nigeria: a review of the literature. African Journal of
  Reproductive Health, 21(2):96–108.
Smola, A. J. and Schölkopf, B. (1998). Learning with kernels, volume 4. Citeseer.
Team, I. (2023). Infer and uk professional head of intelligence assessment launch collabora-
  tion. https://www.infer-pub.com/the-pub/uk-collaboration/.
Uzochukwu, B. S. (2017). Primary health care systems (primasys): case study from nigeria.
  Geneva: World Health Organization.
                                               94


Voinov, A., Jenni, K., Gray, S., Kolagani, N., Glynn, P. D., Bommel, P., Prell, C., Zellner,
  M., Paolisso, M., Jordan, R., et al. (2018). Tools and methods in participatory modeling:
  Selecting the right tool for the job. Environmental Modelling & Software, 109:232–255.
Walker, W. E., Lempert, R. J., and Kwakkel, J. H. (2012). Deep uncertainty. Delft University
  of Technology, 1(2).
Webber, W., Moffat, A., and Zobel, J. (2010). A similarity measure for indefinite rankings.
  ACM Transactions on Information Systems (TOIS), 28(4):1–38.
WHO, U. (2023). Unfpa, world bank group and the united nations population division.
  trends in maternal mortality: 2000 to 2020: estimates by who, unicef.
Yi, S. K. M., Steyvers, M., Lee, M. D., and Dry, M. J. (2012). The wisdom of the crowd in
  combinatorial problems. Cognitive science, 36(3):452–470.
Özesmi, U. and Özesmi, S. L. (2004). Ecological models based on people’s knowledge: a
  multi-step fuzzy cognitive mapping approach. Ecological Modelling, 176(1):43–64.
                                              95


                                        APPENDIX
                     VALUE HYPOTHESES & DEMOGRAPHICS
The full list of values considered by the recognizing value resonance (RVR) model follows.
  1. family are important in life.
  2. friends are important in life.
  3. leisure time is important in life.
  4. politics are important in life.
  5. work is important in life.
  6. religion is important in life.
  7. it is important for children to have good manners.
  8. it is important for children to be independent.
  9. it is important for children to be hard workers.
 10. it is important for children to have a sense of responsibility.
 11. it is important for children to be imaginative.
 12. it is important for children to be tolerant and respect others.
 13. it is important for children to be thrifty in saving money and other economic pursuits.
 14. determination and perseverance are important qualities in children.
 15. it is important for children to possess religious faith.
 16. it is important for children to be unselfish.
 17. it is important for children to be obedient.
 18. drug addicts make bad neighbors.
 19. people of a different race make bad neighbors.
 20. people with AIDS make bad neighbors.
 21. immigrants and foreign workers make bad neighbors.
 22. homosexuals make bad neighbors.
 23. people who practice a different religion make bad neighbors.
 24. heavy drinkers make bad neighbors.
                                              96


25. unmarried couples living together make bad neighbors.
26. people who speak a different language make bad neighbors.
27. making one’s parent’s proud is of central importance in life.
28. preschool children suffer from having a working mother.
29. men make better political leaders than women do.
30. university is more important for a boy than it is for a girl.
31. men make better business executives than women do.
32. being a housewife is just as fulfilling as working.
33. under employment scarcity, men should have more right to a job than women.
34. under employment scarcity, employers should give priority to the nation’s people over
    immigrants.
35. it is wrong if women have more income than their husbands.
36. homosexual couples make equally good parents as other couples.
37. it is an individual’s duty towards society to have children.
38. it is a child’s duty to take care of their ill parents.
39. people who don’t work turn lazy.
40. work is one’s duty towards society.
41. work should always come first even if it means less spare time.
42. the entire way society is organized must be radically changed by revolutionary action.
43. society must be gradually improved by reforms.
44. present society must be valiantly defended against all subversive forces.
45. in the future, less importance should be placed on work.
46. in the future, greater emphasis should be placed on technological advancement.
47. in the future, people should show greater respect for authority.
48. one generally feels happy.
49. one generally feels healthy.
50. there is a great deal of freedom of choice and control in one’s life.
                                              97


51. the current state of one’s life is satisfactory.
52. the financial situation of one’s household is satisfactory,
53. there has been a recurring lack of sufficient food to eat.
54. one cannot feel safe from criminal activity, even in one’s own home.
55. proper medicine or medical treatment is frequently inaccessible.
56. a cash income is frequently unobtainable.
57. safe shelter is frequently inaccessible.
58. possessing a much higher standard of living than one’s parents is a familiar personal
    experience.
59. most people can be trusted.
60. one’s family is trustworthy.
61. one’s neighborhood is trustworthy.
62. the people one knows personally are trustworthy.
63. one should trust people upon first meeting them.
64. people of a different religion are trustworthy.
65. people of a different nationality are trustworthy.
66. one can have confidence in churches.
67. one can have confidence in the armed forces.
68. one can have confidence in the press.
69. one can have confidence in television.
70. one can have confidence in labor unions.
71. one can have confidence in the police.
72. one can have confidence in the justice system.
73. one can have confidence in the government.
74. one can have confidence in political parties.
75. one can have confidence in parliament.
76. one can have confidence in the civil services.
                                               98


77. one can have confidence in universities.
78. one can have confidence in elections.
79. one can have confidence in major companies.
80. one can have confidence in banks.
81. one can have confidence in the environmental protection movement.
82. one can have confidence in charitable and humanitarian organizations.
83. one can have confidence in major regional organizations.
84. it is more important for international organizations to be effective than it is for them
    to be democratic.
85. one can have confidence in their religious community.
86. incomes should be made more equal.
87. private ownership of business and industry should be increased.
88. the government should take more responsibility to ensure that everyone is provided
    for.
89. competition is good. it stimulates people to work hard and develop new ideas.
90. in the long run, hard work usually brings a better life.
91. protecting the environment should be given priority, even if it causes slower economic
    growth and some loss of jobs.
92. there exists a tremendous amount of corruption.
93. most state authorities are involved in corruption.
94. most business executives are involved in corruption.
95. most local authorities are involved in corruption.
96. most civil service providers are involved in corruption.
97. most journalists and media personnel are involved in corruption.
98. ordinary people have to pay bribes, give gifts, and do favors for local officials and
    service providers all the time.
99. on the whole, women are less corrupt than men.
                                            99


100. there exists a high risk of being held accountable for being involved in bribery.
101. immigrants have a positive impact on national development.
102. immigrants help fill job vacancies.
103. immigrants strengthen cultural diversity.
104. immigrants increase crime rates.
105. immigration gives asylum to political refugees who are persecuted elsewhere.
106. immigration increases the risk of terrorism.
107. immigration helps poor people establish new lives.
108. immigrants increase unemployment rates.
109. immigration leads to social conflict.
110. regarding immigration, the government should let anyone come who wants to.
111. regarding immigration, the government should let people come as long as there are
     jobs available.
112. the government should place strict limits on the number of foreigners who can immi-
     grate.
113. the government should prohibit people from immigrating.
114. locally, one can generally feel secure.
115. local robberies occur frequently.
116. locally, people frequently drink alcohol in the streets.
117. locally, the police or the military frequently interfere with people’s private lives.
118. locally, racist behavior happens frequently.
119. locally, people sell drugs on the streets all the time.
120. local violence and street fights happen frequently.
121. locally, there is a high rate of sexual harassment.
122. It is unsafe to carry too much money on one’s person.
123. It is unsafe to go out at night.
124. one should carry a weapon on their person for safety.
                                              100


125. losing one’s job or being unable to find employment is of genuine concern.
126. being unable to give one’s children a good education is of genuine concern.
127. being personally victimized by crime is a familiar experience.
128. one’s family being victimized by crime is a familiar experience.
129. national involvement in war is of genuine concern.
130. terrorist attacks are of genuine concern.
131. civil war is of genuine concern.
132. freedom is more important than equality.
133. freedom is more important than security.
134. one must be willing to fight for one’s country.
135. science and technology make life healthier, easier, and more comfortable.
136. because of science and technology, there will be more opportunities for the next gen-
     eration.
137. society depends too much on science and not enough on faith.
138. one of the bad effects of science is that it breaks down people’s ideas of right and
     wrong.
139. it is not important to know about science in one’s daily life.
140. the world is better off because of science and technology.
141. God is incredibly important in life.
142. God exists.
143. there exists life after death.
144. hell exists.
145. heaven exists.
146. whenever science and religion conflict, religion is always right.
147. the only acceptable religion is one’s own religion.
148. frequently attending religious services is routine.
149. praying often is routine.
                                             101


150. religiosity is an inseparable part of one’s identity.
151. religion is more about following religious norms and ceremonies than it is about doing
     good to other people.
152. religion is more about making sense of life after death than it is about making sense
     of life in this world.
153. these days, one often has trouble deciding which moral rules are the right ones to
     follow.
154. claiming government benefits to which you are not entitled is perfectly justifiable.
155. avoiding a fare on public transportation is eminently justifiable.
156. stealing property is justifiable.
157. cheating on one’s taxes is justifiable.
158. accepting a bribe in the course of one’s duties is perfectly justifiable.
159. homosexuality is completely justifiable.
160. prostitution is justifiable.
161. abortion is easily justifiable.
162. divorce is justifiable.
163. sex before marriage is justifiable.
164. suicide is justifiable.
165. euthanasia is justifiable.
166. it is justifiable for a man to beat his wife.
167. it is justifiable for parents to beat their children.
168. violence against other people is justifiable.
169. terrorism as a political, ideological, or religious mean is justifiable.
170. having casual sex is justifiable.
171. political violence is justifiable.
172. the death penalty is justifiable.
173. the government has the right to surveil people in public areas.
                                               102


174. the government has the right to monitor all emails and other information exchanged
     online.
175. the government has the right to collect information about its residents without their
     knowledge.
176. politics are of interest.
177. discussions of political matters with one’s friends are standard.
178. the daily newspaper is a reliable source of information.
179. tv news is a reliable source of information.
180. news radio stations are reliable sources of information.
181. one’s mobile phone is a reliable source of information.
182. one’s email is a reliable source of information.
183. the internet is a reliable source of information.
184. social media is a reliable source of information.
185. conversations with friends or colleagues are reliable sources of information.
186. signing a petition as a political action is both viable and justifiable.
187. boycotts are a viable and justifiable political action.
188. peaceful political demonstrations are viable and justifiable political actions.
189. going on strike is a viable and justifiable political action.
190. donating to a campaign fund or group one believes in is a viable and justifiable political
     action.
191. contacting a government official for a cause one believes in is a viable and justifiable
     political action.
192. encouraging others to take action about political issues is a viable and justifiable po-
     litical action.
193. encouraging people to vote during elections is a viable and justifiable political action.
194. searching for information about politics and political events online is a viable and
     justifiable political action.
                                              103


195. signing an electronic petition is a viable and justifiable political action.
196. encouraging others to take political action using the internet is a viable and justifiable
     political action.
197. organizing political activities, events, and protests using the internet is a viable and
     justifiable political action.
198. voting in local elections is standard.
199. voting in national elections is standard.
200. votes are always counted fairly during national elections.
201. opposition candidates are always prevented from running during national elections.
202. TV news always favors the governing party during national elections.
203. voters are always offered bribes during national elections.
204. journalists always provide fair coverage of national elections.
205. election officials are always fair during national elections.
206. rich people always buy the national elections.
207. during national elections, voters are always threatened with violence at the polls.
208. voters are always offered a genuine choice during national elections.
209. women always have equal opportunities to run the office during national elections.
210. having honest elections is important.
211. people have a great deal of say in what the government does under the current political
     system.
212. politically, it is good to have a strong leader who does not have to bother with parlia-
     ment and elections.
213. politically, it is good to have experts, not the government, make decisions according to
     what they think is best for the country.
214. politically, it is good to have the army rule.
215. it is good to have a democratic political system.
216. it is good to have a system governed by religious law in which there are no political
                                               104


     parties or elections.
217. one’s political views should align with the political left.
218. one’s political views should align with the political right.
219. governments taxing the rich and subsidizing the poor is an essential characteristic of
     democracy.
220. religious authorities interpreting the laws is an essential characteristic of democracy.
221. people choosing their leaders in free elections is an essential characteristic of democracy.
222. people receiving state aid for unemployment is an essential characteristic of democracy.
223. military leadership under governmental incompetence is an essential characteristic of
     democracy.
224. civil rights, designed to protect people’s liberty against oppression are essential char-
     acteristics of democracy.
225. state-ensured income equality is an essential characteristic of democracy.
226. obedience to the governing body is an essential characteristic of democracy.
227. equal rights for women is an essential characteristic of democracy.
228. it is important to live in a democratically governed country.
229. one’s nation is completely democratically governed.
230. the current national political system is functioning satisfactorily. nationally, there is a
     great deal of respect for individual human rights.
231. one possesses national pride.
232. one experiences fellowship with one’s town, one’s village, or one’s city.
233. one experiences fellowship with one’s district or one’s region.
234. one experiences fellowship with one’s country.
235. one experiences fellowship with one’s continent.
236. one experiences fellowship with the world.
237. girls and women should themselves decide when, if, and with whom they should marry.
238. a girl should wait to marry until she has completed secondary school.
                                              105


239. a boy should wait to marry until he has completed secondary school.
240. marrying girls young can help provide them security.
241. a boy should wait to have children until he is at least 18 years old.
242. it is important for a woman to have children as soon as possible after she has married.
243. it is important for a man to have children as soon as possible after he has married.
244. a woman should be in love with someone before having sex with that person.
245. a man should be in love with someone before having sex with that person.
246. women who carry condoms on them are easy.
247. men should be outraged if their wife or partner asks them to use a condom.
248. a real man produces a male child.
249. a couple should decide together if they want to have children.
250. a man and a woman should decide together whether to use contraceptives.
251. women in the community are motivated to use modern contraceptives.
252. men in the community are motivated to use modern contraceptives, including support-
     ing female partners.
253. it is easy for women in the community to access and use modern contraceptives.
254. it is acceptable for women in the community and neighborhood to use contraceptives.
255. unplanned pregnancies are not a familiar personal experience.
256. doctors have a great deal of knowledge when it comes to family planning and childbirth.
257. nurses have a great deal of knowledge when it comes to family planning and childbirth.
258. auxiliary nurses have a great deal of knowledge when it comes to family planning and
     childbirth.
259. the midwives at the clinic have a great deal of knowledge when it comes to family
     planning and childbirth.
260. family planning counselors have a great deal of knowledge when it comes to family
     planning and childbirth.
261. community health workers have a great deal of knowledge when it comes to family
                                            106


     planning and childbirth.
262. traditional birth attendants have a great deal of knowledge when it comes to family
     planning and childbirth.
263. traditional healers have a great deal of knowledge when it comes to family planning
     and childbirth.
264. religious leaders have a great deal of knowledge when it comes to family planning and
     childbirth.
265. the youth clinic nearby has a great deal of knowledge when it comes to family planning
     and childbirth.
266. one’s family has a great deal of knowledge when it comes to family planning and
     childbirth.
267. the health services, clinic, or nearest hospital can be relied upon to deliver safe con-
     traceptives.
268. the health services, clinic, or nearest hospital can be relied upon to deliver family
     planning counseling.
269. the health services, clinic, or nearest hospital can be relied upon to deliver safe child
     delivery.
270. the health services, clinic, or nearest hospital can be relied upon to deliver good ante-
     natal care.
271. the health services, clinic, or nearest hospital can be relied upon to deliver good post-
     natal care.
272. the health services, clinic, or nearest hospital can be relied upon to deliver safe abortion.
273. the health services, clinic, or nearest hospital can be relied upon to deliver HIV testing
     and counseling.
274. the health services, clinic, or nearest hospital can be relied upon to deliver antiretroviral
     therapy.
275. the health services, clinic, or nearest hospital can be relied upon to prevent mother-to-
                                               107


     child transmission of HIV.
276. the health services, clinic, or nearest hospital can be relied upon to deliver support for
     gender-based violence.
277. one possesses a great deal of freedom of choice and control over family planning.
278. it is important for girls to continue their schooling even if they become pregnant and
     have children.
279. a girl is ready for marriage once she starts menstruating.
280. a girl should honor the decisions/wishes of her family even if she does not want to
     marry.
281. a boy should honor the decisions/wishes of his family even if he does not want to marry.
282. a girl should wait to have children until she is at least 18 years old, even if she is
     married.
283. it is safer for a woman to give birth at a clinic than at home.
284. women should have access to safe abortion services to terminate an unwanted preg-
     nancy.
285. is a woman’s responsibility to avoid getting pregnant.
286. only when a woman has a child is she a real woman.
287. having a son is always better than having a daughter.
288. contraceptives should be available for everyone, whether or not one is married.
289. sexual education promotes sexual activity among young people.
290. men, and not women, should decide how earnings will be used.
291. men, and not women, should make decisions on major household purchases.
292. men, and not women, should make decisions concerning healthcare visits and spending.
293. men, and not women, should decide whether a woman should give birth at a clinic.
294. men, and not women, should make decisions concerning care for children’s health.
295. men, and not women, should make decisions concerning visiting family or relatives.
296. men, and not women, should decide whether girls should go to school.
                                              108


297. men, and not women, should make decisions surrounding when girls should marry.
298. men, and not women, should decide with whom girls should marry.
299. men, and not women, should decide if and when to have children.
300. men, and not women, should decide on the number of children.
301. men, and not women, should decide if and when to have sex.
302. men, and not women, should decide whether to use condoms.
303. men, and not women, should decide whether to use modern contraceptives other than
     condoms.
304. men, and not women, should decide if girls should be circumcised.
305. men, and not women, should decide if boys should be circumcised.
306. men should help decide how earnings will be used.
307. men and women should make decisions on how earnings will be used together.
308. men should help make decisions on major household purchases.
309. men and women should make decisions on major household purchases together.
310. men should help make decisions on healthcare visits and spending.
311. men and women should make decisions on healthcare visits and spending together.
312. men should help make decisions on whether a woman should give birth at a clinic.
313. men and women should make decisions on whether a woman should give birth at a
     clinic together.
314. men should help make decisions on care for children’s health.
315. men and women should make decisions on care for children’s health together.
316. men should help make decisions on visits to family or relatives.
317. men and women should make decisions on visits to family or relatives together.
318. men should help make decisions on whether girls should go to school.
319. men and women should make decisions on whether girls should go to school together.
320. men should help make decisions on when girls should marry.
321. men and women should make decisions on when girls should marry together.
                                          109


322. men should help make decisions on with whom girls should marry.
323. men and women should make decisions on with whom girls should marry together.
324. men should help make decisions on if and when to have children.
325. men and women should make decisions on if and when to have children together.
326. men should help make decisions on the number of children.
327. men and women should make decisions on the number of children together.
328. men should play a role in deciding if and when to have sex.
329. men and women should decide together if and when to have sex.
330. men should play a role in deciding whether to use condoms.
331. men and women should decide whether to use condoms together.
332. men should play a role in deciding whether to use modern contraceptives other than
     condoms.
333. men and women should decide together whether to use modern contraceptives other
     than condoms.
334. men should play a role in deciding if girls should be circumcised.
335. men and women should decide together if girls should be circumcised.
336. men should play a role in deciding if boys should be circumcised.
337. men and women should decide together if boys should be circumcised.
338. saving money is realistically feasible.
339. one must spend savings or borrow money to get by.
340. over the coming years, the government should emphasize a high level of economic
     growth.
341. over the coming years, the government should prioritize ensuring the country has strong
     defense forces.
342. over the coming years, the government should focus on ensuring that people have more
     say about how things are done at their jobs and in their communities.
343. over the coming years, the government should prioritize work to make the nation’s
                                             110


     cities and countryside more beautiful.
344. maintaining order in the nation is of utmost importance.
345. giving people more say in important government decisions is of utmost importance.
346. fighting rising prices is of utmost importance.
347. protecting freedom of speech is of utmost importance.
348. having a stable economy is of utmost importance.
349. progress toward a less impersonal and more humane society is of utmost importance.
350. progress toward a society in which ideas count more than money is of utmost impor-
     tance.
351. fighting crime is of utmost importance.
   Demographic features that we considered include:
  1. Respondent sex
        • Male
        • Female
  2. Income
        • Monthly income
  3. Religion
        • Christianity
        • Islam
  4. Residence
        • Urban
        • Rural
  5. Occupation
        • Agriculture
        • Business
        • Teacher
        • Trade
                                             111


     • Other
     • None
6. Education
     • Arabic education
     • Primary
     • Secondary
     • Post-secondary
     • Unclear
7. Age
     • Respondent age
     • Age as of household head
8. Children
     • Number of children going to school
     • Number of children (under 18) in the household
     • Number of children ever born
     • Number of children alive
     • Number of children dead
9. Cowives
     • Rank among wives
                                       112