ON PERMUTATION PATTERNS, PINNACLE SETS, AND BACKBONES OF BIPARTITE
PROJECTIONS
By
Rachel Domagalski
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Mathematics – Doctor of Philosophy
2021
ABSTRACT
ON PERMUTATION PATTERNS, PINNACLE SETS, AND BACKBONES OF BIPARTITE
PROJECTIONS
By
Rachel Domagalski
This dissertation encompasses the study of two different fields, one regarding permutations in-
cluding pattern containment and pinnacle sets, and the other on weighted networks, specifically
bipartite projections and their backbones.
The study of pattern containment and avoidance for linear permutations is a well-established
area of enumerative combinatorics. A cyclic permutation is the set of all rotations of a linear
permutation. Callan initiated the study of permutation avoidance in cyclic permutations and
characterized the avoidance classes for all single permutations of length 4. We continue this
work. In particular, we establish a cyclic variant of the Erdős-Szekeres Theorem that any linear
permutation of length 𝑚𝑛 + 1 must contain either the increasing pattern of length 𝑚 + 1 or the
decreasing pattern of length 𝑛 + 1. We then derive results about avoidance of multiple patterns of
length 4. We also determine generating functions for the cyclic descent statistic on these classes.
We then study the pinnacle set, which is the value analogue of a well-studied permutation
statistic, the peak set. Let 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 be a permutation in the symmetric group 𝔖𝑛 written in
one-line notation. The pinnacle set of 𝜋, denoted Pin 𝜋, is the set of all 𝜋𝑖 such that 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 .
The classic peak set statistic consists of the positions of these values. The pinnacle set was
introduced by Davis, Nelson, Petersen, and Tenner who showed that it has many interesting
properties. In particular, they proved that the number of subsets of [𝑛] = {1, 2, . . . , 𝑛} which can
be the pinnacle set of some permutation is a binomial coefficient. Their proof involved a bijection
with lattice paths and was somewhat involved. We give a simpler demonstration of this result which
does not need lattice paths. Moreover, we show that our map and theirs are different descriptions
of the same function. Davis et al. also studied the number of pinnacle sets with maximum 𝑚 and
cardinality 𝑑 which they denoted by 𝔭(𝑚, 𝑑). We show that these integers are the well-known ballot
numbers and give two proofs of this fact: one using finite differences and one bijective. Diaz-
Lopez, Harris, Huang, Insko, and Nilsen found a summation formula for calculating the number
of permutations in 𝔖𝑛 having a given pinnacle set. We derive a new expression for this number
which is faster to calculate in many cases. We also show how this method can be adapted to find
the number of orderings of a pinnacle set which can be realized by some 𝜋 ∈ 𝔖𝑛 . This concludes
our research on permutations.
Bipartite projections are used in a wide range of network contexts including politics (bill co-
sponsorship), geography (firm co-location), genetics (gene co-expression), economics (executive
board co-membership), and innovation (patent co-authorship). However, because bipartite pro-
jections are always weighted graphs, which are inherently challenging to analyze and visualize,
it is often useful to examine the ‘backbone,’ an unweighted subgraph containing only the most
significant edges. We introduce the R package backbone for extracting the backbone of weighted
bipartite projections, and use two empirical datasets to demonstrate its functionality, bill sponsor-
ship data from the 114th session of the United States Senate and a Globalization and World Cities
data set regarding firm locations in 2000.
After introducing and demonstrating five different models for backbone extraction, the fixed fill
model (FFM), fixed row model (FRM), fixed column model (FCM), fixed degree sequence model
(FDSM), and stochastic degree sequence model (SDSM), we compare them in terms of accuracy,
speed, statistical power, similarity, and community detection. Here, we aim to find which models
perform similarly to FDSM, since the FDSM model controls for both degree sequences exactly.
We find that the computationally-fast SDSM offers a statistically conservative but close approx-
imation of the computationally-impractical FDSM under a wide range of conditions, and that it
correctly recovers a known community structure even when the signal is weak. Therefore, although
each backbone model may have particular applications, we recommend SDSM for extracting the
backbone of most bipartite projections.
Copyright by
RACHEL DOMAGALSKI
2021
ACKNOWLEDGEMENTS
First I would like to thank my advisor, Dr. Bruce Sagan, for all his help and support since I arrived
at MSU. You have always made me feel like I can do and achieve anything and have helped me
feel capable in my skills as a mathematician. I am so grateful for your advice and kindness. To
Dr. Zachary Neal, thank you for taking me under your wing and into your research projects. You
have shown me a new way of applying mathematics and it’s been so much fun and so exciting
working with you. Thank you for your support and trust. Without both of your encouragement and
guidance, this thesis would not be possible.
To my committee, Drs. Robert Bell, Peter Magyar, and Elizabeth Munch, thank you so much
for dedicating your time to preparing me to become a mathematician worthy of this degree. To
Dr. Sivaram Narayan, thank you for your guidance and inspiring me to pursue graduate school.
I greatly appreciate your wisdom, support, and friendship through my undergrad, masters, PhD,
and beyond. My collaborators and friends, Jinting Liang, Quinn Minnich, Jamie Schmidt, Alex
Sietsema, and Xiaoqin Yan, thank you for your encouragement and knowledge. It’s been a pleasure
working with you all both in-person and online. You are all destined for great things, and I can’t
wait to see what you accomplish.
Davis, words cannot even begin to come close to cover how grateful I am for your support over
the past decade. Through every life step we’ve taken, you’ve always encouraged me to follow every
dream and passion. I love you. Here’s to our next chapter with the beautiful family we’ve made.
Which brings me to our dogs, Atlas and Tsuki. You two are magic. All extra hours post-graduation
go to you my loves.
Mom and Dad, thank you for instilling a love of math and science and discovery in me from
day one. I’m so lucky to have such wonderful people as parents. Thank you for being my biggest
cheerleaders and for all of your love. Steven, my best friend since birth. Thank you for always
being there for me, no matter how many miles between us. You inspire me every single day. I love
you all and can’t wait to be together again.
v
To my cohort and beyond, I couldn’t have done this without you. The laughter, the late nights,
the self-deprecating humor, the game nights, the life talks, the tree climbing, the walks to the river,
the El Oasis taco truck trips, the downs, and the so so many ups, I love you all and miss seeing you
daily. To my friends, especially Emilee, Nikki, Olivia, Brooke, Paige, and to my extended family,
thank you for your support, love, and all the joy we have shared.
Finally, thank you to the faculty and staff members of the mathematics department at Michigan
State University for creating such a wonderful home during my time here, and those at Central
Michigan University and Holly High School who got me here in the first place.
Portions of Chapter 3 appear in “Domagalski, R., Liang, J., Minnich, Q., Sagan, B. E., Schmidt,
J., & Sietsema, A. (2021). Cyclic Pattern Containment and Avoidance. ArXiv:2106.02534 [Math].
http://arxiv.org/abs/2106.02534” and are reprinted here under a CC BY 4.0 license.
Portions of Chapter 4 appear in “Domagalski, R., Liang, J., Minnich, Q., Sagan, B. E., Schmidt,
J., & Sietsema, A. (2021). Pinnacle Set Properties. ArXiv:2105.10388 [Math].
http://arxiv.org/abs/2105.10388” and are reprinted here under a CC BY 4.0 license.
Portions of Chapter 7 were originally published in “Domagalski, R., Neal, Z. P., & Sagan, B.
(2021). Backbone: An R package for extracting the backbone of bipartite projections. Plos one,
16(1), e0244363,” reprinted here under a CC BY 4.0 license, and in “Neal, Z. P., Domagalski, R., &
Sagan, B. (2021). Analysis of Spatial Networks From Bipartite Projections Using the R Backbone
Package. Geographical Analysis. https://doi.org/10.1111/gean.12275 [NDS21a],” and “Neal, Z. P.,
Domagalski, R., & Yan, X. (2022). Homophily in collaborations among US House Representatives,
1981–2018. Social Networks, 68, 97–106. https://doi.org/10.1016/j.socnet.2021.04.007,“ both
reprinted with journal permissions.
Portions of Chapters 6 and 8 originally appeared in “Neal, Z. P., Domagalski, R., & Sagan, B.
(2021). Comparing Models for Extracting the Backbone of Bipartite Projections. arXiv preprint
arXiv:2105.13396.” They are reprinted here under a CC BY-SA 4.0 license.
The work in Chapters 6-8 was supported by funding from the National Science Foundation
(#1851625 & #2016320) and Michigan State University Center for Business and Social Analytics.
vi
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2 BACKGROUND ON PERMUTATION PATTERNS AND STATISTICS . . 5
CHAPTER 3 CYCLIC PATTERN CONTAINMENT AND AVOIDANCE . . . . . . . . . 13
3.1 A cyclic Erdős-Szekeres Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Pattern avoidance of doubletons . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Three or more patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Cyclic descent generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Open problems and concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Longer patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.2 Other statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Vincular patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
CHAPTER 4 PINNACLE SET PROPERTIES . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Counting admissible pinnacle sets . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Ballot numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Permutations with a given pinnacle set . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Open problems and concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 62
CHAPTER 5 BACKGROUND ON BACKBONE EXTRACTION . . . . . . . . . . . . . 64
CHAPTER 6 BACKBONE MODELS AND THEIR PROBABILITY MASS FUNCTIONS 69
6.1 Bipartite ensemble backbone models . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Fixed degree sequence model (FDSM) . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Fixed fill model (FFM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4 Fixed row model (FRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Fixed column model (FCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.6 Stochastic degree sequence model (SDSM) . . . . . . . . . . . . . . . . . . . . . 78
CHAPTER 7 BACKBONE: AN R PACKAGE FOR EXTRACTING THE BACKBONE
OF WEIGHTED GRAPHS . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.1 Two Illuminating Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.1.1 Legislative Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.1.2 Spatial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Universal Threshold universal() . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 Fixed fill model fixedfill() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Fixed row model fixedrow() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
vii
7.5 Fixed column model fixedcol() . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.6 Stochastic degree sequence model sdsm() . . . . . . . . . . . . . . . . . . . . . . 103
7.7 Fixed degree sequence model fdsm() . . . . . . . . . . . . . . . . . . . . . . . . 107
CHAPTER 8 COMPARING MODELS FOR BACKBONE EXTRACTION . . . . . . . . 111
8.1 Study 1: Choosing cell-filling probabilities for the SDSM . . . . . . . . . . . . . . 111
8.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Study 2: Statistical power of SDSM . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.3 Study 3: Backbone equivalence under varying degree distributions . . . . . . . . . 117
8.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.4 Study 4: Recovery of community structure . . . . . . . . . . . . . . . . . . . . . . 122
8.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.5 Recommendations for Backbone Selection . . . . . . . . . . . . . . . . . . . . . . 126
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
viii
LIST OF TABLES
Table 3.1: Wilf equivalence classes and cardinalities of Av𝑛 [Π] for certain [Π] and 𝑛 ≥ 5 . 26
Table 4.1: Run times in seconds compared when most 𝑛𝑖 are equal . . . . . . . . . . . . . . 59
Table 4.2: Run times in seconds compared when most 𝑛𝑖 are constant . . . . . . . . . . . . 60
Table 8.1: SDSM probabilities given agent and artifact degree sequences [1,1,2] . . . . . . 113
Table 8.2: Bipartite degree distributions, with examples in the context of a scholarly
authorship bipartite network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
ix
LIST OF FIGURES
Figure 2.1: The graph of 42351 on the left and of [42351] on the right . . . . . . . . . . . 6
Figure 2.2: The diagram of 132 (left) and 132h𝜎1 , 𝜎2 , 𝜎3 i (right) . . . . . . . . . . . . . . 7
Figure 3.1: The graph of [𝜎] when 𝑚 = 5 and 𝑛 = 3 . . . . . . . . . . . . . . . . . . . . . 14
Figure 4.1: The lattice path 𝐿 for 𝐴 = {2, 3, 7, 9} . . . . . . . . . . . . . . . . . . . . . . . 42
Figure 4.2: Example of a pinnacle set ordering [𝜏] = [7612354] with corresponding dales. . 51
Figure 5.1: Bipartite and bipartite projection networks . . . . . . . . . . . . . . . . . . . . 67
Figure 7.1: An example of an extracted backbone, with Democratic senators represented
by blue vertices, and Republican senators represented by red vertices. . . . . . . 83
Figure 7.2: The distribution of (A) row sums and (B) column sums in the GaWC Dataset 11. 90
Figure 7.3: The positive backbone of the US Senate co-sponsorship network with edges
retained between two senators if they sponsored at least 1 bill together. . . . . . 93
Figure 7.4: The positive backbone of the US Senate co-sponsorship network with edges
retained between two senators if they sponsored more bills together than one
standard deviation above the mean. . . . . . . . . . . . . . . . . . . . . . . . . 96
Figure 7.5: The positive backbone of the US Senate co-sponsorship network under the
fixed row model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Figure 7.6: The positive backbone of the US Senate co-sponsorship network under the
fixed column model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 7.7: The positive backbone of the US Senate co-sponsorship network under the
stochastic degree sequence model. . . . . . . . . . . . . . . . . . . . . . . . . 105
Figure 7.8: Null weight distributions generated using the backbone package on from the
GaWC Dataset 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Figure 7.9: A histogram of the expected co-sponsorships between Senators Cory Booker
and Elizabeth Warren under the fixed degree sequence model (1000 samples).
A positive edge between Booker and Warren would be preserved in the FDSM
backbone because their actual number of co-sponsorships (98) is statistically
significantly larger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
x
Figure 7.10: The positive backbone of the US Senate co-sponsorship network under the
fixed degree sequence model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Figure 8.1: (A) Accuracy and (B) speed computing 𝑝𝑖𝑘 ∗ using different methods. . . . . . . 114
Figure 8.2: Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan
edge in projections derived from FDSM and SDSM ensembles. (B) Similarity
of an FDSM backbone extracted at 𝛼 = 0.05 to SDSM backbones extracted
at various 𝛼 from an empirical bipartite network (green line) and from 100
synthetic bipartite networks (purple line = mean, purple region = 10th –90th
percentile). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Figure 8.3: Jaccard similarity of a backbone extracted at 𝛼 = 0.05 using the Fixed Degree
Sequence Model and a backbone extracted using (A) the Fixed Fill Model,
(B) Fixed Row Model, (C) Fixed Column Model, (D) Stochastic Degree
Sequence Model. Each cell represents the mean over 100 instances of a
100 × 100 bipartite network with given agent and artifact degree distributions. . 120
Figure 8.4: (A) Given agent and artifact degree distributions, there exists a statistical
significance level 𝛼 that maximizes the similarity between an SDSM backbone
extracted at this level and an FDSM backbone extracted at 𝛼 = 0.05, and (B)
when used yields an SDSM backbone that is very similar to the corresponding
FDSM backbone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Figure 8.5: (A) Synthetic bipartite networks with varying levels of block structure, from
which (B) backbones extracted using different models exhibit varying mod-
ularity. (C) When 65% of bipartite edges are within-block, a backbone
extracted using FDSM shows a clear group structure (top) while a backbone
extracted using FCM does not (bottom). . . . . . . . . . . . . . . . . . . . . . 125
xi
CHAPTER 1
INTRODUCTION
This doctoral thesis is the culmination of two combinatorial projects. The first explores permutation
patterns and statistics, specifically looking at pattern containment and avoidance of cyclic permu-
tations, and generating functions of cyclic descent statistics. Additionally, we study a particular
permutation statistic, the pinnacle set. The second project involves bipartite projections, a type of
weighted graph. When a weighted graph represents a social relationship, it is of interest to know
whether an edge weight should be considered particularly strong or weak. We provide various
probabilistic null models to which one can compare an edge weight to determine its statistical
significance. Edges deemed significant are part of the backbone subgraph.
The initial three chapters will describe the project on permutations, beginning with background
information in chapter 2, then discussing permutation patterns and avoidance in chapter 3, and fi-
nally pinnacle set properties in chapter 4. We begin by expanding on the well-studied field of pattern
avoidance in linear permutations by considering its implications in cyclic permutations. Specifi-
cally, we begin chapter 3 by proving a cyclic variant of the Erdős-Szekeres theorem in section 3.1.
This new theorem states that in any cyclic permutation of size 𝑚𝑛 + 2, there is either an increasing
subsequence of length 𝑚 + 2 or a decreasing subsequence of length 𝑛 + 2. This theorem becomes of
great use in our study of length four pattern avoidance in sections 3.2 and 3.3. While linear pattern
avoidance has origins reaching back to the early 1900’s, the study of cyclic pattern avoidance was
introduced relatively recently by Callan [Cal02] in 2002. He was able to count the number of
cyclic permutations that avoid single patterns of length four (length three pattern avoidance being
relatively trivial). We complete this study of length four pattern avoidance by counting the number
of cyclic permutations that avoid any set of length four patterns, specifically providing proof for all
pairs and triples. These proofs utilize the proof technique of generating trees. As the cardinality
of the set of patterns increases, the number of permutations that avoid the set decreases. These
results allow us to completely count all avoidance sets of any size of length four patterns. After
1
this classification, we discuss cyclic descent generating functions in section 3.4. These generating
functions allow us to count the numbers of cyclic descents in permutations that avoid a given set of
patterns, refining our enumerations of the avoidance classes. Chapter 3 is concluded by a section
on open problems raised within this work, namely now that patterns of length three and four are
characterized, future projects could include looking for enumerative formulas for patterns of length
five and higher. It is also of interest to look at the generating functions for other permutation
statistics over the avoidance classes. We provide one result which counts the joint distribution of
cyclic descents and cyclic peaks. Additionally vincular pattern avoidance can be studied. In this
scenario, occurrences of the pattern in a permutation may require different elements to be adjacent
to one another. We conjecture an exponential generating function which will count the number of
permutations that avoid 123 and 213, concluding chapter 3. Recently, Sergi Elizalde and Bruce
Sagan have proven this conjecture [ES21].
Using the background on permutations and permutation statistics presented in chapter 2, in chap-
ter 4 we will explore the pinnacle set of a permutation and prove a number of results related to
counting either the number of pinnacle sets or the number of permutations with a given pinnacle set.
In section 4.1, we reprove a result of [DNKPT18] that counts the number of pinnacle sets. Their
proof involved lattice paths and was somewhat complicated, while ours is a simpler demonstration
that does not need lattice paths. In fact, we show that our map and theirs are different descriptions
of the same function. We then turn our attention to counting pinnacle sets with a defined maximum
and size in section 4.2. While [DNKPT18] proved these counts satisfied a nice recurrence, they
did not provide a formula to find the exact count. We show that these counts are actually just ballot
numbers, and do this in two ways: using the theory of finite differences and via a bijection. Since we
now have counts of the number of pinnacle sets of given sizes, it is natural to turn one’s attention to
counting the number of permutations with a given pinnacle set. We address this area in section 4.3.
While a summation formula that counts such permutations was given in [DLHH+ 21], we construct
a new formula that is more computationally efficient in many cases. We also show how this formula
can be modified to answer a similar question: how many admissible orderings of a pinnacle set are
2
there? Both of the enumerations found in this section have been of great interest to the research
community in recent weeks, and we conclude this chapter by describing the recent progress made
in constructing even faster formulas in section 4.4, which completes our study of permutations.
The remaining chapters will discuss the backbone of a weighted network. We begin by
introducing the concept of bipartite projections and backbone extraction in chapter 5. While
bipartite networks are used to describe and represent a wide range of scenarios, their projections
are challenging to analyze as they are dense and weighted. In addition, the projection loses
information about the original row and column degree sequences of the bipartite network. Ideally,
we’d like to reduce the complexity of these networks to a backbone network that contains only
the most important edges. The edges retained should be those that had a higher or lower weight
than would be expected in a random scenario. To find these backbone networks, we introduce five
different bipartite ensemble backbone models in chapter 6. Each of the different bipartite ensemble
models constrain the degree sequences of the set of all bipartite networks to which we compare our
data. We prove the probability mass functions for the stochastic degree sequence model (SDSM),
fixed row model (FRM), fixed column model (FCM), and fixed fill model (FFM). The FDSM
is considered the ‘gold standard’ model as it exactly fixes both degree sequences. However, its
distribution remains unknown, and therefore we must approximate it through Monte Carlo methods.
While methods for backbone extraction including a few of the ones mentioned above have existed
in the literature for several years, there did not exist one central software package or program where
they were all implemented. This meant that researchers who wanted to find a backbone of their
network would have to first find which method they wanted to use, potentially guessing which was
best for their purposes, and then see if the algorithm was already implemented or available for
use. To increase the ease of access for backbone methods, we’ve implemented the SDSM, FDSM,
FRM, FCM, and FFM in the new R package backbone. The package and its usage are described
in chapter 7. To demonstrate how to use backbone, we apply the functions to two different data sets,
a legislative network and a spatial network. Through implementing the R package and increasing
its user base, we’re often met with the same question from researchers: “which model should be
3
used for my data?” This is the question we investigate in chapter 8.
In chapter 8 we consider each of the five aforementioned models and compare their accuracy,
speed, statistical power, similarity, and community detection. These analyses are conducted in four
studies. In section 8.1, we evaluate the accuracy and speed of different approaches for estimating
cell-filling probabilities used by the SDSM. In section 8.2, we evaluate the statistical power of
the SDSM relative to the FDSM. In section 8.3, we examine how degree distributions impact
the similarity of backbones extracted using different models. In section 8.4, we examine the
extent to which backbones extracted using different models accurately recover a known community
structure. Finally, we conclude in section 8.5 with recommendations for backbone model selection
and opportunities for future model development.
4
CHAPTER 2
BACKGROUND ON PERMUTATION PATTERNS AND STATISTICS
We begin by reviewing some notions from the well-studied theory of patterns in (linear) permuta-
tions. We then discuss permutation statistics and generating functions for cyclic descents. We’ll
finish by exploring what is known about the pinnacle set. The pinnacle set is the value analogue
of a particular permutation statistic, the peak set. More information on the topic of patterns in
permutations can be found in the texts of Bóna [Bón04], Sagan [Sag20], or Stanley [Sta97, Sta99].
Let N and P be the nonnegative and positive integers, respectively. If 𝑚, 𝑛 ∈ N then we define
[𝑚, 𝑛] = {𝑚, 𝑚 + 1, . . . , 𝑛}; if 𝑚 = 1 we then abbreviate to [𝑛] = [1, 𝑛]. Consider the symmetric
group 𝔖𝑛 of all permutations 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 of [𝑛] written in one-line notation. We call 𝑛 the
length of 𝜋 and write |𝜋| = 𝑛. We will also use this notation to represent the cardinality of a
set, where the difference should be clear by context. We will sometimes put commas between the
elements of 𝜋 for readability. We say that two sequences of distinct integers 𝜋 = 𝜋1 . . . 𝜋 𝑘 and
𝜎 = 𝜎1 . . . 𝜎𝑘 are order isomorphic, written 𝜋 𝜎, whenever 𝜋𝑖 < 𝜋 𝑗 if and only if 𝜎𝑖 < 𝜎 𝑗 . If
𝜎 ∈ 𝔖𝑛 and 𝜋 ∈ 𝔖𝑘 then 𝜎 contains 𝜋 as a pattern if there is a subsequence 𝜎0 of 𝜎 with |𝜎0 | = 𝑘
and 𝜎0 𝜋. If no such subsequence exists then 𝜎 avoids 𝜋. We use the notation
Av𝑛 (𝜋) = {𝜎 ∈ 𝔖𝑛 | 𝜎 avoids 𝜋}
for the avoidance class of 𝜋. For example 𝜎 = 42351 contains the pattern 𝜋 = 3241 because of the
subsequence 4251, among others. But it avoids 1234 because it has no increasing subsequence of
length 4. One can extend this notion to sets of permutations Π by letting
Ù
Av𝑛 (Π) = {𝜎 ∈ 𝔖𝑛 | 𝜎 avoids all 𝜋 ∈ Π} = Av𝑛 (𝜋).
𝜋∈Π
A famous theorem of Erdős and Szekeres [ES35] can be stated in terms of pattern containment
and avoidance. Let
𝜄𝑛 = 12 . . . 𝑛 and 𝛿𝑛 = 𝑛 . . . 21
5
5 5
4 4
3 3
2 2
1 1
1 2 3 4 5 1 2 3 4 5
Figure 2.1: The graph of 42351 on the left and of [42351] on the right
be the increasing and decreasing permutations of length 𝑛, respectively.
Theorem 2.0.1 ([ES35]). Suppose 𝑚, 𝑛 ∈ N. Then any 𝜎 ∈ 𝔖𝑚𝑛+1 contains either 𝜄𝑚+1 or 𝛿𝑛+1 .
This is the best possible in that there exist permutations in 𝔖𝑚𝑛 which avoid both 𝜄𝑚+1 and 𝛿𝑛+1 .
The diagram of 𝜋 ∈ 𝔖𝑛 is the collection of points (𝑖, 𝜋𝑖 ) in the first quadrant of the Cartesian
plane. The graphical representation of 𝜋 = 42351 is given on the left in Figure 2.1. It follows that
we can act on 𝜋 with the dihedral group of the square
𝐷 4 = {𝜌0 , 𝜌90 , 𝜌180 , 𝜌270 , 𝑟 0 , 𝑟 1 , 𝑟 −1 , 𝑟 ∞ }
where 𝜌 𝜃 is rotation counterclockwise through 𝜃 degrees and 𝑟 𝑚 is reflection in a line of slope 𝑚.
We wish to write some of these rigid motions in terms of the one-line notation for 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 .
Reflection in a vertical line gives the reversal of 𝜋 which is
𝜋𝑟 = 𝜋𝑛 . . . 𝜋2 𝜋1 .
Similarly, reflection in a horizontal line results in the complement of 𝜋
𝜋 𝑐 = 𝑛 + 1 − 𝜋1 , 𝑛 + 1 − 𝜋2 , . . . , 𝑛 + 1 − 𝜋𝑛 .
Combining these two operations gives rotation by 180 degree or reverse complement
𝜋𝑟𝑐 = 𝑛 + 1 − 𝜋𝑛 , . . . , 𝑛 + 1 − 𝜋2 , 𝑛 + 1 − 𝜋1 .
6
𝜎2
𝜎3
𝜎1
Figure 2.2: The diagram of 132 (left) and 132h𝜎1 , 𝜎2 , 𝜎3 i (right)
We apply any of these operations to sets of permutations by applying them to each element of the
set.
We can use diagrams to inflate permutations. If we are given 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 ∈ 𝔖𝑛 and
permutations 𝜎1 , 𝜎2 , . . . , 𝜎𝑛 then the inflation of 𝜋 by the 𝜎𝑖 is the permutation 𝜋h𝜎1 , 𝜎2 , . . . , 𝜎𝑛 i
whose diagram is obtained from that of 𝜋 by replacing each vertex (𝑖, 𝜋𝑖 ) by a copy of 𝜎𝑖 . For
example, given 𝜋 = 132 and 𝜎1 , 𝜎2 , 𝜎3 then a schematic of the diagram of 132h𝜎1 , 𝜎2 , 𝜎3 i is given
on the right in Figure 2.2. More concretely, if 𝜎1 = 21, 𝜎2 = 1, and 𝜎3 = 213 then
132h𝜎1 , 𝜎2 , 𝜎3 i = 216435.
We say that patterns 𝜋 and 𝜋0 are Wilf equivalent, written 𝜋 ≡ 𝜋0, if # Av𝑛 (𝜋) = # Av𝑛 (𝜋0) for
all 𝑛 ∈ N, where the hash symbol denotes cardinality. This definition extends in the obvious way
to sets of patterns. Note that if 𝜋 and 𝜋0 are Wilf equivalent then both must be in the same 𝔖𝑛 . It
is easy to see that if 𝜙 ∈ 𝐷 4 then 𝜋 ≡ 𝜙(𝜋) and so these are called trivial Wilf equivalences. It is
well known that all elements of 𝔖3 are Wilf equivalent.
Theorem 2.0.2. If 𝜋 ∈ 𝔖3 then
# Av𝑛 (𝜋) = 𝐶𝑛
where 𝐶𝑛 = 𝑛+1 1 2𝑛 is the 𝑛th Catalan number.
𝑛
Trivial Wilf equivalence carries over to sets Π of permutations. Simion and Schmidt [SS85]
determined all Wilf equivalences among the Av𝑛 (Π) for all Π ⊆ 𝔖3 .
7
A permutation statistic is a map st : ]𝑛≥0𝔖𝑛 → 𝑆 where 𝑆 is some set. Famous permutation
statistics include the descent set statistic
Des 𝜋 = {𝑖 | 𝜋𝑖 > 𝜋𝑖+1 },
where the elements 𝑖 ∈ Des 𝜋 are called descents and if 𝜋𝑖 < 𝜋𝑖+1 then 𝑖 is called an ascent,
the descent number statistic
des 𝜋 = # Des 𝜋,
the major index statistic
Õ
maj 𝜋 = 𝑖,
𝑖∈Des 𝜋
the inversion statistic
inv 𝜋 = #{(𝑖, 𝑗) | 𝑖 < 𝑗 and 𝜋𝑖 > 𝜋 𝑗 },
the excedance statistic
exc 𝜋 = #{𝑖 | 𝜋(𝑖) > 𝑖},
and the peak set statistic
Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 }.
Returning to the example given in fig. 2.1, the permutation 𝜋 = 42351 has Des 𝜋 = {1, 4},
des 𝜋 = 2, maj 𝜋 = 5, inv 𝜋 = 6, and exc 𝜋 = 2, and Pk 𝜋 = {4}.
Let st be a statistic whose range is N and let 𝑞 be a variable. If Π is a set of patterns then its
avoidance class has a corresponding generating function
𝐹𝑛st (Π) = 𝐹𝑛st (Π; 𝑞) = 𝑞 st 𝜎 .
Õ
𝜎∈Av𝑛 (Π)
st
Say that Π and Π0 are st-Wilf equivalent and write Π ≡ Π0 if 𝐹𝑛st (Π) = 𝐹𝑛st (Π0) for all 𝑛 ≥ 0.
Clearly st-Wilf equivalence implies Wilf equivalence. The maj- and inv-Wilf equivalence classes
for Π ⊆ 𝔖3 were determined by Dokos, Dwyer, Johnson, Sagan, and Selsor [DDJ+ 12].
If 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 ∈ 𝔖𝑛 then the corresponding cyclic permutation is the set of all rotations of
𝜋, denoted
[𝜋] = {𝜋1 𝜋2 . . . 𝜋𝑛 , 𝜋2 . . . 𝜋𝑛 𝜋1 , . . . , 𝜋𝑛 𝜋1 . . . , 𝜋𝑛−1 }.
8
Continuing our example from the beginning of the section,
[42351] = {42351, 23514, 35142, 51423, 14235}.
If necessary, we will call permutations from 𝔖𝑛 linear to distinguish them from their cyclic cousins.
We also use square brackets to denote cyclic analogues of objects defined in the linear case. For
example, [𝔖𝑛 ] is the set of all cyclic permutations of length 𝑛. We say a cyclic permutation [𝜎]
contains [𝜋] as a pattern if there is some rotation 𝜎0 of 𝜎 which contains 𝜋 linearly. Otherwise
[𝜎] avoids [𝜋]. In our perennial example, even though 42351 avoids 1234 we have that [42351]
contains [1234] since the rotation 14235 has the copy 1235 of this pattern. Given a set [Π] of cyclic
patterns the cyclic avoidance class Av𝑛 [Π] is defined as expected. Note that when using a specific
set of cyclic permutations the square brackets will be put around the permutations themselves, for
example, Av𝑛 ([𝜋], [𝜋0]). Callan [Cal02] determined # Av𝑛 [𝜋] for all [𝜋] ∈ [𝔖4 ]. Gray, Lanning,
and Wang continued work in this direction considering cyclic packing of patterns [GLW18] and
patterns in colored cyclic permutations [GLW19].
The graph of a cyclic permutation [𝜋] is obtained by embedding the graph of 𝜋 on a cylinder.
This is indicated on the right in Figure 2.1 by identifying the two dotted arrows. Cyclic Wilf
equivalence has the obvious definition. But note that now there are fewer trivial cyclic Wilf
equivalences since we need the chosen group element to preserved the cylinder, not just the square.
So the only trivial equivalences are
[𝜋] ≡ [𝜋𝑟 ] ≡ [𝜋 𝑐 ] ≡ [𝜋𝑟𝑐 ]. (2.1)
Certain linear permutation statistics have obvious cyclic analogues. For example, if 𝜋 ∈ 𝔖𝑛
then its cyclic descent number is
cdes[𝜋] = #{𝑖 | 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}.
Note that this is well defined because the cardinality does not depend on which representative of
[𝜋] is chosen. To illustrate, 𝜋 = 23514 has cyclic descents at indices 3 and 5 so cdes[𝜋] = 2. The
9
corresponding generating function 𝐹𝑛cdes [Π] where [Π] is a set of cyclic permutations, and cdes-
Wilf equivalence should now need no definition. Note that cdes is another form of the excedance
statistic on linear permutations. In particular, if 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 then
cdes[𝜋] = exc(𝜋𝑛 , . . . , 𝜋2 , 𝜋1 )
where (𝜋𝑛 , 𝜋𝑛−1 . . . , 𝜋1 ) is cycle notation for the linear permutation which, as a function, sends 𝜋𝑖
to 𝜋𝑖−1 for all 𝑖 modulo 𝑛.
We return our attention to the peak set statistic on linear permutations,
Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 } ⊆ [2, 𝑛 − 1].
For example, if 𝜋 = 18524376 then Pk 𝜋 = {2, 5, 7} since 𝜋2 = 8, 𝜋5 = 4, and 𝜋7 = 7 are all bigger
than the elements directly to their left and right. It is easy to see that 𝑆 ⊆ [2, 𝑛 − 1] is the peak set
of some 𝜋 ∈ 𝔖𝑛 if and only if no two elements of 𝑆 are consecutive. So the number of possible
peak sets is a Fibonacci number. One could also ask how many permutations have a given peak
set. This question was answered by Billey, Burdzy and Sagan.
Theorem 2.0.3 ([BBS13]). If 𝑛 ∈ P and 𝑆 ⊆ [2, 𝑛] then
#{𝜋 | Pk 𝜋 = 𝑆} = 𝑝(𝑆; 𝑛)2𝑛−#𝑆−1
where # denotes cardinality and 𝑝(𝑆; 𝑛) is a polynomial in 𝑛 depending on 𝑆.
It is natural to study the values at the peak indices. This line of research was initiated by Davis,
Nelson, Petersen, and Tenner [DNKPT18] and continued by Rusu [Rus20]; Diaz-Lopez, Harris,
Huang, Insko, and Nilsen [DLHH+ 21]; and Rusu and Tenner [RT]. Define the pinnacle set of a
permutation 𝜋 ∈ 𝔖𝑛 to be
Pin 𝜋 = {𝜋𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 } ⊆ [3, 𝑛]
Continuing with the example 𝜋 = 18524376 we see that Pin 𝜋 = {4, 7, 8}. Following Davis et al.,
call a set 𝑆 an admissible pinnacle set if there is some permutation 𝜋 with Pin 𝜋 = 𝑆. They found a
10
criterion for 𝑆 to be admissible which will be useful in this work. This result was stated in recursive
fashion, but it is clearly equivalent to the following non-recursive version.
Theorem 2.0.4 ([DNKPT18]). Let 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } ⊂ P. The set 𝑆 is an admissible
pinnacle set if and only if we have
𝑠𝑖 > 2𝑖
for all 𝑖 ∈ [𝑑].
Davis et al. were able to count the number of admissible pinnacle sets for 𝜋 ∈ 𝔖𝑛 .
Theorem 2.0.5 ([DNKPT18]). If
A𝑛 = {𝑆 | 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 }
then
𝑛 − 1k
#A𝑛 = 𝑛−1 .j
2
They also studied the more refined constants
𝔭(𝑚, 𝑑) = #{𝑆 ∈ A𝑛 | max 𝑆 = 𝑚 and #𝑆 = 𝑑}
where 𝑛 ≥ 𝑚. Note that if 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 then 𝑆 is also a pinnacle set of some 𝜋0 ∈ 𝔖𝑛0
for all 𝑛0 ≥ 𝑛 since one can just add values larger than 𝑛 to the beginning of 𝜋 in decreasing order.
It follows that the exact value of 𝑛 does not play a role in the definition of 𝔭(𝑚, 𝑑).
A number of questions have been raised about pinnacle sets. For example, if
𝑝 𝑆 (𝑛) = #{𝜋 ∈ 𝔖𝑛 | Pin 𝜋 = 𝑆}
then how can one compute these numbers as there does not seem to be an analogue of Theorem 2.0.3
in the context of pinnacles. Davis et al. gave a recursive procedure for doing so, and then a non-
recursive summation formula for determining the 𝑝 𝑆 (𝑛) was proposed in the paper of Diaz-Lopez
et al.
11
Another problem suggested earlier is as follows. Given an admissible 𝑆, a permutation 𝜎 of 𝑆
is called an admissible ordering if there is a 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆 and the pinnacles of 𝜋 occur in
the same order as they do in 𝜎. Let
O (𝑆) = {𝜎 | 𝜎 is an admissible ordering of 𝑆}.
For example, if 𝑆 = {3, 5, 7} then 𝜎 = 537 ∈ O (𝑆) as witnessed by 𝜋 = 4513276. But 375 ∉ O (𝑆)
since in order for 6 not to be a pinnacle, it must be directly to the left or right of 7 and both
choices lead to a contradiction. The set O𝑆 was studied in the articles of Rusu, and of Rusu and
Tenner [Rus20, RT]. In the latter paper, the authors asked for a function to compute #O (𝑆).
With these definitions and results in hand, we first examine cyclic pattern containment and
avoidance in the following chapter 3. We begin by proving a cyclic version of the Erdős-Szekeres
Theorem 3.1.1 in section 3.1. This result is used to help us count # Av𝑛 ([Π]) in sections 3.2
and 3.3, where [Π] ⊆ [𝔖4 ] and #[Π] ≥ 2 . We then consider cyclic descent generating functions
over Av𝑛 ( [Π]) in section 3.4, and find 𝐹𝑛cdes [Π] for #[Π] = 1, 2 and [Π] ⊆ [𝔖4 ].
We then continue to our study of pinnacle sets in chapter 4. We begin by counting the number
of admissible pinnacle sets in section 4.1. This quantity, given in theorem 2.0.5, was already
found in [DNKPT18]. Here we provide a simpler proof using a bijection using interleaved and right
canonical permutations. As mentioned, [DNKPT18] also studied the values 𝔭(𝑚; 𝑑). In section 4.2,
we show these constants are actually ballot numbers, specifically 𝔭(𝑚; 𝑑) = 𝑚−2𝑑+1 𝑑−1 . We
𝑚−1
𝑚−1
do this in two ways, using finite differences and a bijection. Once we’ve counted the number
of admissible pinnacle sets, we consider the number of permutations with a given pinnacle set
in section 4.3. We provide a sum to count 𝑝 𝑆 (𝑛) in theorem 4.3.1 which is asymptotically
more efficient than previously existing methods. We then extend this result to count #O (𝑆)
in theorem 4.3.12.
12
CHAPTER 3
CYCLIC PATTERN CONTAINMENT AND AVOIDANCE
This chapter contains material from Domagalski, Liang, Minnich, Sagan, Schmidt, and Siet-
sema [DLM+ 21a]. All results in this chapter come from this manuscript except as otherwise
noted.
3.1 A cyclic Erdős-Szekeres Theorem
In this section we will use the linear Erdős-Szekeres Theorem to prove a cyclic analogue. We
will need a variant of the decreasing permutation 𝛿𝑛 defined as follows. Given nonnegative integers
𝑛 (the length), 𝑑 (the difference), and 𝑠 (the smallest value) define the decreasing sequence
𝛿𝑛,𝑑,𝑠 = 𝑠 + (𝑛 − 1)𝑑, 𝑠 + (𝑛 − 2)𝑑, . . . , 𝑠 + 𝑑, 𝑠.
For example
𝛿5,2,3 = 11, 9, 7, 5, 3.
Theorem 3.1.1. Suppose 𝑚, 𝑛 ∈ N. Then any [𝜎] ∈ [𝔖𝑚𝑛+2 ] contains either [𝜄𝑚+2 ] or [𝛿𝑛+2 ].
This is the best possible in that there exist permutations in [𝔖𝑚𝑛+1 ] which avoid both [𝜄𝑚+2 ] and
[𝛿𝑛+2 ].
Proof. To prove the first statement we can assume, by rotating 𝜋 if necessary, that
𝜎 = 𝜎1 , 𝜎2 , . . . , 𝜎𝑚𝑛+1 , 𝑚𝑛 + 2.
So 𝜎0 = 𝜎1 𝜎2 . . . 𝜎𝑚𝑛+1 ∈ 𝔖𝑚𝑛+1 and, by Theorem 2.0.1, contains a copy 𝜅 of either 𝜄𝑚+1 or 𝛿𝑛+1 .
In the first case, the concatenation 𝜅, 𝑚𝑛 + 1 is a copy of [𝜄𝑚+2 ] in [𝜋]. In the second case, we have
that 𝑚𝑛 + 1, 𝜅 is a copy of [𝛿𝑛+2 ] in [𝜎].
To prove the second statement, consider the concatenation
𝜎 = 1, 𝛿𝑛,𝑚,2 , 𝛿𝑛,𝑚,3 , . . . , 𝛿𝑛,𝑚,𝑚+1 .
13
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figure 3.1: The graph of [𝜎] when 𝑚 = 5 and 𝑛 = 3
For example, when 𝑚 = 5 and 𝑛 = 3, then
[𝜎] = [1, 12, 7, 2, 13, 8, 3, 14, 9, 4, 15, 10, 5, 16, 11, 6]
whose graph is shown in Figure 3.1. Define 𝜎0 by 𝜎 = 1𝜎0 and note that 𝜎0 can be written either
as a disjoint union of 𝑚 decreasing subsequences of length 𝑛, or of 𝑛 increasing subsequences
of length 𝑚. In a linear permutation, any increasing subsequence can intersect any decreasing
subsequence at most once. So any increasing subsequence of 𝜎0 has length at most 𝑚, and any
decreasing subsequence has length at most 𝑛. Now let [𝜋] be a subsequence of [𝜎]. We consider
two cases.
Suppose first that [𝜋] contains 1. If [𝜋] is increasing then rotate, if necessary, until 𝜋 = 1𝜋0 for
some 𝜋0 which is a subsequence of 𝜎0. But from the previous paragraph, |𝜋0 | ≤ 𝑚 which implies
|𝜋| ≤ 𝑚 + 1 as desired. If [𝜋] is decreasing then we pick a representative 𝜋 = 𝜋01 and proceed as
in the increasing case to get |𝜋| ≤ 𝑛 + 1.
Now consider the possibility that [𝜋] does not contain 1. Again, we start with the subcase when
14
[𝜋] is increasing. Suppose, for simplicity, that 𝜋 contains an element of 𝑥 ∈ 𝛿𝑛,𝑚,2 as the proof
will be similar for the other deltas. As before, 𝜋 can contain at most one element of each of 𝛿𝑛,𝑚,3
through 𝛿𝑛,𝑚,𝑚+1 . Now [𝜋] can wrap around and pick up other elements. But those elements must
come before 𝑥. And since 𝛿𝑛,𝑚,2 is decreasing, at most one other element can be added in this way.
It follows that |𝜋| ≤ 𝑚 + 1. On the other hand, if [𝜋] is decreasing then the proof is similar. The
only difference is that if one attempts to pick up elements of 𝛿𝑛,𝑚,2 before 𝑥 then this is impossible
since such elements are larger than 𝑥 and [𝜋] is decreasing. So |𝜋| ≤ 𝑛 which is an even tighter
bound. This completes the demonstration of the theorem.
We’ll see the advantages that Theorem 3.1.1 brings in our study of length four pattern avoidance,
specifically, when considering # Av𝑛 (𝑆) where 𝑆 contains [1234] and [1432].
3.2 Pattern avoidance of doubletons
In this section we will enumerate Av𝑛 [Π] for all [Π] ⊂ [𝔖4 ] with #[Π] = 2. Any cyclic Wilf
equivalences stated without proof are trivial.
Let us first dispose of the simplest singleton avoidance classes where [𝜋] ∈ [𝔖𝑘 ] for 𝑘 < 4. In
[𝔖2 ] there is only one cyclic permutation [12] and it is easy to see that every [𝜎] of length at least
2 contains it. In [𝔖3 ] there are only the patterns [123] and [321], and these are only avoided by
[𝛿𝑛 ] and [𝜄𝑛 ], respectively.
Callan [Cal02] enumerated Av𝑛 [𝜋] for any given [𝜋] ∈ [𝔖4 ]. Recall the version of the
Fibonacci numbers defined by 𝐹1 = 𝐹2 = 1 and 𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 for 𝑛 ≥ 3. Unlike the case of
linear permutations in 𝔖3 , there are no nontrivial Wilf equivalences.
Theorem 3.2.1 ([Cal02]). For 𝑛 ≥ 2 we have
𝑛
#𝐴𝑣 𝑛 [1234] = #𝐴𝑣 𝑛 [1432] = 2𝑛 + 1 − 2𝑛 − ,
3
#𝐴𝑣 𝑛 [1243] = #𝐴𝑣 𝑛 [1342] = 2𝑛−1 − 𝑛 + 1,
#𝐴𝑣 𝑛 [1324] = #𝐴𝑣 𝑛 [1423] = 𝐹2𝑛−3 .
15
In presenting the enumerations for doubletons, we make the following conventions to facilitate
locating a given result. All cyclic patterns will be listed starting with 1. And all sets of cyclic
patterns will be given in lexicographic order. We will also use terms like “just before” or “just
after” in [𝜎] to refer the left-to-right order on the cylinder of a cyclic permutation in the form of
Figure 2.1. For example, in [𝜎] = [42351] the 5 comes just before 1 and the 4 just after. We
also say that an element 𝑥 is between 𝑦 and 𝑧 if it is in the subsequence of [𝜎] traversed going
left-to-right around the cylinder from 𝑦 to 𝑧. Continuing our example, between 2 and 5 we have 3,
while between 5 and 2 we have 1 and 4.
One of our tools will be generating trees. To the best of our knowledge, these trees were
introduced by Chung, Grahamm, Hoggatt, and Kleiman [CGHK78] for studying Baxter permu-
tations. Since then, they have become an integral technique in the theory of pattern avoid-
ance [BBMD+ 02, BM03, Kre00, Wes95, Wes96]. The generating tree for an avoidance class
Av[Π], denoted 𝑇 [Π], has as its root the permutation [12]. The children of any [𝜎] ∈ Av𝑛 [Π] are
all the [𝜎0] ∈ Av𝑛+1 [Π] which can be formed by inserting 𝑛 + 1 into one of the spaces of [𝜎]. A
space, also called a site, where insertion of 𝑛 + 1 produces a permutation of the avoidance class is
called active while the other spaces are inactive. A useful observation is that if a space is inactive
it must be because inserting 𝑛 + 1 there results in copy of a forbidden pattern [𝜋] where 𝑛 + 1 plays
the role of the largest element of 𝜋. Once we have picked a representative 𝜎 = 𝜎1 𝜎2 . . . 𝜎𝑛 for [𝜎]
we will label the spaces as 1, 2, . . . , 𝑛 left to right where space 𝑖 comes between 𝜎𝑖 and 𝜎𝑖+1 . The
nodes for Av𝑛 [Π] will be said to be at level 𝑛 in 𝑇 [Π]. We call the number of children of a vertex
its degree which is denoted deg[𝜎]. Given 𝑑 ∈ N, suppose that every cyclic permutation with
deg[𝜎] = 𝑑 has children of degrees 𝑐 1 , 𝑐 2 , . . . , 𝑐 𝑑 . Then this is denoted by the production rule
(𝑑) → (𝑐 1 )(𝑐 2 ) . . . (𝑐 𝑑 ).
There may be other nodes having some special characteristic 𝑋 which always produces nodes
having characteristics 𝑌1 , 𝑌2 , . . . , 𝑌𝑑 which correspond to a production rule
(𝑋) → (𝑌1 )(𝑌2 ) . . . (𝑌𝑑 ).
16
In particular, the characteristic of being the root of the tree is denote in a production rule by (∗).
We can also have production rules which mix numbers for degrees and letters for characteristics.
If 𝑇 [𝜋] can be characterized by production rules, these can often be used to calculate # Av𝑛 [Π].
Theorem 3.2.2. We have
{[1234], [1243]} ≡ {[1234], [1342]} ≡ {[1243], [1432]} ≡ {[1342], [1432]}.
And for 𝑛 ≥ 3
# Av𝑛 ([1234], [1342]) = 2(𝑛 − 2).
Proof. We claim that 𝑇 = 𝑇 ([1234], [1342]) has the following production rules
(∗) → (2)(2),
(1) → (1),
(2) → (1)(2).
Once these are proven then the enumeration follows easily since one can inductively show that, for
𝑛 ≥ 3, level 𝑛 consists of two nodes of degree 2 and 2(𝑛 − 3) nodes of degree 1.
It is easy to check the production rule at levels 𝑛 = 2 and 3, so we assume that 𝑛 ≥ 4 and also
that [𝜎] ∈ Av𝑛 ([1234], [1342]). First of all, note that the site before 𝑛 is always active. If it were
not then the result [𝜎0] of inserting 𝑛 + 1 would have a copy 𝜅 of one of the patterns containing
𝑛 + 1. But 𝑛 can not be in 𝜅 since neither of the patterns have 4 followed immediately in the cycle
by 3. So replacing 𝑛 + 1 by 𝑛 in 𝜅 would give a forbidden pattern in [𝜎] which is a contradiction.
Thus every [𝜎] at has at least one child. Also 𝜎 has at most two children. For suppose
𝜎0 = 𝑛 + 1, 𝜌, 𝑛, 𝜏
is the result of inserting 𝑛 + 1 in 𝜎. It follows that |𝜌| ≤ 1 since if 𝜌 ≥ 2 then [𝜎0] has a copy
of either [4123] or [4213]. Thus 𝑛 + 1 must be inserted either directly before 𝑛 or two elements
before 𝑛.
17
Now consider
𝛿 = 𝑛, 𝑛 − 1, . . . , 3, 2, 1, and 𝜖 = 𝑛, 𝑛 − 1, . . . , 3, 1, 2. (3.1)
It is easy to check that both sites 𝑛 and 𝑛 − 1 are active in these permutations and so both have
degree 2. It is also obvious that if one inserts 𝑛 + 1 in site 𝑛 in either permutation then one gets
another permutation of the same form.
From what we have done, we can finish the proof if we show that deg[𝜎] = 2 implies [𝜎] = [𝛿]
or [𝜎] = [𝜖]. Write
𝜎 = 𝑛𝜌𝑚
where 𝑚 is the last element of 𝜎 and 𝜌 is everything between 𝑛 and 𝑚. Since deg[𝜎] = 2, site 𝑛 − 1
is active and inserting 𝑛 + 1 there yields
𝜎0 = 𝑛, 𝜌, 𝑛 + 1, 𝑚.
Then 𝑚 ≤ 2 since otherwise [𝜎] contains a copy of [4123] or [4213] since 𝑛 ≥ 4. In the case 𝑚 = 1
we must have 𝜌 decreasing. For if there is an ascent 𝑥 < 𝑦 in 𝜌 then [𝜎0] contains [𝑥, 𝑦, 𝑛 + 1, 1]
which is a copy of [2341], a contradiction. So in this case 𝜌 is decreasing and 𝜎 = 𝛿. The other
possibility is that 𝑚 = 2. This forces the last element of 𝜌 to be 1. For if 1 is elsewhere and 𝑥 is
the last element of 𝜌 then then [𝜎0] contains [1, 𝑥, 𝑛 + 1, 2] which is contradictory copy of [1342].
Similarly to the first case, one can now show that 𝜌 is decreasing and so 𝜎 = 𝜖 as desired.
Comparing our next result with the previous one will provide our first nontrivial Wilf equiva-
lence.
Theorem 3.2.3. We have
{[1234], [1324]} ≡ {[1423], [1432]}.
And for 𝑛 ≥ 3
# Av𝑛 ([1234], [1324]) = 2(𝑛 − 2).
18
Proof. Let 𝐷 stand for the decreasing permutation and 𝐸 for the decreasing permutation with its
largest two elements swapped. We consider the root [12] to be of type 𝐷. We will show that
𝑇 = 𝑇 ( [1234], [1324]) has production rules
(1) → (1),
(𝐷) → (𝐷)(𝐸),
(𝐸) → (1)(1).
It follows by induction that level 𝑛 ≥ 3 of 𝑇 has a 𝐷, an 𝐸, and 2(𝑛 − 3) nodes of degree one,
proving the theorem.
The same demonstration as in the previous theorem shows that the site before 𝑛 in any [𝜎] ∈
Av𝑛 ( [1234], [1324]) is active. So again, every such permutation has at least one child. Also, every
[𝜎] has at most two children. Indeed, write
𝜎 = 1𝜎2 . . . 𝜎𝑛 (3.2)
and put 𝑛 + 1 in site 𝑖 ≥ 3. Then 1, 𝜎2 , 𝜎3 , 𝑛 + 1 is a copy of either 1234 or 1324, another
contradiction.
Now consider permutations corresponding to 𝐷 and 𝐸 at level 𝑛
𝛿 = 1, 𝑛, 𝑛 − 1, 𝑛 − 2, 𝑛 − 3, . . . , 2 and 𝜖 = 1, 𝑛 − 1, 𝑛, 𝑛 − 2, 𝑛 − 3, . . . , 2. (3.3)
It is easy to check that both sites 1 and 2 are active in 𝛿, 𝜖. So, by the previous paragraph, they both
have degree 2. Furthermore, the two children of 𝛿 have the form 𝐷 and 𝐸.
We will be done if we can show that [𝜎] having two children implies [𝜎] = [𝛿] or [𝜖]. Write
𝜎 as in (3.2). Since the active sites must be 1 and 2, and the site before 𝑛 must be active, either
𝜎2 = 𝑛 or 𝜎3 = 𝑛. If 𝜎2 = 𝑛 and there is an ascent 𝑥 < 𝑦 in the rest of the permutation, then after
inserting 𝑛 + 1 in position 2 we have [𝑥, 𝑦, 𝑛, 𝑛 + 1] which is a copy of [1234], a contradiction. So in
this case [𝜎] = [𝛿]. Alternatively, suppose 𝜎3 = 𝑛. This forces 𝜎2 = 𝑛 − 1, since if 𝜎2 = 𝑥 < 𝑛 − 1
then 𝑛 − 1 comes after 𝑛. But inserting 𝑛 + 1 in position 1 gives [𝑥, 𝑛, 𝑛 − 1, 𝑛 + 1] which is a copy
19
of [1324]. And similarly to the first case we see that the rest of 𝜎 is decreasing. The result is that
[𝜎] = [𝜖]. This completes the proof.
Theorem 3.2.4. We have
{[1234], [1423]} ≡ {[1324], [1432]}.
And for 𝑛 ≥ 1
𝑛−1
# Av𝑛 ([1234], [1423]) = 1 + .
2
Proof. Suppose [𝜎] ∈ Av𝑛 ([1234], [1423]) and write
𝜎 = 1𝜌𝑛𝜏 (3.4)
where 𝜌 and 𝜏 are the subsequences between 1 and 𝑛, and between 𝑛 and 1, respectively. Now
𝜌 and 𝜏 must be decreasing since [𝜎] avoids [1234] and [1423], respectively. Furthermore, 𝜌
must consist of consecutive integers since, if not, then we have 𝑥 < 𝑦 < 𝑧 such that 1𝑧𝑥𝑛𝑦 is a
subsequence of 𝜎. So [𝑥𝑛𝑦𝑧] is a copy of [1423] in [𝜎], which is a contradiction. Conversely,
it is easy to check that if 𝜎 has the form (3.4) with 𝜌 and 𝜏 decreasing and 𝜌 consecutive then
[𝜎] ∈ Av𝑛 ( [1234], [1423]). So we have characterized the elements of this class.
To finish the enumeration, if 𝜌 = ∅ there is one corresponding 𝜎. But if 𝜌 ≠ ∅ then choosing
the smallest and largest element of 𝜌 from the elements 2, 3, . . . , 𝑛 − 1 completely determines 𝜎.
Since these two elements could be equal, we are choosing 2 elements from 𝑛 − 2 elements with
repetition which is counted by 𝑛−1 2 .
The following result follows immediately from Theorem 3.1.1
Theorem 3.2.5. We have
# Av𝑛 ([1234], [1432]) = 0
for 𝑛 ≥ 6.
We now have, by comparison with Theorem 3.2.4, another nontrivial Wilf equivalence.
20
Theorem 3.2.6. We have
{[1243], [1324]} ≡ {[1243], [1423]} ≡ {[1324], [1342]} ≡ {[1342], [1423]}.
And for 𝑛 ≥ 1
𝑛−1
# Av𝑛 ([1324], [1342]) = 1 + .
2
Proof. Take [𝜎] ∈ Av𝑛 ([1324], [1342]) and write 𝜎 as in (3.4). Then 𝜌 is increasing since [𝜎]
avoids [1324]. And every element of 𝜌 is smaller than every element of 𝜏 since [𝜎] avoids [1342].
To avoid a copy of one of the forbidden patterns containing the 1 of 𝜎 we must have that 𝜏 avoids
213 and 231. And to avoid a copy of [1324] where 𝑛 plays the role of 4, it must be that 𝜏 avoids
132. The 𝜏 which avoid these three pattern are exactly those which are inflations of the form
𝜏 = 21h𝛿 𝑘 , 𝜄𝑙 i for some 𝑘, 𝑙 ≥ 0 (see the chart on page 2773 of [DDJ+ 12]). Absorbing the 1 and 𝑛
of 𝜎 into 𝜌 and 𝜏, respectively, we see that
𝜎 = 132h𝜄 𝑗 , 𝛿 𝑘 , 𝜄𝑙 i (3.5)
where 𝑗, 𝑘 ≥ 1 and 𝑙 ≥ 0. Again, it is not hard to check that for every 𝜎 of this form we have
[𝜎] ∈ Av𝑛 ( [1324], [1342]).
To enumerate these 𝜎, we distinguish two cases. If 𝑙 ≥ 2 then picking the smallest and
largest elements of the copy of 𝜄𝑙 from 2, 3, . . . , 𝑛 − 1 completely determines 𝜎 . So in this case
there are 𝑛−2 2 choices. If 𝑙 ≤ 1 then the copy of 𝜄𝑙 can be appended to the copy of 𝛿 𝑘 so that
𝜎 = 12[𝜄 𝑗 , 𝛿𝑛− 𝑗 ]. Since we must have 1 and 𝑛 in the ascending and decreasing subsequences, there
are now 𝑛 − 1 choices. Adding the two counts given the desired result.
Theorem 3.2.7. For 𝑛 ≥ 4 we have
# Av𝑛 ([1243], [1342]) = 4.
Proof. Take [𝜎] ∈ Av𝑛 ([1243], [1342]) and write 𝜎 as in (3.4). Then 𝜌 and 𝜏 can not both be
nonempty. For if 𝑥 ∈ 𝜌 and 𝑦 ∈ 𝜏 then 1𝑥𝑛𝑦 is a copy of either 1243 or 1342.
21
Assume first that 𝜌 = ∅ so that
𝜎 = 1𝑛𝜏. (3.6)
Then 𝜏 must be increasing or decreasing. For suppose it was neither. Then it would contain a copy
of one of the patterns 132, 231, 213, or 312. In the first two cases this would give, together with
the 1, a copy of 1243 or 1342 in 𝜎. And in the last two cases, prepending 𝑛 gives a copy of 4213
or 4312. Conversely, if 𝜎 is given by (3.6) with 𝜏 increasing or decreasing then it is easy to verify
that [𝜎] ∈ Av𝑛 ([1243], [1342]).
Using the same ideas, one can also show that if 𝜏 = ∅ then one gets exactly two elements of
Av𝑛 ( [1243], [1342]), of the form 𝜎 = 1𝜌𝑛 where 𝜌 is either increasing or decreasing. Thus there
are a total of four elements in the avoidance class.
Theorem 3.2.8. For 𝑛 ≥ 3 we have
# Av𝑛 ([1324], [1423]) = 2𝑛−2 .
Proof. Take [𝜎] ∈ Av𝑛 ([1324], [1423]) and write
𝜎 = 𝑛, 𝜌, 𝑛 − 1, 𝜏.
Similar to the previous proof, one of 𝜌 or 𝜏 must be empty since otherwise 4132 or 4231 is a
pattern in 𝜎. If 𝜌 = ∅ then one shows similarly that 𝑛 − 2 either begins or ends 𝜏. Continuing in
this manner, we see that there are 2 choices for the positions of 𝑛 − 1, 𝑛 − 2, . . . , 2. Checking, as
usual, that all such permutations are actually in the avoidance set, the enumeration follows.
This fully characterizes all non-trivial Wilf equivalences for all length four doubletons.
3.3 Three or more patterns
We will now compute # Av𝑛 [Π] for Π ⊆ 𝔖𝑛 having #Π ≥ 3. We will not consider those
[Π] containing both [1234] and [1432] since for such classes # Av𝑛 [Π] = 0 for 𝑛 ≥ 6 as in
Theorem 3.2.5.
22
Theorem 3.3.1. We have
{[1234], [1243], [1324]} ≡ {[1234], [1324], [1342]} ≡ {[1243], [1423], [1432]}
≡ {[1342], [1423], [1432]}.
And for 𝑛 ≥ 4
# Av𝑛 ([1234], [1324], [1342]) = 3.
Proof. If [𝜎] ∈ Av𝑛 ([1234], [1324], [1342]) then [𝜎] avoids [1324] and [1342]. So, by the
proof of Theorem 3.2.6, we can write 𝜎 in the form (3.5) for 𝑗, 𝑘, 𝑙 ≥ 1. But since [𝜎] also avoids
[1234] we must have 𝑗 + 𝑙 ≤ 3. For the same reason, 𝑗 ≤ 2 since if 𝑗 = 3 then the copy of 𝜄3 and
one element of the copy of 𝛿 𝑘 would form a [1234]. Thus the only possibilities are ( 𝑗, 𝑙) = (1, 1),
(1, 2), or (2, 1) which proves the result.
Theorem 3.3.2. We have
{[1234], [1243], [1342]} ≡ {[1243], [1342], [1432]}.
And for 𝑛 ≥ 5
# Av𝑛 ([1234], [1243], [1342]) = 2.
Proof. If [𝜎] ∈ Av𝑛 ([1234], [1243], [1342]) then [𝜎] avoids [1243] and [1342]. So, by the proof
of Theorem 3.2.7, we can write
𝜎 = 𝑥𝑦𝜌 (3.7)
where {𝑥, 𝑦} = {1, 𝑛} and 𝜌 is either increasing or decreasing. Since 𝑛 ≥ 5 we have |𝜌| ≥ 3.
But [𝜎] also avoides [1234] and this forces 𝜌 to be decreasing. So there are two choices for [𝜎]
depending on the values of 𝑥 and 𝑦.
Theorem 3.3.3. We have
{[1234], [1243], [1423]} ≡ {[1234], [1342], [1423]} ≡ {[1243], [1324], [1432]}
≡ {[1324], [1342], [1432]}.
23
And for 𝑛 ≥ 2
# Av𝑛 ([1234], [1342], [1423]) = 𝑛 − 1.
Proof. We will show that 𝑇 = 𝑇 ([1234], [1342], [1423]) has production rules
(∗) → (1)(2),
(1) → (1),
(2) → (1)(2).
Then, by induction, level 𝑛 ≥ 2 of 𝑇 will contain one node of degree 2 and 𝑛 − 2 nodes of degree
1. Checking the root is easy, so assume 𝑛 ≥ 3.
By Theorem 3.2.2, 𝑇 is a subtree of 𝑇 ([1234], [1342]). So we just need to check which nodes
of that tree also avoid [1423]. As in the proof of that theorem, the site before 𝑛 in [𝜎] at level 𝑛 in
𝑇 is still active since 4 is not followed immediately by 3 in [1423]. Thus it suffices to show that
both sites of 𝛿 remain active, but only one in 𝜖 where 𝛿, 𝜖 are defined by (3.1). Indeed, the two sites
of 𝛿 give rise to copies of 𝛿 and 𝜖 at level 𝑛 + 1 of 𝑇. But site 𝑛 − 1 of delta which was active in the
larger tree is now inactive since inserting 𝑛 + 1 there gives the copy [1, 𝑛 + 1, 2, 𝑛] of [1423]. This
completes the proof.
We now have, in comparison with the previous theorem, a nontrivial Wilf equivalence.
Theorem 3.3.4. We have
{[1234], [1324], [1423]} ≡ {[1324], [1423], [1432]}.
And for 𝑛 ≥ 2
# Av𝑛 ([1234], [1324], [1423]) = 𝑛 − 1.
Proof. It suffices to show that 𝑇 = 𝑇 ([1234], [1324], [1423]) satisfies the same production rules
as in the previous theorem. Now 𝑇 is a subtree of 𝑇 ([1234], [1324]) which was constructed in the
proof of Theorem 3.2.3. And we see in the usual way that the site before 𝑛 in any [𝜎] remains
active in 𝑇 because 4 is not immediately followed by 3 in [1423].
24
So it suffices to show, with 𝛿 and 𝜖 as in (3.3), that site 1 remains active in 𝛿, but not in 𝜖.
Indeed, inserting 𝑛 + 1 in this site of 𝛿 just produces another descending sequence. But in 𝜖 such a
placement gives the copy [1, 𝑛 + 1, 𝑛 − 1, 𝑛] of [1423].
We now have another nontrivial Wilf equivalence with Theorem 3.3.1.
Theorem 3.3.5. We have
{[1243], [1324], [1342]} ≡ {[1243], [1342], [1423]}.
And for 𝑛 ≥ 4
# Av𝑛 ([1243], [1324], [1342]) = 3.
Proof. By Theorem 3.2.7, we just need to show that exactly 3 of the 4 permutations [𝜎] avoiding
{[1243], [1342]} also avoid [1324]. These permutations are described in equation (3.7). If 𝑥 = 𝑛
and 𝑦 = 1 then [𝜎] contains the copy [𝑛132] of this pattern. It is also easy to check that the other
three avoid it.
We now have our last nontrivial Wilf equivalence for triples.
Theorem 3.3.6. We have
{[1243], [1324], [1423]} ≡ {[1324], [1342], [1423]}.
And for 𝑛 ≥ 2
# Av𝑛 ([1324], [1342], [1423]) = 𝑛 − 1.
Proof. Comparing the description of Av𝑛 ([1324], [1342]) in the proof of Theorem 3.2.6 and that
of Av𝑛 ( [1324], [1423]) in the proof of Theorem 3.2.8, we see that any
[𝜎] ∈ Av𝑛 ( [1324], [1342], [1423]) can be put in the form
𝜎 = 21[𝛿 𝑘 , 𝜄𝑛−𝑘 ]
with 𝑘 ≥ 1. Also, 𝑘 = 𝑛 − 1 and 𝑘 = 𝑛 yield the same permutation. So there are 𝑛 − 1 choices for
𝑘 and we are done.
25
[Π] # Av𝑛 [Π]
{[1234], [1243], [1324], [1342]} 1
{[1243], [1342], [1423], [1432]}
{[1234], [1243], [1324], [1423]} 2
{[1234], [1243], [1342], [1423]}
{[1234], [1324], [1342], [1423]}
{[1243], [1324], [1342], [1423]}
{[1243], [1324], [1342], [1432]}
{[1243], [1324], [1423], [1432]}
{[1324], [1342], [1423], [1432]}
{[1234], [1243], [1324], [1342], [1423]} 1
{[1243], [1324], [1342], [1423], [1432]}
Table 3.1: Wilf equivalence classes and cardinalities of Av𝑛 [Π] for certain [Π] and 𝑛 ≥ 5
When #[Π] ≥ 4 where [Π] ⊂ [𝔖4 ], the size of Av𝑛 [Π] becomes constant for 𝑛 ≥ 5. And this
size is trivial to calculate for 𝑛 ≤ 4. Furthermore, the description of the surviving permutations
for large 𝑛 is easy to obtain given our previous proofs. So we content ourselves with a listing of
the equivalence classes and associated constants in Table 3.1. Classes are separated by double
horizontal line. As usual, we do not consider classes containing both the increasing and decreasing
permutations because of the cyclic Erdős-Szekeres Theorem.
3.4 Cyclic descent generating functions
We will now consider the generating function for the number of cyclic descents over various
avoidance classes [Π] ⊂ [𝔖4 ], starting with those defined by a single element. We will sometimes
use the characterizations given by Callan [Cal02] for these classes to facilitate our work, and use
the abbreviation
𝑞 cdes 𝜎
Õ
𝐷 𝑛 ([Π]) = 𝐷 𝑛 ([Π]; 𝑞) =
𝜎∈Av𝑛 [Π]
for the generating function.
To begin, we have a lemma showing that trivial Wilf equivalences also give simple relationships
between the corresponding generating functions.
26
Lemma 3.4.1. For any [Π], we have
𝐷 𝑛 ([Π] 𝑟 ; 𝑞) = 𝐷 𝑛 ([Π] 𝑐 ; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([𝜋]; 1/𝑞)
and
𝐷 𝑛 ([Π] 𝑟𝑐 ; 𝑞) = 𝐷 𝑛 ([Π]; 𝑞).
Proof. Reversing or complementing a permutation turns all cyclic descents into cyclic ascents and
vice-versa. Translating this into generating functions gives the first displayed equalities. And the
second displayed equation follows from the the previous display.
Now consider the possible 𝐷 𝑛 ([𝜋]) for [𝜋] ∈ [𝔖4 ]. We begin with the simplest case.
Theorem 3.4.2. We have 𝐷 𝑛 ([1423]; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([1324]; 1/𝑞) where, for 𝑛 ≥ 2,
𝑛+𝑘 −3 𝑘
𝑛−1
Õ
𝐷 𝑛 ([1324]; 𝑞) = 𝑞 .
𝑛−𝑘 −1
𝑘=1
Proof. We use Callan’s characterization of this avoidance class to obtain a recursion for 𝐷 𝑛 ([1324]).
If [𝜎] ∈ Av𝑛 ([1324]) and 𝑛 ≥ 3 then write 𝜎 = 𝜎1 𝜎2 . . . 𝜎𝑛−1 𝑛. Let 𝑘 be the index such that
𝜎𝑘 = 𝑛 − 1. There are two cases.
If 𝑘 = 𝑛 − 1 then 𝜎 = 𝜏, 𝑛 − 1, 𝑛 where [𝜏, 𝑛 − 1] ∈ Av𝑛−1 ([1324]) and this is a bijection. Since
cdes[𝜎] = cdes[𝜏, 𝑛 − 1], this case contributes 𝐷 𝑛−1 ([1324]) to the recursion.
If 1 ≤ 𝑘 ≤ 𝑛 − 2 then this forces
𝜎 = 2314[𝜄 𝑘−1 , 1, 𝜏, 1]
for some 𝜏 such that [𝜏𝑛] avoids [1324]. Because of the extra descent caused by 𝑛 − 1 we have
cdes[𝜎] = 1 + cdes[𝜏𝑛]. So this case gives a contribution of 𝑛−2
Í
𝑘=1 𝑞𝐷 𝑛−𝑘 ([1324]).
Putting everything together, we have
𝑛−2
Õ
𝐷 𝑛 ([1324]) = 𝐷 𝑛−1 ([1324]) + 𝑞𝐷 𝑛−𝑘 ([1324]).
𝑘=1
for 𝑛 ≥ 3 and 𝐷 2 ([1324]) = 𝑞. It is now a simple manner of manipulating binomial coefficients to
show that the formula given in the theorem satisfies this initial value problem.
27
For the next case, we will use a characterization of the class different from the one found by
Callan. This will permit us to avoid the use of a recurrence.
Lemma 3.4.3. Suppose [𝜎] ∈ [𝔖𝑛 ] and write 𝜎 = 1𝜌𝑛𝜏. We have [𝜎] ∈ Av𝑛 ([1342]) if and only
if the following three conditions are satisfied:
(a) 𝜌 and 𝜏 both avoid {213, 231},
(b) max 𝜌 < min 𝜏,
(c) there is not both a descent in 𝜌 and an ascent in 𝜏.
Proof. For the forward direction, suppose [𝜎] ∈ Av𝑛 ([1342]). Condition (a) is true since if either
𝜌 or 𝜏 contains 213 then, together with 𝑛, we have that [𝜎] contains [2134]. Similarly, if either
contains 231 then [𝜎] contains the forbidden pattern by prepending the 1. As far as (b), if there is
𝑦 > 𝑥 with 𝑦 ∈ 𝜌 and 𝑥 ∈ 𝜏 then [1𝑦𝑛𝑥] is a copy of [1342]. Finally for (c), if there were a descent
in 𝜌 and an ascent in 𝜏 then, because of (b), putting them together would again give a copy of the
pattern to avoid.
The converse is similar where one assumes that a copy of [1342] exists and then considers all
the different intersections it could have with 1, 𝜌, 𝑛, and 𝜏. We leave the details to the reader.
In order to use this lemma, we will need a result about the ordinary descent statistic on linear
permutations avoiding {123, 231}. The next result is a specialization of Proposition 5.2 of the
paper of Dokos, Dwyer, Johnson, Sagan, and Selsor [DDJ+ 12] and so the proof is ommited.
Lemma 3.4.4 ([DDJ+ 12]). We have
𝑞 des 𝜎 = (1 + 𝑞) 𝑛−1 .
Õ
𝜎∈Av𝑛 (213,231)
We need one last well-known definition. Call a polynomial 𝑓 (𝑞) = of degree 𝑛
Í𝑛 𝑘
𝑘=0 𝑎 𝑘 𝑞
symmetric if 𝑎 𝑘 = 𝑎 𝑛−𝑘 for all 0 ≤ 𝑘 ≤ 𝑛. Note that 𝑓 (𝑞) of degree 𝑛 is symmetric if and only if
𝑞 𝑛 𝑓 (1/𝑞) = 𝑓 (𝑞). (3.8)
28
Theorem 3.4.5. We have 𝐷 𝑛 ([1243]; 𝑞) = 𝐷 𝑛 ([1342]; 𝑞) where, for 𝑛 ≥ 2,
1 − 𝑞 𝑛−1
𝐷 𝑛 ([1342]; 𝑞) = 2𝑞(1 + 𝑞) 𝑛−2 − 𝑞 ·
1−𝑞
is symmetric.
Proof. It is easy to prove from the explicit form of 𝐷 𝑛 ([1342]) that it satisfies equation (3.8) and
so is symmetric. So once this is proved, the equality of the two generating functions follows from
Lemma 3.4.1.
We adopt the notation of Lemma 3.4.3 and let 𝜎𝑘 = 𝑛 where 2 ≤ 𝑘 ≤ 𝑛. We will consider cases
depending on whether 𝜌 or 𝜏 is empty. If 𝜌 = ∅ then by Lemma 3.4.3 (a) and Lemma 3.4.4 we have
that the generating function for the possible linear 𝜏 is (1 + 𝑞) 𝑛−3 . Also, cdes[𝜎] = 2 + des 𝜏 by the
form of 𝜎, so the contribution of such [𝜎] to 𝐷 𝑛 ([1342]) is 𝑞 2 (1 + 𝑞) 𝑛−3 . In an analogous way,
we see that those [𝜎] with 𝜏 = ∅ yield 𝑞(1 + 𝑞) 𝑛−3 . Adding these, we have a total of 𝑞(1 + 𝑞) 𝑛−2
so far.
We now assume that 𝜌, 𝜏 are both nonempty so that 3 ≤ 𝑘 ≤ 𝑛 − 1. By parts (b) and (c)
of Lemma 3.4.3, either 𝜌 must be an increasing subsequence of consecutive integers or 𝜏 must
be a decreasing one. Using Lemma 3.4.4 again, we see that in the first subcase a contribution
of 𝑞 2 (1 + 𝑞) 𝑛−𝑘−1 is obtained. And in the second, taking into account the descents in 𝜌, the
contribution is 𝑞 𝑛−𝑘+1 (1 + 𝑞) 𝑘−3 . However, these two subcases overlap when 𝜌 is increasing and
𝜏 is decreasing. So we must subtract 𝑞 𝑛−𝑘+1 .
Thus we get a grand total of
𝑛−1
[𝑞 2 (1 + 𝑞) 𝑛−𝑘−1 + 𝑞 𝑛−𝑘+1 (1 + 𝑞) 𝑘−3 − 𝑞 𝑛−𝑘+1 ].
Õ
𝐷 𝑛 ([1342]) = 𝑞(1 + 𝑞) 𝑛−2 +
𝑘=3
Summing the geometric series and simplifying completes the proof.
For the avoidance class of the increasing (or decreasing) pattern in [𝔖4 ], we will need another
concept. Given sequences 𝜌 and 𝜏 of distinct integers, their shuffle set is
𝜌 𝜏 = {𝜎 : |𝜎| = |𝜌| + |𝜏| and both 𝜌, 𝜏 are subsequences of 𝜎}.
29
For example,
12 34 = {1234, 1324, 1342, 3124, 3142, 3412}.
In the statement of the next result we make the usual convention that 𝑛𝑘 = 0 if 𝑘 > 𝑛.
Theorem 3.4.6. We have 𝐷 𝑛 ([1234]; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([1432]; 1/𝑞) where, for 𝑛 ≥ 2,
Õ
− 𝑛)𝑞 2
𝑛
𝐷 𝑛 ([1432]; 𝑞) = 𝑞 + (2𝑛−1 + 𝑞𝑗.
2𝑗 − 1
𝑗 ≥3
Proof. We use Callan’s description of the avoidance for [1234] translated by complementation to
apply to [1432]. We are going to derive a recursion for 𝐷 𝑛 ([1432]; 𝑞). If [𝜎] ∈ 𝔖𝑛 [1432] then
suppose 𝜎𝑛 = 1 and 𝜎𝑘 = 2 for some 1 ≤ 𝑘 ≤ 𝑛 − 1. There are three cases.
If 𝑘 = 1 then there is a bijection between such [𝜎] and Av𝑛−1 [1432] obtained by removing
1 and taking the order isomorphic cyclic permutation on [𝑛 − 1]. Since 2 immediately follows 1
cyclically in [𝜎], the descent into 1 remains a descent after applying the map. So the contribution
of this case is 𝐷 𝑛−1 ([1432]; 𝑞).
Now suppose that 2 ≤ 𝑘 ≤ 𝑛 − 1 and write
𝜎 = 𝜌2𝜏1.
where |𝜌| = 𝑘 − 1, |𝜏| = 𝑛 − 𝑘 − 1. As Callan proves, 𝜌 must be increasing. So there are two
more cases depending upon whether the elements of 𝜌 are consecutive or not. Suppose first that
they are not consecutive. In this case, 𝜏 must also be increasing so cdes[𝜎] = 2. To compute
the number of such 𝜎, note that once the elements of 𝜌 have been picked from [3, 𝑛], all of 𝜎 is
determined. The total number of nonempty subsets of this interval is 2𝑛−2 − 1. And those which
consist of consecutive integers are determined by their minimum and maximum element, which
could be equal. So there are 𝑛−1 2 subsets to exclude. The contribuion of this case is then
𝑛−1
2𝑛−2 − − 1 𝑞2 .
2
Finally we consider the case when 𝜌 ≠ ∅ is consecutive (and still increasing), say with minimum
𝑚 + 1 and maximum 𝑀 − 1. Note that if 𝑙 = |𝜏| then 0 ≤ 𝑙 ≤ 𝑛 − 3. Callan shows that the possible 𝜏
30
are the elements of (34 . . . 𝑚) (𝑀, 𝑀 + 1, . . . , 𝑛). Since a permutation can be written as a shuffle
in many ways, the same shuffle could occur for different 𝜌. So it will be convenient to color the
elements of the second sequence by marking them with a hat. Thus the 𝜎 in this case are in bijection
with colored shuffles (34 . . . 𝑚) ( 𝑀,
b 𝑀\ + 1, . . . , b
𝑛). It will also be convenient to consider these
as corresponding to the sequences 2𝜏 by prepending a 2 to each shuffle and considering 2 as an
uncolored element. Set 𝑆 be the set of such sequences 𝑠 = 2𝑠2 𝑠3 . . . 𝑠𝑙+1 where 𝑙, 𝑚, 𝑀 are allowed
to vary over all possible values. Note that if 𝑠 corresponds to 𝜎 then des 𝜎 = 2 + des 𝑠. To compute
des 𝑠, we consider the transition indices
Tr 𝑠 = {𝑖 | 𝑠𝑖 is colored and 𝑠𝑖+1 is not, or vice-versa}.
For example, if 𝑠 = 23b 645b7b8 then Tr 𝑠 = {2, 3, 5}. It is easy to see that the map Tr : 𝑆 → 2 [𝑙] , the
range being all subsets of [𝑙], is a bijection. Also, every other transition index of 𝑠 starting with the
second corresponds to a descent. So, using the round down function, des 𝑠 = b# Tr 𝑠/2c. We can
now complete this case using 𝑖 = # Tr 𝑠 to see that the contribution is
𝑛−3 𝑙 𝑛−3 𝑛−3
ÕÕ 𝑙 b𝑖/2c+2 Õ b𝑖/2c+2 Õ 𝑙
𝑞 = 𝑞
𝑖 𝑖
𝑙=0 𝑖=0 𝑖=0 𝑙=𝑖
Õ 𝑛 − 2
𝑛−3
= 𝑞 b𝑖/2c+2
𝑖+1
𝑖=0
Õ 𝑛 − 2 𝑛 − 2
=𝑞 2 + 𝑞𝑗
2𝑗 + 1 2𝑗 + 2
𝑗 ≥0
Õ 𝑛−1
=𝑞 2 𝑞𝑗.
2𝑗 + 2
𝑗 ≥0
Putting all the cases together we have
− 1
Õ 𝑛 − 1
+ 𝑞2 2 𝑛
−1+
𝑛−2
𝐷 𝑛 ([1432]; 𝑞) = 𝐷 𝑛−1 ([1432]; 𝑞) − 𝑞 𝑗 .
2 2𝑗 + 2
𝑗 ≥0
As usual, the routine verification that our desired formula satisfies this recursion and the initial
condition is left to the reader.
31
We now turn to the cyclic descent polynomials for pairs in [𝔖4 ]. To simplify notation, for any
polynomial 𝑓 (𝑞) and 𝑛 ∈ N we let
𝑓 (𝑛) (𝑞) = 𝑞 𝑛 𝑓 (1/𝑞).
Theorem 3.4.7. We have the following descent polynomials.
(a) We have
(𝑛)
𝐷 𝑛 ([1234], [1243]) = 𝐷 𝑛 ([1342], [1432]) = 𝐷 𝑛 ([1243], [1432])
(𝑛)
= 𝐷 𝑛 ([1234], [1342]).
And for 𝑛 ≥ 3
𝐷 𝑛 ([1234], [1342]; 𝑞) = (2𝑛 − 5)𝑞 𝑛−2 + 𝑞 𝑛−1 .
(b) We have
(𝑛)
𝐷 𝑛 ([1423], [1432]) = 𝐷 𝑛 ([1234], [1324]).
And for 𝑛 ≥ 3
𝐷 𝑛 ([1234], [1324]; 𝑞) = (2𝑛 − 5)𝑞 𝑛−2 + 𝑞 𝑛−1 .
(c) We have
(𝑛)
𝐷 𝑛 ([1324], [1432]) = 𝐷 𝑛 ([1234], [1423]).
And for 𝑛 ≥ 1
𝑛 − 1 𝑛−2
𝐷 𝑛 ([1234], [1423]; 𝑞) = 𝑞 𝑛−1 + 𝑞 .
2
(d) We have
(𝑛)
𝐷 𝑛 ([1243], [1423]) = 𝐷 𝑛 ([1342], [1423]) = 𝐷 𝑛 ([1243], [1324])
(𝑛)
= 𝐷 𝑛 ([1324], [1342]).
And for 𝑛 ≥ 1
𝑛−1
Õ
𝐷 𝑛 ([1324], [1342]; 𝑞) = 𝑞 + (𝑛 − 𝑘)𝑞 𝑘 .
𝑘=2
32
(e) For 𝑛 ≥ 4 we have
𝐷 𝑛 ([1243], [1342]; 𝑞) = 𝑞 + 𝑞 2 + 𝑞 𝑛−1 + 𝑞 𝑛−2 .
(f) For 𝑛 ≥ 3 we have
𝐷 𝑛 ([1324], [1423]; 𝑞) = 𝑞(1 + 𝑞) 𝑛−2 .
Proof. We will only prove (a) as the others follow easily in a similar fashion from the descriptions
of the avoidance classes in Section 3.2. We adopt the notation of the proof of Theorem 3.2.2.
We will use the description of the generating tree to obtain a recursion for 𝐷 𝑛+1 [1243], [1432]).
Note that if 𝑛 + 1 is inserted in site 𝑖 of 𝜎 to form 𝜎0 then
cdes[𝜎] if 𝑖 is a cyclic descent,
0
cdes[𝜎 ] =
cdes[𝜎] + 1 if 𝑖 is a cyclic ascent.
Since the site before 𝑛 is always active, and such a site is a cyclic ascent, these children will
give a contribution of 𝑞𝐷 𝑛 ([1243], [1432]). In 𝛿 and 𝜖, insertion in the other active site gives
permutations with 𝑛 − 1 descents. So
𝐷 𝑛+1 [1243], [1432]) = 2𝑞 𝑛−1 + 𝑞𝐷 𝑛 ([1243], [1432]).
It is now easy to check that the formula in (a) satisfies this recursion and is also valid at 𝑛 = 3,
completing the proof.
For classes avoiding 3 or more patterns, we will only write down the results for those which are
not eventually constant. The interested reader can easily compute the polynomials for the remaining
classes. We also content ourselves with stating the polynomial for one member of every trivial Wilf
equivalence class since the rest can be computed from Lemma 3.4.1.
Theorem 3.4.8. We have the descent polynomials
𝐷 𝑛 ( [1234], [1342], [1423]; 𝑞) = 𝐷 𝑛 ([1234], [1324], [1423]; 𝑞) = (𝑛 − 2)𝑞 𝑛−2 + 𝑞 𝑛−1
and
1 − 𝑞 𝑛−1
𝐷 𝑛 ([1324], [1342], [1423]; 𝑞) = 𝑞 ·
1−𝑞
for 𝑛 ≥ 2.
33
3.5 Open problems and concluding remarks
We collect here various areas for future research in the hopes that the reader will be interested
in pursuing this work.
3.5.1 Longer patterns
There has been very little work about containment and avoidance for cyclic patterns of length
longer than 4. Of course, the cyclic Erdős-Szekeres Theorem, Theorem 3.1.1 above, is one such
result. There is also a paper of Gray, Lanning and Wang [GLW18] where the authors consider
cyclic packing (maximizing the number of copies of a given pattern among all the permutations
[𝜎] ∈ [𝔖𝑛 ] for some 𝑛) and superpatterns (permutations containing all the patterns [𝜋] ∈ [𝔖𝑘 ] for
some 𝑘). It would be interesting to see if there are nice enumerative formulas for classes consisting
of cyclic patterns of length 5 and up.
3.5.2 Other statistics
We have previously mentioned the peak set of a linear permutation,
Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 }
which has corresponding peak number
pk 𝜋 = # Pk 𝜋.
Peaks are an important part of Stembridge’s theory of enriched 𝑃-partitions [Ste97] where 𝑃 is
a partially ordered set. On the enumerative side, the study of permutations which have a given
peak set has been a subject of current interest [BBPS15, BBS13, BFT16, CVDLO+ 17, DLHIO17a,
DLHIO17b, DLHIPL17]. Now define the cyclic peak number to be
cpk[𝜋] = #{𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}.
34
As with cdes, this is well defined since it is independent of the choice of representative of [𝜋].
There should be interesting generating functions for the distribution of cpk over avoidance classes,
or even for the joint distribution of cdes and cpk. As evidence, we prove one such result.
Theorem 3.5.1. For 𝑛 ≥ 3
𝑞 cdes[𝜎] 𝑡 cpk[𝜎] = 𝑞 𝑛−2 𝑡 + (2𝑛 − 6)𝑞 𝑛−2 𝑡 2 + 𝑞 𝑛−1 𝑡.
Õ
[𝜎]∈Av𝑛 ( [1234],[1342])
Proof. Let 𝐹𝑛 (𝑞, 𝑡) denote the desired generating function. We proceed as in the proof of Theo-
rem 3.4.7 (a) to find a recursion for 𝐹𝑛+1 (𝑞, 𝑡). Since the largest element of [𝜎] is always a cyclic
peak, inserting 𝑛 + 1 before 𝑛 does not change cpk. So this contributes 𝑞𝐹𝑛 (𝑞, 𝑡) to the recursion.
For 𝛿 and 𝜖, inserting 𝑛 + 1 in the other active site increases the number of peaks to 2. So the
contribution from these cases is 2𝑞 𝑛−1 𝑡 2 . In summary
𝐹𝑛+1 (𝑞, 𝑡) = 2𝑞 𝑛−1 𝑡 2 + 𝑞𝐹𝑛 (𝑞, 𝑡)
and the desired polynomial is easily seen to be the solution.
In a recent paper Adin, Gessel, Reiner, and Roichman [AGRR20] defined a cyclic analogue of
the Hopf algebra of quasisymmetric functions. In this context the cyclic descent set of a linear
permutation arises naturally in the description of the product in this algebra. They also raise the
following intriguing question.
Question 3.5.2. Find an analogue of the major index for cyclic permutations that has nice proper-
ties, such as a generating function over [𝔖𝑛 ] which factors nicely as does the generating function
for the ordinary major index over 𝔖𝑛 .
3.5.3 Vincular patterns
The study of vincular patterns was originated by Babson and Steingrímsson [BS00] and has since
become a mainstay of the pattern field. We consider 𝜋 as a vincular pattern if one only counts
occurrences in 𝜎 where certain adjacent elements of 𝜋 must also be adjacent in the copy in 𝜎. Such
35
adjacent elements are overlined in 𝜋. For example, 𝜎 = 24513 contains two copies of 𝜋 = 132,
namely 243 and 253. But only 243 is a copy of 132. Avoidance and Wilf equivalence are defined
in the obvious way. These notions and the corresponding notation carry over to cyclic patterns
without change. There are undoubtedly nice results which can be proven about vincular cyclic
patterns. As an example, we show how one vincular class is enumerated by the Catalan numbers.
Theorem 3.5.3. We have
[1324] ≡ [1423] ≡ [1324] ≡ [2314].
And for 𝑛 ≥ 1
# Av𝑛 [1324] = 𝐶𝑛−1 .
Proof. The Wilf equivalences are trivial. To prove the Catalan formula, suppose that [𝜎] ∈
Av𝑛 [1324] for 𝑛 ≥ 2 and write 𝜎 so that 𝜎𝑛 = 𝑛 and 𝜎𝑛−1 = 𝑚 for some 𝑚 ∈ [𝑛 − 1]. First notice
that 𝜎 = 𝜌𝜏𝑚𝑛 where 𝜌 and 𝜏 are permutations of [𝑚 + 1, 𝑛 − 1] and [𝑚 − 1], respectively. For if
there are 𝑥 < 𝑚 < 𝑦 < 𝑛 with 𝑥 before 𝑦 in 𝜎 then [𝑥𝑦𝑚𝑛] is a copy of [1324]. Furthermore, it is
clear that [𝑚𝜌] and [𝜏𝑚] must avoid the forbidden pattern.
We claim the if 𝜎 = 𝜌𝜏𝑚𝑛 where 𝜌 and 𝜏 obey the restrictions of the previous paragraph then
[𝜎] avoids [1324]. Suppose, towards a contradiciton, that a copy [𝜅] = [𝑤𝑦𝑥𝑧] exists with 𝑤𝑦𝑥𝑧
order isomorphic to 1324. Consider the elements 𝑥 and 𝑧 which play the roles of 2 and 4. The
possibility that they are 𝑚 and 𝑛, respectively, is ruled out by the fact that every element of 𝜌 is
larger than every element of 𝜏. If 𝑧 ∈ 𝜏𝑚 then all of 𝜅 must be in this subsequence since 𝑧 is
the largest element of the copy. But this is impossible since [𝜏𝑚] avoids the bad pattern. Finally,
suppose 𝑧 ∈ 𝜌. This forces 𝑥 ∈ 𝜌 since it is comes cyclically just before 𝑧, and 𝑛 is too large to be
𝑥. We must also have 𝑦 ∈ 𝜌 since 𝑥 < 𝑦 < 𝑧. But now there is no possible choice for 𝑤. Indeed,
if 𝑤 ∈ [𝑚𝜌] then [𝜅] is in this subsequence, contradicting our assumption. And if 𝑤 ∈ 𝜏 then it
could be replaced by 𝑚 since 𝑥, 𝑦, 𝑧 > 𝑚, yielding the same contradiction as before.
36
From the first two paragraphs we immediately get the recursion
𝑛−1
Õ
# Av𝑛 [1324] = # Av𝑚 [1324] · # Av𝑛−𝑚 [1324].
𝑚=1
From this the Catalan enumeration follows by induction.
It appears that sometimes rather than trying to find the size of the avoidance class directly, it
may be easier to use exponential generating functions. Given a set of (possibly vincular) patterns
[Π], let
Õ 𝑥𝑛
𝐸 [Π] = # Av𝑛 [Π] .
𝑛!
𝑛≥0
We have the following conjectures for two vincular avoidance classes. Once the corresponding
differential equation is proved, an explicit solution can easily found using separation of variables.
Conjecture 3.5.4. We have the following.
1. If 𝐸 = 𝐸 [123] then
𝐸 0 = 𝐸 2 − 𝐸 + 1.
2. If 𝐸 = 𝐸 [213] then
2
𝐸− 𝑥
𝐸0 = 𝑒 2.
Recently, Sergi Elizalde and Bruce Sagan have constructed proofs of both conjectured results
through a more general result using generating functions which keep track of the number of cyclic
occurrences, instead of just avoidance [ES21].
37
CHAPTER 4
PINNACLE SET PROPERTIES
This chapter contains material from Domagalski, Liang, Minnich, Sagan, Schmidt, and Siet-
sema [DLM+ 21c]. All results in this chapter are from this manuscript except as otherwise noted.
4.1 Counting admissible pinnacle sets
In this section we give our proof of Theorem 2.0.5, which gives the number of admissible
pinnacle sets 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 . Our strategy will be as follows. First, we will introduce
the set of interleaved permutations which are obviously counted by the desired binomial coefficient.
Next, we will associate with each admissible pinnacle set 𝑆 a particular permutation 𝜋 such that
Pin 𝜋 = 𝑆. This permutation will be called right canonical because its pinnacles will be as far
right as possible. Finally, we will show that the set of interleaved permutations and the set of right
canonical permutations are, in fact, the same. This will complete the proof of the theorem.
An interleaved permutation 𝜋 ∈ 𝔖𝑛 is one constructed in the following manner. Pick any
j k
𝐴 ⊆ [2, 𝑛] with #𝐴 = 2 . 𝑛−1
j k
I1 Fill the first 𝑛−1
2 even positions of 𝜋 with the elements of 𝐴 in increasing order.
I2 Fill the remaining positions of 𝜋 with the elements of 𝐴 = [𝑛] − 𝐴 in increasing order.
As an example, suppose 𝑛 = 9 and 𝐴 = {2, 3, 7, 9}. After step I1 we have
𝜋= 2 3 7 9 .
Since 𝐴 = {1, 4, 5, 6, 8}, after I2 we have the full interleaved permutation
𝜋 = 1 2 4 3 5 7 6 9 8. (4.1)
Let
I𝑛 = {𝜋 ∈ 𝔖𝑛 | 𝜋 is interleaved}.
38
Clearly 𝜋 ∈ I𝑛 is completely determined by the choice of 𝐴. It follows immediately that
𝑛 − 1k
#I𝑛 = 𝑛−1 .
j (4.2)
2
Now given an admissible pinnacle set 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } ⊂ [𝑛] we wish to construct
a permutation 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆. We use the following algorithm to construct the right
canonical permutation 𝜋 from 𝑆. We first deal with the case where 𝑛 is odd. Let 𝑆 = [𝑛] − 𝑆.
C1 Place elements of 𝑆 in 𝜋 moving right to left, starting with the largest unused element of 𝑆
and then decreasing until an element less than the largest unused element of 𝑆 is placed.
C2 Place the largest unused element of 𝑆 in the rightmost unused position.
C3 Iterate C1 and C2 until all elements of 𝑆 and 𝑆 are placed.
If 𝑛 is even, the only change to this procedure is that we fill both 𝜋𝑛 and 𝜋𝑛−1 with elements of 𝑆
before considering whether to place an element of 𝑆. To illustrate, consider 𝑛 = 9 and 𝑆 = {4, 7, 9}.
So 𝑆 = {1, 2, 3, 5, 6, 8}. Here is the construction of 𝜋 where, at each stage, we note whether C1 or
C2 is being used.
step C1 C2 C1 C2 C1 C1 C2 C1 C1
𝜋 8 98 698 7698 57698 357698 4357698 24357698 124357698
So the right canonical permutation for 𝑆 = {4, 7, 9} is 𝜋 = 124357698. Note that Pin 𝜋 = 𝑆.
Furthermore, this is the same permutation as obtained in (4.1). However, neither the sets 𝐴 nor 𝐴
equals 𝑆. Let
C𝑛 = {𝜋 ∈ 𝔖𝑛 | 𝜋 is right canonical}.
We first need to show that C1–C3 is well defined in that every position of 𝜋 gets filled and that
we always have Pin 𝜋 = 𝑆. Recall that A𝑛 = {𝑆 | 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 }.
Lemma 4.1.1. If 𝑆 ⊂ [𝑛] is an admissible set then C1–C3 produces a permutation 𝜋 with Pin 𝜋 = 𝑆.
Thus
#C𝑛 = #A𝑛 .
39
Proof. Clearly the second sentence follows from the first. For the first sentence, we will present
details for the case when 𝑛 is odd. If 𝑛 is even, then one can just place the largest element of 𝑆 in
position 𝑛 and proceed as in the odd case.
The following notation will be useful. Let
𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 },
𝑆 = { 𝑠¯1 < 𝑠¯2 < . . . < 𝑠¯𝑛−𝑑 }.
We will also let 𝑆 𝑝 and 𝑆 𝑝 denote the elements of 𝑆 and of 𝑆, respectively, which have not been
used during the placement of 𝜋𝑛 , 𝜋𝑛−1 , . . . , 𝜋 𝑝 .
We will use reverse induction on the position 𝑝 being filled in 𝜋. When 𝑝 = 𝑛, we have 𝑆 ≠ ∅
since 1, which is always a non-pinnacle, must be in 𝑆. So there is an element 𝑠¯𝑛−𝑑 to place in
position 𝑛. Furthermore this element can not be a pinnacle since it is the last element of the
permutation, which agrees with the fact that it is in 𝑆.
Suppose that 𝜋𝑛 , 𝜋𝑛−1 , . . . , 𝜋 𝑝+1 have been constructed. Suppose first that 𝜋 𝑝+1 ∈ 𝑆. One
subcase is if either 𝑆 𝑝 = ∅, or 𝑆 𝑝 ≠ ∅ and 𝜋 𝑝+1 > max 𝑆 𝑝 . We must show that 𝑆 𝑝 ≠ ∅ so that we
can let 𝜋 𝑝 = max 𝑆 𝑝 . This is true when 𝑆 𝑝 = ∅ since |𝑆 𝑝 ] 𝑆 𝑝 | = 𝑝. If the second option holds then
we have 𝜋 𝑝+1 > max 𝑆 𝑝 . But there must be at least two elements of 𝑆 smaller than max 𝑆 𝑝 since
𝑆 is admissible and so there is some permutation making max 𝑆 𝑝 a pinnacle. Also, these elements
must still be in 𝑆 𝑝 since elements of this set are placed in decreasing order right to left. Thus this
set is nonempty as desired. Furthermore, 𝜋 𝑝 is not a pinnacle since it is smaller than 𝜋 𝑝+1 .
Now consider the subcase when 𝜋 𝑝+1 < max 𝑆 𝑝 . Then we let 𝜋 𝑝 = max 𝑆 𝑝 which is well
defined. But we must show that 𝜋 𝑝 is a pinnacle. We know 𝜋 𝑝 > 𝜋 𝑝+1 . So there remains to check
whether one can construct 𝜋 𝑝−1 with 𝜋 𝑝−1 < 𝜋 𝑝 . For this, it suffices to show that 𝑆 𝑝−1 ≠ ∅ since
then we will have 𝜋 𝑝−1 = max 𝑆 𝑝−1 < 𝜋 𝑝+1 < 𝜋 𝑝 . Note that this will also finish the induction
step.
We claim that if 𝜋 𝑝 = 𝑠𝑖 and 𝜋 𝑝+1 = 𝑠¯ 𝑗 then 𝑗 > 𝑖. It will then follow that 𝑠¯ 𝑗−1 exists and can
be used for 𝜋 𝑝−1 . But by Theorem 2.0.4 we have 𝑠𝑖 > 𝑠¯𝑖+1 . Indeed, if 𝑠𝑖 < 𝑠¯𝑖+1 then at most the
elements 𝑠1 , . . . , 𝑠𝑖−1 , 𝑠¯1 , . . . , 𝑠¯𝑖 are less than 𝑠𝑖 so that 𝑠𝑖 ≤ 2𝑖, a contradiction. Also, elements of
40
𝑆 are placed in decreasing order with 𝑠𝑖 being placed as early as possible with a smaller element to
its right. The desired bound on 𝑗 follows.
We are now ready to give our proof of Theorem 2.0.5.
Theorem 4.1.2. We have C𝑛 = I𝑛 . Thus
𝑛 − 1k
#A𝑛 = 𝑛−1 .
j
2
Proof. The second statement follows directly from the first, Lemma 4.1.1, and equation (4.2). So
we only need to prove that the two sets are the same. We will consider the case when 𝑛 is odd, as
the even case is similar.
We begin by showing that any right canonical permutation 𝜋 is interleaved. That is to say, the
subword consisting of all even indices is an increasing sequence, and the subword consisting of all
odd indices is an increasing sequence starting with 1.
In terms of the placement of 1, note that 𝜋1 is not a pinnacle. And since non-pinnacles are
placed in decreasing order from right to left, we must have 𝜋1 = 1.
To finish this direction, it is enough to show that for any elements 𝜋𝑖 and 𝜋𝑖+2 , we have that
𝜋𝑖+2 > 𝜋𝑖 . Note that we are done immediately if 𝜋𝑖 and 𝜋𝑖+2 are either both pinnacles or both
non-pinnacles since the construction places them in decreasing order from right to left. If 𝜋𝑖+2 is
a pinnacle and 𝜋𝑖 is not, then by the pinnacle assumption 𝜋𝑖+2 > 𝜋𝑖+1 . And since non-pinnacles
are placed in decreasing order right to left, 𝜋𝑖+1 > 𝜋𝑖 . Combining the two inequalities gives the
desired result. Finally, suppose 𝜋𝑖 is a pinnacle and 𝜋𝑖+2 is not. Then 𝜋𝑖+1 is not a pinnacle, being
adjacent to 𝜋𝑖 . And, by construction, 𝜋𝑖+1 must be the first available non-pinnacle right to left
which is smaller than 𝜋𝑖 . It follows that 𝜋𝑖+2 > 𝜋𝑖 .
For set containment the other way, let 𝜋 be an interleaved permutation. It suffices to show that if
the elements of 𝜋 are placed right to left then they follow C1–C3. Consider 𝜋𝑖 placed after 𝜋𝑖+1 with
1 < 𝑖 < 𝑛. The boundary cases when 𝑖 = 1 or 𝑛 are similar. If 𝜋𝑖 < 𝜋𝑖+1 then 𝜋𝑖 is a non-pinnacle
and 𝜋𝑖+1 is either a non-pinnacle or a pinnacle. In the first case, the non-pinnacles are being placed
in decreasing order as desired. In the second, the previously placed non-pinnacle is 𝜋𝑖+2 . So the
41
2
1
6 7 8 9
0
1 2 3 4 5 6 7 8
2 5
-1
3 4
-2
Figure 4.1: The lattice path 𝐿 for 𝐴 = {2, 3, 7, 9}
same conclusion holds by the interleaving condition. Now consider the possibility 𝜋𝑖 > 𝜋𝑖+1 . By
the interleaving condition, 𝜋𝑖−1 < 𝜋𝑖+1 so 𝜋𝑖 is a pinnacle. Either 𝜋𝑖+2 is a pinnacle or not, the
latter possibility including the case that 𝜋𝑖+2 does not exist. If it is, then the interleaving condition
shows that pinnacles are being placed in decreasing order. If 𝜋𝑖+2 is not a pinnacle, then this fact
and the interleaving condition again imply 𝜋𝑖 < 𝜋𝑖+2 < 𝜋𝑖+3 . It follows that 𝜋𝑖 was placed after the
first smaller non-pinnacle and, by the interleaving condition one last time, that any pinnacles to its
right are larger. This completes the proof of the other containment.
Given a set 𝐴 and 𝑘 ∈ N we let 𝐴𝑘 be the set of all 𝑘-element subsets of 𝐴. The above construct
gives us a bijection
[2, 𝑛]
𝜓 : j 𝑛−1 k → A𝑛
2
given by
𝜓( 𝐴) = Pin 𝜋
where 𝜋 is the interleaving permutation corresponding to 𝐴.
In [DNKPT18], the authors proved Theorem 2.0.5 using a bijection
[2, 𝑛]
𝜙 : j 𝑛−1 k → A𝑛
2
42
defined as follows. An up-down lattice path 𝐿 starts at the origin and uses steps which are either
up (𝑈) or down (𝐷) parallel to the vectors [1, 1] and [1, −1], respectively. For more information
about lattice paths, see the text of Sagan [Sag20]. It will be convenient to index the steps of 𝐿 with
[2,𝑛]
[2, 𝑛] and write 𝐿 = 𝑠2 𝑠3 . . . 𝑠𝑛 . Associate with 𝐴 ∈ j 𝑛−1 k the lattice path 𝐿 such that
2
𝐷 if 𝑖 ∈ 𝐴,
𝑠𝑖 =
𝑈 if 𝑖 ∉ 𝐴.
To illustrate, if 𝑛 = 9 and 𝐴 = {2, 3, 7, 9} as in the example beginning this section then
𝐿 = 𝐷𝐷𝑈𝑈𝑈𝐷𝑈𝐷
as depicted in Figure 4.1 where each step is labeled by its index. We now define
𝜙( 𝐴) = {𝑖 | in 𝐿 either 𝑠𝑖 = 𝑈 strictly below the 𝑥-axis, or 𝑠𝑖 = 𝐷 weakly above the 𝑥-axis}.
Continuing our example, 𝜙({2, 3, 7, 9}) = {4, 7, 9} = 𝜓({2, 3, 7, 9}). This is not an accident.
Proposition 4.1.3. We have
𝜙 = 𝜓.
Proof. We will give the proof for 𝑛 odd as the even case is similar. Let 𝑙 = (𝑛 − 1)/2. We
need to show that 𝜙( 𝐴) = 𝜓( 𝐴) for all 𝐴 ∈ [2,𝑛] 𝑙 . Suppose 𝐴 = {𝑎 1 < 𝑎 2 < . . . < 𝑎 𝑙 } and
𝐴 = [𝑛] − 𝐴 = {𝑎 1 < 𝑎 2 < . . . < 𝑎 𝑛−𝑙 }. Let 𝐿 and 𝜋 be the lattice path and interleaved permutation,
respectively, associated with 𝐴. So 𝜓( 𝐴) = Pin 𝜋 and there will be two cases depending on whether
a pinnacle of 𝜋 comes from 𝐴 or 𝐴
In the first case, suppose 𝑎𝑖 ∈ Pin 𝜋. Since 𝜋 is interleaved, this is equivalent to 𝑎𝑖 = 𝜋2𝑖 >
𝜋2𝑖+1 = 𝑎𝑖+1 . Recall that 𝑎𝑖 indexes the 𝑖th 𝐷 step of 𝐿, and similarly for 𝑎𝑖+1 and 𝑈 steps. So the
previous inequality is equivalent to step 𝑠 𝑎𝑖 = 𝐷 being preceded by more up steps than down steps.
And this is precisely the condition for 𝑎𝑖 to be the index of a down step weakly above the 𝑥-axis,
which means it is in 𝜙( 𝐴). Thus this case is complete.
In a similar manner, one proves that 𝑎𝑖 ∈ Pin 𝜋 if and only if 𝑎𝑖 is the index of an up step strictly
below the 𝑥-axis. This completes the second case and the proof.
43
4.2 Ballot numbers
Davis et al. derived a number of properties of the constants 𝔭(𝑚, 𝑑) which count the number of
admissible pinnacle sets 𝑆 with 𝑑 elements and maximum 𝑚. In this section we prove that these
constants are, in fact, ballot numbers. We give two proofs of this result. In the first, we derive
a formula for 𝔭(𝑚, 𝑑) using finite differences and then show that it agrees with the well-known
expression for ballot numbers. In the second, we give an explicit bijection between these admissible
sets and ballot sequences.
Suppose we are given nonnegative integers 𝑝 > 𝑞. A ( 𝑝, 𝑞) ballot sequence is a permutation
𝛽 = 𝛽1 𝛽2 . . . 𝛽 𝑝+𝑞 of 𝑝 copies of the letter 𝑋 and 𝑞 copies of the letter 𝑌 such that in any nonempty
prefix 𝛽1 𝛽2 . . . 𝛽𝑖 the number of 𝑋’s is greater than the number of 𝑌 ’s. Let
B 𝑝,𝑞 = {𝛽 | 𝛽 is a ( 𝑝, 𝑞) ballot sequence}.
The following result is well known.
Theorem 4.2.1 ([And87],[Ber87]). For nonnegative integers 𝑝 > 𝑞 we have
𝑝−𝑞 𝑝+𝑞
#B 𝑝,𝑞 = .
𝑝+𝑞 𝑞
Note that if we let 𝑝 = 𝑑 + 1 and 𝑞 = 𝑑 then the previous result gives get
1 2𝑑 + 1
#B𝑑+1,𝑑 = = 𝐶𝑑
2𝑑 + 1 𝑑
where 𝐶𝑑 is the 𝑑th Catalan number.
Our first proof that the 𝔭(𝑚, 𝑑) are ballot numbers will use the theory of finite differences. If
𝑓 (𝑚) is a function of a nonnegative integer 𝑚 then its forward difference is the function Δ 𝑓 defined
by
Δ 𝑓 (𝑚) = 𝑓 (𝑚 + 1) − 𝑓 (𝑚).
For a fixed 𝑑 ∈ P, define the following polynomial in 𝑚 of degree 𝑑 − 1
𝑚 − 2𝑑 + 1 Ö
𝑑−1
𝑝 𝑑 (𝑚) = (𝑚 − 𝑖).
(𝑑 − 1)!
𝑖=2
44
Lemma 4.2.2. The polynomial 𝑝 𝑑 (𝑚) satisfies
Δ 𝑝 𝑑 (𝑚) = 𝑝 𝑑−1 (𝑚)
and
𝑝 𝑑 (2𝑑 + 1) = 𝐶𝑑 .
Proof. To prove the first equality, we compute
Δ 𝑝 𝑑 (𝑚) = 𝑝 𝑑 (𝑚 + 1) − 𝑝 𝑑 (𝑚)
𝑚 − 2𝑑 + 2 Ö 𝑚 − 2𝑑 + 1 Ö
𝑑−1 𝑑−1
= (𝑚 + 1 − 𝑖) − (𝑚 − 𝑖)
(𝑑 − 1)! (𝑑 − 1)!
𝑖=2 𝑖=2
(𝑚 − 2𝑑 + 2)(𝑚 − 1) − (𝑚 − 2𝑑 + 1)(𝑚 − 𝑑 + 1) Ö
𝑑−2
= (𝑚 − 𝑖)
(𝑑 − 1)!
𝑖=2
(𝑑 − 1)(𝑚 − 2𝑑 + 3) Ö
𝑑−2
= (𝑚 − 𝑖)
(𝑑 − 1)!
𝑖=2
= 𝑝 𝑑−1 (𝑚).
For the second equality, we have
2
𝑑−1
Ö
𝑝 𝑑 (2𝑑 + 1) = (2𝑑 + 1 − 𝑖)
(𝑑 − 1)!
𝑖=2
2𝑑 (2𝑑 − 1)!
= ·
𝑑! (𝑑 + 1)!
(2𝑑)!
=
𝑑!(𝑑 + 1)!
= 𝐶𝑑
which finishes the proof.
Note that by the criterion in Theorem 2.0.4, 𝔭(𝑚, 𝑑) can only be nonzero if 𝑚 > 2𝑑. We thank
Richard Stanley who, on being shown Lemma 4.2.2, pointed out that 𝔭(𝑚, 𝑑) is a ballot number.
45
Theorem 4.2.3. If 𝑚, 𝑑 ∈ P with 𝑚 > 2𝑑 then 𝔭(𝑚, 𝑑) = 𝑝 𝑑 (𝑚). Thus
𝑚 − 2𝑑 + 1 𝑚 − 1
𝔭(𝑚, 𝑑) = = #B𝑚−𝑑,𝑑−1 .
𝑚−1 𝑑−1
Proof. Induct on 𝑑 where the base case of 𝑑 = 1 is trivial to verify. To finish the first claim, it suffices
to use the previous lemma and show that both Δ𝔭(𝑚, 𝑑) = 𝔭(𝑚, 𝑑 − 1) and 𝔭(2𝑑 + 1, 𝑑) = 𝐶𝑑 . But
these were proved in [DNKPT18, Sections 2.2–2.3]. The first displayed equality now follows from
simple manipulation of the definition of 𝑝 𝑑 (𝑚), while the second comes from Theorem 4.2.1.
We would like to give a bijective proof of the relationship between admissible pinnacle sets and
ballot sequences from the previous theorem. Let
𝔓(𝑚, 𝑑) = {𝑆 | 𝑆 admissible with max 𝑆 = 𝑚 and #𝑆 = 𝑑}
so that #𝔓(𝑚, 𝑑) = 𝔭(𝑚, 𝑑). For 𝑚 > 2𝑑, define a map
𝜂 : B𝑚−𝑑,𝑑−1 → 𝔓(𝑚, 𝑑)
by sending ballot sequence 𝛽 = 𝛽1 𝛽2 . . . 𝛽𝑚−1 to
𝜂(𝛽) = {𝑖 | 𝛽𝑖 = 𝑌 } ] {𝑚}.
For example, if 𝑚 = 9, 𝑑 = 3 and 𝛽 = 𝑋 𝑋 𝑋𝑌 𝑋 𝑋𝑌 𝑋 then
𝜂(𝛽) = {4, 7} ] {9} = {4, 7, 9}.
Theorem 4.2.4. The map 𝜂 is a well-defined bijection.
Proof. We must first show that 𝜂 is well defined in that 𝜂(𝛽) ∈ 𝔓(𝑚, 𝑑). Since 𝛽 ∈ B𝑚−𝑑,𝑑−1
we see that the set {𝑖 | 𝛽𝑖 = 𝑌 } is contained in [𝑚 − 1] and has cardinality 𝑑 − 1. It follows that
𝑆 = 𝜂(𝛽) has maximum 𝑚 and cardinality 𝑑.
There remains to show that 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } is admissible. By Theorem 2.0.4, it
suffices to show that 𝑠𝑖 > 2𝑖 for all 𝑖. But 𝑠𝑖 is the index of the 𝑖th 𝑌 in 𝛽. Since 𝛽 is a ballot
sequence, this 𝑌 is preceded by 𝑖 copies of 𝑌 (including itself) and at least 𝑖 + 1 copies of 𝑋. So
𝑠𝑖 ≥ 𝑖 + (𝑖 + 1) = 2𝑖 + 1 which is what we wished to prove.
46
To show that 𝜂 is a bijection, we create its inverse. Given 𝑆 ∈ 𝔓(𝑚, 𝑑) we define 𝜂−1 (𝑆) = 𝛽 =
𝛽1 𝛽2 . . . 𝛽𝑚−1 by letting
𝑋 if 𝑖 ∉ 𝑆,
𝛽𝑖 =
𝑌 if 𝑖 ∈ 𝑆.
The proof that 𝜂−1 is well defined is similar to the one for 𝜂. And proving that the compositions of
𝜂 with 𝜂−1 are identity maps is easy. So we are done.
4.3 Permutations with a given pinnacle set
Given an admissible set 𝑆, there does not seem to be an expression for 𝑝 𝑆 (𝑛), the number of
permutations in 𝔖𝑛 with 𝑆 as pinnacle set, analogous to the one in Theorem 2.0.3 for peak sets.
In [DNKPT18], they found expressions for 𝑝 𝑆 (𝑛) when #𝑆 ≤ 2 as well as bounds for general 𝑆, and
asked whether an exact formula could be given in the general case. Such an expression was given
in [DLHH+ 21] as a summation. In this section we will give another sum which is asymptotically
more efficient. In addition, this method can be extended to count #O (𝑆), the number of admissible
orderings of 𝑆.
Since our sum will involve a significant amount of new notation, we will collect it here and
then explain its relevance afterwards. Fix 𝑛 ∈ P. Suppose we have an admissible pinnacle set
𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } for permutations in 𝔖𝑛 . We use the convention 𝑠0 = 0 and 𝑠 𝑑+1 = 𝑛 + 1
and let
𝑛𝑖 = 𝑠𝑖+1 − 𝑠𝑖 − 1
for 0 ≤ 𝑖 ≤ 𝑑. Let
𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 𝑑𝑙 , 𝑑𝑟 }
and give the following total order to 𝐷’s elements
1𝑙 < 1𝑟 < 2𝑙 < 2𝑟 < . . . < 𝑑𝑙 < 𝑑𝑟 .
We call 𝑖 𝑙 and 𝑖𝑟 the elements of rank 𝑖 in 𝐷. If 𝐵 ⊆ 𝐷 then we will let
𝑏 = #𝐵
47
and
𝑟 𝑗 = the rank of the 𝑗th smallest element of 𝐵
for 1 ≤ 𝑗 ≤ 𝑏. We also define
𝑏𝑖 = the number of elements in 𝐵 with rank at least 𝑖.
Note that we always have 𝑏 1 = 𝑏 and 𝑏 𝑑+1 = 0 since 𝑑 is the largest rank. For example, if 𝑑 = 4,
then 𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , 3𝑙 , 3𝑟 , 4𝑙 , 4𝑟 } and one possible 𝐵 might be 𝐵 = {1𝑙 , 3𝑙 , 3𝑟 , 4𝑟 } which has
𝑟 1 = 1, 𝑟 2 = 3, 𝑟 3 = 3, 𝑟 4 = 4 and 𝑏 1 = 4, 𝑏 2 = 3, 𝑏 3 = 3, 𝑏 4 = 1, 𝑏 5 = 0. We can now state the first
main result of this section.
Theorem 4.3.1. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have
𝑏−1
! 𝑑 !
Õ Ö Ö
𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 .
𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0
To prove this, it will be convenient to convert the linear permutations we have been studying
into cyclic ones in order to avoid considering boundary cases. Given a linear permutation 𝜋 =
𝜋1 𝜋2 . . . 𝜋𝑛 the corresponding cyclic permutation is the set of permutations
[𝜋] = {𝜋1 𝜋2 . . . 𝜋𝑛 , 𝜋2 . . . 𝜋𝑛 𝜋1 , ..., 𝜋𝑛 𝜋1 . . . 𝜋𝑛−1 }.
Intuitively, we think of [𝜋] as the result of arranging the elements of 𝜋 on a circle. Let
[𝔖𝑛 ] = {[𝜋] | 𝜋 ∈ 𝔖𝑛 }.
For example if 𝜋 = 1324 then
[𝜋] = {1324, 3241, 2413, 4132}.
We are also using the bracket notation in [𝑛] where 𝑛 ∈ N but this should not cause any confusion.
Cyclic permutations are of interest in part because of their relation with pattern avoidance, standard
Young tableaux, quasisymmetric functions, and other mathematical objects [AGRR20, Cal02,
DLM+ 21a, DLM+ 21b, GLW18, GLW19].
48
We define the pinnacle set of [𝜋] = [𝜋1 𝜋2 . . . 𝜋𝑛 ] to be
Pin[𝜋] = {𝜋𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}.
Continuing our example from the last paragraph
Pin[1324] = {3, 4}.
Note in particular that Pin[12] = {2} and, more generally, 𝑛 ∈ Pin[𝜋] for any [𝜋] ∈ [𝔖𝑛 ] where
𝑛 ≥ 2.
Lemma 4.3.2. For 𝑛 ∈ P, there is a bijection between linear permutations in 𝔖𝑛 with pinnacle set
𝑆 and cyclic permutations in [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1}.
Proof. Given a linear 𝜋, append the element 𝑛 + 1 to the end of 𝜋 and take the corresponding
equivalence class in 𝔖𝑛+1 to form an element of [𝔖𝑛+1 ]. The map is clearly invertible and does
not destroy or create any pinnacles for elements in [𝑛]. Since 𝑛 + 1 ≥ 2, we know that 𝑛 + 1 will
become a pinnacle. Therefore the map has the desired properties concerning the pinnacle set.
Consider some admissible pinnacle set 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 }. Given the above lemma,
we may count the number of permutations in 𝔖𝑛 with pinnacle set 𝑆 by counting the number of
cyclic permutations [𝜋] ∈ [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1} where we let 𝑠 𝑑+1 = 𝑛 + 1.
Therefore, much of what follows will be in regards to cyclic permutations with pinnacle set 𝑆0.
A factor of a (cyclic) permutation is a subsequence of consecutive elements. We may attempt
to construct a [𝜋] with pinnacle set 𝑆0 by first putting the elements of 𝑆0 in some cyclic order, and
then placing all elements in 𝑆0 = [𝑛 + 1] − 𝑆0 into either decreasing factors starting with some 𝑠𝑖 ,
or into increasing factors ending with some 𝑠𝑖 . Such a [𝜋] will then be completely determined
by the increasing/decreasing factors that each element of 𝑆0 falls into, and we will call every such
assignment a placement. Note that it is possible for multiple placements to result in the same
permutation since each vale (an element of [𝜋] smaller than the elements on either side) can be
part of the factor on either side. For example, start with a desired pinnacle set {4, 5} and place
49
non-pinnacles between these elements to form the cyclic permutation [𝜋] = [14325]. Then [𝜋]
would be associated with a placement where the decreasing factor starting with 4 is 43 and the
increasing factor ending with 5 is 25. But it would also be associated with a placement having
these factors be 432 and 5, respectively.
It is also possible, depending on the placement, that [𝜋] will not have pinnacle set 𝑆0 if no
sufficiently small elements are placed between two pinnacles. In our example above, this could
have happened if we had placed 1, 2 and 3 all in the increasing factor ending in 5, resulting in
the cyclic permutation [41235] in which only 5 is a pinnacle. It is true, however, that any [𝜋] so
constructed will have a pinnacle set that is a subset of 𝑆0 since every non-pinnacle was placed so
that its factor contains an 𝑠𝑖 which is the largest element. For our arguments, we will focus on
counting placements and then convert them into permutations later.
Fix a cyclic ordering of the pinnacle indices and write it as [𝜏] = [𝜏1 · · · 𝜏𝑑+1 ] ∈ [𝔖𝑑+1 ]. An
example is shown in Figure 4.2 where 𝜏 = [7612354]. Now given a placement consistent with this
ordering, for every space between two adjacent elements in [𝜏] define the dale set of this placement
to consist of all elements between the two corresponding pinnacles that are also smaller than both
pinnacles. So in Figure 4.2 the dales are outlined by triangles with solid lines as sides. If 𝑠𝑖 is the
smaller of the two pinnacles, then we say that the dale has rank 𝑖. Note that the rank is from the
index of 𝑠𝑖 and not its actual value. We will further denote the rank as either 𝑖 𝑙 or 𝑖𝑟 depending on
whether the dale is to the left, or right of the pinnacle 𝑠𝑖 . In Figure 4.2 the dale ranks are given
along the 𝑥-axis. Define the dale rank set 𝐷 [𝜏] to be the set of the dale ranks of [𝜏]. And define
the master dale rank set to be
𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 𝑑𝑙 , 𝑑𝑟 }
so that 𝐷 ⊇ 𝐷 [𝜏] for all [𝜏]. In Figure 4.2, we have that 𝐷 [𝜏] = {1𝑙 , 1𝑟 , 2𝑟 , 3𝑟 , 41 , 4𝑟 , 6𝑙 } while
𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 6𝑙 , 6𝑟 }. Note that, by our definitions, there will be no dales in the case
where 𝑑 = 0.
Clearly 𝐷 [𝜏] will be a subset of 𝐷 consisting of exactly 𝑑 + 1 elements if 𝑑 > 0, and empty
otherwise. We can derive further information about 𝐷 [𝜏] if we want, such as how it will always
50
𝑠7 𝑠7
𝑛6
elements 𝑠6
𝑛5
elements 𝑠5
𝑛4
elements 𝑠4
𝑛3
elements 𝑠3
𝑛2
elements 𝑠2
𝑛1
elements 𝑠1
𝑛0
elements
6𝑙 1𝑙 1𝑟 2𝑟 3𝑟 4𝑙 4𝑟
Figure 4.2: Example of a pinnacle set ordering [𝜏] = [7612354] with corresponding dales.
contain both 1𝑙 and 1𝑟 if 𝑑 > 0, how it will never contain both 𝑑𝑙 and 𝑑𝑟 if 𝑑 > 1, and how 𝐷 [𝜏] will
never be able to have certain combinations of the higher ranked dales. These facts are not necessary
for proving our formula, although further analysis of them might help to improve its efficiency.
Lemma 4.3.3. For 𝑛 ∈ P, a given placement will correspond to a permutation [𝜋] ∈ [𝔖𝑛+1 ] with
pinnacle set 𝑆0 if and only if every dale is non-empty.
Proof. First, suppose 𝑑 = 0. In this case, the theorem is trivial since there are no dales. And every
placement will automatically result in only one pinnacle, namely 𝑛 + 1, as long as 𝑛 > 0.
Now suppose 𝑑 > 0. Clearly if any dale of rank 𝑖 (whether left or right) is empty, then the
pinnacle 𝑠𝑖 will have no smaller elements between itself and the higher pinnacle next to it, which
will force 𝑠𝑖 to not be a pinnacle. On the other hand, if all dales have at least one element, then
the space between any two pinnacles will always contain an element smaller than both, and all
elements of 𝑆0 will in fact be pinnacles.
We can now enumerate all placements corresponding to a given cyclic ordering of the indices
of the pinnacle set 𝑆0.
51
Lemma 4.3.4. Given an admissible pinnacle set 𝑆0, fix an order [𝜏] of the pinnacle indices. The
total number of placements with order [𝜏] that will result in a permutation with pinnacle set 𝑆0 is
given by
Õ Ö 𝑑
(−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖
𝐵⊂𝐷 [𝜏] 𝑖=0
where 𝑏, 𝑑, the 𝑏𝑖 , and the 𝑛𝑖 are defined above.
Proof. We will use the Principle of Inclusion and Exclusion or PIE. We let our universal set be all
possible placements with no restrictions. We then wish to exclude any placement where at least
one dale is empty. Therefore, if 𝐵 is some subset of the dales, we must be able to count the number
of placements where all dales in 𝐵 (and possibly others) are empty.
First consider the case when 𝐵 = ∅. There are 2(𝑑 + 1) factors of which 2𝑖 only exist below
𝑠𝑖 . So each of the 𝑛𝑖 non-pinnacles between 𝑠𝑖 and 𝑠𝑖+1 may be placed in any of the 2(𝑑 + 1 − 𝑖)
factors that are long enough to extend above 𝑠𝑖 . As an example, in fig. 4.2, if we look between the
horizontal boundary lines for the elements counted by 𝑛2 we see there are 10 = 2(6 + 1 − 2) such
factors represented by the diagonal lines (solid or dotted) which intersect the region.
For non-empty 𝐵, each dale of rank at least 𝑖 + 1 that we require to be empty will result in a loss
of two additional factors, and so there are only 2(𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) choices. Therefore, for a given
𝐵, the total number of placements guaranteeing the dales in 𝐵 are empty is
Ö 𝑑
2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 .
𝑖=0
To use the PIE, we must also attach the sign (−1) |𝐵| = (−1) 𝑏 to this term before summing.
Therefore, given a fixed order [𝜏] of the pinnacle indices of 𝑆0, we have that the total number of
placements that will result in a permutation with pinnacle set 𝑆0 is
Õ Ö 𝑑
(−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 .
𝐵⊆𝐷 [𝜏] 𝑖=0
Finally, when 𝐵 = 𝐷 [𝜏] then 𝑏 1 = #𝐵 = 𝑑 + 1. So we can ignore this term because the product has
a factor of 𝑑 + 1 − 𝑏 1 = 0.
52
The above formula must be summed over all possible [𝜏] to give a final count for the number
of [𝜋] with Pin[𝜋] = 𝑆0. This results in computationally expensive double sum. Also, note that in
the above formula there may be multiple 𝐵 resulting in the same term. For example, {1𝑙 , 2𝑟 , 5𝑙 }
is not the same as {1𝑟 , 2𝑟 , 5𝑙 } even though both produce the same 𝑏𝑖 . We will take care of this
redundancy when we optimize our formula below.
To fix the double sum problem, note that each 𝐵 in Lemma 4.3.4 is a subset of the master dale
rank set 𝐷. We will fix some subset 𝐵 ⊆ 𝐷 and count the number of orderings [𝜏] that will produce
a 𝐷 [𝜏] which can have 𝐵 as a subset. This will allow us to just sum over all subsets 𝐵 ⊆ 𝐷 without
having to keep track of [𝜏]. Furthermore, we only have to sum over the subsets 𝐵 of cardinality at
most 𝑑 since requiring more than 𝑑 dales to be empty is impossible for an admissible pinnacle set.
Lemma 4.3.5. Fix some 𝐵 ⊆ 𝐷 with |𝐵| ≤ 𝑑. The number of orderings [𝜏] that will produce a
𝐷 [𝜏] such that 𝐵 ⊆ 𝐷 [𝜏] is given by
𝑏−1
Ö
(𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 )
𝑖=0
where 𝑏, 𝑑, and the 𝑟𝑖 are defined as above.
Proof. We will start by viewing all 𝑑 + 1 pinnacles as separate and then adjoin them in pairs in such
a way so that the desired dales are formed. Here, “adjoining a pair of pinnacles” means requiring
that they be adjacent in [𝜏].
We start with the dale of rank 𝑟 𝑏 the largest rank in 𝐵. In that case, the only way to generate
such a dale is to order 𝑠𝑟 𝑏 so that one of the 𝑑 + 1 − 𝑟 𝑏 higher pinnacles is directly to its left or
right depending on whether the corresponding element of 𝐵 is a left or right rank, respectively. So
select one such pinnacle and adjoin it to the appropriate side of 𝑠𝑟 𝑏 .
Next we will examine the dale in 𝐵 with the next highest rank, 𝑟 𝑏−1 . If 𝑟 𝑏−1 is a smaller rank
than 𝑟 𝑏 , we may once again select a taller pinnacle to place next to 𝑠𝑟 𝑏−1 , on either the left or right
as necessary, in order to produce the desired dale. This time however, although there are 𝑑 +1−𝑟 𝑏−1
pinnacles higher than 𝑠𝑟 𝑏−1 , one of them is unavailable since we have already adjoined two of the
higher-ranked pinnacles together. More specifically, because of adjoining a higher pinnacle with
53
𝑠𝑟 𝑏 , we know that one taller pinnacle cannot be joined to its left and another cannot be joined to
its right. So no matter whether 𝑟 𝑏−1 corresponded to a left or right dale, there is one less option.
Therefore, the number of ways to append a larger pinnacle is 𝑑 + 1 − 𝑟 𝑏−1 − 1. On the other hand, if
𝑟 𝑏−1 = 𝑟 𝑏 then we need to adjoin a second pinnacle to 𝑠𝑟 𝑏 on the side opposite the one used when
considering 𝑟 𝑏 . Again, the pinnacle already adjoined to 𝑠𝑟 𝑏 removes one option so the number of
choices is 𝑑 + 1 − 𝑟 𝑏−1 − 1 as before. So in either case we have the same number of possibilities.
Similar consideration show that, in general, each 𝑟 𝑏−𝑖 results in 𝑑 + 1 −𝑖 −𝑟 𝑏−𝑖 choices for adjoining
pinnacles. Note that for this argument we are using the fact that 𝑏 ≤ 𝑑 since if 𝑏 = 𝑑 + 1 then the
string of pinnacles would wrap into a circle before creating the final dale.
Once all dales have been created by the above process, we only need to count the number of
ways to join the resulting strings of pinnacles together. Since we have adjoined pinnacles together
𝑏 times, we have 𝑑 + 1 − 𝑏 strings which we then must arrange in a circle. This can be done in
(𝑑 + 1 − 𝑏 − 1)! = (𝑑 − 𝑏)! ways. Therefore,
𝑏−1
Ö
(𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 )
𝑖=0
is the number of orderings [𝜏] that will allow for a given 𝐵 to be a subset of 𝐷 [𝜏] .
We are now in a position to prove Theorem 4.3.1 which we restate here for ease of reference.
Theorem 4.3.6. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have
𝑏−1
! 𝑑 !
Õ Ö Ö
𝑝 𝑆 (𝑛) = 2 𝑛−2𝑑−1 𝑏
(−1) (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑖 .
𝑛
𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0
Proof. It is easy to verify the formula if 𝑑 = 0, so we assume 𝑑 > 0. From Lemma 4.3.3, the
number of permutations 𝜋 ∈ 𝔖𝑛+1 with pinnacle set 𝑆 equals the number of cyclic permutations
[𝜋] ∈ [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1}. So we will count the latter. From Lemma 4.3.4,
the number of placements which correspond to a cyclic permutation with pinnacle set 𝑆0 is given
by
Õ Õ Ö𝑑
(−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖
[𝜏] 𝐵⊆𝐷 [𝜏] 𝑖=0
54
where the outer sum is over all possible cyclic orderings [𝜏] of the index set of 𝑆0. We now wish
to swap the summations so that the outer sum is over all 𝐵 ⊆ 𝐷 with |𝐵| ≤ 𝑑. We may restrict to
size at most 𝑑 since any larger 𝐵 will either consist of a combination of dales that cannot exist, or
will require all 𝑑 + 1 dales to be empty which is impossible because of the assumption that 𝑑 > 0.
In order to interchange the summations we must multiply the term corresponding to each 𝐵 by
the number of distinct permutations [𝜏] that could have generated it. This was counted in Lemma
4.3.5, and so we get the formula
𝑏−1
! 𝑑
!
Õ Ö Ö
(−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖
𝐵⊂𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0
for the number of placements.
Now we seek to turn the placements into permutations. Since all dales are guaranteed to be
non-empty, we have that every permutation corresponding to one of these placements will have
𝑑 + 1 non-pinnacle elements that are part of both a decreasing factor and an increasing factor. This
means that every such corresponding [𝜋] has been counted by 2𝑑+1 placements. Dividing by this,
and also pulling some common factors of two out from the second product, we have
𝑑 𝑏−1
! 𝑑 !
Ö Õ Ö Ö
𝑝 𝑆 (𝑛) = 2−𝑑−1 2𝑛𝑖 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖
𝑖=0 𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0
𝑏−1
! 𝑑
!
Õ Ö Ö
= 2𝑛−2𝑑−1 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖
𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0
where this is the formula we set out to prove.
In [DNKPT18], explicit formulas were given for 𝑝 𝑆 (𝑛) when |𝑆| ≤ 3. These expressions follow
easily from the previous reslt.
Corollary 4.3.7. We have the following values for 𝑝 𝑆 (𝑛).
(1) If 𝑆 = ∅ then
𝑝 𝑆 (𝑛) = 2𝑛−1 .
55
(2) If 𝑆 = {𝑙} where 3 ≤ 𝑙 ≤ 𝑛 then
𝑝 𝑆 (𝑛) = 2𝑛−2 (2𝑙−2 − 1).
(3) If 𝑆 = {𝑙, 𝑚} where 𝑙 ≥ 3, 𝑚 ≥ 5, and 𝑙 < 𝑚 ≤ 𝑛, then
𝑝 𝑆 (𝑛) = 2𝑛+𝑚−𝑙−5 (3𝑙−1 − 2𝑙 + 1) − 2𝑛−3 (2𝑙−2 − 1).
Proof. In each of the results we apply Theorem 4.3.6.
(1) When 𝑑 = 0, the first product in Theorem 4.3.6 is always empty and the second always
equals one. Therefore, everything reduces immediately to 𝑝 𝑆 (𝑛) = 2𝑛−1 , as desired.
(2) When 𝑑 = 1 we have 𝑛0 = 𝑙 − 1, 𝑛1 = 𝑛 − 𝑙. Therefore, we have the following possibilities
for 𝐵, and the corresponding terms in the summation
• 𝐵 = ∅ : 2𝑙−1
• 𝐵 = {1𝑙 } or {1𝑟 }: −1
which when substituted into the formula gives
𝑝 𝑆 (𝑛) = 2𝑛−3 (2𝑙−1 − 2) = 2𝑛−2 (2𝑙−2 − 1).
(3) When 𝑑 = 2 we have 𝑛0 = 𝑙 − 1, 𝑛1 = 𝑚 − 𝑙 − 1, and 𝑛2 = 𝑛 − 𝑚. Additionally, the first inner
product will always zero out if 2𝑟 , 2𝑙 are both in 𝐵. Therefore, we have the following possibilities
for 𝐵, and the corresponding terms in the summation:
• 𝐵 = ∅ : (2)3𝑙−1 2𝑚−𝑙−1
• 𝐵 = {1𝑙 } or {1𝑟 }: (−2)2𝑙−1 2𝑚−𝑙−1
• 𝐵 = {2𝑙 } or {2𝑟 }: (−1)2𝑙−1
• 𝐵 = {1𝑙 , 1𝑟 }: 2
• 𝐵 = {1𝑙 , 2𝑟 } or {1𝑟 , 2𝑟 } or {1𝑙 , 2𝑙 } or {1𝑟 , 2𝑙 }: 1.
56
When we substitute all these into the formula, we get
𝑝 𝑆 (𝑛) = 2𝑛−5 [(2)3𝑙−1 2𝑚−𝑙−1 − (4)2𝑙−1 2𝑚−𝑙−1 − (2)2𝑙−1 + (2)2𝑚−𝑙−1 + 4]
= 2𝑛−5 [(2)3𝑙−1 2𝑚−𝑙−1 − (4)2𝑙−1 2𝑚−𝑙−1 + (2)2𝑚−𝑙−1 ] − 2𝑛−5 [(2)2𝑙−1 − 4]
= 2𝑛+𝑚−𝑙−5 (3𝑙−1 − 2𝑙 + 1) − 2𝑛−3 (2𝑙−2 − 1)
as desired.
We can make Theorem 4.3.6 more efficient by summing over certain weak compositions rather
than subsets. A weak composition of 𝑛 ∈ N is a sequence 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼 𝑘 ] of nonnegative
integers called parts such that 𝑖 𝛼𝑖 = 𝑛. In this case we write 𝛼 |= 𝑛 or |𝛼| = 𝑛 where |𝛼| = 𝑖 𝛼𝑖 .
Í Í
To 𝐵 ⊆ 𝐷 we associate the composition 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] where 𝛼𝑖 is the number of dales
in 𝐵 of rank 𝑖. To illustrate, for the example in Figure 4.2 the corresponding composition is
𝛼 = [2, 1, 1, 2, 0, 1]. Note that all the necessary parameters for 𝐷 can be read off of 𝛼. In particular
𝑟 𝑗 = min{𝑖 | 𝛼1 + 𝛼2 + · · · + 𝛼𝑖 ≥ 𝑗 },
and
𝑏𝑖 = 𝛼𝑖 + 𝛼𝑖+1 + · · · + 𝛼𝑑 .
Note that
𝑏 = 𝑏 1 = |𝛼|.
Thus we will be able to sum over the following set
𝐶 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] | 𝛼𝑖 ∈ [0, 2] for all 𝑖 and |𝛼| ≤ 𝑑}.
We must find how many 𝐵 correspond to a given 𝛼. If 𝛼𝑖 = 0 then 𝐵 contains no dales of rank
𝑖. If 𝛼𝑖 = 2 then 𝐵 contains both dales of rank 𝑖. So the only choice comes if 𝛼𝑖 = 1 in which case
𝐵 could contain either 𝑖 𝑙 or 𝑖𝑟 . Letting
𝑜 = the number of 𝛼𝑖 = 1
we see that the number of 𝐵 represented by 𝛼 is 2𝑜 . Thus we have proved the following result.
57
Corollary 4.3.8. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have
𝑏−1
! 𝑑 !
Õ Ö Ö
𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 (−1) 𝑏 2𝑜 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 .
𝛼∈𝐶 (𝑑) 𝑖=0 𝑖=0
In order to compare this formula to the one in [DLHH+ 21], we need to introduce some notation.
The vale set of a permutation 𝜋 is
Val 𝜋 = {𝜋𝑖 | 𝜋𝑖−1 > 𝜋𝑖 < 𝜋𝑖+1 }.
Call a pair (𝑆, 𝑇) 𝑛-admissible if there is a permutation 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆 and Val 𝜋 = 𝑇.
Define
V𝑛 (𝑆) = {𝑇 | (𝑆, 𝑇) is 𝑛-admissible}.
Theorem 4.3.9 ([DLHH+ 21]). Given 𝑛 ∈ P and admissible 𝑆 with #𝑆 = 𝑑 we have
Ö 𝑁
𝑆𝑇 (𝑠)
Õ Ö
𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 𝑁 𝑆𝑇 (𝑡)
2
𝑇 ∈V𝑛 (𝑆) 𝑠∈𝑆 𝑡∈[𝑛]−(𝑆]𝑇)
where 𝑆𝑖 = {𝑠 ∈ 𝑆 | 𝑠 < 𝑖}, 𝑇𝑖 = {𝑡 ∈ 𝑇 | 𝑡 < 𝑖}, and 𝑁 𝑆𝑇 (𝑖) = #𝑇𝑖 − #𝑆𝑖 .
In order to estimate the number of terms in this sum, we need a formula for #V𝑛 (𝑆). Let
𝐾 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] |= 𝑑 | 𝛼1 + 𝛼2 + · · · + 𝛼 𝑘 ≥ 𝑘 for all 𝑘 ∈ [𝑑]}.
Theorem 4.3.10 ([DLHH+ 21]). Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have
Õ 𝑛 − 1 Ö 𝑑
0 𝑛𝑖−1
#V𝑛 (𝑆) = .
𝛼1 𝛼𝑖
𝛼∈𝐾 (𝑑) 𝑖=2
We can now compare the number of terms in the sums of Corollary 4.3.8 and Theorem 4.3.9.
In the former we have 𝑐(𝑑) := #𝐶 (𝑑) ≤ 3𝑑 terms, where the inequality comes from the fact that
every 𝛼𝑖 ∈ {0, 1, 2}. In the latter, we have 𝑣 𝑛 (𝑆) := #V𝑛 (𝑆) terms which depends on 𝑛 and 𝑆, and
not just 𝑑 as seen in Theorem 4.3.10. If 𝑛1 ≤ 4 and 𝑛𝑖 ≤ 3 for 𝑖 ≥ 2 then each of the binomial
coefficients in the sum is a most 3 and so 𝑣 𝑛 (𝑆) could be significantly smaller than 𝑐(𝑑). But if
58
𝑆 DLHHIN DLMSSS
{3, 5, 7, 9, 11, 13, 15, 17, 19, 21} 9.2 × 10−5 0.72
{3, 6, 9, 12, 15, 18, 21, 24, 27, 30} 0.11 0.73
{3, 7, 11, 15, 19, 23, 27, 31, 35, 39} 9.5 0.73
{3, 8, 13, 18, 23, 28, 33, 38, 43, 48} 210 0.78
Table 4.1: Run times in seconds compared when most 𝑛𝑖 are equal
even one of the 𝑛𝑖 is large, then the inequality will be reversed. For example, suppose 𝑛1 ≥ 2𝑑 + 1
and take 𝛼 = [𝑑, 0, 0, . . . , 0] ∈ 𝐾 (𝑑). Then, by Stirling’s approximation,
2𝑑 4𝑑
𝑣 𝑛 (𝑆) ≥ ∼√
𝑑 𝜋𝑑
which will eventually be greater than 3𝑑 . So, for fixed 𝑑, there are only finitely many 𝑛 such
that 𝑣 𝑛 (𝑆) ≤ 𝑐(𝑑). Thus, in most cases, Corollary 4.3.8 will be more efficient. We should
mention that Diaz-Lopez, Insko, and Nilsen [DLIN21] have come up with a refinement of the ideas
in [DLHH+ 21] which permits the product of binomial coefficients in Theorem 4.3.10 to be replaced
by 2𝑑 .
The observations of the previous paragraph are borne out by actual computer computations. In
Tables 4.1 and 4.2 we show the results of computing 𝑝 𝑆 (1000) for various sets 𝑆 (first column) with
constant 𝑑 by the algorithm in [DLHH+ 21] (second column) and our algorithm (third column).
The run times are in seconds and are the average over 10 trials for each set using a 15-inch 2017
MacBook Pro with a 3.1 GHz Quad-Core Intel Core i7 processor. In Table 4.1 the 𝑛𝑖 for 0 < 𝑖 < 𝑑
are constant in each set, but allowed to increase as one goes down the table. As expected, the
algorithm using vales starts out orders of magnitude faster than the one using dales but quickly
becomes orders of magnitude slower, with the latter’s times being virtually constant. Similar
behaviour is shown in the two parts of Table 4.2 which keep all of the 𝑛𝑖 for 0 ≤ 𝑖 < 𝑑 constant
except for one which is allowed to grow. Note the difference in growth rate of the vale algorithm
between increasing 𝑛4 (upper chart) and 𝑛0 (lower chart).
59
Increase 𝑛4 with other 𝑛𝑖 constant
𝑆 DLHHIN DLMSSS
{3, 5, 7, 9, 11} 2.9 × 10−5 0.0014
{3, 5, 7, 9, 21} 7.1 × 10−5 0.0014
{3, 5, 7, 9, 31} 0.00012 0.0015
{3, 5, 7, 9, 41} 0.00017 0.0015
Increase 𝑛0 with other 𝑛𝑖 constant
𝑆 DLHHIN DLMSSS
{3, 5, 7, 9, 11} 2.9 × 10−5 0.0014
{13, 15, 17, 19, 21} 0.012 0.0015
{23, 25, 27, 29, 31} 0.26 0.0015
{33, 35, 37, 39, 41} 1.8 0.0015
Table 4.2: Run times in seconds compared when most 𝑛𝑖 are constant
Another advantage to this approach is that it can be modified to count #O (𝑆), the number of
admissible orderings of an admissible pinnacle set 𝑆. First, if we fix 𝑛 > 0 we have that Lemma 4.3.2
will again allow us to reduce to the case of cyclic orderings of the pinnacle set 𝑆0 for permutations
in 𝔖𝑛+1 . We now prove the following intermediate result.
Lemma 4.3.11. Consider a cyclic ordering [𝜏] with dale set 𝐷 [𝜏] and corresponding 𝑟 𝑗 . The
ordering is admissible if and only if
𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1
for all 𝑗 ∈ [𝑑 + 1].
Proof. Note that, by definition of the 𝑛𝑖 and 𝑟𝑖 , the right hand side of the inequality is simply the
number of non-pinnacles small enough to be placed in any of the dales having rank at least 𝑟 𝑗 . So if
for any 𝑗 we have 𝑗 > 𝑛0 + 𝑛1 + · · · 𝑛𝑟 𝑗 −1 , then there will be at least 𝑗 + 1 dales having rank at most
𝑟 𝑗 . This means there would not be enough small non-pinnacle elements to fill them all. Therefore,
60
any such ordering is not admissible. On the other hand, if we have that 𝑗 ≤ 𝑛0 + 𝑛1 + · · · 𝑛𝑟 𝑗 −1
for all 𝑗, then we may always fill all the dales by placing the smallest non-pinnacle in the lowest
ranked dale, and proceeding upwards. The inequalities guarantee that we will always have enough
non-pinnacles to do this at every step, and so we are done.
Since the problem is trivial if 𝑑 = 0, so we may also assume that 𝑑 > 0. We also define for the
master dale rank set 𝐷
𝐷 0 = 𝐷 − {1𝑙 , 1𝑟 }
and for any subset 𝐵
1 if 𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1 for all 𝑗 ∈ [𝑏],
𝛿𝐵 =
0 otherwise.
With this notation, we can count admissible orderings.
Theorem 4.3.12. If 𝑑 ∈ P and 𝑆 is admissible then
Õ 𝑑−2
Ö
#O (𝑆) = 𝛿 𝐵∪{1 ,1𝑟 } (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) .
𝑙
𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0
Proof. We first wish to sum over all possible orderings, partitioned by their dales. Since every dale
set for 𝑑 > 0 is guaranteed to have 𝑑 + 1 elements and contain {1𝑙 , 1𝑟 }, we may index the dales by
taking 𝐵 ⊆ 𝐷 0 where |𝐵| = 𝑑 − 1. We then consider the following summation
Õ 𝑑−2
Ö
(𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) .
𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0
Clearly this sums over every possible dale set once, and the expression inside comes from Lemma
4.3.5, which counts the number of cyclic orderings of 𝑆0 which have dales containing those in 𝐵.
However, due to the restrictions placed on the size of 𝐵 and the comments above, this expression
will count those cyclic orderings of 𝑆0 which have dales equal to 𝐵 ∪ {1𝑙 , 1𝑟 } instead of just a
subset. Therefore, no ordering can be counted twice by two different 𝐵’s and so every ordering is
accounted for exactly once in the above summation, making the total 𝑑!.
61
Finally, using Lemma 4.3.11, we may exclude from this sum precisely those orderings which
are not admissible by writing it as
Õ 𝑑−2
Ö
𝛿 𝐵∪{1 ,1𝑟 } (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) .
𝑙
𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0
This completes the proof.
We may also rewrite our result in terms of compositions for a faster summation. Lemma 4.3.11
still holds as the 𝑟 𝑗 are the same whether or not the dales sets are represented as compositions, but
now we will need make some new definitions. Let
𝐶 0 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ] |= [𝑑 − 1] | 𝛼𝑖 ∈ [0, 2] for all 𝑖},
and if 𝛼 |= 𝑏
1 if 𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1 for all 𝑗 ∈ [𝑏],
𝛿𝛼 =
0 otherwise.
Also, if 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ] then define
2 ⊕ 𝛼 = [2, 𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ].
The following result follows from Theorem 4.3.12 in much the same way that Corollary 4.3.8
followed from Theorem 4.3.6. So the proof is omitted.
Corollary 4.3.13. If 𝑑 ∈ P and 𝑆 is admissible then
Õ 𝑑−2
Ö
#O (𝑆) = 𝛿2⊕𝛼 2𝑜 (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ).
𝛼∈𝐶 0 (𝑑) 𝑖=0
4.4 Open problems and concluding remarks
Others have also been working on finding a fast formula for computing the number of permu-
tations with a given pinnacle set. Recently, Falque, Novelli, and Thibon [FNT21] have constructed
an efficient recursion to compute 𝑝 𝑆 (𝑛). This formula is a low degree polynomial in both 𝑚, the
62
maximum of the pinnacle set 𝑆 and 𝑑, the cardinality of the set, and has complexity O (𝑚𝑑 2 ). While
the result was originally stated in terms of 𝑛 instead of 𝑚, we can simplify in the following way. A
permutation in 𝔖𝑚 with pinnacle set 𝑆 can be extended to a permutation in 𝔖𝑚+1 by placing 𝑚 + 1
at either the far left or the far right of the permutation. Any other way of inserting 𝑚 + 1 into the
permutation would make 𝑚 + 1 a pinnacle. Recursively applying this procedure to 𝜋 ∈ 𝔖𝑚 with
the elements {𝑚 + 1, 𝑚 + 2, . . . , 𝑛} will extend it to a permutation 𝜋0 ∈ 𝔖𝑛 . Since there were two
possible positions to place each of the elements {𝑚 + 1, 𝑚 + 2, . . . , 𝑛}, we have 𝑝 𝑛 (𝑆) = 2𝑛−𝑚 𝑝 𝑚 (𝑆)
and thus the result can be stated in terms of 𝑚. In addition, they provide a conjectured formula for
the weighted sum introduced in [DNKPT18]:
Õ
𝑞 𝑆 (𝑛) := 2 |𝐼 | 𝑝 𝐼 (𝑛). (4.3)
𝐼 ⊂𝑆
Following this work, Fang [Fan21] provided another recurrence to compute 𝑝 𝑆 (𝑛) with com-
plexity O (𝑑 4 + 𝑑 log 𝑛). He also proved an expression for eq. (4.3) which is simpler than the earlier
conjecture and which is very combinatorial in nature. Quinn Minnich has recently found a simpler
proof of this result.
63
CHAPTER 5
BACKGROUND ON BACKBONE EXTRACTION
Bipartite or two-mode networks are composed of two types of nodes, which we call agents and
artifacts, and edges between nodes of one type and nodes of the other type. They can be used to
represent a wide range of phenomena and therefore are studied in a diverse range of disciplines.
For example, natural selection unfolds as species (the agents) compete over sites (the artifacts),
commerce is possible as traders exchange resources, scientific advances are reported as scholars
write papers, and laws are adopted as legislators sponsor bills. Although bipartite networks
are useful in their own right, they can also be useful for inferring unipartite (i.e., one-mode)
networks that would otherwise be difficult or impossible to measure directly. A bipartite projection
transforms a bipartite network into a unipartite co-occurrence network in which agents are connected
to the extent that they share artifacts. For example, competitive interaction networks can be
inferred from species’ co-occurrence in sites [Dia75], trade networks can be inferred from firm
co-location [TCW02] or product co-exchange [SDCGS15], scholarly collaboration networks can
be inferred from paper co-authorship [New01], and political alliance networks can be inferred from
bill co-sponsorship [Nea20]. Throughout this thesis, we use these applications to offer concrete
examples, however the models we discuss are perfectly general and can be applied to derive
unipartite backbones in a range of contexts [AABB11, Tol21, ZH05]. Indeed, in principle any
unipartite network can be represented as the projection of some bipartite network [VFO20, GL04,
NP03] .
Despite their promise, bipartite projections (i.e., co-occurrence networks) are challenging to
analyse because they are typically dense and weighted, and because the edge weights do not
necessarily capture the strength of the relationship between nodes [Nea14]. In particular, when
transforming a bipartite graph into a unipartite graph via projection, information about the artifacts
responsible for edges between vertices is lost [LMDV08], specifically, one no longer knows which
artifact(s) gave rise to a given edge and therefore no longer knows whether the artifact(s) are large
64
or small (i.e. the column sums of the bipartite matrix). This is important because co-participation
in small artifacts provides more information about the relationship between two vertices than co-
participation in large artifacts [Nea14]. For example, observing two people attending the same small
party provides more information about a potential social relationship between them than observing
these individuals attending the same large gathering. Similarly, observing two legislators co-
sponsoring the same unpopular bill (i.e. one that is co-sponsored by no one else) provides more
information about a potential political relationship between them than observing these legislators
co-sponsoring the same popular bill (i.e. one that is co-sponsored by many others also).
Bipartite projection also involves the loss of information about the individual vertices, one no
longer knows how many artifacts a given vertex participated in (i.e. the row sums of the bipartite
matrix). This information is important to consider because the scale of each edge weight in
a bipartite projection is driven by the number of artifacts participated in by the two vertices it
connects [Nea14]. For example, on average the number of events co-attended by two people who
each attend many events will be larger (on average) than the number of events co-attended by two
people who each attend few events. Similarly, on average the number of bills co-sponsored by
two legislators who each sponsor many bills will be larger (on average) than the number of bills
co-sponsored by two legislators who each sponsor few bills. Therefore, what counts as a ‘large’
or ‘small’ number of co-attendances or co-sponsorships depends in part on the total number of
attendances or sponsorships of both members of a dyad. As we will see, the backbone extraction
methods we consider cope with these challenges by controlling for the row and column sums of the
bipartite matrix associated with the bipartite graph in question.
As a result of these challenges, it is often useful to analyze the backbone of a bipartite projection,
which is an unweighted and typically sparser network that retains only the most ‘important’ edges.
Although well-known methods exist for extracting the backbone of weighted networks that are not
bipartite projections [SBV09, Dia16], methods designed specifically for bipartite projections have
recently been developed [Nea14, ZK11, SSDC+ 17, TML+ 11].
To begin, we’ll define notation and language for discussing bipartite projections and backbones.
65
Throughout this chapter, we use the ecological case of Darwin’s Finches to provide a concrete
example [San00, Got00]. On his voyage to the Galapagos Islands on the H.M.S. Beagle, Darwin
observed that only some species of finches lived on each island. These patterns can be represented
as a bipartite network in which finch species (the agent nodes) are connected to the islands (the
artifact nodes) where they are found [NN20]. A bipartite network can be represented as a binary
matrix in which the agents are arrayed as rows, and the artifacts are arrayed as columns. We use
B to denote a bipartite network’s representation as a matrix, where 𝐵𝑖𝑘 = 1 if agent 𝑖 is connected
to artifact 𝑘, and otherwise is 0. The sequence of row sums and the sequence of column sums of
B are called the agent and artifact degrees sequences, respectively. These sequences are among
the bipartite network’s most significant features and are known to have implications for bipartite
projections and backbones [VFO20, DNS21, NDS21a]. In the ecological case, the agent degree
sequence captures the number of islands where each species is found, while the artifact degree
sequence captures the number of species found on each island.
The projection of a bipartite network is a weighted unipartite co-occurrence network in which
a pair of agents is connected by an edge with a weight equal to their number of shared artifacts. For
example, the bipartite projection of Darwin’s species location network is a species co-occurrence
network in which a pair of species is connected by an edge with a weight equal to the number of
islands where they are both found. We use P to denote the matrix representation of a bipartite
projection, which is computed as BB𝑇 , where B𝑇 indicates the transpose of B. In a projection P,
𝑃𝑖 𝑗 indicates the number of times both 𝑖 and 𝑗 were connected to the same artifact 𝑘 in B. The
diagonal entries of P, 𝑃𝑖𝑖 , are equal to the agent degrees. Typically the backbone of P will discard
these diagonal entries, though their values are used in deciding which other edges are deemed
important.
As the reader may have inferred, bipartite networks and their weighted projections are equivalent
to bipartite and weighted graphs. This equivalence helps in the visualization and analysis techniques
in the network sciences. A graph 𝐺 is a set of objects called vertices, together with a set of 2-
element subsets of the vertices which are called edges. An edge between vertices 𝑖 and 𝑗 can be
66
Figure 5.1: Bipartite and bipartite projection networks
denoted as 𝑒 = 𝑖 𝑗. If there exists an edge 𝑒 = 𝑖 𝑗 between vertices 𝑖 and 𝑗, we say that 𝑖 and 𝑗 are
adjacent. We call a graph weighted if each edge has an associated numeric value, and unweighted
otherwise. The weight of edge 𝑒 = 𝑖 𝑗 is denoted 𝑤(𝑖 𝑗); in unweighted graphs, we set 𝑤(𝑖 𝑗) = 1 for
all present edges. The degree of vertex 𝑖 is the number of edges of the form 𝑖 𝑗 for some 𝑗. Graphs
are often discussed by viewing their adjacency matrices G, where 𝐺 𝑖 𝑗 = 𝑤(𝑖 𝑗). As mentioned
above, the matrix representation of a bipartite network B is the graph’s bipartite adjacency matrix,
while the matrix P is the adjacency matrix of the weighted graph. See fig. 5.1 for an example of
this connection.
The backbone of a bipartite projection is a binary representation of P that contains only the most
‘important’ or ‘significant’ edges. For example, the backbone of a species co-occurrence network
connects pairs of species if they are found on a significant number of the same islands, which
might be interpreted as evidence that the two species do not compete for resources and perhaps are
67
symbiotic. We use P0 to denote the matrix representation of the backbone of P. Because multiple
methods exist for deciding when an edge is significant and thus should occur in the backbone, we
0
use P M to denote a backbone extracted using method 𝑀.
Backbone extraction methods that were originally developed for non-projection weighted net-
works are often also applied to weighted bipartite projections. One simple method preserves an
edge in the backbone if its weight in the projection exceeds some universal threshold 𝑇. However,
when 𝑇 = 0 is chosen (which is common), since each artifact of degree 𝑑 induces 𝑑 (𝑑 − 1)/2 edges
in the backbone, this leads to a very dense backbone with a high clustering coefficient [LMDV08].
Here, density refers to the number of edges present in the network divided by the maximum possible
number of edges. A network clustering coefficient measures how many ‘triangles’, three pairwise
adjacent vertices, are present in the network compared to all triples. Backbones with high density
and clustering coefficient may not elucidate any interesting information regarding the network.
Using 𝑇 > 0 can yield a sparser and less clustered backbone [DT05, Fon20, BR11], but the choice
of a particular threshold value is arbitrary, and applying the same threshold to all edges yields
backbones that overlook agents with low degree in the projection [SBV09]. More sophisticated
methods, including the disparity filter [SBV09] and likelihood filter [Dia16], aim to overcome these
limitations of the universal threshold method by using a different threshold for each edge based on
a null model. However, all methods that can be applied to non-projection weighted networks have
the same shortcoming when applied to weighted bipartite projections: they ignore information
about the artifacts [Nea14]. In the ecological case, the universal threshold, disparity filter, and
likelihood filter methods all decide whether two species should be connected in the backbone only
by examining how many islands they are both found on, but do not consider the characteristics of
those islands, including how many other species are found there, or even how many islands there are.
Therefore, although these methods are promising for extracting the backbone from non-projection
weighted networks, different methods are required for extracting the backbone from a bipartite
projection.
68
CHAPTER 6
BACKBONE MODELS AND THEIR PROBABILITY MASS FUNCTIONS
This chapter contains material from Neal, Domagalski, and Sagan [NDS21b]. All results in this
chapter are from this manuscript unless otherwise noted.
6.1 Bipartite ensemble backbone models
Bipartite ensemble backbone models decide whether an edge’s observed weight 𝑃𝑖 𝑗 is signif-
icantly large, and thus whether a corresponding edge should be included in the backbone, in the
following way. Let B be the set of all bipartite networks B∗ having the same number of agents
and artifacts as B. In the ecological case, B∗ might be viewed as representing a possible world
containing the same species and islands, but in which locations of species on islands is different,
and likewise B is the set of all such possible worlds. We will create our ensembles by taking a
subset B M of B subject to certain constraints 𝑀 and imposing a probability distribution on it. In all
our models except the SDSM, we impose the uniform probability distribution on B M , that is, each
element of the ensemble is equally likely. We will then extract the backbone from the projection of
B by using the distribution of edge weights arising from projections of members of the ensemble
under consideration.
We use 𝑃𝑖∗𝑗 to denote a random variable equal to (B∗ B∗𝑇 )𝑖 𝑗 for B∗ ∈ B M . That is, 𝑃𝑖∗𝑗 is the
number of artifacts shared by 𝑖 and 𝑗 in a bipartite network randomly drawn from B M . In the
ecological case, 𝑃𝑖∗𝑗 represents the number of islands that are home to both species 𝑖 and 𝑗 in a
possible world, while the distribution of 𝑃𝑖∗𝑗 is the distribution of the number of islands shared by
species 𝑖 and 𝑗 in all possible worlds.
Decisions about which edges should appear in a backbone extracted at the two-tailed statistical
69
significance level 𝛼 are made by comparing 𝑃𝑖 𝑗 to 𝑃𝑖∗𝑗
1 if Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) < 𝛼2 ,
𝑃𝑖0 𝑗
=
0 otherwise.
This test preserves an edge in the backbone if its weight in the observed projection is uncommonly
large compared to its weight in projections of members of the ensemble. A two-tailed significance
test is used because, in principle, an edge’s weight in the observed projection could be uncommonly
larger or uncommonly smaller than its weight in projections of members of the ensemble. One can
use the same principles to obtain a signed backbone by comparing 𝑃𝑖 𝑗 to 𝑃𝑖∗𝑗 with
1 if Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) < 𝛼2 ,
𝑃𝑖0 𝑗 = −1
if Pr(𝑃𝑖∗𝑗 ≤ 𝑃𝑖 𝑗 ) < 𝛼2 ,
0 otherwise.
In the ecological case, two species are connected in the backbone if their number of shared
islands in the observed world is uncommonly large compared to their number of shared islands in
all possible worlds.
There are many ways that B can be constrained [SUG18], with each set of constraints describing
a different ensemble B M and different ensemble backbone model; however, in this work we focus on
five possibilities. We describe each of these models and their meaning in the context of Darwin’s
species and islands, and derive their probability mass functions for the respective edge weight
distributions. These probability mass functions of 𝑃𝑖∗𝑗 are used by ensemble backbone models to
evaluate the statistical significance of the weight of edge 𝑃𝑖 𝑗 in a bipartite projection. We use the
following notation:
• Let B be an 𝑚 × 𝑛 bipartite matrix, with a vector of row sums 𝑅 = (𝑟 1 , . . . , 𝑟 𝑚 ), a vector of
column sums 𝐶 = (𝑐 1 , . . . , 𝑐 𝑛 ), and 𝑓 cells containing a 1. So
Õ 𝑚 Õ𝑛
𝑓 = 𝑟𝑖 = 𝑐𝑗.
𝑖=1 𝑗=1
70
• Let B M be the ensemble of all 𝑚 × 𝑛 matrices B∗ = (𝐵𝑖∗𝑗 ) that obey the constraints of the
respective model. In all models, the probability distribution on B M is uniform except in the
stochastic case.
• Let 𝑃𝑖∗𝑗 be a random variable equal to (B∗ B∗𝑇 )𝑖 𝑗 for all B∗ ∈ B M . Note that we have
𝑃𝑖∗𝑗 = 𝐵𝑖1
∗ 𝐵∗ + 𝐵∗ 𝐵∗ + · · · + 𝐵∗ 𝐵∗ .
𝑗1 𝑖2 𝑗2 𝑖𝑛 𝑗𝑛 (6.1)
6.2 Fixed degree sequence model (FDSM)
In the fixed degree sequence model (FDSM) B∗ ∈ B FDSM are constrained to have the same agent
and artifact degree sequences as B. Adopting the FDSM implies, for example, that in all possible
worlds a given species is found on exactly the same number of islands, and a given island is home to
exactly the same number of species. The distribution of 𝑃𝑖∗𝑗 arising from B FDSM is unknown, but
can be approximated by uniformly sampling B∗ from B FDSM , constructing P∗ , and saving the values
𝑃𝑖∗𝑗 . In the studies below, we use 1000 samples of B∗ generated using the ‘curveball’ algorithm,
which is among the fastest methods to sample B FDSM uniformly at random [SNB+ 14, Car15].
The FDSM has been used to extract the backbone of bipartite projections of, for example, movies
co-liked by viewers [ZK11] and conference panel co-participation by scholars [SR12, DL16]. In
this paper, we use the FDSM as the reference model to which other ensemble models are compared
because it fully controls for both degree sequences.
The primary limitation of the FDSM is its computational cost. First, constructing each P∗
requires matrix multiplication, which must be performed repeatedly and has complexity O (𝑛2.37 )
for two 𝑛×𝑛 matrices using the fast Coppersmith-Winograd algorithm [CW90]. Second, computing
Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) with sufficient precision to achieve a two-tailed familywise error rate of 𝛼 requires
2 −.5𝑚
at least .5𝑚𝛼/2 + 1 samples, where 𝑚 is the number of rows (i.e., agents) in B and P. Thus, for
example, extracting the backbone of a bipartite projection with 1000 agents at a family-wise error
rate of 0.05 would require performing at least 20 million matrix multiplications. Therefore, the
tightly-constrained FDSM is frequently impractical for backbone extraction. However, models that
rely on ensembles with more relaxed constraints offer computationally-feasible alternatives.
71
6.3 Fixed fill model (FFM)
In the highly relaxed fixed fill model (FFM), B∗ ∈ B FFM are simply constrained to contain
the same number of 1s as B. Adopting the FFM implies, for example, that in all possible worlds
only the total number of species-habitat pairs is fixed, but any given species may be found on a
different number of islands and any given island may be home to a different number of species.
The distribution of 𝑃𝑖∗𝑗 arising from 𝐵FFM has not been described before. We derive it and call it a
Jacobi distribution because it is related to Jacobi polynomials.
Let the fixed fill model constrain all B∗ ∈ B FFM to contain the same number of 1s (i.e. fill) as
B.
Theorem 6.3.1. Under the fixed fill model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 satisfies
(𝑚 − 2)𝑛
Õ
𝑛 𝑛−𝑘
2 𝑛−𝑘−𝑟
𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟
Pr(𝑃𝑖∗𝑗 = 𝑘) = . (6.2)
𝑚𝑛
𝑓
Proof. For the denominator we need to compute the cardinality #B FFM . If B∗ ∈ B FFM then B∗
has 𝑚𝑛 entries of which 𝑓 must be chosen to be ones. So
#B FFM =
𝑚𝑛
.
𝑓
For the numerator, suppose 𝑃𝑖∗𝑗 = 𝑘. We see from equation (6.1) that there are exactly 𝑘
columns 𝑐 where 𝐵𝑖𝑐 ∗ = 𝐵∗ = 1. There are 𝑛 ways to choose these columns. Now define the
𝑗𝑐 𝑘
following parameters:
𝑝 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 1 and 𝐵∗ = 0,
𝑗𝑐
𝑞 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 0 and 𝐵∗ = 1,
𝑗𝑐
𝑟 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 0 and 𝐵∗ = 0.
𝑗𝑐
The number of ways to pick the columns counted by these parameters from the 𝑛 − 𝑘 columns
which do not contains ones in both rows is the trinomial coefficients 𝑝,𝑞,𝑟 𝑛−𝑘
. Now we have used
72
2𝑘 + 𝑝 + 𝑞 ones in rows 𝑖 and 𝑗. So there are 𝑓 − 2𝑘 − 𝑝 − 𝑞 left to distribute to the remaining 𝑚 − 2
rows. And these rows have (𝑚 − 2)𝑛 entries. So the number of possibilities for these remaining
(𝑚−2)𝑛
ones is 𝑓 −2𝑘−𝑝−𝑞 . Thus the total number of choices from this and the previous paragraph is
(𝑚 − 2)𝑛 (𝑚 − 2)𝑛
𝑛 Õ 𝑛−𝑘 𝑛 Õ 𝑛−𝑘 𝑛−𝑘 −𝑟
=
𝑘 𝑝, 𝑞, 𝑟 𝑓 − 2𝑘 − 𝑝 − 𝑞 𝑘 𝑟 𝑝 𝑓 −𝑛−𝑘 +𝑟
𝑝+𝑞+𝑟=𝑛−𝑘 𝑝+𝑞+𝑟=𝑛−𝑘
(𝑚 − 2)𝑛 Õ 𝑛 − 𝑘 − 𝑟
Õ
𝑛 𝑛−𝑘
=
𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟 𝑝 𝑝
(𝑚 − 2)𝑛
Õ
𝑛 𝑛−𝑘
= 2 𝑛−𝑘−𝑟
𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟
as desired.
For even modestly large B, computing equation (6.2) involves values larger than can be handled
by some programs. In practice, we use logs to make these computations practical.
We now show that the sum in the numerator of this probability is related to the famous Jacobi
orthogonal polynomials. This sum is a terminating hypergeometric series. Given a real number 𝑎
and a nonnegative integer 𝑟 the corresponding Pochhammer symbol or rising factorial is
(𝑎)𝑟 = 𝑎(𝑎 + 1)(𝑎 + 2) · · · (𝑎 + 𝑟 − 1).
Note that if 𝑎 is an integer with −𝑟 < 𝑎 ≤ 0 then (𝑎)𝑟 = 0 because the product contains 0 as a factor.
Given real numbers 𝑎 1 , 𝑎 2 , . . . , 𝑎 𝑝 and 𝑏 1 , 𝑏 2 , . . . , 𝑏 𝑞 as well as a variable 𝑧, the corresponding
hypergeometric series is
𝑎1 𝑎2 . . . 𝑎 𝑝 Õ (𝑎 ) (𝑎 ) · · · (𝑎 ) 𝑧𝑟
1𝑟 2𝑟
𝑝 𝑟
; 𝑧 =
𝑝 𝐹𝑞 .
𝑏 1 𝑏 2 . . . 𝑏 𝑞 𝑟 ≥0 (𝑏 1 )𝑟 (𝑏 2 )𝑟 · · · (𝑏 𝑞 )𝑟 𝑟!
Note that if any of the 𝑎𝑖 are negative integers then, because of the remark above, this series will
terminate and become a polynomial in 𝑧.
73
To convert a binomial coefficient into Pochhammer symbols, we write
(𝑛)(𝑛 − 1) · · · (𝑛 − 𝑟 + 1)
𝑛
=
𝑟 𝑟!
(−1) 𝑟 (−𝑛)(−𝑛 + 1) · · · (−𝑛 + 𝑟 − 1)
=
(1)𝑟
(−1) 𝑟 (−𝑛)𝑟
= .
(1)𝑟
The following identity will also be useful
(𝑎)𝑏+𝑟 = (𝑎)(𝑎 + 1) · · · (𝑎 + 𝑏 − 1) × (𝑎 + 𝑏)(𝑎 + 𝑏 + 1) · · · (𝑎 + 𝑏 + 𝑟 − 1)
= (𝑎)𝑏 (𝑎 + 𝑏)𝑟 .
We now return to the sum in the numerator of equation (6.2). We will ignore the factor of
2𝑛−𝑘 since it is constant with respect to the sum and so can be pulled outside. For simplicity of
calculation we will also use the substitutions
𝑠 = (𝑚 − 2)𝑛, 𝑡 = 𝑓 − 𝑛 − 𝑘.
Thus we have
(𝑚 − 2)𝑛
Õ
Õ 𝑛−𝑘 𝑛−𝑘 𝑠
2−𝑟 = (1/2) 𝑟
𝑟
𝑟 𝑓 −𝑛−𝑘 +𝑟 𝑟
𝑟 𝑡 + 𝑟
Õ (−1) 𝑟 (𝑘 − 𝑛)𝑟 (−1) 𝑡+𝑟 (−𝑠)𝑡+𝑟
= · (1/2) 𝑟
𝑟
(1)𝑟 (1) 𝑡+𝑟
Õ (𝑘 − 𝑛)𝑟 (−𝑠)𝑡 (−𝑠 + 𝑡)𝑟 (1/2) 𝑟
= (−1) 𝑡
𝑟
(1)𝑡 (𝑡 + 1)𝑟 (1)𝑟
(−1) 𝑡 (−𝑠)𝑡 Õ (𝑘 − 𝑛)𝑟 (−𝑠 + 𝑡)𝑟 (1/2) 𝑟
=
(1)𝑡 𝑟
(𝑡 + 1)𝑟 𝑟!
𝑠 𝑘 − 𝑛 − 𝑠 + 𝑡 1
= ;
𝑡 2 1
𝐹
𝑡+1 2
We are indebted to Marko Petkovšek [personal communication] for pointing out that this 2 𝐹1
is, up to a factor, a specialization of a Jacobi polynomial. Given a nonnegative integer ℓ and real
74
numbers 𝛼, 𝛽 the associated Jacobi polynomial is
−ℓ ℓ + 𝛼 + 𝛽 + 1 1 − 𝑧
(𝛼,𝛽) 𝛼 + ℓ
(𝑧) = ;
𝑃ℓ 2 1
𝐹
2
ℓ
𝛼+1
To make these 2 𝐹1 polynomials agree we can let ℓ = 𝑛 − 𝑘, 𝛼 = 𝑡 = 𝑓 − 𝑛 − 𝑘,
𝛽 = −𝑠 + 𝑡 − (ℓ + 𝛼 + 1) = 𝑘 − (𝑚 − 1)𝑛 − 1
and 𝑧 = 0. With these substitutions we get
(𝑚 − 2)𝑛
(𝑚 − 2)𝑛
Õ
−𝑟 𝑛−𝑘 𝑓 −𝑛−𝑘 ( 𝑓 −𝑛−𝑘, 𝑘−(𝑚−1)𝑛−1)
2 = 𝑃𝑛−𝑘 (0).
𝑟
𝑟 𝑓 − 𝑛 − 𝑘 + 𝑟 𝑓 − 2𝑘
𝑛−𝑘
6.4 Fixed row model (FRM)
In the more constrained fixed row model (FRM), B∗ ∈ B FRM are constrained to have the same
agent degree sequence as B, but have unconstrained artifact degree sequences. Adopting the FRM
for backbone extraction implies, for example, that in all possible worlds a given species is found
on the same number of islands, but a given island may be home to a different number of species.
The distribution of 𝑃𝑖∗𝑗 arising from B FRM is hypergeometric [TML+ 11, Nea13]. The FRM has
been used to extract the backbone of bipartite projections of, for example, movies co-starring
actors [TML+ 11], papers co-written by authors [TML+ 11], parties co-attended by women [Nea13],
majority opinions joined by Supreme Court justices [Nea13], and microRNAs co-associated with
diseases [CXW+ 18].
Let the fixed row model constrain all B∗ ∈ B FRM to have the same row sums as B.
Theorem 6.4.1. Under the fixed row model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is hypergeometric and
satisfies
𝑟𝑗 𝑛 − 𝑟𝑗
𝑘 𝑟𝑖 − 𝑘
Pr(𝑃𝑖∗𝑗 = 𝑘) = .
𝑛
𝑟𝑖
75
Proof. The total number of ways to pick 𝑟𝑖 of the 𝑛 columns for ones in the 𝑖th row and 𝑟 𝑗 of the 𝑛
columns for ones in the 𝑗th row is
𝑛 𝑛 𝑛 𝑛!
= . (6.3)
𝑟𝑖 𝑟 𝑗 𝑟𝑖 𝑟 𝑗 !(𝑛 − 𝑟 𝑗 )!
So that will go in the denominator of the desired probability.
For the numerator we follow the same line of reasoning as in the previous proof, where the
parameters therein can be expressed as
𝑝 = 𝑟𝑖 − 𝑘,
𝑞 = 𝑟 𝑗 − 𝑘,
𝑟 = 𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘.
So we have a total of
𝑛 𝑛−𝑘 𝑛!
= (6.4)
𝑘 𝑝, 𝑞, 𝑟 𝑘!(𝑟𝑖 − 𝑘)!(𝑟 𝑗 − 𝑘)!(𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘)!
choices.
Dividing equation (6.4) by (6.3) and cancelling 𝑛! gives
𝑟𝑗! (𝑛 − 𝑟 𝑗 )!
𝑟𝑗 𝑛 − 𝑟𝑗
·
∗ 𝑘!(𝑟 𝑗 − 𝑘)! (𝑟𝑖 − 𝑘)!(𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘)! 𝑘 𝑟𝑖 − 𝑘
Pr(𝑃𝑖 𝑗 = 𝑘) = = .
𝑛 𝑛
𝑟𝑖 𝑟𝑖
as desired.
6.5 Fixed column model (FCM)
In the closely related fixed column model (FCM), B∗ ∈ B FCM are constrained to have the same
artifact degree sequence as B, but have unconstrained agent degree sequences. Adopting the FCM
for backbone extraction implies, for example, that in all possible worlds a given species may be
found on a different number of islands, but a given island is home to the same number of species.
76
The distribution of 𝑃𝑖∗𝑗 arising from B FCM has not been described before, but we derive it here to
show it is Poisson-binomial.
Let the fixed column model constrain all B∗ ∈ B FCM to have the same column sums as B.
Let 𝑋1 , . . . , 𝑋𝑛 be independent Bernoulli random variables. Let the probability of success for
𝑋𝑖 be
Pr(𝑋𝑖 = 1) = 𝑝𝑖 .
The random variable
𝑋 = 𝑋1 + · · · + 𝑋𝑛 (6.5)
is said to have the Poisson binomial distribution with parameters 𝑝 1 , . . . , 𝑝 𝑛 .
Theorem 6.5.1. Under the fixed column model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is Poisson binomial
with parameters
𝑐 (𝑐 − 1) 𝑐 (𝑐 − 1) 𝑐 𝑛 (𝑐 𝑛 − 1)
𝑝1 = 1 1 , 𝑝2 = 2 2 , . . . , 𝑝𝑛 = .
𝑚(𝑚 − 1) 𝑚(𝑚 − 1) 𝑚(𝑚 − 1)
Proof. The 𝐵𝑖𝑘 ∗ are all either zero or one and are independent in different columns when only the
column sums are fixed. So as 𝑘 varies, the products 𝐵𝑖𝑘 ∗ 𝐵∗ are independent Bernoulli random
𝑗𝑘
variables. Comparing equations (6.1) and (6.5), we see that the distribution of 𝑃𝑖∗𝑗 is Poisson
binomial.
If column 𝑘 has column sum 𝑐 = 𝑐 𝑘 then all zero-one vectors with sum 𝑐 are equally likely for
that column of B∗ . So there are 𝑚𝑐 possible 𝑘th columns. The number of ways to have a success
is the number of possible columns which have ones in both positions 𝑖 and 𝑗 where 𝑖 ≠ 𝑗. So the
number of choices is the number of ways to choose the remaining 𝑐 − 2 ones in that column from
the other 𝑚 − 2 positions, that is, 𝑚−2𝑐−2 . Thus
𝑚−2
𝑝 𝑘 = Pr(𝐵𝑖𝑘∗ 𝐵∗ = 1) = 𝑐 − 2 = 𝑐(𝑐 − 1)
𝑗𝑘 𝑚 𝑚(𝑚 − 1)
𝑐
which finishes the demonstration.
77
6.6 Stochastic degree sequence model (SDSM)
Finally, the stochastic degree sequence model (SDSM) takes B SDSM to be all binary 𝑚 × 𝑛
matrices, but also gives a process for generating these matrices with different probabilities. Each
B∗ is generated by filling the cells 𝐵𝑖𝑘∗ with a 0 or 1 depending on the outcome of an independent
∗ . The distribution of the random variable 𝑃 ∗ arising from B SDSM
Bernoulli trial with probability 𝑝𝑖𝑘 𝑖𝑗
is Poisson-binomial with parameters which can be computed using the 𝑝𝑖𝑘 ∗ [DNS21, LR16]. There
are many ways to choose 𝑝𝑖𝑘 ∗ , but in the studies in chapter 8, we choose 𝑝 ∗ so that it approximates
𝑖𝑘
Pr(𝐵𝑖𝑘∗ = 1) for B∗ ∈ B FDSM , with the goal of ensuring that the expected agent and artifact
degree sequences of B∗ ∈ B SDSM match those of B. Adopting such a version of SDSM implies,
for example, that in each possible world a given species may be found on many or few islands
and a given island may be home to many or few species, but the average number of islands on
which a given species lives in all possible worlds and the average number of species that live on
an given island in all possible worlds matches these values the observed world. The SDSM has
been used to extract the backbone of bipartite projections of, for example, legislators co-sponsoring
bills [Nea20, Nea14, SB20], zebrafish (Danio rerio) sharing operational taxonomic units [BDS+ 20],
countries sharing exports [SDCGS15], and genes expressed in genesets [MLLS21].
In the stochastic degree sequence model, B SDSM consists of all binary 𝑚×𝑛 matrices. A method
is then chosen to generate probabilities 𝑝𝑖𝑘 ∗ . Finally, matrices B∗ ∈ B SDSM are generated using
these probabilities for independent Bernoulli trials, where 𝐵𝑖𝑘 ∗ is filled with a one with probability
∗ and zero otherwise.
𝑝𝑖𝑘
Theorem 6.6.1. Under the stochastic degree sequence model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is
Poisson binomial with parameters
∗ 𝑝∗ , . . . , 𝑝 = 𝑝∗ 𝑝∗ .
𝑝 1 = 𝑝𝑖1 𝑗1 𝑛 𝑖𝑛 𝑗𝑛
Proof. The fact that the distribution is Poisson binomial follows immediately from the independence
assumption on the Pr(𝐵𝑖𝑘 ∗ ) and equation (6.1). Furthermore, the probability that the 𝑘th variable is
78
one is
∗ 𝐵∗ = 1) = Pr(𝐵∗ = 1) Pr(𝐵∗ = 1) = 𝑝 ∗ 𝑝 ∗ .
𝑝 𝑘 = Pr(𝐵𝑖𝑘 𝑗𝑘 𝑖𝑘 𝑗𝑘 𝑖𝑘 𝑗 𝑘
So we are done.
In the following chapter, we will implement these emsemble methods in the R package
backbone.
79
CHAPTER 7
BACKBONE: AN R PACKAGE FOR EXTRACTING THE BACKBONE OF WEIGHTED
GRAPHS
This chapter contains material from Domagalski, Neal, and Sagan [DNS21, NDS21a], and back-
ground from Neal, Domagalski, and Yan [NDY22]. Replication materials are available at
https://www.github.com/domagal9/dissertation.
We now introduce the R package backbone that implements these five models, fixed degree
sequence model (FDSM), fixed fill model (FFM), fixed column model (FCM), fixed row model
(FRM), and the stochastic degree sequence model (SDSM). The backbone package provides these
methods in a common framework making them both accessible and easy to use for scientists and
researchers. It can be installed in R [R C18] from The Comprehensive R Archive Network (CRAN)
via install.packages("backbone") and used with library(backbone) [DNS20]. Informa-
tion regarding the CRAN distribution is found at https://CRAN.R-project.org/package=backbone.
Additional materials relating to backbone including papers, presentations, workshop materials,
and datasets are available at https://rbackbone.net.
7.1 Two Illuminating Data Sets
We illustrate the use of the R backbone package to extract the backbone of two networks:
the first is a network of bill co-sponsorship relations among Senators in the 114th session of the
United States Senate, the second is a network of world city firm co-locations amongst large cities
in the year 2000. Both of these networks, legislative and spatial, are used as templates for network
research in their corresponding fields.
7.1.1 Legislative Networks
For more than a decade, legislative networks have shed new light on understanding legislative
behavior [Fow06a, Fow06b]. Although legislative networks clarify that governance is an interactive
80
and interdependent process, they are most useful if they help us explain or predict key parts of this
process. The most consequential action a legislator can take is voting, and several studies have
shown that a legislator’s position in a legislative network helps explain their voting behavior. For
example, [Fow06a] found that US legislators were more likely to vote in favor of bills sponsored by
well-connected legislators, even after controlling for shared party membership, and therefore that
well-connected legislators were more effective at advancing their legislative agendas. Similarly,
[RNH13] found that social ties among European legislators exacerbated ideological voting patterns:
friendship increased the likelihood of political allies voting the same way, but decreased the
likelihood of political adversaries voting the same way. [Fon20] offers one potential explanation
for the network’s influence over voting behavior: “When legislators are called on to vote on a
question that they do not understand, they take cues from experts who are nearby in the legislative
network” (p. 270). Although voting is particularly consequential, legislative networks have also
been used to explain how the coalitions that shape voting outcomes change over time. For example,
[Nea20] demonstrated that the US Congress has become substantially more partisan since 1973
with legislators increasingly collaborating only with members of the same party, and opposing
members of the other party. However, [KMN16] and [AN20a] clarified that these coalitions are
not strictly partisan and frequently include members from both parties.
Directly measuring legislative networks (e.g., simply asking legislators who they work with) is
challenging because legislators are busy and may have motivations to conceal or misrepresent their
true collaborations. As a result, most studies of legislative networks rely on more indirect mea-
surements derived from bill sponsorship [e.g., Nea20], committee memberships [e.g., PMNW05],
attendance at press events [e.g., DMSK15], and roll call votes [e.g., ALH+ 15a]. What do such
indirectly measured legislative networks measure? Different source data provides information
about different types of relations among legislators. For example, voting similarly in roll call votes
provides information about ideological alignment, whereas sharing membership on a committee
provides information about alignment on prioritized issues. The majority of legislative networks
are derived from patterns of bill sponsorship, which also provides information about ideological
81
and issue alignment, but more directly provides information about collaboration as legislators join
together in lending their collective support to bills [Kir11, KK96].
All but the most popular legislative measures require collaboration to cultivate support and
ensure their eventual passage. Past studies have identified many factors that influence when
legislators choose to collaborate, consistently finding support for homophily [MSLC01]: similar
legislators are more likely to collaborate [CP87]. In the context of legislative collaboration,
homophily with respect to political party is known as partisanship, which when particularly intense
leads to partisan polarization. Both research [e.g., Nea20, LCH06, MM13] and media reports [e.g.,
Ing15] confirm that polarization has become a hallmark of legislative relations in the US Congress,
so observing party homophily in networks of legislative collaboration is expected.
To demonstrate how the backbone package works, we employ its use on a co-sponsorship
network of the United States Senate during the 114th session. Since both prior research [LCH06,
Nea20, SB20, ALH+ 15b, AN20b] and media accounts [Dru16] of the current US political climate
provide us with a priori expectations about what structure a properly extracted backbone should
have, we expect positive relationships to form primarily between those in the same political party,
and accordingly a relatively large modularity statistic computed from a partition of the nodes by
political party. Modularity measures the strength of division within the network. Specifically, for
a network G with vertex degree sequence (𝑑1 , . . . , 𝑑𝑛 ), it is given by the quantity
1 Õ 𝐺 𝑖 𝑗 − 𝑑𝑖 𝑑 𝑗
𝑄= 𝛿(𝑐𝑖 , 𝑐 𝑗 ),
2 2 G
Í Í
G
𝑖, 𝑗
where 𝑐𝑖 and 𝑐 𝑗 represent the communities (in this case political party) that vertices 𝑖 and 𝑗 belong
to, and 𝛿(𝑐𝑖 , 𝑐 𝑗 ) is the Kronecker delta function. In visualizations of the extracted backbones, we
depict Republican senators by red vertices, and both Democratic and Independent senators who are
left-leaning and caucused with Democrats by blue vertices. Although we discuss signed backbones
in the text, for visual clarity we only provide figures for binary backbones which contain positive
edges. Positive relations of collaboration between two Republicans are depicted in red, between
two Democrats are blue, and for all other pairs are purple. For an example, see fig. 7.1
82
Heller, D. (NV−R)
Alexander, L. (TN−R) Flake, J. (AZ−R)
McCaskill, C. (MO−D)
Cassidy, B. (LA−R)
Lankford, J. (OK−R) Hatch, O. (UT−R)
Risch, J. (ID−R)
Johnson, R. (WI−R)
Warner, M. (VA−D)
Paul, R. (KY−R)
Corker, B. (TN−R)
Enzi, M. (WY−R)
Portman, R. (OH−R)
Rounds, M. (SD−R)
Roberts, P. (KS−R)
Cornyn, J. (TX−R)
Perdue, D. (GA−R) Crapo, M. (ID−R)
Blunt, R. (MO−R)
Scott, T. (SC−R)
Carper, T. (DE−D)
Toomey, P. (PA−R)
McCain, J. (AZ−R)
Manchin, J. (WV−D)
Shelby, R. (AL−R)
Gardner, C. (CO−R)
Cochran, T. (MS−R) Lee, M. (UT−R)
Coats, D. (IN−R)
McConnell, M. (KY−R)
Isakson, J. (GA−R)
Cruz, T. (TX−R)
Moran, J. (KS−R) Boozman, J. (AR−R)Daines, S. (MT−R) Barrasso, J. (WY−R)
Reed, J. (RI−D)
Inhofe, J. (OK−R)
Ayotte, K. (NH−R)
Sessions, J. (AL−R)
Reid, H. (NV−D) Hoeven, J. (ND−R)
Graham, L. (SC−R)
Sasse, B. (NE−R)
Fischer, D. (NE−R)
Kaine, T. (VA−D)
Donnelly, J. (IN−D)
Franken, A. (MN−D)
Feinstein, D. (CA−D) Gillibrand, K. (NY−D)
Murray, P. (WA−D) Capito, S. (WV−R)
Tillis, T. (NC−R)
Wicker, R. (MS−R)
Heitkamp, H. (ND−D)
Cotton, T. (AR−R)
Collins, S. (ME−R)
Whitehouse, S. (RI−D)
Baldwin, T. (WI−D) Ernst, J. (IA−R)
Brown, S. (OH−D) Mikulski, B. (MD−D) Kirk, M. (IL−R)
Vitter, D. (LA−R)
Thune, J. (SD−R)
Shaheen, J. (NH−D)
Burr, R. (NC−R)
Stabenow, D. (MI−D) Coons, C. (DE−D)
Menéndez, R. (NJ−D)
Rubio, M. (FL−R)
Boxer, B. (CA−D) Tester, J. (MT−D)
Bennet, M. (CO−D)
Cardin, B. (MD−D)
Klobuchar, A. (MN−D)
Booker, C. (NJ−D)
Casey, R. (PA−D)
Grassley, C. (IA−R)
Warren, E. (MA−D)
Hirono, M. (HI−D)
King, A. (ME−I)
Durbin, R. (IL−D)
Murphy, C. (CT−D)
Schatz, B. (HI−D)
Markey, E. (MA−D)
Schumer, C. (NY−D)
Heinrich, M. (NM−D)
Sullivan, D. (AK−R)
Nelson, B. (FL−D)
Blumenthal, R. (CT−D)
Leahy, P. (VT−D)
Peters, G. (MI−D)
Merkley, J. (OR−D)
Udall, T. (NM−D)
Sanders, B. (VT−I)
Wyden, R. (OR−D)
Cantwell, M. (WA−D)
Murkowski, L. (AK−R)
Figure 7.1: An example of an extracted backbone, with Democratic senators represented by blue
vertices, and Republican senators represented by red vertices.
The data set consists of 100 senators and the 3589 bills that they have sponsored or co-sponsored
in the 114th session of Congress [USG20]. This data takes the form of a bipartite network B, where
the agents are the senators (rows) and the artifacts are the bills (columns). Here, 𝐵𝑖𝑘 = 1 if senator
𝑖 sponsored or co-sponsored bill 𝑘, and otherwise is 0. Below we examine the data set. Notice
that the row names correspond to each senator (including their party affiliation and the state they
represent) and the column names refer to the bill number.
> set.seed(19)
> library(backbone)
> senate <- read.csv("S114.csv", row.names = 1, header = TRUE)
> senate <- as.matrix(senate)
> dim(senate)
[1] 100 3589
> senate[1:5, 1:5]
sj9 sj8 sj7 sj6 sj5
Alexander, L. (TN-R) 0 1 0 1 0
83
Boxer, B. (CA-D) 0 0 0 0 1
Cantwell, M. (WA-D) 0 0 0 0 1
Carper, T. (DE-D) 0 0 0 0 1
Cochran, T. (MS-R) 0 1 0 1 0
A weighted network P can be constructed from B via bipartite projection, where P = BB𝑇 and
𝑃𝑖 𝑗 contains the number of bills that both senator 𝑖 and senator 𝑗 sponsored. Notice the network is
now 100 rows by 100 columns.
> G <- senate%*%t(senate)
> dim(G)
[1] 100 100
> G[1:5, 1:2]
Alexander, L. (TN-R) Boxer, B. (CA-D)
Alexander, L. (TN-R) 141 10
Boxer, B. (CA-D) 10 303
Cantwell, M. (WA-D) 15 82
Carper, T. (DE-D) 12 55
Cochran, T. (MS-R) 40 25
The projected network P now indicates that Senator Lamar Alexander sponsored a total of 141
bills in the 114th session. Among these 141 bills, 10 were co-sponsored with Senator Barbara
Boxer, and 15 were co-sponsored with Senator Maria Cantwell.
We can use the values of graph P to observe differences between those with similar or dissimilar
ideology. Below, we compare the number of bills co-sponsored by two individuals with similar
political ideology, Senators Cory Booker and Elizabeth Warren, versus those with dissimilar
ideology, Senators Ted Cruz and Bernie Sanders. The results are consistent with the expectation
that legislators sharing a similar ideology engage in more co-sponsorships.
84
> G["Booker, C. (NJ-D)", "Warren, E. (MA-D)"]
[1] 98
> G["Cruz, T. (TX-R)", "Sanders, B. (VT-I)"]
[1] 5
The differences in the number of bills co-sponsored prompts an important underlying question:
how many bills do two senators have to co-sponsor before we would be justified in concluding they
are political collaborators? Similarly, how few bills do they have to co-sponsor before we would
be justified in concluding they are political opponents? These questions are what the backbone
package seeks to answer.
7.1.2 Spatial Networks
The second type of network we will examine with the backbone package is a spatial network.
Bipartite projections appear in spatial analysis, where they can take two distinct forms depending
on whether the agents or artifacts are spatial entities (i.e., locations). In the locations-as-agents
approach, a spatial bipartite projection is a network of locations, such that a pair of locations is
connected to the extent that they share artifacts. Calling it the “interlocking world city network
model,” this is the approach that [Tay01] proposed and which launched a wave of research on
world city networks: major cities (the agents, which are locations) are connected to the extent that
they house branch offices of the same advanced producer services firms (e.g., finance, accounting,
consulting; the artifacts). It rests on the logic that offices of the same firm must communicate and
interact with one another, and therefore that when two cities have an office of the same firm, there
is likely interaction between them. Spatial networks adopting the locations-as-agents approach to
measurement via bipartite projection are quite common at multiple spatial scales, and have been used
to measure networks among urban locations connected by twitter users [Poo18], bus routes [LD20],
networks among cities connected by patents [BR17], banking syndicates [PWK19], networks among
countries connected by treaties [HBKM09], trade [SCS17], and corporate executives [HFC16].
85
In the locations-as-artifacts approach, a spatial bipartite projection is a network of agents (often
people or other social actors), such that a pair of agents is connected to the extent that they share
locations. The locations-as-artifacts approach is less common in geography because the spatial
units play only an instrumental role in the network, forging the links between agents, but do not
appear in the bipartite projection network itself. However, it is common in sociological research,
where the focus is on social networks emerging from spatial interactions. For example, [BCS+ 17]
and [XCB20] use this approach to measure and study the social network among households in
Los Angeles: households (the agents) are connected to the extent that they visit the same routine
activity locations (e.g., school, work; the artifacts). This rests on the logic that places offer
opportunities for casual encounters which lead to the formation of social bonds, and therefore when
two households frequent the same places, they are more likely to interact with each other [Jac61].
[HKBH07] adopted a similar locations-as-artifacts approach to derive a ‘product space’ in which
export products were connected to the extent that they were exported by the same countries. This
follows the logic that “if [the production of] two goods...require similar institutions, infrastructure,
physical factors, technology, or some combination thereof, they will tend to be produced [in the same
location],” and therefore the spatial co-production of products indirectly captures their production
technology similarity [HKBH07, p. 484].
There is an important link between these two approaches. When B is a bipartite network where
the rows represent locations, then BB0 will yield a locations-as-agents bipartite projection, while
B0B will yield a locations-as-artifacts bipartite projection. Therefore, a single bipartite network
can be studied from both perspectives. For example, although the world cities literature usually
focuses on cities linked by sharing firms, some have simultaneously examined a network of firms
linked by their co-location in cities [e.g., Nea08, VMND16]. Similarly, [SCS17] examined not
only a network of countries linked by trading the same products, but also a network of products
that are traded by the same countries.
The key advantage to measuring spatial networks using bipartite projections lies in the relative
ease of data collection. For example, data about economic exchanges between cities may not be
86
available from official government sources, and collecting such data directly is often impractical.
However, data about where firms’ offices are located is readily available, usually on the firms’
own websites. Accordingly, bipartite projections offer a practical way for researchers to indirectly
approximate a city-level economic network. Similarly, because social network analysis requires
data from a population (not a sample) and is sensitive to missingness, it is often impractical to
collect data on the social network among residents of a large city. However, data about the places
residents visit or tweet about can be collected using routine surveys, remote sensing, and digital
trace measures. Accordingly, bipartite projections also offer a practical way for researchers to
indirectly approximate social networks in large geographic areas.
In the context of spatial analysis, it can be used for research adopting a locations-as-agents
approach, to infer the spatial network among a set of locations from data on their shared character-
istics. However, it can also be used for research adopting a locations-as-artifacts approach, to infer
a social network among a set of actors from data on their shared locations. To illustrate backbone’s
application in one specific spatial analytic context, we will demonstrate its use to examine the world
city network and identify the most central cities in it.
The Globalization and World Cities (GaWC) “Data Set 11” was originally collected in 2000,
and records the extent of 100 advanced producer services firms’ presence in each of 315 large cities
[TCW02]. These data served as the foundation for one of the earliest and most comprehensive
empirical studies of the world city network [Tay04], and as a template for a substantial body of
empirical research conducted by those associated with the GaWC research network. Formally, the
data set takes the form of a rectangular 315 × 100 bipartite matrix B, in which 𝐵𝑖𝑘 contains the
‘service value’ of firm 𝑘’s presence in city 𝑖. The service values are an ordinal scale intended to
capture the importance or extent of a firm’s presence in a city, and ranged from 0 (no presence) to
5 (global headquarters), with a value of 2 representing an presence that provides “the ‘normal’ or
‘typical’ service level of the given firm in a city” [TCW02, p. 2370]. These publicly available data
can be loaded into R directly from the GaWC website (as of July 2021) and converted to matrix
form. This data set is also included in the replication materials.
87
> cities <- read.csv(file="https://www.lboro.ac.
uk/gawc/datasets/da11.csv",
header = TRUE,
row.names = 1)
> cities <- as.matrix(cities)
The backbone package is designed for use with binary bipartite data, so for this illustration we
transform the original ordinal B into a binary B0 such that
1 if 𝐵𝑖 𝑗 ≥ 3
𝐵𝑖0 𝑗 =
.
0 if 𝐵𝑖 𝑗 ≤ 2
This transformation can be achieved, and the cities that contain no firms with a larger-than-typical
presence can be excluded, by typing:
> cities[cities <= 2] <- 0
> cities[cities >= 3] <- 1
> cities <- cities[rowSums(cities) != 0,]
This transformation allows us to focus only on firms that maintain a larger-than-typical presence in
a given city, and only on the 196 cities that contain at least one such firm. For convenience, we use
B to refer to this binary matrix in the remainder of this section. Once the bipartite data has been
loaded and transformed, it is possible to examine some of its features. For example, it is possible
to look at the pattern of firms’ presence in cities.
> cities[114:117,8:11]
Horwath KPMG Summit...Baker RSM
MELBOURNE 0 1 0 1
MEXICO CITY 0 1 0 0
MIAMI 1 1 0 1
MILAN 0 0 0 1
88
This command shows the portion of B that includes the 114th to 117th cities, and 8th to 11th firms.
The output shows that while the accounting firms of KPMG and RSM maintained offices in several
of these cities, Horwath and Summit International+Baker Tilley did not.
Two key characteristics of any bipartite data are the row sums and column sums. In these data,
the row sums indicate the number of firms located in a city, while the column sums indicate the
number of cities in which a firm maintains a presence.
> rowSums(cities)["AMSTERDAM"]
AMSTERDAM
29
> rowSums(cities)["NEW YORK"]
NEW YORK
74
> colSums(cities)["KPMG"]
KPMG
76
> colSums(cities)["HSBC"]
HSBC
43
For example, there are 74 firms that maintain a larger-than-typical presence in New York,
but only 29 firms that maintain a larger-than-typical presence in Amsterdam. Likewise, KPMG
maintains a larger-than-typical presence in 76 cities, while HSBC maintains a larger-than-typical
presence in only 43 cities. Figure 7.2 illustrates these values for all cities and firms in these data.
Specifically, Figure 7.2A shows that while most cities contain fewer than 20 firms, some cities
contain many more firms. Similarly, Figure 7.2B shows that while most firms maintain a presence
fewer than 40 cities, some firms maintain a presence of many more cities.
89
Figure 7.2: The distribution of (A) row sums and (B) column sums in the GaWC Dataset 11.
The conventional “specification of the world city network” used in GaWC research involves
computing a weighted bipartite projection P from the original bipartite data B [Tay01].
> P <- cities %*% t(cities)
Following this specification, the cities are treated as agents and the firms are treated as artifacts.
The resulting square matrix P is treated as a weighted world city network in which the strength
of the connection between a pair of cities is measured by their number of co-located firms. For
example, examining the matrix cell corresponding to the connection between Amsterdam and New
York
> P["AMSTERDAM","NEW YORK"]
[1] 26
indicates that 26 firms maintain a presence in both cities, and might be interpreted as evidence that
they interact economically.
Many analyses of the world city network focus on cities’ degree centrality, or what is sometimes
called a city’s “global network centrality” (GNC). This value measures a city’s total number or
90
strength of connections in the network, and is interpreted as an indicator of a city’s status or
importance in the network.
> sort(rowSums(P), decreasing = TRUE)[1:5]
LONDON NEW YORK PARIS HONG KONG SINGAPORE
1496 1403 1043 1032 913
In these data, London and New York have the greatest centrality, occupying the top tier of the
urban hierarchy as what GaWC research calls Alpha++ cities [BST99]. They are followed by a
second tier of Alpha+ cities that include Paris, Hong Kong, and Singapore. This approach appears
to successfully identify what nearly any scholar of globalization would regard as the cities “used
by global capital as basing points in the spatial organization and articulation of production and
markets” [Fri86, p. 71].
However, these values and this weighted spatial network are less informative than they might
seem. The centrality values derived from this network are almost perfectly correlated with the
number of firms located in each city (i.e. the row sums of B).
> cor(rowSums(P), rowSums(cities))
[1] 0.9767704
The high correlation indicates that this approach to identifying central cities in a world city network
is actually just identifying cities that contain many firms. This occurs because measuring a world
city network using a weighted bipartite projection of firm locations guarantees that cities with many
firms will have stronger connections and larger centrality values [Nea12]. If world city researchers
were simply interested in finding cities with many firms, there are much simpler ways achieve this
(e.g., counting a city’s number of firms).
In practice, world city researchers are interested in something more nuanced: studying cities that
are central in a network of economic interactions. The challenge is that although firm co-location
may provide information about which cities interact economically, firm co-location is not the same
91
as economic interaction. The backbone package can be used to make inferences about which
cities are engaged in economic interaction based on firm co-location patterns. Specifically, it can
be used to estimate whether the number of firms co-located in two cities is large enough to warrant
concluding that the two cities are engaged in meaningful economic interaction. The backbone of
the world city network is a binary network in which pairs of cities are connected only if their number
of co-located firms suggests they are engaged in meaningful economic interaction, and therefore
provides a simplified and potentially more focused depiction of the world city network.
We’ll now examine how the backbone package’s functionality provides insights on both the
spatial and legislative networks described.
7.2 Universal Threshold universal()
The simplest approach to backbone extraction applies a single threshold value 𝑇 to all edges.
As mentioned previously, often 𝑇 = 0 is used which leads to very dense and highly clustered
backbones. While we do not recommend using a universal threshold method, this is included in
the backbone package for comparison purposes. The function, universal() allows the user to
extract a single threshold 𝑇, or extract a signed backbone by selecting upper and lower thresholds
𝑇 + and 𝑇 − .
For both the senate and the world cities data sets, we’ll use the universal() function to
compute a backbone with a single threshold of 0. Thus in the legislative network, if two senators
have co-sponsored one or more bills, there will be an edge between them. Similarly, any number of
firm co-locations is interpreted as evidence of economic interaction between a pair of cities. Notice
that our backbone graph is represented by a square adjacency matrix with 0-1 entries.
> universalbb <- universal(senate, upper = 0, bipartite = TRUE)
> universalbb$backbone[1:5, 1:2]
Alexander, L. (TN-R) Boxer, B. (CA-D)
Alexander, L. (TN-R) 0 1
Boxer, B. (CA-D) 1 0
92
Reed, J. (RI−D)
Menéndez, R. (NJ−D)
Grassley, C. (IA−R)
Cardin, B. (MD−D)
Lankford, J. (OK−R) Vitter, D. (LA−R)
Hirono, M. (HI−D)
Graham, L. (SC−R)
Fischer, D. (NE−R)
Kaine, T. (VA−D)
Wicker, R. (MS−R)
Toomey, P. (PA−R)
McCaskill, C. (MO−D)
Booker, C. (NJ−D)
Hoeven, J. (ND−R)
Heitkamp, H. (ND−D)
Brown, S. (OH−D)
Capito, S. (WV−R)
Sasse, B. (NE−R)
Tillis, T. (NC−R)
King, A. (ME−I)
Bennet, M. (CO−D)
Moran, J. (KS−R) Tester, J. (MT−D)
Risch, J. (ID−R)
Perdue, D. (GA−R)
Corker, B. (TN−R)
Murkowski, L. (AK−R)
Enzi, M. (WY−R)
McCain, J. (AZ−R)
Schatz, B. (HI−D)
Warner, M. (VA−D)
Casey, R. (PA−D) McConnell, M. (KY−R)
Donnelly, J. (IN−D)
Crapo, M. (ID−R)
Alexander, L. (TN−R)
Warren, E. (MA−D)
Blumenthal, R. (CT−D) Coats, D. (IN−R)
Roberts, P. (KS−R)
Lee, M. (UT−R)
Sessions, J. (AL−R)
Rounds, M. (SD−R)
Manchin, J. (WV−D)
Cantwell, M. (WA−D)
Murphy, C. (CT−D)
Ernst, J. (IA−R) Heller, D. (NV−R)
Nelson, B. (FL−D)
Sullivan, D. (AK−R)
Franken, A. (MN−D)
Gillibrand, K. (NY−D)
Cruz, T. (TX−R)
Reid, H. (NV−D)
Cotton, T. (AR−R)
Daines, S. (MT−R)
Wyden, R. (OR−D)
Ayotte, K. (NH−R)
Barrasso, J. (WY−R)
Hatch, O. (UT−R)
Peters, G. (MI−D)
Mikulski, B. (MD−D) Coons, C. (DE−D)
Baldwin, T. (WI−D) Durbin, R. (IL−D)
Boozman, J. (AR−R) Inhofe, J. (OK−R)
Rubio, M. (FL−R)
Collins, S. (ME−R)
Whitehouse, S. (RI−D)
Flake, J. (AZ−R)
Leahy, P. (VT−D)
Cochran, T. (MS−R)
Cornyn, J. (TX−R)
Schumer, C. (NY−D)
Johnson, R. (WI−R)
Gardner, C. (CO−R)
Paul, R. (KY−R) Murray, P. (WA−D)
Portman, R. (OH−R)
Kirk, M. (IL−R)
Feinstein, D. (CA−D)
Boxer, B. (CA−D) Scott, T. (SC−R)
Cassidy, B. (LA−R)
Burr, R. (NC−R)
Udall, T. (NM−D)
Shaheen, J. (NH−D) Markey, E. (MA−D)
Isakson, J. (GA−R)
Blunt, R. (MO−R) Merkley, J. (OR−D)
Thune, J. (SD−R)
Carper, T. (DE−D)
Heinrich, M. (NM−D)
Klobuchar, A. (MN−D) Sanders, B. (VT−I) Shelby, R. (AL−R)
Stabenow, D. (MI−D)
Figure 7.3: The positive backbone of the US Senate co-sponsorship network with edges retained
between two senators if they sponsored at least 1 bill together.
Cantwell, M. (WA-D) 1 1
Carper, T. (DE-D) 1 1
Cochran, T. (MS-R) 1 1
The density of a network is the number of edges in the network, divided by the number of
possible edges in the network. Plotting this backbone using the igraph package [CN06] reveals
that it is extremely dense as only 1 pair of senators out of the total 4950 unique pairs have not
sponsored at least one bill together (see fig. 7.3). Accordingly, this universal threshold backbone is
uninformative about the underlying structure of the network. Moreover, partitioning this backbone
into two groups by political party yields a modularity near zero, which indicates that this backbone
does not reflect the partisan polarization known to exist in the US Senate.
We see a similar density problem occur in the world cities network.
> universal0 <- universal(cities, upper = 0, bipartite = TRUE)
93
> table(universal0$backbone)
0 1
21506 16910
> mean(universal0$backbone)
[1] 0.4401812
> sort(rowSums(universal0$backbone), decreasing = TRUE)[1:5]
LONDON NEW YORK PARIS HONG KONG LOS ANGELES
191 185 175 171 171
> cor(rowSums(universal0$backbone), rowSums(cities))
[1] 0.7407175
A backbone extracted using 𝑇 = 0 is quite dense (44% of possible inter-city connections
are present) because it treats even small numbers of firm co-locations as evidence of economic
interaction between cities. As a result, the most central cities are still obviously large cities
that contain many firms, and indeed, cities’ centrality in this network remains highly correlated
(𝑟 = 0.74) with their total number of firms.
A sparser network containing fewer inter-city connections can be obtained using a higher (i.e.
more stringent) threshold that retains only particularly strong connections [e.g., DT05]. For
example, the universal() function can be used to extract a backbone where 𝑇 = 25, and therefore
only cities with more than 25 co-located firms are counted as connected:
> universal25 <- universal(cities, upper = 25, bipartite = TRUE)
> mean(universal25$backbone)
[1] 0.001665973
> sort(rowSums(universal25$backbone), decreasing = TRUE)[1:5]
LONDON NEW YORK HONG KONG PARIS CHICAGO
15 12 5 5 3
> cor(rowSums(universal25$backbone), rowSums(cities))
94
[1] 0.8381523
This more stringent universal threshold is indeed much less dense (only 0.16% of possible edges
are present). However, it still remains focused on the largest cities, whose centrality is highly
correlated (𝑟 = 0.84) with the total number of firms.
These approaches involve an arbitrarily-selected threshold, however the universal() function
can also be used to apply a universal threshold that is based on characteristics of the weighted
bipartite projection P. For example, it is possible to extract a backbone in which cities are
connected if they have more than two standard deviations above the average number of co-located
firms.
> universal.meansd <- universal(B, upper = function(x)mean(x)+2*sd(x),
bipartite = TRUE)
> mean(universal.meansd$backbone)
[1] 0.03092461
> sort(rowSums(universal.meansd$backbone), decreasing = TRUE)[1:5]
LONDON NEW YORK HONG KONG PARIS SINGAPORE
64 61 51 49 42
> cor(rowSums(universal.meansd$backbone), rowSums(cities))
[1] 0.9655334
This backbone is also lower density (3% of possible edges are present), but once again it focuses
only on large cities, whose centrality is nearly identical to their total number of firms (𝑟 = 0.97).
To create a signed backbone, we can apply both an upper and lower threshold value. The
following code will return a backbone where the positive edges indicate two senators co-sponsored
more than 1 standard deviation above the mean number of co-sponsored bills and negative edges
indicate two senators co-sponsored less than 1 standard deviation below the mean number of
co-sponsored bills. The graph of the positive edges of this backbone can be seen in fig. 7.4.
> universalbb2 <- universal(senate, upper = function(x) mean(x)+sd(x),
95
Burr, R. (NC−R)
Warner, M. (VA−D)
Cassidy, B. (LA−R)
Carper, T. (DE−D)
Lee, M. (UT−R)
McCaskill, C. (MO−D)
Kaine, T. (VA−D)
Enzi, M. (WY−R) Rounds, M. (SD−R)
Toomey, P. (PA−R)
Barrasso, J. (WY−R)
Shelby, R. (AL−R)
Manchin, J. (WV−D)
Johnson, R. (WI−R)
Paul, R. (KY−R)
Corker, B. (TN−R)
Donnelly, J. (IN−D)
Gardner, C. (CO−R)
Hoeven, J. (ND−R)
Sullivan, D. (AK−R) Reid, H. (NV−D)
Sasse, B. (NE−R)
Ernst, J. (IA−R)
Graham, L. (SC−R)
Fischer, D. (NE−R)
Sessions, J. (AL−R)
Murkowski, L. (AK−R)
Thune, J. (SD−R)
Daines, S. (MT−R) Tillis, T. (NC−R)
Risch, J. (ID−R)
Scott, T. (SC−R)
Cochran, T. (MS−R)
Heller, D. (NV−R)
Crapo, M. (ID−R)
McConnell, M. (KY−R)
Perdue, D. (GA−R)
Moran, J. (KS−R)
Portman, R. (OH−R)
Blunt, R. (MO−R)
Roberts, P. (KS−R)
Hatch, O. (UT−R) Coats, D. (IN−R)
Isakson, J. (GA−R)
Inhofe, J. (OK−R)
Cornyn, J. (TX−R) Wicker, R. (MS−R)
Alexander, L. (TN−R)
Boozman, J. (AR−R)
Vitter, D. (LA−R)
Capito, S. (WV−R) Ayotte, K. (NH−R)
Cotton, T. (AR−R)
Rubio, M. (FL−R)
Grassley, C. (IA−R)
Kirk, M. (IL−R) Nelson, B. (FL−D)
Collins, S. (ME−R) King, A. (ME−I)
Flake, J. (AZ−R)
Cruz, T. (TX−R)
Leahy, P. (VT−D)
Schatz, B. (HI−D) McCain, J. (AZ−R)
Coons, C. (DE−D)
Wyden, R. (OR−D) Peters, G. (MI−D)
Casey, R. (PA−D)
Shaheen, J. (NH−D)
Booker, C. (NJ−D)
Klobuchar, A. (MN−D)
Gillibrand, K. (NY−D)
Merkley, J. (OR−D) Murphy, C. (CT−D)
Schumer, C. (NY−D)
Durbin, R. (IL−D)
Hirono, M. (HI−D)
Baldwin, T. (WI−D)
Brown, S. (OH−D) Sanders, B. (VT−I)
Bennet, M. (CO−D) Blumenthal, R. (CT−D) Whitehouse, S. (RI−D)
Tester, J. (MT−D) Murray, P. (WA−D)
Boxer, B. (CA−D)
Warren, E. (MA−D)
Franken, A. (MN−D) Markey, E. (MA−D) Menéndez, R. (NJ−D)
Heinrich, M. (NM−D)
Stabenow, D. (MI−D)
Cardin, B. (MD−D)
Lankford, J. (OK−R)
Feinstein, D. (CA−D)
Mikulski, B. (MD−D)
Udall, T. (NM−D)
Cantwell, M. (WA−D)
Reed, J. (RI−D)
Heitkamp, H. (ND−D)
Figure 7.4: The positive backbone of the US Senate co-sponsorship network with edges retained
between two senators if they sponsored more bills together than one standard deviation above the
mean.
lower = function(x) mean(x)-sd(x), bipartite = TRUE)
The resulting graph in fig. 7.4 is much less dense than when using an upper threshold of 0.
Additionally, the polarized structure of the Senate by political party is visible, and is confirmed by a
larger modularity (𝑄 = 0.277). However, it still does not necessarily reveal the underlying structure
of the network among legislators. In this case, “the application of a threshold to the global weight
distribution...belittles nodes with a small [degree],” resulting in a backbone that preserves edges
only among legislators who sponsor many bills, and treating legislators who sponsor few bills as
isolates [SBV09, p. 6484]. Similarly in the world cities network, the universal threshold backbone
extraction does not take into account variations in the number of firms located in each city. By
not controlling for these variations (which are substantial in this data, see 7.2A) when deciding
whether two cities are connected, it privileges cities that contain many firms. In these data, because
there are large variations in the number of firms located in each city that must be controlled for, a
96
universal threshold backbone is not appropriate.
To obtain meaningfully sparse graphs that do not ignore the multi-scalar character of node
degrees we must allow the threshold to vary for different edges. To improve our backbone results,
we move to methods of bipartite projection backbones that rely on a distinct threshold value for
each pair of vertices.
Extracting a null model backbone: backbone.extract( )
Instead of using a universal threshold to determine a backbone, the backbone package in-
corporates the five different ensemble methods previously mentioned in chapter 6: FFM, FRM,
FCM, SDSM, and FDSM. These models 𝑀 do control for variation in the row and column degree
sequences of B∗ ∈ B 𝑀 . To use these methods in backbone, one first calls to an ensemble model
function (fixedfill(), fixedrow(), fixedcol(), sdsm(), or fdsm()), which finds the prob-
ability of observing an edge with the observed weight in a corresponding null model, returning an
object of class ‘backbone.’ This object contains the following: a positive matrix with (𝑖, 𝑗) entry
equal to the probability that 𝐺 𝑖∗𝑗 is equal to or above the corresponding entry in 𝐺, and a negative
matrix with (𝑖, 𝑗) entry equal to the probability that 𝐺 𝑖∗𝑗 is equal to or below the corresponding
entry in 𝐺, and summary, a data frame summary of the inputted matrix and model including the
class, model name, number of rows, and number of columns.
This ‘backbone’ object is then supplied to backbone.extract(), which performs the hypoth-
esis test for a given significance value and returns a backbone graph. The user can input bipartite
graph objects of class ‘matrix’, ‘sparseMatrix’, ‘Matrix’, ‘igraph’, ‘network’, and ‘edgelist’ (a matrix
of two columns), and can choose the type of backbone returned by specifying the desired class in
backbone.extract(). The backbone.extract() function allows the user to input the backbone
class object and obtain either a signed or positive backbone. This backbone.extract() function
has five arguments: matrix, signed, alpha, class, narrative, and fwer. The matrix argu-
ment takes a backbone object generated by fixedfill(), fixedrow(), fixedcol(), sdsm(),
or fdsm() and returns a backbone graph of class = class using a two-tailed significance test with
97
significance value 𝛼 = alpha. If the signed parameter is set to TRUE then a signed backbone is
returned, if it is set to FALSE then a positive backbone is returned. If the narrative parameter is
set to TRUE then suggested narrative text for a manuscript, including possible citations, is displayed.
Extracting the backbone of a bipartite projection involves conducting an independent statistical
test on ℓ = 𝑚(𝑚 − 1)/2 edges in the projection, where 𝑚 is the number of vertices in the bipartite
projection. Because each of these tests is independent, this can inflate the familywise error rate
beyond the desired alpha. The fwer parameter offers two ways to correct for this: the classical
Bonferroni correction is applied when fwer = ‘bonferroni’, and the more powerful Holm-
Bonferroni correction is applied when fwer = ‘holm’ [Hol79].
7.3 Fixed fill model fixedfill()
The fixedfill() function will apply the fixed fill ensemble model to the bipartite network.
Due to the large binomial coeffients in the probability distribution, this model as currently imple-
mented in backbone v1.5.0 is infeasible on large networks like the Senate data set. However,
we can still apply it to the world cities network and do so below. Regardless, as we’ll see in
Chapter chapter 8, FFM is not the recommended model for bipartite backbone extraction when
there is concern regarding the degree sequences.
> fixedprobs <- fixedfill(cities)
> fixedbb <- backbone.extract(fixedprobs)
In this null model, the number of edges in the network is held constant, that is, our observed world
cities network is compared to all other possible networks with the same density. Specifically in this
instance, the number of firms present in cities remains fixed, but the number of firms per company
and number of firms per city may vary. Notice above we’ve applied the backbone.extract()
function here after choosing the fixedfill() function which determined the ensemble method.
Under the default settings, backbone.extract() has extracted a positive backbone under an alpha
value of 0.05. Since all statistical tests are two-tailed tests, an edge is retained in the cities network
98
if the probability of two cities having the observed number of co-located firms is greater than or
equal to 0.025, i.e., the upper tail of the Jacobi distribution.
> mean(fixedbb)
[1] 0.07418784
> sort(rowSums(fixedbb), decreasing = TRUE)[1:5]
LONDON NEW YORK PARIS HONG KONG TOKYO
94 87 76 71 71
> cor(rowSums(fixedbb), rowSums(cities))
[1] 0.9293961
This FFM backbone network has a low density but again provides information focused around
the largest cities. The centrality is highly correlated with number of firms. Instead of this model
which compares a bipartite B with other networks of the same density, we’ll now apply the remaining
models which are based upon the degree sequences.
7.4 Fixed row model fixedrow()
To apply the fixed row distribution to a bipartite graph, one uses the fixedrow() function. The
FRM is also often called hypergeometric as it estimates a hypergeometric probability distribution
for each pair of nodes in the network. As an example,
> rowprobs <- fixedrow(senate)
> rowbb <- backbone.extract(rowprobs, alpha = .01)
We can now examine how this method has changed the appearance of our network, focusing
only on the positive edges of the signed backbone in fig. 7.5. We can see that the FRM has reduced
the density of our network and that we begin to see some of the two party structure that is inherent
in the United States Senate. The known polarized structure is also apparent, which is reflected in
this network’s modularity (𝑄 = 0.215).
99
Boxer, B. (CA−D)
Sanders, B. (VT−I)
Durbin, R. (IL−D)
Reid, H. (NV−D)
Hirono, M. (HI−D) Blumenthal, R. (CT−D)
Wyden, R. (OR−D)
Cardin, B. (MD−D)
Markey, E. (MA−D)
Heinrich, M. (NM−D) Murphy, C. (CT−D)
Whitehouse, S. (RI−D)
Leahy, P. (VT−D)
Reed, J. (RI−D)
Mikulski, B. (MD−D)
Schumer, C. (NY−D)
Bennet, M. (CO−D)
McCaskill, C. (MO−D)
Warren, E. (MA−D)
Murray, P. (WA−D)
Schatz, B. (HI−D)
Klobuchar, A. (MN−D)
Gillibrand, K. (NY−D)
Casey, R. (PA−D)
Merkley, J. (OR−D)
Baldwin, T. (WI−D)
Warner, M. (VA−D) Tester, J. (MT−D)
Cantwell, M. (WA−D)
Franken, A. (MN−D)
King, A. (ME−I)
Booker, C. (NJ−D)
Cochran, T. (MS−R) Stabenow, D. (MI−D)
Daines, S. (MT−R)
Peters, G. (MI−D)
Heller, D. (NV−R)
Murkowski, L. (AK−R)
Manchin, J. (WV−D) Carper, T. (DE−D)
Brown, S. (OH−D)
Kirk, M. (IL−R)
Kaine, T. (VA−D)
Alexander, L. (TN−R) Moran, J. (KS−R)
Feinstein, D. (CA−D)
Donnelly, J. (IN−D)
Graham, L. (SC−R)
Johnson, R. (WI−R) Shaheen, J. (NH−D)
Roberts, P. (KS−R) Nelson, B. (FL−D) Menéndez, R. (NJ−D)
Burr, R. (NC−R)
Ernst, J. (IA−R)
Hoeven, J. (ND−R) Blunt, R. (MO−R)
Collins, S. (ME−R)
Flake, J. (AZ−R)
Udall, T. (NM−D)
Heitkamp, H. (ND−D)
Capito, S. (WV−R)
Corker, B. (TN−R)
Fischer, D. (NE−R) Coons, C. (DE−D)
Ayotte, K. (NH−R)
Gardner, C. (CO−R)
Boozman, J. (AR−R)
Enzi, M. (WY−R)
McCain, J. (AZ−R)
Sullivan, D. (AK−R)
Portman, R. (OH−R)
Wicker, R. (MS−R)
Perdue, D. (GA−R)
Hatch, O. (UT−R)
Risch, J. (ID−R)
Grassley, C. (IA−R)
Tillis, T. (NC−R)
Cassidy, B. (LA−R)
Thune, J. (SD−R)
Rounds, M. (SD−R)
McConnell, M. (KY−R) Sasse, B. (NE−R)
Cotton, T. (AR−R)
Isakson, J. (GA−R)
Toomey, P. (PA−R)
Lankford, J. (OK−R)
Lee, M. (UT−R)
Rubio, M. (FL−R)
Crapo, M. (ID−R)
Scott, T. (SC−R)
Sessions, J. (AL−R)
Barrasso, J. (WY−R)
Paul, R. (KY−R)
Vitter, D. (LA−R)
Coats, D. (IN−R) Cornyn, J. (TX−R)
Cruz, T. (TX−R)
Inhofe, J. (OK−R)
Shelby, R. (AL−R)
Figure 7.5: The positive backbone of the US Senate co-sponsorship network under the fixed row
model.
Specifically, for our example, the fixed row function will fix the number of bills that each senator
sponsors, while allowing each bill to be sponsored by a varying number of senators. The function
will compute the probability of each senator sponsoring at least (or at most) the observed number
of bills when the bills which they sponsor were chosen randomly.
Similarly, we can see how the fixed row model affects the world cities network.
> rowprobs2 <- fixedrow(cities)
> rowbb2 <- backbone.extract(rowprobs2, alpha = .1)
> mean(rowbb2)
[1] 0.09225323
> sort(rowSums(rowbb2), decreasing = TRUE)[1:5]
INDIANAPOLIS PORTLAND MELBOURNE LYON AUCKLAND
60 54 52 49 44
100
> cor(rowSums(rowbb2), rowSums(cities))
[1] 0.3039028
First, it is less dense than the 𝑇 = 0 universal threshold backbone, but denser than the 25-
threshold or mean-threshold backbones, containing 9.2% of possible edges. That is, this model
does reduce the complexity of the original network, but still preserves many intercity connections.
Second, and perhaps more notably, because the FRM controls for the number of firms in each city
when deciding which intercity connections to keep, it does not simply focus on cities that are large
and contain many firms. Indeed, while the most central cities are major financial centers, they
are not the obvious ones typically highlighted in world cities research. Moreover, cities’ centrality
and total firm count are only modestly correlated (𝑟 = 0.30), indicating that cities’ centrality in
this network provides information that is unique from what could have been learned from simply
counting their number of firms.
Although the FRM does control for the number of firms in each city (i.e. the row sums of
B∗ ∈ B 𝐹 𝑅𝑀 ), it does not control for the number of cities where each firm maintains a presence (i.e.
the column sums of B∗ ∈ B 𝐹 𝑅𝑀 ). However, there is substantial variation in the number of cities
where each firm maintains a presence (see Figure 7.2B), and not controlling for this variation can
distort decisions about whether a particular city pair’s number of co-located firms is significant.
For example, if Firm X maintains a presence in every city, then observing that it is co-located in
Amsterdam and New York is trivial. In contrast, if Firm Y maintains a presence in only two cities
then observing that it is co-located in Amsterdam and New York is quite noteworthy. Because these
data contain not only large variations in the number of firms in each city (see figure 7.2A) but also
large variations in the number of cities where each firm maintains a presence (see figure 7.2B),
the FRM is not appropriate. More generally, a FRM backbone and the fixedrow() function are
appropriate only when there is variation in the row sums of B, but limited variation in the column
sums of B.
101
Sasse, B. (NE−R)
Shelby, R. (AL−R)
McConnell, M. (KY−R)
Carper, T. (DE−D)
McCaskill, C. (MO−D)
Warner, M. (VA−D)
Nelson, B. (FL−D)
Kaine, T. (VA−D)
Reed, J. (RI−D)
Booker, C. (NJ−D)
Cantwell, M. (WA−D)
Sullivan, D. (AK−R) MenÃ©ndez, R. Warren,
(NJ−D) E. (MA−D)
Wyden, R. Mikulski,
(OR−D) B. (MD−D)
Boxer, B. (CA−D)
Heinrich, M. (NM−D)
Durbin, R. (IL−D)
Baldwin, T. (WI−D)
Schatz, B. (HI−D)
Gillibrand, K. (NY−D) Cardin, B. (MD−D)
Blumenthal, R. (CT−D)
Markey,
Murphy, C. (CT−D) E. (MA−D) Sanders, B. (VT−I)
Hirono, M. (HI−D)
Shaheen, J. (NH−D)
Leahy, P. (VT−D)
Casey, R. (PA−D) Merkley, J. (OR−D)
Coons, C. (DE−D) Brown, S. (OH−D) Peters, G. (MI−D)
Feinstein, D. (CA−D)Murray, P. (WA−D)
Franken, A. (MN−D)
Klobuchar, A. (MN−D)
Schumer, C. (NY−D) Stabenow, D. Whitehouse,
(MI−D) S. (RI−D)
Udall, T. (NM−D)
Paul, R. (KY−R)
Bennet, M. (CO−D)
Tester, J. (MT−D)
Ernst, J. (IA−R)
King, A. (ME−I)
Murkowski, L. (AK−R)
Collins, S. (ME−R) Heitkamp, H. (ND−D)
Kirk, M. (IL−R)
Portman, R. (OH−R)
Burr, R. (NC−R)
Graham, L. (SC−R)
Ayotte, K. (NH−R)
Grassley, C. (IA−R)
Tillis, T. (NC−R)
Moran, J. (KS−R)
Donnelly, J. (IN−D)
Coats, D. (IN−R) Blunt, R. (MO−R)
Wicker, R. (MS−R) Gardner, C. (CO−R)
Boozman, J. (AR−R)
McCain, J. (AZ−R)
Cotton, T. (AR−R)
Roberts, P. (KS−R) Capito, S. (WV−R)
Lankford, J. (OK−R) Cochran, T. (MS−R)
Rubio, M. (FL−R) Daines, S. (MT−R)
Hatch, O. (UT−R)
Johnson, R. (WI−R) Isakson, J. (GA−R)
Inhofe, J. (OK−R)
Vitter, D. (LA−R) Heller, D. (NV−R)
Toomey, P. (PA−R)
Crapo, M. (ID−R)
Cornyn, J. (TX−R)
Barrasso, J. (WY−R)
Fischer, D. (NE−R)
Flake, J. (AZ−R)
Risch, J. (ID−R) Thune, J. (SD−R)
Perdue, D. (GA−R)
Enzi, M. (WY−R) Corker, B. (TN−R)
Scott, T. (SC−R)
Rounds, M. (SD−R)
Hoeven, J. (ND−R)
Lee, M. (UT−R)
Sessions, J. (AL−R)
Cruz, T. (TX−R)
Cassidy, B. (LA−R)
Manchin, J. (WV−D)
Alexander, L. (TN−R)
Reid, H. (NV−D)
Figure 7.6: The positive backbone of the US Senate co-sponsorship network under the fixed
column model.
7.5 Fixed column model fixedcol()
The fixed column distribution can be used through the fixedcol() function. In this scenario,
the fixed column function fixes the number of senators that sponsor each bill, while allowing each
senator to sponsor a varying number of bills.
> colprobs<- fixedcol(senate)
> colbb <- backbone.extract(colprobs, alpha = .01)
We can now examine how the fixed column model (also called Poisson binomial) has changed
the appearance of our co-sponsorship network, again examining the positive edges in fig. 7.6. We
can see that the fixed column function has again reduced the density of our network and the two
party structure is more apparent. The known polarized structure is reflected in this network’s even
higher modularity (𝑄 = 0.424).
We mentioned the FRM is not a good choice for the world cities network because of the
substantial variation in the column sums. Here, the FCM would control for this variation in number
102
of cities where each firm maintains a presence, but introduces a similar problem in that the row
sums are now also not controlled for. The high correlation with total number of firms exemplifies
this issue.
> colprobs2 <- fixedcol(cities)
> colbb2 <- backbone.extract(colprobs2, alpha = 0.1, signed = FALSE)
> mean(colbb2)
[1] 0.07418784
> sort(rowSums(colbb2), decreasing = TRUE)[1:5]
LONDON NEW YORK PARIS HONG KONG TOKYO
94 87 76 71 71
> cor(rowSums(fixedcol_bb2), rowSums(cities))
[1] 0.9293961
We’ll now attempt to approach our ‘gold-standard’ model, where we compare our observed data
set to all other bipartite networks with the exact same degree sequences. The backbone package
provides two ways to do this, SDSM where the degree sequences are approximately fixed and the
probability mass function is known, and FDSM where the probability mass function is unknown
and thus the distribution is constructed through sampling.
7.6 Stochastic degree sequence model sdsm()
When describing the Stochastic degree sequence model in chapter 6, we choose probabilities
∗ so that it approximates Pr(𝐵∗ = 1) for B∗ ∈ B 𝑆𝐷𝑆𝑀 . Here we use the Bipartite Configuration
𝑝𝑖𝑘 𝑖𝑘
Model or BiCM to compute those probabilities for the Poisson binomial distribution, which is used
in the SDSM. In the following chapter 8, we will demonstrate why BiCM is the right choice for
computing these probabilities.
In the context of the senate co-sponsorship matrix, the stochastic degree sequence model will
compare our observed values to a distribution where each senator sponsors roughly the same number
of bills, and each bill is sponsored by roughly the same number of people. Also demonstrated is
103
the ‘narrative’ parameter which prints out information regarding the backbone network and the
citations for the model used.
> sdsm <- sdsm(senate)
> sdsmbb <- backbone.extract(sdsm, narrative = TRUE, alpha = .01)
Suggested manuscript text and citations:
From a bipartite graph containing 100 agents and 3589 artifacts, we obtained
the weighted bipartite projection, then extracted its binary backbone using
the backbone package (Domagalski, Neal, & Sagan, 2021). Edges were retained
in the backbone if their weights were statistically significant
(alpha = 0.01) by comparison to a null Stochastic Degree Sequence Model
(SDSM; Neal, 2014).
Domagalski, R., Neal, Z. P., and Sagan, B. (2021). backbone: An R Package
for Backbone Extraction of Weighted Graphs. PLoS ONE.
https://doi.org/10.1371/journal.pone.0244363
Neal, Z. P. (2014). The backbone of bipartite projections: Inferring
relationships from co-authorship, co-sponsorship, co-attendance and other
co-behaviors. Social Networks, 39, 84-97.
https://doi.org/10.1016/j.socnet.2014.06.001
We are able to see more of the partisan structure that is suggested to be present in the US
Senate in fig. 7.7, and this visualization provides more information than the extremely dense graph
found using a universal threshold. Moreover, the known polarized structure of the US Senate is
particularly evident, and confirmed by the much larger modularity (𝑄 = 0.471).
104
McCaskill, C. (MO−D)
Warner, M. (VA−D)
Nelson, B. (FL−D)
Carper, T. (DE−D)
Blumenthal, R. (CT−D)
Kaine, T. (VA−D) Murkowski, L. (AK−R)
Menéndez, R. (NJ−D)
Schumer, C. (NY−D)
Feinstein, D. (CA−D)
Durbin, R. (IL−D)
Stabenow, D. (MI−D)
Reed, J. (RI−D)
Gillibrand, K. (NY−D)
Whitehouse, S. (RI−D)
Booker, C. (NJ−D)
Bennet, M. (CO−D)
Cardin, B. (MD−D) Klobuchar, A. (MN−D)
Brown, S. (OH−D)
Baldwin, T. (WI−D)
Casey, R. (PA−D)
Mikulski, B. (MD−D)
Sullivan, D. (AK−R)
Reid, H. (NV−D) King, A. (ME−I)
Murphy, C. (CT−D)
Heitkamp, H. (ND−D)
Markey, E. (MA−D) Peters, G. (MI−D) Coons, C. (DE−D)
Franken, A. (MN−D)
Leahy, P. (VT−D)
Sanders, B. (VT−I)
Warren, E. (MA−D)
Shaheen, J. (NH−D)
Hirono, M. (HI−D)
Boxer, B. (CA−D)
Manchin, J. (WV−D)
Murray, P. (WA−D)
Donnelly, J. (IN−D)
Merkley, J. (OR−D)
Heinrich, M. (NM−D)
Cantwell, M. (WA−D) Schatz, B. (HI−D)
Grassley, C. (IA−R)
Udall, T. (NM−D) Collins, S. (ME−R)
Cassidy, B. (LA−R)
Tester, J. (MT−D)
Wyden, R. (OR−D) Vitter, D. (LA−R)
Kirk, M. (IL−R)
Thune, J. (SD−R)
Wicker, R. (MS−R)
Hoeven, J. (ND−R)
Alexander, L. (TN−R)
Cochran, T. (MS−R)
Capito, S. (WV−R) Fischer, D. (NE−R)
Rounds, M. (SD−R) Cornyn, J. (TX−R)
Ayotte, K. (NH−R)
Graham, L. (SC−R)
Shelby, R. (AL−R)
Boozman, J. (AR−R)
Sasse, B. (NE−R)
Sessions, J. (AL−R)
Roberts, P. (KS−R)
Blunt, R. (MO−R) Scott, T. (SC−R)
McConnell, M. (KY−R)
Hatch, O. (UT−R)
Coats, D. (IN−R)
Daines, S. (MT−R) Enzi, M. (WY−R)
Barrasso, J. (WY−R)
Moran, J. (KS−R)
Perdue, D. (GA−R) Paul, R. (KY−R)
Burr, R. (NC−R) Inhofe, J. (OK−R)
Tillis, T. (NC−R) Cruz, T. (TX−R)
Isakson, J. (GA−R)
Ernst, J. (IA−R)
Lee, M. (UT−R)
Crapo, M. (ID−R)
Lankford, J. (OK−R)
Cotton, T. (AR−R)
Risch, J. (ID−R)
Toomey, P. (PA−R)
Gardner, C. (CO−R)
Johnson, R. (WI−R)
Corker, B. (TN−R)
Rubio, M. (FL−R)
Flake, J. (AZ−R)
McCain, J. (AZ−R)
Portman, R. (OH−R)
Heller, D. (NV−R)
Figure 7.7: The positive backbone of the US Senate co-sponsorship network under the stochastic
degree sequence model.
Before examining the entire SDSM world cities backbone, consider how it determines whether
the number of co-located firms is statistically significant for a single city-pair. In fig. 7.8, three
of our ensemble models are drawn. The blue curve shows the number of firms that would be
co-located in Amsterdam and New York if all firms located in cities randomly, but on average
the number of firms in each city did not change and on average the number of cities where each
firm maintains a presence did not change. The SDSM distribution is wider and flatter than the
FDSM distribution, but has nearly the same midpoint. These differences arise because the SDSM
distribution is an approximation of the more targeted FDSM distribution. As an approximation
with a wider distribution, the SDSM is less statistically powerful, therefore we use a more liberal
threshold of statistical significance so that it will more closely mirror the FDSM. The 26 co-located
firms actually observed in Amsterdam and New York is in the middle of the SDSM distribution,
which indicates that this value is about what might be expected even under random conditions
(i.e. not statistically significant). Therefore, the SDSM backbone does not include a link between
105
Figure 7.8: Null weight distributions generated using the backbone package on from the GaWC
Dataset 11
Amsterdam and New York.
> sdsm2 <- sdsm(cities)
> sdsmbb2 <- sdsm(sdsm2)
> mean(sdsmbb2)
[1] 0.01973136
> sort(rowSums(sdsmbb2), decreasing = TRUE)[1:5]
KANSAS CITY CHARLOTTE RICHMOND INDIANAPOLIS BORDEAUX
24 21 20 18 17
> cor(rowSums(sdsmbb2), rowSums(cities))
[1] -0.1062661
The SDSM backbone is a sparse network, in which medium-sized regional centers are the most
central cities, and cities’ centrality and total firm count are uncorrelated (𝑟 = −0.11).
106
7.7 Fixed degree sequence model fdsm()
As mentioned in the previous chapter, the fixed degree sequence model first samples random
bipartite networks 𝐵∗ ∈ B 𝐹 𝐷𝑆𝑀 that preserves both degree sequences using the curveball algorithm
[SUG18]. These bipartite graphs 𝐵∗ are then projected to obtain random weighted bipartite
projection P∗ = B∗ B∗> . These two steps are repeated a number of times to sample the space of
possible 𝑃𝑖∗𝑗 . At each iteration, we compare 𝑃𝑖 𝑗 to the value of 𝑃𝑖∗𝑗 and keep a record of how often
it was above, below, or equal to the generated value. The fdsm() function returns a backbone
object containing a matrix object positive of the proportion of times 𝑃𝑖∗𝑗 is equal to or above the
corresponding entry in P, and a matrix object negative containing the proportion of times 𝑃𝑖∗𝑗 is
equal to or below the corresponding entry in P. This differs from the previous ensemble methods
where the exact probability mass function is known and a probability can be given.
The fdsm() function can also save each value of 𝑃𝑖∗𝑗 for a given 𝑖, 𝑗. This is useful for
visualizing an example of the empirical null edge weight distribution generated by the model. The
values 𝑖, 𝑗 correspond to the row and column indices of a cell in the projected matrix and can be
input as either numeric values or a string containing the row names. These values are returned in
the list dyad_values.
Using the fixed degree sequence model on the senate data set will allow us to compare our
observed values to a distribution where each senator sponsors the exact same number of bills and
each bill is sponsored by the exact same number of people. We can find the backbone using the
fixed degree sequence model as follows:
> fdsm <- fdsm(senate, trials = 1000, dyad = c("Booker, C. (NJ-D)",
"Warren, E. (MA-D)"))
The dyad_values output is a list of the 𝐺 𝑖∗𝑗 values for each of the 1000 trials, where 𝑖 =
“Booker, C. (NJ-D)” and 𝑗 = “Warren, E. (MA-D)”. These values correspond to the number of bills
Senators Booker and Warren would be expected to co-sponsor when we create a random bipartite
graph with the curveball algorithm where: (a) the number of bills sponsored by Senator Booker, by
107
0.08
0.06
Density 0.04
0.02
0.00
40 50 60 70 80 90
Expected Number of Co−Sponsorships under FDSM
Figure 7.9: A histogram of the expected co-sponsorships between Senators Cory Booker and
Elizabeth Warren under the fixed degree sequence model (1000 samples). A positive edge
between Booker and Warren would be preserved in the FDSM backbone because their actual
number of co-sponsorships (98) is statistically significantly larger.
Senator Warren, and all other Senators was fixed, and (b) the number of senators sponsoring each
bill was fixed. We can compare their actual number of co-sponsorships, 98, to what is generated
under our null model. We can view a histogram of the expected co-sponsorships generated in each
of the 1000 trials as follows (see fig. 7.9):
> hist(fdsm$dyadvalues, freq = FALSE, xlab = "Number of Co-Sponsorships")
> lines(density(fdsm$dyadvalues))
> fdsmbb <- backbone.extract(fdsm, alpha = 0.01, signed = TRUE)
The FDSM backbone, based on 1000 Monte Carlo samples, requires approximately 81 seconds
108
Heller, D. (NV−R)
Alexander, L. (TN−R) Flake, J. (AZ−R)
McCaskill, C. (MO−D)
Cassidy, B. (LA−R)
Lankford, J. (OK−R) Hatch, O. (UT−R)
Risch, J. (ID−R)
Johnson, R. (WI−R)
Warner, M. (VA−D)
Paul, R. (KY−R)
Corker, B. (TN−R)
Enzi, M. (WY−R)
Portman, R. (OH−R)
Rounds, M. (SD−R)
Roberts, P. (KS−R)
Cornyn, J. (TX−R)
Perdue, D. (GA−R) Crapo, M. (ID−R)
Blunt, R. (MO−R)
Scott, T. (SC−R)
Carper, T. (DE−D)
Toomey, P. (PA−R)
McCain, J. (AZ−R)
Manchin, J. (WV−D)
Shelby, R. (AL−R)
Gardner, C. (CO−R)
Cochran, T. (MS−R) Lee, M. (UT−R)
Coats, D. (IN−R)
McConnell, M. (KY−R)
Isakson, J. (GA−R)
Cruz, T. (TX−R)
Moran, J. (KS−R) Boozman, J. (AR−R)Daines, S. (MT−R) Barrasso, J. (WY−R)
Reed, J. (RI−D)
Inhofe, J. (OK−R)
Ayotte, K. (NH−R)
Sessions, J. (AL−R)
Reid, H. (NV−D) Hoeven, J. (ND−R)
Graham, L. (SC−R)
Sasse, B. (NE−R)
Fischer, D. (NE−R)
Kaine, T. (VA−D)
Donnelly, J. (IN−D)
Franken, A. (MN−D)
Feinstein, D. (CA−D) Gillibrand, K. (NY−D)
Murray, P. (WA−D) Capito, S. (WV−R)
Tillis, T. (NC−R)
Wicker, R. (MS−R)
Heitkamp, H. (ND−D)
Cotton, T. (AR−R)
Collins, S. (ME−R)
Whitehouse, S. (RI−D)
Baldwin, T. (WI−D) Ernst, J. (IA−R)
Brown, S. (OH−D) Mikulski, B. (MD−D) Kirk, M. (IL−R)
Vitter, D. (LA−R)
Thune, J. (SD−R)
Shaheen, J. (NH−D)
Burr, R. (NC−R)
Stabenow, D. (MI−D) Coons, C. (DE−D)
Menéndez, R. (NJ−D)
Rubio, M. (FL−R)
Boxer, B. (CA−D) Tester, J. (MT−D)
Bennet, M. (CO−D)
Cardin, B. (MD−D)
Klobuchar, A. (MN−D)
Booker, C. (NJ−D)
Casey, R. (PA−D)
Grassley, C. (IA−R)
Warren, E. (MA−D)
Hirono, M. (HI−D)
King, A. (ME−I)
Durbin, R. (IL−D)
Murphy, C. (CT−D)
Schatz, B. (HI−D)
Markey, E. (MA−D)
Schumer, C. (NY−D)
Heinrich, M. (NM−D)
Sullivan, D. (AK−R)
Nelson, B. (FL−D)
Blumenthal, R. (CT−D)
Leahy, P. (VT−D)
Peters, G. (MI−D)
Merkley, J. (OR−D)
Udall, T. (NM−D)
Sanders, B. (VT−I)
Wyden, R. (OR−D)
Cantwell, M. (WA−D)
Murkowski, L. (AK−R)
Figure 7.10: The positive backbone of the US Senate co-sponsorship network under the fixed
degree sequence model.
to extract. Using the fixed degree sequence model allows us to see more of the partisan structure
we assume to be present in the United States Senate in fig. 7.10. This expected partisan structure
is confirmed by the backbone’s high modularity (𝑄 = 0.468).
The spatial network backbone extracted using FDSM is noticeably different from the other
networks extracted using FFM, FRM, FCM, and SDSM.
> fdsm2 <- fdsm(cities, trials = 10000)
> fdsmbb2 <- backbone.extract(fdsm2, alpha = 0.1, signed = FALSE)
> mean(fdsmbb2)
[1] 0.02207414
> sort(rowSums(fdsmbb2), decreasing = TRUE)[1:5]
KANSAS CITY CHARLOTTE INDIANAPOLIS RICHMOND BORDEAUX
24 21 20 20 17
109
> cor(rowSums(fdsmbb2), rowSums(cities))
[1] -0.001015871
> cor(as.vector(fdsmbb2),as.vector(sdsmbb2))
[1] 0.9315762
First, it has a very low density, containing only 2.2% of possible edges. Second, the cities with
the highest centrality are medium-sized regional centers. Moreover, cities’ centrality and total firm
count are uncorrelated (𝑟 = −0.001), indicating that the FDSM backbone is detecting interaction
patterns unrelated to a city’s number of firms. Importantly, the pattern of intercity links in the
SDSM and FDSM backbones are highly correlated (𝑟 = 0.93).
The original bipartite firm location data are known to contain substantial variation in both
number of firms in each city (see figure 7.2A) but also large variations in the number of cities where
each firm maintains a presence (see figure 7.2B). Because the FDSM controls for variation in these
two characteristics, it is an appropriate model to use for backbone extraction in this case. Using
it yields a world city network backbone that contains only those intercity links that are not simply
the product of these characteristics. That is, the FDSM backbone allows world city researchers
to look beyond these characteristics to identify pairs of cities with unexpectedly-large numbers of
firm co-locations, which are potentially indicative of unexpectedly-strong economic interaction.
More generally, the FDSM and fdsm() function are appropriate when there is variation in both
the row sums of B and the column sums of B, which is likely to occur in most empirical bipartite
data. However, although FDSM may often be the most suitable model for many empirical data,
its simulation-based approach can be impractically slow when applied to bipartite data containing
many agents and artifacts. As we’ll see in the following chapter 8, in such cases, the SDSM model
is the recommended alternative. Additionally, we’ll investigate the relationship between the alpha
values used in SDSM and those used in FDSM. The backbone R package in the future will also be
home to additional backbone extraction methods, adding functionality for weighted networks that
are not bipartite projections.
110
CHAPTER 8
COMPARING MODELS FOR BACKBONE EXTRACTION
All results in this chapter are from Neal, Domagalski, and Sagan [NDS21b]. Replication materials
are available at https://www.github.com/domagal9/dissertation.
In this chapter we will compare the different bipartite ensemble backbone models. We begin
by examining different methods for choosing the cell-filling probabilities in SDSM. As mentioned
in chapter 7, this study will eventually conclude with deciding that the Bipartite Configuration
Model is the best choice for these values. After having a defined SDSM to work with, we study its
statistical power as compared to the FDSM backbones. Again, we’ll use the world cities network for
this analysis. Following this comparison, we can evaluate each of the five different models under
varying degree distributions, looking to examine their speed, accuracy, similarity, and community
detection. The culmination of these studies allows us to make a recommendation that in general,
SDSM is the correct backbone extraction method to use for most bipartite projections.
8.1 Study 1: Choosing cell-filling probabilities for the SDSM
The SDSM requires choosing 𝑝𝑖𝑘 ∗ , which we want to approximate Pr(𝐵∗ = 1) for B∗ ∈ B FDSM .
𝑖𝑘
There are three types of methods that might be used for doing so: arithmetic, general linear models,
and entropy maximization. First, we can choose 𝑝𝑖𝑘 ∗ = (𝑟 × 𝑐 )/ 𝑓 , where 𝑟 is the sum of entries
𝑖 𝑘 𝑖
in row 𝑖 of B, 𝑐 𝑘 is the sum of entries in column 𝑘 of B, and 𝑓 is the sum of all entries in B. When
∗ falls outside the [0, 1] range, it is truncated toward 0 or 1, respectively [Got00]. We call this
𝑝𝑖𝑘
method RCF because the value is chosen based on a row sum, a column sum, and the number of
entries of B that are filled with a one. Second, an estimate can be obtained by fitting a general
linear model of the form:
𝐵𝑖𝑘 = 𝛽0 + 𝛽1𝑟𝑖 + 𝛽2 𝑐 𝑘 + 𝜖, or
𝐵𝑖𝑘 = 𝛽0 + 𝛽1𝑟𝑖 + 𝛽2 𝑐 𝑘 + 𝛽3𝑟𝑖 𝑐 𝑘 + 𝜖,
111
where the 𝛽’s are estimated coefficients and 𝜖 is an error term. If the model is treated as a
linear regression and the coefficients are estimated using ordinary least squares, then the predicted
value of 𝐵𝑖𝑘 is chosen for 𝑝𝑖𝑘∗ , either truncating values outside the required [0, 1] range (linear
probability model; LPM) or transforming them into the required range using a linear discriminant
model (LDM) [AWvH20]. If the model is treated as a logistic regression and the coefficients are
estimated using maximum likelihood, then the predicted probability that 𝐵𝑖𝑘 = 1 is chosen for
∗ . In prior work, the logistic regression approach has used a scobit or logit link function, with
𝑝𝑖𝑘
or without an interaction term (𝛽3 ) [Nea14, SB20, Nea20]. Finally, an estimate can be obtained
by entropy maximization methods, including the polytope method (Poly) [DNS21, NDY22] or
bipartite configuration model (BiCM) [SSDC+ 17]. In this study, we evaluate the accuracy and
speed of these methods for choosing 𝑝𝑖𝑘 ∗ that approximate Pr(𝐵∗ = 1) for B∗ ∈ B FDSM .
𝑖𝑘
8.1.1 Methods
To evaluate accuracy, we begin by enumerating all the members of a small B FDSM . For example,
given an agent degree sequence of [1, 1, 2] and an artifact degree sequence of [1, 1, 2], B FDSM
contains 5 members (see Table 8.1A). Second, from this complete enumeration, we compute the
∗ to approximate (i.e., Pr(𝐵∗ = 1) for B∗ ∈ B FDSM , see Table 8.1B). Third,
probabilities we wish 𝑝𝑖𝑘 𝑖𝑘
we compute 𝑝𝑖𝑘 ∗ using each of nine methods (see Table 8.1C for values obtained using the BiCM
method). Finally, we quantify the accuracy with which 𝑝𝑖𝑘 ∗ approximates the desired probabilities
using the absolute mean difference for all 𝑖, 𝑘. In the example shown in Table 8.1, BiCM’s accuracy
for these degree sequences is 0.028. That is, on average 𝑝𝑖𝑘 ∗ chosen using BiCM deviates from the
desired probabilities by ± 0.028. Because evaluating accuracy in this way requires enumerating all
members of B FDSM , it is possible only for short degree sequences that define B FDSM with small
cardinality. We focus on degree sequences ranging in length from 2 to 5, which define 384 unique
B FDSM ranging in cardinality from 4 to 2040.
After identifying each method’s accuracy, we evaluate the computational running time of the
four most accurate methods by using them to choose 𝑝𝑖𝑘 ∗ for bipartite graphs defined by up to 3162
112
(A) Members of B FDSM
1 0 0 0 0 1 0 0 1 0 0 1 0 1 0
0 0 1 1 0 0 0 0 1 0 1 0 0 0 1
0 1 1 0 1 1 1 1 0 1 0 1 1 0 1
(B) Desired probabilities ∗ computed using BiCM
(C) 𝑝𝑖𝑘
0.2 0.2 0.6 0.216 0.216 0.568
0.2 0.2 0.6 0.216 0.216 0.568
0.6 0.6 0.8 0.568 0.568 0.863
Table 8.1: SDSM probabilities given agent and artifact degree sequences [1,1,2]
agents and up to 3162 artifacts, and thus requiring choosing up to 10,000,000 probabilities.
8.1.2 Results
Figure 8.1A shows the accuracy of each method’s computation of 𝑝𝑖𝑘 ∗ . Each gray line plots the
accuracy of each method for a single B FDSM , while the red line plots the mean accuracy of each
method over all 384 B FDSM . We find that choosing 𝑝𝑖𝑘 ∗ using a logistic regression with an interaction
term (i.e., (Scobit-I and Logit-I)) is on average least accurate [Nea14, Nea20], while choosing 𝑝𝑖𝑘 ∗
using entropy maximization (i.e., BiCM and Poly) is on average most accurate [DNS21, SDCGS15].
Figure 8.1B shows the number of seconds required to compute 𝑝𝑖𝑘 ∗ using a 2.3 GhZ Intel i7
processor. Among the two most accurate methods, BiCM is several orders of magnitude faster
than Polytope. When computing more than 104 probabilities, BiCM is also faster than the two
slightly less accurate Logit and LDM methods. In the largest case we evaluated, computing 107
probabilities, BiCM took only about 0.3 seconds. Therefore, we use BiCM for choosing 𝑝𝑖𝑘 ∗ when
extracting SDSM backbones in the remaining studies because it is both the most accurate and
fastest. In previous versions of the R package backbone, different methods for determining these
probabilities were included. However, based on these results, the sdsm() function uses the BiCM
method.
113
A B
0.40 One B FDSM
Mean for all 384 B FDSM 103
Logit
Accuracy: Mean absolute difference Seconds to estimate probabilities
0.30 LDM
10 2 Poly
0.20
BiCM
101
0.10
100
0.05
10−1
0.01 10−2
10−3
Logit−I Scobit−I RCF LPM Scobit LDM Logit Poly BiCM 102 103 104 105 106 107
Probability estimation method Number of probabilities to estimate
∗ using different methods.
Figure 8.1: (A) Accuracy and (B) speed computing 𝑝𝑖𝑘
8.2 Study 2: Statistical power of SDSM
Ensemble backbone models require the specification of a statistical significance level 𝛼, which
determines how uncommonly large an observed edge weight 𝑃𝑖 𝑗 must be when compared to edge
weights 𝑃𝑖∗𝑗 arising from an ensemble in order for a corresponding edge to be included in the
backbone. For a given model, smaller values of 𝛼 represent more stringent criteria for retaining
edges, and therefore yield sparser backbones. Although FDSM and SDSM define their respective
ensembles by constraining both agent and artifact degree sequences, and thus aim to yield similar
backbones, a given 𝛼 does not necessarily represent the same level of stringency in these two models.
Because the SDSM allows variation in the degree sequences of B∗ ∈ B SDSM , the distribution of
𝑃𝑖∗𝑗 is wider. These wider distributions mean that the SDSM provides a more conservative test of
edge weight significance than FDSM, or alternatively the SDSM has less statistical power to detect
significant edges than FDSM.
A concrete example serves to illustrate this difference. As in chapter 7, we study the world city
network using a bipartite projection where two cities are linked to the extent that firms maintain
locations in both cities. Recall the Globalization and World Cities (GaWC) data set takes the
form of a bipartite network recording the presence or absence of 100 firms (artifacts) in 196 cities
(agents) in the year 2000 [TCW02, NDS21a]. In this bipartite network, the agent degrees are
right-tailed because most cities contain only a few firms, while a few cities such as New York
114
contain many (see fig. 7.2). Likewise, the artifact degrees are also right tailed because most firms
maintain locations in only a few cities, while a few firms such as the accounting firm KPMG
maintain locations in many.
Figure 8.2A illustrates the distribution of the Milan-Paris edge weight in projections arising
from B FDSM and B SDSM of which the observed bipartite network is a member (i.e., the random
variable 𝑃𝑖∗𝑗 ). These distributions allow a researcher to decide whether Milan and Paris’s observed
number of co-located firms is significantly large, and therefore whether Milan and Paris should
be connected in a world city network backbone. The SDSM distribution is wider than the FDSM
distribution, which has implications for whether the Milan-Paris edge will be included in a backbone
extracted at a given significance level using each model. In the observed data, there are 26 firms
co-located in Milan and Paris (i.e., 𝑃𝑖 𝑗 = 26). The probability of observing the same or larger edge
weight in projections from the FDSM ensemble is 0.0033, which is less than 0.05 2 , and therefore
a Milan-Paris edge is deemed significant by the FDSM and is included in the FDSM backbone
extracted at 𝛼 = 0.05. In contrast, the probability of observing the same or larger edge weight
in projections from the SDSM ensemble is 0.0275, which is not less than 0.05 2 , and therefore a
Milan-Paris edge is not deemed significant by the SDSM and is not included in the SDSM backbone
extracted at 𝛼 = 0.05. For a given level of significance 𝛼, this difference in statistical power leads
the SDSM backbone to be sparser than the FDSM backbone (density = 0.004 vs. 0.012), and means
that these two backbones are dissimilar (Jaccard = 0.36).
In this study, we investigate SDSM’s statistical power relative to FDSM, and specifically whether
extracting an SDSM backbone using a more liberal (i.e., larger) 𝛼 makes it more similar to an FDSM
backbone extracted at 𝛼 = 0.05.
8.2.1 Methods
To evaluate SDSM’s statistical power and the effect of significance levels on the similarity of SDSM
and FDSM backbones, we first extracted the FDSM backbone from the GaWC bipartite network
at 𝛼 = 0.05. We then extracted several SDSM backbones from the GaWC bipartite network at
115
0.01 ≤ 𝛼 ≤ 0.3 in 0.001 increments, each time computing the Jaccard index (𝐽) to measure the
similarity between the SDSM and FDSM backbones. The Jaccard index is the ratio of the edges
0 0
the P 𝑆𝐷𝑆𝑀 and P 𝐹 𝐷𝑆𝑀 have in common to their total edges. After comparing SDSM and FDSM
backbones extracted from the empirical GaWC bipartite network, we repeat this process using 100
synthetic bipartite networks with the same dimensions (196 × 100), density (0.08) and right-tailed
agent and artifact degree distributions.
8.2.2 Results
The green line in Figure 8.2B shows the Jaccard similarity between an FDSM backbone extracted
from the empirical GaWC network at 𝛼 = 0.05 and SDSM backbones extracted at the significance
levels shown on the x-axis. We find that an SDSM backbone achieves its maximum similarity
to the FDSM backbone (𝐽 = 0.81) when it is extracted using the more liberal significance level
of 𝛼 = 0.12. Returning to the example in Figure 8.2A, using this more liberal significance level
would result in the Milan-Paris edge being deemed significant and included in the SDSM backbone
because its SDSM p-value 0.0275 < 0.12 2 . Because this more liberal significance level results in
the inclusion of additional edges, the new SDSM backbone extracted at 𝛼 = 0.12 has a density
(0.01), which is closer to that of the FDSM backbone extracted at 𝛼 = 0.05 (0.012).
The purple line in Figure 8.2B shows the mean Jaccard similarity between an FDSM backbone
extracted using 𝛼 = 0.05 and SDSM backbones extracted using 0.01 ≤ 𝛼 ≤ 0.3 from 100 bipartite
networks generated to resemble the empirical GaWC network. The shaded purple region shows
the 10th and 90th percentile of Jaccard similarities of these backbones. We find that these synthetic
networks behave similarly to the empirical network. Specifically, SDSM and FDSM backbones
extracted from a low-density 196 × 100 bipartite network with right-tailed degree distributions
achieve a maximum similarity of 0.49 < 𝐽 < 0.76 when the FDSM backbone is extracted using
𝛼 = 0.05 and the SDSM backbone is extracted using 𝛼 = 0.14. This is promising because it
suggests that, given the characteristics of an empirical bipartite network, it may be possible to
select a significance level for extracting a computationally-efficient SDSM backbone that closely
116
A B
1.00 Empirical GaWC network
100 simulated networks
Ensemble
Jaccard simiilarity of P′SDSM and P′FDSM
FDSM 0.81
SDSM 0.75
0.65
Probability
α α=0.05
0.50
0.25
0
10 20 Pij = 26 30 0.05
0.12 0.20 0.25 0.30
0.
Firms co−located in Milan & Paris (P∗ij )
14
SDSM Significance level (α)
Figure 8.2: Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan edge in
projections derived from FDSM and SDSM ensembles. (B) Similarity of an FDSM backbone
extracted at 𝛼 = 0.05 to SDSM backbones extracted at various 𝛼 from an empirical bipartite
network (green line) and from 100 synthetic bipartite networks (purple line = mean, purple region
= 10th –90th percentile).
resembles a computationally-infeasible FDSM backbone.
8.3 Study 3: Backbone equivalence under varying degree distributions
Agent and artifact degree distributions are a key feature of a bipartite network, and are known to
have implications for bipartite projections [VFO20, DNS21, NDS21a]. The FDSM is particularly
appealing because it allows decisions about the significance of edges in a projection to be condi-
tioned on both bipartite degree sequences, thereby taking into account these important features.
However, because the computational requirements of the FDSM make it impractical for extracting
the backbone from most bipartite projections, it is often necessary to use a different backbone
model. In this study, we evaluate the equivalence of an FDSM backbone and backbones extracted
using more computationally efficient models. We perform this comparison for backbones extracted
from bipartite networks characterized by five types of degree distributions: right-tailed, left-tailed,
normal, constant, and uniform.
For the sake of concreteness, in this section we use the example of a bipartite network in
which authors (agents) are linked to the papers they have written (artifacts). The projection of
117
Degree Distribution Authors (agents) Papers (artifacts)
Right-tailed Most write some papers, but a few Most papers are sole-authored, but
∼ 𝛽(1, 10) are prolific (most departments). some are written by large teams
(e.g., sociology).
Left-tailed ∼ 𝛽(10, 1) Most are prolific, but some are in- Most papers are written by large
active (elite departments). teams, but some are sole-authored
(e.g., physics).
Uniform ∼ 𝛽(1, 1) There is substantial diversity in There is substantial diversity in the
scholarly output (e.g., interdisci- size of authorship teams (e.g., an
plinary departments). entire university).
Constant ∼ There are strong norms about how There are strong norms about how
𝛽(10000, 10000) many papers an author should many authors a paper should have
have (e.g., for performance eval- (e.g., a senior author & a junior
uations). author)
Normal ∼ 𝛽(10, 10) Scholarly output varies around Authorship teams vary around
some typical level. some typical size.
Table 8.2: Bipartite degree distributions, with examples in the context of a scholarly authorship
bipartite network
such a network yields a co-authorship network in which the edge weight between a pair of authors
indicates their number of co-authored papers [New01]. These edge weight values will depend
heavily on the distribution of papers written by authors (i.e., the agent degree sequence), and on
the distribution of authors on each paper (i.e., the artifact degree sequence). Different degree
distributions describe different kinds of scholarly environments as shown in Table 8.2. The choice
of a backbone model affects whether these distributions are considered, and in this example affects
whether decisions about the significance of two authors’ number of co-authored papers consider
the scholarly environment. The FDSM compares their observed number of co-authored papers to
the number that might be observed in alternative realizations of the same environment, while other
backbone models relax the extent to which the environment is held constant.
8.3.1 Methods
We evaluate similarities among the backbones extracted using different models by comparing
backbones extracted from synthetic 100 × 100 bipartite networks with a density of 0.1, and with a
combination of agent and artifact degree distributions shown in Table 8.2. Following our example,
118
these synthetic bipartite networks might represent a college of 100 faculty who collectively wrote
100 papers, in a particular type of scholarly environment where each individual had a 10% chance
of being an author on each paper. After generating a bipartite network with a given size, density,
and degree distributions, we extract five different backbones from the generated bipartite network,
using the fixed fill model, fixed row model, fixed column model, stochastic degree sequence model,
and fixed degree sequence model; in all cases we use 𝛼 = 0.05. We compute the similarity of the
first four backbones to the FDSM backbone using a Jaccard index, repeating this process 100 times
for each of the 25 possible combinations of agent and artifact degree distributions.
8.3.2 Results
The heatmaps in Figure 8.3 illustrate the similarity between an FDSM backbone and a backbone
extracted using an alternative model. The rows of each heat map correspond to different agent
degree distributions, and the columns correspond to different artifact degree distributions, in the
synthetic bipartite networks from which the backbones were extracted. The lightest patches identify
conditions under which a given backbone model yields a backbone that is similar to what would be
obtained using the computationally costly FDSM, while darker patches identify conditions under
which these two backbones differ. We find that when agent degrees are constant (i.e., every agent
has the same degree) and artifact degrees are constant or left-tailed, all backbone models yield the
same backbone as FDSM (Mean 𝐽 = 1). However, beyond this special case, which is likely to be
rare in empirical data, similarity to FDSM-extracted backbones varies.
As expected, the similarity of backbones extracted using FRM and FDSM depends primarily on
the distribution of artifact degrees, not agent degrees (see Figure 8.3B). For example, for any agent
degree distribution, these two models yield very different backbones when artifact degrees follow a
right-tailed distribution (Mean 𝐽 = 0.186), but very similar backbones when artifact degrees follow
a normal distribution (Mean 𝐽 = 0.863). This occurs because both models exactly control for agent
degrees, however FDSM also controls for artifact degrees, while FRM does not.
A similar but rotated pattern emerges when considering the FCM: the similarity of backbones
119
A Fixed Fill Model B Fixed Row Model
Agent degree distribution Agent degree distribution
Unif Cons Left Norm Unif Cons Left Norm
Jaccard
1.00
Right Right
Right Unif Cons Left Norm Right Unif Cons Left Norm 0.75
Artifact degree distribution Artifact degree distribution
C Fixed Column Model D Stochastic Degree Sequence Model 0.50
Agent degree distribution Agent degree distribution
0.25
0.00
Unif Cons Left Norm Unif Cons Left Norm
Right Right
Right Unif Cons Left Norm Right Unif Cons Left Norm
Artifact degree distribution Artifact degree distribution
Figure 8.3: Jaccard similarity of a backbone extracted at 𝛼 = 0.05 using the Fixed Degree
Sequence Model and a backbone extracted using (A) the Fixed Fill Model, (B) Fixed Row Model,
(C) Fixed Column Model, (D) Stochastic Degree Sequence Model. Each cell represents the mean
over 100 instances of a 100 × 100 bipartite network with given agent and artifact degree
distributions.
extracted using FCM and FDSM depends primarily on the distribution of agent degrees, not artifact
degrees (see Figure 8.3C). For any artifact degree distribution, these two models yield very different
backbones when agent degrees follow a right-tailed or uniform (Mean 𝐽 = 0.084) distribution , but
more similar backbones when agent degrees follow a left-tailed distribution or are constant (Mean
𝐽 = 0.617). This occurs because both models exactly control for artifact degrees, however FDSM
also controls for agent degrees, while FRM does not. However, there is a notable exception to this
general pattern: when artifact degrees follow a uniform distribution, FCM and FDSM always yield
different backbones (Mean 𝐽 = 0.151).
The conditions under which the FFM yields FDSM-similar backbones occur at the intersection
of the conditions under which the FRM and FCM both yield FDSM-equivalent backbones (see
120
Figure 8.3A). When artifact degrees follow a right-tailed distribution and/or the agent degrees follow
a right-tailed or uniform distribution, then FFM and FDSM backbones differ (Mean 𝐽 = 0.1). In
contrast, for other combinations of degree distributions, FFM and FDSM backbones are more
similar (Mean 𝐽 = 0.724).
Finally, as expected based on the findings from study 2, we observe that the SDSM generally
yields different backbones than FDSM when both are extracted at 𝛼 = 0.05 (see Figure 8.3D).
Specifically, except in the narrow case where agent degrees are constant and artifact degrees are
constant or left-tailed (Mean 𝐽 = 1), SDSM and FDSM backbones exhibit only modest similarity
(Mean 𝐽 = 0.314). This lack of similarity or equivalence occurs because SDSM offers a less
statistically powerful (or more conservative) test of edges statistical significance than FDSM, and
therefore retains fewer edges in the backbone. However, findings from study 2 also suggested that
careful selection of the significance level used for extracting an SDSM backbone can yield results
more similar to FDSM.
To explore this possibility, we expanded the analysis reported in figure 8.3D by extracting
SDSM backbones at different significance levels. We find that when a suitably more liberal (i.e.,
larger) significance level 𝛼 is used to extract an SDSM backbone, the resulting SDSM backbone
is very similar to an FDSM backbone extracted at 𝛼 = 0.05 (see Figure 8.4A). Specifically, for
backbones extracted from bipartite networks with any agent or artifact degree distributions, these
two backbones tend to be nearly equivalent (Mean 𝐽 = 0.865). This suggests that in principle the
fast SDSM can be used to obtain a close approximation of a computationally-infeasible FDSM
backbone from any bipartite network.
In practice, using SDSM to obtain an FDSM-like backbone requires selecting an 𝛼 value for the
SDSM that corresponds to 𝛼 = 0.05 in the FDSM. We observe that there are three distinct values
of such an ‘optimal’ 𝛼 that depend on agent and artifact degree distributions (see Figure 8.4B).
First, when agent degrees are constant, a value only slightly higher than 0.05 (Mean = 0.062, SD
= 0.021) achieves the best approximation of an FDSM backbone. Second, when artifact degrees
are constant, a value roughly double (Mean = 0.09, SD = 0.022) achieves the best approximation
121
A Similarity of SDSM (α = optimal) and FDSM (α = 0.05) B SDSM α to maximize simlarity with FDSM (α = 0.05)
Agent degree distribution Agent degree distribution
Jaccard Optimal α
1.00 0.150
0.75 0.125
Unif Cons Left Norm Unif Cons Left Norm
0.50 0.100
0.25 0.075
0.00 0.050
Right Right
Right Unif Cons Left Norm Right Unif Cons Left Norm
Artifact degree distribution Artifact degree distribution
Figure 8.4: (A) Given agent and artifact degree distributions, there exists a statistical significance
level 𝛼 that maximizes the similarity between an SDSM backbone extracted at this level and an
FDSM backbone extracted at 𝛼 = 0.05, and (B) when used yields an SDSM backbone that is very
similar to the corresponding FDSM backbone.
of an FDSM backbone. Finally, when neither agent nor artifact degrees are constant, which is
likely in most empirical bipartite networks, a value roughly 2.5 times larger (Mean = 0.13, SD
= 0.014) achieves the best approximation of an FDSM backbone. Although further work is needed
to facilitate the a priori selection of an 𝛼 that allows an SDSM backbone to closely approximate
an FDSM backbone, these results suggest that under the most common circumstances (i.e., when
there is variation in degrees) 𝛼 ≈ 0.13 may be appropriate.
8.4 Study 4: Recovery of community structure
Studies 1-3 examine the backbones extracted from synthetic random bipartite networks; how-
ever, empirical bipartite networks are generally not random, but instead have a clustered or blocked
structure. In this study, we evaluate the extent to which backbones extracted using different models
reflect a known community structure that is encoded in the bipartite data from which they are
extracted [CWW18]. As shown in chapter 7, SDSM and FDSM backbones extracted from a bi-
partite network representing bill co-sponsorship in the 114th session of the US Senate more clearly
captured the known partisan community structure than an FRM backbone [DNS21]. For the sake
of concreteness, we use this legislative network context as an example in this section, but we extend
this prior work by considering a broader range of backbone models, and by examining their ability
122
to recover community structures from bipartite data containing varying levels of evidence for this
structure.
8.4.1 Methods
We investigate the ability for backbones to recover a known community structure in three steps.
First, we simulate a 200 × 1000 bipartite network with a density of 0.1 and right-tailed agent and
artifact degree distributions. We focus on a bipartite network with more artifacts than agents to
ensure that these data contain sufficient information to encode potential community memberships.
We focus on a bipartite network with right-tailed degree distributions because they are common
in many empirical unipartite [BC19] and bipartite networks [Nea20, NDS21a, AABB11]. Similar
to the Senate data set we examined in chapter 7, this synthetic bipartite network could represent a
legislative body composed of 200 legislators casting votes on 1000 bills, where any given legislator
had a 10% chance of voting in favor of any given bill. The right-tailed degree distributions capture
the fact that most legislators vote in favor of only a few bills, and that most bills receive the support
of only a few legislators, which is typical of legislative bodies. The backbone of a projection
of such a bipartite network would represent a network of collaboration or ideological alignment
among legislators [Nea20].
Second, we incorporate evidence of communities in this bipartite network by randomly assigning
each agent and each artifact to one of two groups. We then perform checkerboard swaps, which
preserve the degree distributions, until a given fraction of edges 𝑊 are within-group, connecting
an agent and artifact from the same group [GSPA07]. Figure 8.5A provides graphical depictions
of the matrices describing synthetic bipartite networks at two values of 𝑊. In each plot, the rows
represent agents assigned to group A or B, the columns represent artifacts assigned to group A or
B, and a cell is shaded black if the row agent is connected to the column artifact. When 𝑊 = 0.5,
agents in a given group are equally likely to associate with artifacts in either group, placing ≈ 0.5
of the edges (i.e., shaded cells) in the diagonal blocks and ≈ 0.5 of the edges in the off-diagonal
blocks. In contrast, when 𝑊 = 0.8, agents in a given group are much more likely to associate with
123
artifacts from their own group than artifacts in the other group, placing ≈ 0.8 of the edges in the
diagonal blocks and ≈ 0.2 of the edges in the off-diagonal blocks. Returning to our example, the
groups could represent political parties: each legislator belongs to one of two parties (i.e., there
are conservative and liberal legislators), and each bill advances the agenda of one of these parties
(i.e., there are conservative and liberal bills). When 𝑊 = 0.5, a conservative legislator is equally
likely to vote for conservative and liberal bills, while when 𝑊 = 0.8, a conservative legislator is
four-times more likely to vote for a conservative bill than a liberal bill.
Finally, we extract a backbone from the bipartite network using a given model and compute the
backbone’s modularity 𝑄 with respect to the agents’ group assignments [NG04]. If a backbone
model is able to recover the community structure from evidence in the bipartite network, then
we expect a positive association between 𝑊 and 𝑄. In the legislative example, if legislators are
bipartisan in their voting patterns (i.e., 𝑊 = 0.5), then legislators should not be clustered by party
in the backbone (i.e., 𝑄 ≈ 0). In contrast, if legislators are strongly partisan in their voting patterns
(i.e., 𝑊 = 0.8), then legislators should be clustered by party in the backbone (i.e., 𝑄 ≈ 0.5).
We repeat these three steps 10 times for 0.5 ≤ 𝑊 ≤ 0.8 in 0.05 increments. When evaluating
the SDSM backbone, we consider both a backbone extracted using the conventional significance
level of 𝛼 = 0.05 and one extracted at the more liberal 𝛼 = 0.13, which study 3 suggests yields a
backbone similar to FDSM.
8.4.2 Results
Figure 8.5B shows the modularity (y-axis; with respect to known community memberships) of
backbones extracted using different models from bipartite networks with different fractions of
within-community edges (x-axis). All six lines increase monotonically, confirming that all back-
bone models yield backbones that can recover a known community structure. However, there is
notable variation among the models. As evidence of community structure grows stronger in the
bipartite network, the modularity of backbones extracted using the FFM and FCM slowly increase,
but even when the evidence of such a structure is quite strong (i.e., when 𝑊 = 0.8) they only achieve
124
A B C
0.5 Backbone Model
Fraction of within−group edges, W = 0.5
FDSM
SDSM (α = 0.13)
AGENTS
0.4
SDSM (α = 0.05)
FRM
Group B Group A 0.3
FCM
Modularity (Q)
FFM
Group A Group B
ARTIFACTS
0.2
Fraction of within−group edges, W = 0.8
0.1
AGENTS
Group B Group A
0.0
Group A Group B
ARTIFACTS −0.1
0.50 0.55 0.60 0.65 0.70 0.75 0.80
Fraction of within−group edges (W)
Figure 8.5: (A) Synthetic bipartite networks with varying levels of block structure, from which
(B) backbones extracted using different models exhibit varying modularity. (C) When 65% of
bipartite edges are within-block, a backbone extracted using FDSM shows a clear group structure
(top) while a backbone extracted using FCM does not (bottom).
average values of 𝑄 = 0.15 and 0.18, respectively. Backbones extracted using the FRM display a
similar pattern, but achieve a higher average modularity (𝑄 = 0.39) value when 𝑊 is large.
In contrast, backbones extracted using FDSM and SDSM are virtually indistinguishable in
their ability to recover the known community structure, and do so very well. As evidence of a
community structure grows stronger in the bipartite network, the modularity of backbones extracted
using these models rapidly increases. When the evidence of community structure is strong (i.e.,
when 𝑊 = 0.8), these backbones have very high modularity (mean 𝑄 = 0.49). However, even
when there is only modest evidence of community structure in the bipartite network (e.g., when
𝑊 = 0.65), these backbones are still able to identify the community structure and have a distinctively
high modularity (mean 𝑄 = 0.37).
Figure 8.5C illustrates the difference between two backbone models’ abilities to recover a known
community structure, when evidence of that structure is modest in the bipartite data from which
the backbone is extracted (𝑊 = 0.65). In both plots, agents from group A (e.g., conservatives, in
the legislative example) are colored red, while agents from group B (e.g., liberals, in the legislative
example) are colored blue. The FDSM-extracted backbone clearly places agents from different
125
groups in separate clusters. In contrast, the FCM-extracted backbone is unable to distinguish this
group structure and fails to cluster agents according to their known group memberships. These
findings suggest that although all backbone models can yield backbones that recover a known
community structure, SDSM and FDSM backbones are able to detect this structure more clearly
and from a weaker signal.
8.5 Recommendations for Backbone Selection
Bipartite networks can be used to represent a wide range of phenomena in the social and
natural worlds including interspecies competition, global trade, scientific advances, and legislative
deliberation. Likewise, projections of bipartite networks, which take the form of co-occurrence
networks, can be useful for inferring unipartite networks that would otherwise be difficult to measure
directly. Several models have been proposed for extracting the backbone of bipartite projections,
and thus for making such inferences, including the fixed fill model (FFM), fixed row model (FRM),
fixed column model (FCM), fixed degree sequence model (FDSM), and stochastic degree sequence
model (SDSM). We have introduced each of these models and found their probability mass functions
in chapter 6. To facilitate their use, we have described the R package backbone where we have
implemented each model in chapter 7. We then systematically compared these models in terms of
their relative accuracy, speed, statistical power, similarity, and ability to recover a known community
structure in chapter 8.
In study 1, we examined several methods for choosing the probabilities necessary for applying
the stochastic degree sequence model (SDSM), finding that the bipartite configuration model
(BiCM) is both the fastest and most accurate. In study 2, we examined the statistical power of the
SDSM relative to the fixed degree sequence model (FDSM), finding that the SDSM can be viewed
as a statistically less powerful (or more conservative) variant of the FDSM. In study 3, we examined
the similarity of an FDSM-extracted backbone to backbones extracted using other models, finding
that the SDSM and FDSM extract very similar backbones from bipartite networks with a wide
range of possible degree distributions when an appropriate significance level 𝛼 is chosen. Finally,
126
in study 4, we examined the ability for backbones extracted using different models to recover a
known community structure, finding that although all models can recover the structure, SDSM and
FDSM can detect a community structure more clearly and from a weaker signal.
Based on these findings, and with the goal of offering researchers some guidance in extracting
the backbones of bipartite projections, we offer three recommendations. First, we recommend the
stochastic degree sequence model (SDSM) for extracting the backbones of bipartite projections be-
cause it is fast, controls for both agent and artifact degree sequences, and yields modular backbones
when the bipartite data contains even modest evidence of within-community clustering. Second,
when the SDSM is used, we recommend that the cell-filling probabilities 𝑝𝑖𝑘 ∗ be chosen using
the Bipartite Configuration Model (BiCM) because it is faster and more accurate than any other
currently available method. Third, when an FDSM backbone extracted at the 𝛼 = 0.05 significance
level is desired but computationally infeasible, we recommend extracting an SDSM backbone at the
𝛼 = 0.13 significance level, which we observe is very similar when there is variation in the agent
and artifact degree sequences. The models and options necessary to adopt these recommendations
are implemented in the backbone package for R [DNS21].
These findings and recommendations must be viewed in light of the fact that, due to the
computational requirements of the FDSM and of extracting a large number of backbones across
the four studies, these studies have relied on small synthetic bipartite networks ranging in size
from 3 × 3 (study 1) to 200 × 1000 (study 4). However, in practice bipartite networks may be
several orders of magnitude larger. For example, a bipartite network used to infer collaborations
in the US House of Representatives includes 435 agents (representatives) and over 6000 artifacts
(bills) [Nea20, NDY22], while a bipartite network used to infer movie recommendations includes
17,770 agents (films) and nearly 500,000 artifacts (viewers) [ZK11]. Future research should
explore whether these findings extend to backbones extracted from such large bipartite networks.
Limitations of existing backbone models also point to directions for future research. First, using
the FDSM will generally be computationally infeasible in practice because the distribution of 𝑃𝑖∗𝑗
arising from B FDSM must be estimated via numerical simulation. Identifying this distribution’s
127
probability mass function, which is known for the other ensembles (as discussed in chapter 6),
would facilitate the use of this otherwise attractive model; however, this is a well-studied problem
and so is probably very hard to solve. Second, all the ensemble models we have considered impose
constraints on the degree sequences, but other types of constraints may also be useful. For example,
in some contexts it may be necessary to constrain all members of an ensemble to contain a 0 in a
particular cell (e.g., to represent that an author was not alive to co-author a paper, or a legislator
was not present to co-sponsor a bill). These limitations and future directions notwithstanding, the
results presented above provide a starting point for further development of backbone models, and
provide applied researchers with some practical guidance on model selection.
128
BIBLIOGRAPHY
129
BIBLIOGRAPHY
[AABB11] Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-László Barabási.
Flavor network and the principles of food pairing. Scientific Reports, 1(1):1–7,
2011.
[AGRR20] Ron M. Adin, Ira M. Gessel, Victor Reiner, and Yuval Roichman. Cyclic quasi-
symmetric functions. Sém. Lothar. Combin., 82B:Art. 67, 12, 2020.
[ALH+ 15a] C. Andris, D Lee, M. J. Hamilton, M. Martino, C. E. Gunning, and J. A. Selden.
The rise of partisanship and super-cooperators in the us house of representatives.
PloS One, 10:e0123507, 2015.
[ALH+ 15b] Clio Andris, David Lee, Marcus J Hamilton, Mauro Martino, Christian E Gunning,
and John Armistead Selden. The rise of partisanship and super-cooperators in the
us house of representatives. PloS One, 10(4):e0123507, 2015.
[AN20a] S. Aref and Z. P. Neal. Detecting coalitions by optimally partitioning signed net-
works of political collaboration. Scientific Reports, 10:1506, 2020.
[AN20b] Samin Aref and Zachary P. Neal. Detecting coalitions by optimally partitioning
signed networks of political collaboration. Scientific reports, 10(1):1–10, 2020.
[And87] Désiré André. Solution directe du probleme résolu par m. bertrand. CR Acad. Sci.
Paris, 105(436):7, 1887.
[AWvH20] Paul Allison, R. A. Williams, and P. von Hippel. Better predicted probabilities
from linear probability models with applications to multiple imputation. 2020 Stata
Conference 1, Stata Users Group, 2020.
[BBMD+ 02] Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Danièle
Gardy, and Dominique Gouyou-Beauchamps. Generating functions for generating
trees. Discrete Math., 246:29–55, 2002. Formal power series and algebraic combi-
natorics (Barcelona, 1999).
[BBPS15] Sara Billey, Krzysztof Burdzy, Soumik Pal, and Bruce E. Sagan. On meteors,
earthworms and WIMPs. Ann. Appl. Probab., 25(4):1729–1779, 2015.
[BBS13] Sara Billey, Krzysztof Burdzy, and Bruce E. Sagan. Permutations with given peak
set. J. Integer Seq., 16(6):Article 13.6.1, 18, 2013.
[BC19] Anna D Broido and Aaron Clauset. Scale-free networks are rare. Nature communi-
cations, 10(1):1–10, 2019.
[BCS+ 17] Christopher R Browning, Catherine A Calder, Brian Soller, Aubrey L Jackson,
and Jonathan Dirlam. Ecological networks and neighborhood social organization.
American Journal of Sociology, 122(6):1939–1988, 2017.
130
[BDS+ 20] Amanda N Buerger, David T Dillon, Jordan Schmidt, Tao Yang, Jasenka Zubce-
vic, Christopher J Martyniuk, and Joseph H Bisesi Jr. Gastrointestinal dysbiosis
following diethylhexyl phthalate exposure in zebrafish (danio rerio): Altered mi-
crobial diversity, functionality, and network connectivity. Environmental Pollution,
265:114496, 2020.
[Ber87] J. Bertrand. Solution d’un problème. CR Acad. Sci. Paris, 105:369, 1887.
[BFT16] Sara Billey, Matthew Fahrbach, and Alan Talmage. Coefficients and roots of peak
polynomials. Exp. Math., 25(2):165–175, 2016.
[BM03] Mireille Bousquet-Mélou. Four classes of pattern-avoiding permutations under one
roof: generating trees with two labels. Electron. J. Combin., 9(2):Research paper
19, 31, 2002/03. Permutation patterns (Otago, 2003).
[Bón04] Miklós Bóna. Combinatorics of permutations. Discrete Mathematics and its Ap-
plications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2004.
[BR11] K. A. Bratton and S. M. Rouse. Networks in the legislative arena: How group
dynamics affect cosponsorship. Legislative Studies Quarterly, 36:423–460, 2011.
[BR17] Pierre-Alexandre Balland and David Rigby. The geography of complex knowledge.
Economic Geography, 93(1):1–23, 2017.
[BS00] Eric Babson and Einar Steingrímsson. Generalized permutation patterns and a
classification of the Mahonian statistics. Sém. Lothar. Combin., 44:Art. B44b, 18,
2000.
[BST99] Jonathan V Beaverstock, Richard G Smith, and Peter J Taylor. A roster of world
cities. cities, 16(6):445–458, 1999.
[Cal02] David Callan. Pattern avoidance in circular permutations. Preprint arXiv:0210014,
2002.
[Car15] C. J. Carstens. Proof of uniform sampling of binary matrices with fixed row sums
and column sums for the fast curveball algorithm. Physical Review E, 91(4), Apr
2015.
[CGHK78] F. R. K. Chung, R. L. Graham, V. E. Hoggatt, Jr., and M. Kleiman. The number of
Baxter permutations. J. Combin. Theory Ser. A, 24(3):382–394, 1978.
[CN06] Gabor Csardi and Tamas Nepusz. The igraph software package for complex
network research, 2006.
[CP87] G. A. Caldeira and S. C. Patterson. Political friendship in the legislature. The
Journal of Politics, 49:953–975, 1987.
[CVDLO+ 17] Francis Castro-Velez, Alexander Diaz-Lopez, Rosa Orellana, José Pastrana, and
Rita Zevallos. The number of permutations with the same peak set for signed
permutations. J. Comb., 8(4):631–652, 2017.
131
[CW90] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic
progressions. Journal of Symbolic Computation, 9:251–280, 1990.
[CWW18] Tristan JB Cann, Iain S Weaver, and Hywel TP Williams. Is it correct to project and
detect? assessing performance of community detection on unipartite projections
of bipartite networks. In International Conference on Complex Networks and their
Applications, pages 267–279. Springer, 2018.
[CXW+ 18] Xing Chen, Di Xie, Lei Wang, Qi Zhao, Zhu-Hong You, and Hongsheng Liu.
BNPMDA: Bipartite network projection for mirna–disease association prediction.
Bioinformatics, 34(18):3178–3186, 2018.
[DDJ+ 12] Theodore Dokos, Tim Dwyer, Bryan P. Johnson, Bruce E. Sagan, and Kimberly
Selsor. Permutation patterns and statistics. Discrete Math., 312(18):2760–2775,
2012.
[Dia75] Jared M Diamond. Assembly of species communities. In M. L. Cody and J. M.
Diamond, editors, Ecology and evolution of communities, pages 342–444. Harvard
University Press, 1975.
[Dia16] Navid Dianati. Unwinding the hairball graph: Pruning algorithms for weighted
complex networks. Physical Review E, 93(1):012304, 2016.
[DL16] Ben Derudder and Xingjian Liu. How international is the annual meeting of the
association of american geographers? A social network analysis perspective. Envi-
ronment and Planning A, 48(2):309–329, 2016.
[DLHH+ 21] Alexander Diaz-Lopez, Pamela E. Harris, Isabella Huang, Erik Insko, and Lars
Nilsen. A formula for enumerating permutations with a fixed pinnacle set. Discrete
Math., 344(6):112375, 2021.
[DLHIO17a] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Mohamed Omar. A proof
of the peak polynomial positivity. Sém. Lothar. Combin., 78B:Art. 6, 9, 2017.
[DLHIO17b] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Mohamed Omar. A proof
of the peak polynomial positivity conjecture. J. Combin. Theory Ser. A, 149:21–29,
2017.
[DLHIPL17] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Darleen Perez-Lavin.
Peak sets of classical Coxeter groups. Involve, 10(2):263–290, 2017.
[DLIN21] Alexander Diaz-Lopez, Erik Insko, and Lars Nilsen. Pinnacle ordering. In prepa-
ration, 2021.
[DLM+ 21a] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie
Schmidt, and Alexander Sietsema. Cyclic pattern containment and avoidance.
arXiv:2106.02534 [math], Jun 2021. arXiv: 2106.02534.
132
[DLM+ 21b] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie Schmidt,
and Alexander Sietsema. Cyclic shuffle compatibility. arXiv:2106.10182 [math],
Jun 2021. arXiv: 2106.10182.
[DLM+ 21c] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie Schmidt,
and Alexander Sietsema. Pinnacle set properties. arXiv:2105.10388 [math], May
2021. arXiv: 2105.10388.
[DMSK15] B. A. Desmarais, V. G. Moscardelli, B. F. Schaffner, and M. S. Kowal. Measuring
legislative collaboration: The senate press events network. Social Networks, 40:43–
54, 2015.
[DNKPT18] Robert Davis, Sarah A. Nelson, T. Kyle Petersen, and Bridget E. Tenner. The
pinnacle set of a permutation. Discrete Math., 341(11):3249–3270, 2018.
[DNS20] Rachel Domagalski, Zachary P. Neal, and Bruce Sagan. backbone: Extracts the
Backbone from Weighted Graphs, 2020. R package version 1.2.0.
[DNS21] Rachel Domagalski, Zachary P Neal, and Bruce Sagan. Backbone: An R package
for extracting the backbone of bipartite projections. PloS One, 16(1):e0244363,
2021.
[Dru16] L. Drutman. American politics has reached peak polarization, 2016.
[DT05] Ben Derudder and Peter Taylor. The cliquishness of world cities. Global Networks,
5(1):71–91, 2005.
[ES35] P. Erdős and G. Szekeres. A combinatorial problem in geometry. Compositio Math.,
2:463–470, 1935.
[ES21] Sergi Elizalde and Bruce Sagan. Consecutive patterns in circular permutations.
2021.
[Fan21] Wenjie Fang. Efficient recurrence for the enumeration of permutations with fixed
pinnacle set. arXiv:2106.09147 [math], Jun 2021. arXiv: 2106.09147.
[FNT21] Justine Falque, Jean-Christophe Novelli, and Jean-Yves Thibon. Pinnacle sets re-
visited. arXiv:2106.05248 [math], Jun 2021. arXiv: 2106.05248.
[Fon20] Christian Fong. Expertise, networks, and interpersonal influence in congress. The
Journal of Politics, 82(1):269–284, 2020.
[Fow06a] J. H. Fowler. Connecting the congress: A study of cosponsorship networks. Political
Analysis, 14:456–487, 2006.
[Fow06b] J. H. Fowler. Legislative cosponsorship networks in the us house and senate. Social
Networks, 28:454–465, 2006.
[Fri86] John Friedmann. The world city hypothesis. Development and change, 17(1):69–83,
1986.
133
[GL04] Jean-Loup Guillaume and Matthieu Latapy. Bipartite structure of all complex
networks. Information Processing Letters, 90(5):215–221, 2004.
[GLW18] Daniel Gray, Charles Lanning, and Hua Wang. Pattern containment in circular
permutations. Integers, 18B:Paper No. A4, 13, 2018.
[GLW19] Daniel Gray, Charles Lanning, and Hua Wang. Patterns in colored circular permu-
tations. Involve, 12(1):157–169, 2019.
[Got00] Nicholas J Gotelli. Null model analysis of species co-occurrence patterns. Ecology,
81(9):2606–2621, 2000.
[GSPA07] Roger Guimera, Marta Sales-Pardo, and Luís A Nunes Amaral. Module identifica-
tion in bipartite and directed networks. Physical Review E, 76(3):036102, 2007.
[HBKM09] Emilie M Hafner-Burton, Miles Kahler, and Alexander H Montgomery. Network
analysis for international relations. International organization, pages 559–592,
2009.
[HFC16] Eelke M Heemskerk, Meindert Fennema, and William K Carroll. The global
corporate elite after the financial crisis: evidence from the transnational network of
interlocking directorates. Global Networks, 16(1):68–88, 2016.
[HKBH07] César A Hidalgo, Bailey Klinger, A-L Barabási, and Ricardo Hausmann. The
product space conditions the development of nations. Science, 317(5837):482–487,
2007.
[Hol79] Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian
journal of statistics, pages 65–70, 1979.
[Ing15] Christopher Ingraham. A stunning visualization of our divided congress. Washing-
ton Post, Apr 2015.
[Jac61] Jane Jacobs. The death and life of great American cities. Random House, 1961.
[Kir11] J. H. Kirkland. The relational determinants of legislative outcomes: Strong and
weak ties between legislators. The Journal of Politics, 73:887–898, 2011.
[KK96] Daniel Kessler and Keith Krehbiel. Dynamics of cosponsorship. The American
Political Science Review, 90(3):555–566, 1996.
[KMN16] G. Koger, S. Masket, and H. Noel. No disciplined army: American political parties
as networks. In J. N. Victor, A. H. Montgomery, and Lubell M., editors, The Oxford
Handbook of Political Netwokrs, chapter 18, pages 453–470. Oxford University
Press, Oxford, 2016.
[Kre00] Darla Kremer. Permutations with forbidden subsequences and a generalized
Schröder number. Discrete Math., 218(1-3):121–130, 2000.
134
[LCH06] Geoffrey C Layman, Thomas M Carsey, and Juliana Menasce Horowitz. Party
polarization in american politics: Characteristics, causes, and consequences. Annu.
Rev. Polit. Sci., 9:83–110, 2006.
[LD20] Chengliang Liu and Dezhong Duan. Spatial inequality of bus transit dependence
on urban streets and its relationships with socioeconomic intensities: A tale of two
megacities in china. Journal of Transport Geography, 86:102768, 2020.
[LMDV08] Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio. Basic notions for
the analysis of large two-mode networks. Social Networks, 30(1):31–48, 2008.
[LR16] J. Liebig and A. Rao. Fast extraction of the backbone of projected bipartite networks
to aid community detection. Europhysics Letters, 113(2):28003, 2016.
[MLLS21] Federico Marini, Annekathrin Ludt, Jan Linke, and Konstantin Strauch. Gene-
tonic: an r/bioconductor package for streamlining the interpretation of rna-seq data.
bioRxiv, 2021.
[MM13] J. Moody and P. J. Mucha. Portrait of political party polarization. Network Science,
1:119–121, 2013.
[MSLC01] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in
social networks. Annual Review of Sociology, 27:415–444, 2001.
[NDS21a] Z. P. Neal, R. Domagalski, and B. Sagan. Analysis of spatial networks from bipartite
projections using the R backbone package. Geographical Analysis, 2021.
[NDS21b] Zachary P. Neal, Rachel Domagalski, and Bruce Sagan. Comparing models for
extracting the backbone of bipartite projections. arXiv:2105.13396 [cs, stat], Jun
2021. arXiv: 2105.13396.
[NDY22] Zachary P Neal, Rachel Domagalski, and Xiaoqin Yan. Homophily in collaborations
among us house representatives, 1981–2018. Social Networks, 68:97–106, 2022.
[Nea08] Zachary P. Neal. The duality of world cities and firms: comparing networks,
hierarchies, and inequalities in the global economy. Global Networks, 8(1):94–115,
2008.
[Nea12] Zachary P. Neal. Structural determinism in the interlocking world city network.
Geographical Analysis, 44(2):162–170, 2012.
[Nea13] Zachary P. Neal. Identifying statistically significant edges in one-mode projections.
Social Network Analysis and Mining, 3(4):915–924, Dec 2013.
[Nea14] Zachary P. Neal. The backbone of bipartite projections: Inferring relationships
from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Social
Networks, 39:84–97, Oct 2014.
[Nea20] Zachary P. Neal. A sign of the times? Weak and strong polarization in the us
congress, 1973–2016. Social Networks, 60:103–112, 2020.
135
[New01] Mark EJ Newman. Scientific collaboration networks. I. Network construction and
fundamental results. Physical Review E, 64(1):016131, 2001.
[NG04] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure
in networks. Physical review E, 69(2):026113, 2004.
[NN20] Zachary P. Neal and Jennifer W Neal. Out of bounds? The boundary specification
problem for centrality in psychological networks, Aug 2020.
[NP03] Mark EJ Newman and Juyong Park. Why social networks are different from other
types of networks. Physical review E, 68(3):036122, 2003.
[PMNW05] M. A. Porter, P. J. Mucha, M. E. Newman, and C. M. Warmbrand. A network
analysis of committees in the us house of representatives. Proceedings of the
National Academy of Sciences, 102:7057–7062, 2005.
[Poo18] Ate Poorthuis. How to draw a neighborhood? the potential of big data, regionaliza-
tion, and community detection for understanding the heterogeneous nature of urban
neighborhoods. Geographical Analysis, 50(2):182–203, 2018.
[PWK19] Vladimír Pažitka, Dariusz Wójcik, and Eric Knight. Critiquing construct validity
in world city network research: Moving from office location networks to inter-
organizational projects in the modeling of intercity business flows. Geographical
Analysis, 2019.
[R C18] R Core Team. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria, 2018.
[RNH13] N. Ringe, Victor J. N., and Gross J. H. Keeping your friends close and your enemies
closer? information networks in legislative politics. British Journal of Political
Science, 43:601–628, 2013.
[RT] Irena Rusu and Bridget E. Tenner. Admissible pinnacle orderings. Preprint
arXiv:2001.08185.
[Rus20] Irena Rusu. Sorting permutations with fixed Pinnacle set. Electron. J. Combin.,
27(3):Paper No. 3.23, 21, 2020.
[Sag20] Bruce E. Sagan. Combinatorics: The art of counting, volume 210 of Graduate
Studies in Mathematics. American Mathematical Society, Providence, RI, 2020.
[San00] James G Sanderson. Testing ecological patterns. American Scientist, 88(4):332,
2000.
[SB20] David Schoch and Ulrik Brandes. Legislators’ roll-call voting behavior increasingly
corresponds to intervals in the political spectrum. Scientific Reports, 10(1):1–9,
2020.
136
[SBV09] M Ángeles Serrano, Marián Boguná, and Alessandro Vespignani. Extracting the
multiscale backbone of complex weighted networks. Proceedings of the National
Academy of Sciences, 106(16):6483–6488, 2009.
[SCS17] Mika J Straka, Guido Caldarelli, and Fabio Saracco. Grand canonical validation of
the bipartite international trade network. Physical Review E, 96(2):022306, 2017.
[SDCGS15] Fabio Saracco, Riccardo Di Clemente, Andrea Gabrielli, and Tiziano Squartini.
Randomizing bipartite networks: the case of the world trade web. Scientific Reports,
5(1):1–18, 2015.
[SNB+ 14] Giovanni Strona, Domenico Nappo, Francesco Boccacci, Simone Fattorini, and
Jesus San-Miguel-Ayanz. A fast and unbiased procedure to randomize ecological
binary matrices with fixed row and column totals. Nature Communications, 5:4114,
Jun 2014.
[SR12] Christian Stegbauer and Alexander Rausch. How international are international
congresses? Connections, 32(1):1–11, 2012.
[SS85] Rodica Simion and Frank W. Schmidt. Restricted permutations. European J.
Combin., 6(4):383–406, 1985.
[SSDC+ 17] Fabio Saracco, Mika J Straka, Riccardo Di Clemente, Andrea Gabrielli, Guido
Caldarelli, and Tiziano Squartini. Inferring monopartite projections of bipartite
networks: An entropy-based approach. New Journal of Physics, 19(5):053022,
2017.
[Sta97] Richard P. Stanley. Enumerative Combinatorics. Vol. 1, volume 49 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1997.
With a foreword by Gian-Carlo Rota, Corrected reprint of the 1986 original.
[Sta99] Richard P. Stanley. Enumerative Combinatorics. Vol. 2, volume 62 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1999.
With a foreword by Gian-Carlo Rota and appendix 1 by Sergey Fomin.
[Ste97] John R. Stembridge. Enriched 𝑃-partitions. Trans. Amer. Math. Soc., 349(2):763–
788, 1997.
[SUG18] Giovanni Strona, Werner Ulrich, and Nicholas J. Gotelli. Bi-dimensional null model
analysis of presence-absence binary matrices. Ecology, 99(1):103–115, 2018.
[Tay01] Peter J Taylor. Specification of the world city network. Geographical analysis,
33(2):181–194, 2001.
[Tay04] Peter J Taylor. World city network: a global urban analysis. Routledge, 2004.
[TCW02] Peter J Taylor, Gilda Catalano, and David RF Walker. Measurement of the world
city network. Urban Studies, 39(13):2367–2376, 2002.
137
[TML+ 11] Michele Tumminello, Salvatore Miccichè, Fabrizio Lillo, Jyrki Piilo, and Rosario N.
Mantegna. Statistically validated networks in bipartite complex systems. PLoS One,
6(3):e17994, Mar 2011.
[Tol21] Jeff Tollefson. Tracking QAnon: How Trump turned conspiracy-theory research
upside down. Nature, 2021.
[USG20] USGPO. govinfo – Bulk Data - Bill Status. United States Government Publishing
Office (GPO), 2020.
[VFO20] Demival Vasques Filho and Dion R. J. O’Neale. Transitivity and degree assortativity
explained: The bipartite structure of social networks. Phys. Rev. E, 101:052305,
May 2020.
[VMND16] Michiel Van Meeteren, Zachary P. Neal, and Ben Derudder. Disentangling ag-
glomeration and network externalities: A conceptual typology. Papers in Regional
Science, 95(1):61–80, 2016.
[Wes95] Julian West. Generating trees and the Catalan and Schröder numbers. Discrete
Math., 146(1-3):247–262, 1995.
[Wes96] Julian West. Generating trees and forbidden subsequences. In Proceedings of
the 6th Conference on Formal Power Series and Algebraic Combinatorics (New
Brunswick, NJ, 1994), volume 157, pages 363–374, 1996.
[XCB20] Wenna Xi, Catherine A Calder, and Christopher R Browning. Beyond activity
space: Detecting communities in ecological networks. Annals of the American
Association of Geographers, pages 1–20, 2020.
[ZH05] Bin Zhang and Steve Horvath. A general framework for weighted gene co-expression
network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1),
2005.
[ZK11] Katharina Anna Zweig and Michael Kaufmann. A systematic approach to the
one-mode projection of bipartite graphs. Social Network Analysis and Mining,
1(3):187–218, Jul 2011.
138