ON PERMUTATION PATTERNS, PINNACLE SETS, AND BACKBONES OF BIPARTITE PROJECTIONS By Rachel Domagalski A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematics – Doctor of Philosophy 2021 ABSTRACT ON PERMUTATION PATTERNS, PINNACLE SETS, AND BACKBONES OF BIPARTITE PROJECTIONS By Rachel Domagalski This dissertation encompasses the study of two different fields, one regarding permutations in- cluding pattern containment and pinnacle sets, and the other on weighted networks, specifically bipartite projections and their backbones. The study of pattern containment and avoidance for linear permutations is a well-established area of enumerative combinatorics. A cyclic permutation is the set of all rotations of a linear permutation. Callan initiated the study of permutation avoidance in cyclic permutations and characterized the avoidance classes for all single permutations of length 4. We continue this work. In particular, we establish a cyclic variant of the Erdős-Szekeres Theorem that any linear permutation of length 𝑚𝑛 + 1 must contain either the increasing pattern of length 𝑚 + 1 or the decreasing pattern of length 𝑛 + 1. We then derive results about avoidance of multiple patterns of length 4. We also determine generating functions for the cyclic descent statistic on these classes. We then study the pinnacle set, which is the value analogue of a well-studied permutation statistic, the peak set. Let 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 be a permutation in the symmetric group 𝔖𝑛 written in one-line notation. The pinnacle set of 𝜋, denoted Pin 𝜋, is the set of all 𝜋𝑖 such that 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 . The classic peak set statistic consists of the positions of these values. The pinnacle set was introduced by Davis, Nelson, Petersen, and Tenner who showed that it has many interesting properties. In particular, they proved that the number of subsets of [𝑛] = {1, 2, . . . , 𝑛} which can be the pinnacle set of some permutation is a binomial coefficient. Their proof involved a bijection with lattice paths and was somewhat involved. We give a simpler demonstration of this result which does not need lattice paths. Moreover, we show that our map and theirs are different descriptions of the same function. Davis et al. also studied the number of pinnacle sets with maximum 𝑚 and cardinality 𝑑 which they denoted by 𝔭(𝑚, 𝑑). We show that these integers are the well-known ballot numbers and give two proofs of this fact: one using finite differences and one bijective. Diaz- Lopez, Harris, Huang, Insko, and Nilsen found a summation formula for calculating the number of permutations in 𝔖𝑛 having a given pinnacle set. We derive a new expression for this number which is faster to calculate in many cases. We also show how this method can be adapted to find the number of orderings of a pinnacle set which can be realized by some 𝜋 ∈ 𝔖𝑛 . This concludes our research on permutations. Bipartite projections are used in a wide range of network contexts including politics (bill co- sponsorship), geography (firm co-location), genetics (gene co-expression), economics (executive board co-membership), and innovation (patent co-authorship). However, because bipartite pro- jections are always weighted graphs, which are inherently challenging to analyze and visualize, it is often useful to examine the ‘backbone,’ an unweighted subgraph containing only the most significant edges. We introduce the R package backbone for extracting the backbone of weighted bipartite projections, and use two empirical datasets to demonstrate its functionality, bill sponsor- ship data from the 114th session of the United States Senate and a Globalization and World Cities data set regarding firm locations in 2000. After introducing and demonstrating five different models for backbone extraction, the fixed fill model (FFM), fixed row model (FRM), fixed column model (FCM), fixed degree sequence model (FDSM), and stochastic degree sequence model (SDSM), we compare them in terms of accuracy, speed, statistical power, similarity, and community detection. Here, we aim to find which models perform similarly to FDSM, since the FDSM model controls for both degree sequences exactly. We find that the computationally-fast SDSM offers a statistically conservative but close approx- imation of the computationally-impractical FDSM under a wide range of conditions, and that it correctly recovers a known community structure even when the signal is weak. Therefore, although each backbone model may have particular applications, we recommend SDSM for extracting the backbone of most bipartite projections. Copyright by RACHEL DOMAGALSKI 2021 ACKNOWLEDGEMENTS First I would like to thank my advisor, Dr. Bruce Sagan, for all his help and support since I arrived at MSU. You have always made me feel like I can do and achieve anything and have helped me feel capable in my skills as a mathematician. I am so grateful for your advice and kindness. To Dr. Zachary Neal, thank you for taking me under your wing and into your research projects. You have shown me a new way of applying mathematics and it’s been so much fun and so exciting working with you. Thank you for your support and trust. Without both of your encouragement and guidance, this thesis would not be possible. To my committee, Drs. Robert Bell, Peter Magyar, and Elizabeth Munch, thank you so much for dedicating your time to preparing me to become a mathematician worthy of this degree. To Dr. Sivaram Narayan, thank you for your guidance and inspiring me to pursue graduate school. I greatly appreciate your wisdom, support, and friendship through my undergrad, masters, PhD, and beyond. My collaborators and friends, Jinting Liang, Quinn Minnich, Jamie Schmidt, Alex Sietsema, and Xiaoqin Yan, thank you for your encouragement and knowledge. It’s been a pleasure working with you all both in-person and online. You are all destined for great things, and I can’t wait to see what you accomplish. Davis, words cannot even begin to come close to cover how grateful I am for your support over the past decade. Through every life step we’ve taken, you’ve always encouraged me to follow every dream and passion. I love you. Here’s to our next chapter with the beautiful family we’ve made. Which brings me to our dogs, Atlas and Tsuki. You two are magic. All extra hours post-graduation go to you my loves. Mom and Dad, thank you for instilling a love of math and science and discovery in me from day one. I’m so lucky to have such wonderful people as parents. Thank you for being my biggest cheerleaders and for all of your love. Steven, my best friend since birth. Thank you for always being there for me, no matter how many miles between us. You inspire me every single day. I love you all and can’t wait to be together again. v To my cohort and beyond, I couldn’t have done this without you. The laughter, the late nights, the self-deprecating humor, the game nights, the life talks, the tree climbing, the walks to the river, the El Oasis taco truck trips, the downs, and the so so many ups, I love you all and miss seeing you daily. To my friends, especially Emilee, Nikki, Olivia, Brooke, Paige, and to my extended family, thank you for your support, love, and all the joy we have shared. Finally, thank you to the faculty and staff members of the mathematics department at Michigan State University for creating such a wonderful home during my time here, and those at Central Michigan University and Holly High School who got me here in the first place. Portions of Chapter 3 appear in “Domagalski, R., Liang, J., Minnich, Q., Sagan, B. E., Schmidt, J., & Sietsema, A. (2021). Cyclic Pattern Containment and Avoidance. ArXiv:2106.02534 [Math]. http://arxiv.org/abs/2106.02534” and are reprinted here under a CC BY 4.0 license. Portions of Chapter 4 appear in “Domagalski, R., Liang, J., Minnich, Q., Sagan, B. E., Schmidt, J., & Sietsema, A. (2021). Pinnacle Set Properties. ArXiv:2105.10388 [Math]. http://arxiv.org/abs/2105.10388” and are reprinted here under a CC BY 4.0 license. Portions of Chapter 7 were originally published in “Domagalski, R., Neal, Z. P., & Sagan, B. (2021). Backbone: An R package for extracting the backbone of bipartite projections. Plos one, 16(1), e0244363,” reprinted here under a CC BY 4.0 license, and in “Neal, Z. P., Domagalski, R., & Sagan, B. (2021). Analysis of Spatial Networks From Bipartite Projections Using the R Backbone Package. Geographical Analysis. https://doi.org/10.1111/gean.12275 [NDS21a],” and “Neal, Z. P., Domagalski, R., & Yan, X. (2022). Homophily in collaborations among US House Representatives, 1981–2018. Social Networks, 68, 97–106. https://doi.org/10.1016/j.socnet.2021.04.007,“ both reprinted with journal permissions. Portions of Chapters 6 and 8 originally appeared in “Neal, Z. P., Domagalski, R., & Sagan, B. (2021). Comparing Models for Extracting the Backbone of Bipartite Projections. arXiv preprint arXiv:2105.13396.” They are reprinted here under a CC BY-SA 4.0 license. The work in Chapters 6-8 was supported by funding from the National Science Foundation (#1851625 & #2016320) and Michigan State University Center for Business and Social Analytics. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BACKGROUND ON PERMUTATION PATTERNS AND STATISTICS . . 5 CHAPTER 3 CYCLIC PATTERN CONTAINMENT AND AVOIDANCE . . . . . . . . . 13 3.1 A cyclic Erdős-Szekeres Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Pattern avoidance of doubletons . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Three or more patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Cyclic descent generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5 Open problems and concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.1 Longer patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.2 Other statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.3 Vincular patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 CHAPTER 4 PINNACLE SET PROPERTIES . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1 Counting admissible pinnacle sets . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Ballot numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Permutations with a given pinnacle set . . . . . . . . . . . . . . . . . . . . . . . . 47 4.4 Open problems and concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 62 CHAPTER 5 BACKGROUND ON BACKBONE EXTRACTION . . . . . . . . . . . . . 64 CHAPTER 6 BACKBONE MODELS AND THEIR PROBABILITY MASS FUNCTIONS 69 6.1 Bipartite ensemble backbone models . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Fixed degree sequence model (FDSM) . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3 Fixed fill model (FFM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4 Fixed row model (FRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5 Fixed column model (FCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.6 Stochastic degree sequence model (SDSM) . . . . . . . . . . . . . . . . . . . . . 78 CHAPTER 7 BACKBONE: AN R PACKAGE FOR EXTRACTING THE BACKBONE OF WEIGHTED GRAPHS . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1 Two Illuminating Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.1 Legislative Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.2 Spatial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.2 Universal Threshold universal() . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3 Fixed fill model fixedfill() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.4 Fixed row model fixedrow() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 vii 7.5 Fixed column model fixedcol() . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.6 Stochastic degree sequence model sdsm() . . . . . . . . . . . . . . . . . . . . . . 103 7.7 Fixed degree sequence model fdsm() . . . . . . . . . . . . . . . . . . . . . . . . 107 CHAPTER 8 COMPARING MODELS FOR BACKBONE EXTRACTION . . . . . . . . 111 8.1 Study 1: Choosing cell-filling probabilities for the SDSM . . . . . . . . . . . . . . 111 8.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2 Study 2: Statistical power of SDSM . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 8.3 Study 3: Backbone equivalence under varying degree distributions . . . . . . . . . 117 8.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.4 Study 4: Recovery of community structure . . . . . . . . . . . . . . . . . . . . . . 122 8.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.5 Recommendations for Backbone Selection . . . . . . . . . . . . . . . . . . . . . . 126 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 viii LIST OF TABLES Table 3.1: Wilf equivalence classes and cardinalities of Av𝑛 [Π] for certain [Π] and 𝑛 ≥ 5 . 26 Table 4.1: Run times in seconds compared when most 𝑛𝑖 are equal . . . . . . . . . . . . . . 59 Table 4.2: Run times in seconds compared when most 𝑛𝑖 are constant . . . . . . . . . . . . 60 Table 8.1: SDSM probabilities given agent and artifact degree sequences [1,1,2] . . . . . . 113 Table 8.2: Bipartite degree distributions, with examples in the context of a scholarly authorship bipartite network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 ix LIST OF FIGURES Figure 2.1: The graph of 42351 on the left and of [42351] on the right . . . . . . . . . . . 6 Figure 2.2: The diagram of 132 (left) and 132h𝜎1 , 𝜎2 , 𝜎3 i (right) . . . . . . . . . . . . . . 7 Figure 3.1: The graph of [𝜎] when 𝑚 = 5 and 𝑛 = 3 . . . . . . . . . . . . . . . . . . . . . 14 Figure 4.1: The lattice path 𝐿 for 𝐴 = {2, 3, 7, 9} . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 4.2: Example of a pinnacle set ordering [𝜏] = [7612354] with corresponding dales. . 51 Figure 5.1: Bipartite and bipartite projection networks . . . . . . . . . . . . . . . . . . . . 67 Figure 7.1: An example of an extracted backbone, with Democratic senators represented by blue vertices, and Republican senators represented by red vertices. . . . . . . 83 Figure 7.2: The distribution of (A) row sums and (B) column sums in the GaWC Dataset 11. 90 Figure 7.3: The positive backbone of the US Senate co-sponsorship network with edges retained between two senators if they sponsored at least 1 bill together. . . . . . 93 Figure 7.4: The positive backbone of the US Senate co-sponsorship network with edges retained between two senators if they sponsored more bills together than one standard deviation above the mean. . . . . . . . . . . . . . . . . . . . . . . . . 96 Figure 7.5: The positive backbone of the US Senate co-sponsorship network under the fixed row model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 7.6: The positive backbone of the US Senate co-sponsorship network under the fixed column model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 7.7: The positive backbone of the US Senate co-sponsorship network under the stochastic degree sequence model. . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure 7.8: Null weight distributions generated using the backbone package on from the GaWC Dataset 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Figure 7.9: A histogram of the expected co-sponsorships between Senators Cory Booker and Elizabeth Warren under the fixed degree sequence model (1000 samples). A positive edge between Booker and Warren would be preserved in the FDSM backbone because their actual number of co-sponsorships (98) is statistically significantly larger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 x Figure 7.10: The positive backbone of the US Senate co-sponsorship network under the fixed degree sequence model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 8.1: (A) Accuracy and (B) speed computing 𝑝𝑖𝑘 ∗ using different methods. . . . . . . 114 Figure 8.2: Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan edge in projections derived from FDSM and SDSM ensembles. (B) Similarity of an FDSM backbone extracted at 𝛼 = 0.05 to SDSM backbones extracted at various 𝛼 from an empirical bipartite network (green line) and from 100 synthetic bipartite networks (purple line = mean, purple region = 10th –90th percentile). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Figure 8.3: Jaccard similarity of a backbone extracted at 𝛼 = 0.05 using the Fixed Degree Sequence Model and a backbone extracted using (A) the Fixed Fill Model, (B) Fixed Row Model, (C) Fixed Column Model, (D) Stochastic Degree Sequence Model. Each cell represents the mean over 100 instances of a 100 × 100 bipartite network with given agent and artifact degree distributions. . 120 Figure 8.4: (A) Given agent and artifact degree distributions, there exists a statistical significance level 𝛼 that maximizes the similarity between an SDSM backbone extracted at this level and an FDSM backbone extracted at 𝛼 = 0.05, and (B) when used yields an SDSM backbone that is very similar to the corresponding FDSM backbone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Figure 8.5: (A) Synthetic bipartite networks with varying levels of block structure, from which (B) backbones extracted using different models exhibit varying mod- ularity. (C) When 65% of bipartite edges are within-block, a backbone extracted using FDSM shows a clear group structure (top) while a backbone extracted using FCM does not (bottom). . . . . . . . . . . . . . . . . . . . . . 125 xi CHAPTER 1 INTRODUCTION This doctoral thesis is the culmination of two combinatorial projects. The first explores permutation patterns and statistics, specifically looking at pattern containment and avoidance of cyclic permu- tations, and generating functions of cyclic descent statistics. Additionally, we study a particular permutation statistic, the pinnacle set. The second project involves bipartite projections, a type of weighted graph. When a weighted graph represents a social relationship, it is of interest to know whether an edge weight should be considered particularly strong or weak. We provide various probabilistic null models to which one can compare an edge weight to determine its statistical significance. Edges deemed significant are part of the backbone subgraph. The initial three chapters will describe the project on permutations, beginning with background information in chapter 2, then discussing permutation patterns and avoidance in chapter 3, and fi- nally pinnacle set properties in chapter 4. We begin by expanding on the well-studied field of pattern avoidance in linear permutations by considering its implications in cyclic permutations. Specifi- cally, we begin chapter 3 by proving a cyclic variant of the Erdős-Szekeres theorem in section 3.1. This new theorem states that in any cyclic permutation of size 𝑚𝑛 + 2, there is either an increasing subsequence of length 𝑚 + 2 or a decreasing subsequence of length 𝑛 + 2. This theorem becomes of great use in our study of length four pattern avoidance in sections 3.2 and 3.3. While linear pattern avoidance has origins reaching back to the early 1900’s, the study of cyclic pattern avoidance was introduced relatively recently by Callan [Cal02] in 2002. He was able to count the number of cyclic permutations that avoid single patterns of length four (length three pattern avoidance being relatively trivial). We complete this study of length four pattern avoidance by counting the number of cyclic permutations that avoid any set of length four patterns, specifically providing proof for all pairs and triples. These proofs utilize the proof technique of generating trees. As the cardinality of the set of patterns increases, the number of permutations that avoid the set decreases. These results allow us to completely count all avoidance sets of any size of length four patterns. After 1 this classification, we discuss cyclic descent generating functions in section 3.4. These generating functions allow us to count the numbers of cyclic descents in permutations that avoid a given set of patterns, refining our enumerations of the avoidance classes. Chapter 3 is concluded by a section on open problems raised within this work, namely now that patterns of length three and four are characterized, future projects could include looking for enumerative formulas for patterns of length five and higher. It is also of interest to look at the generating functions for other permutation statistics over the avoidance classes. We provide one result which counts the joint distribution of cyclic descents and cyclic peaks. Additionally vincular pattern avoidance can be studied. In this scenario, occurrences of the pattern in a permutation may require different elements to be adjacent to one another. We conjecture an exponential generating function which will count the number of permutations that avoid 123 and 213, concluding chapter 3. Recently, Sergi Elizalde and Bruce Sagan have proven this conjecture [ES21]. Using the background on permutations and permutation statistics presented in chapter 2, in chap- ter 4 we will explore the pinnacle set of a permutation and prove a number of results related to counting either the number of pinnacle sets or the number of permutations with a given pinnacle set. In section 4.1, we reprove a result of [DNKPT18] that counts the number of pinnacle sets. Their proof involved lattice paths and was somewhat complicated, while ours is a simpler demonstration that does not need lattice paths. In fact, we show that our map and theirs are different descriptions of the same function. We then turn our attention to counting pinnacle sets with a defined maximum and size in section 4.2. While [DNKPT18] proved these counts satisfied a nice recurrence, they did not provide a formula to find the exact count. We show that these counts are actually just ballot numbers, and do this in two ways: using the theory of finite differences and via a bijection. Since we now have counts of the number of pinnacle sets of given sizes, it is natural to turn one’s attention to counting the number of permutations with a given pinnacle set. We address this area in section 4.3. While a summation formula that counts such permutations was given in [DLHH+ 21], we construct a new formula that is more computationally efficient in many cases. We also show how this formula can be modified to answer a similar question: how many admissible orderings of a pinnacle set are 2 there? Both of the enumerations found in this section have been of great interest to the research community in recent weeks, and we conclude this chapter by describing the recent progress made in constructing even faster formulas in section 4.4, which completes our study of permutations. The remaining chapters will discuss the backbone of a weighted network. We begin by introducing the concept of bipartite projections and backbone extraction in chapter 5. While bipartite networks are used to describe and represent a wide range of scenarios, their projections are challenging to analyze as they are dense and weighted. In addition, the projection loses information about the original row and column degree sequences of the bipartite network. Ideally, we’d like to reduce the complexity of these networks to a backbone network that contains only the most important edges. The edges retained should be those that had a higher or lower weight than would be expected in a random scenario. To find these backbone networks, we introduce five different bipartite ensemble backbone models in chapter 6. Each of the different bipartite ensemble models constrain the degree sequences of the set of all bipartite networks to which we compare our data. We prove the probability mass functions for the stochastic degree sequence model (SDSM), fixed row model (FRM), fixed column model (FCM), and fixed fill model (FFM). The FDSM is considered the ‘gold standard’ model as it exactly fixes both degree sequences. However, its distribution remains unknown, and therefore we must approximate it through Monte Carlo methods. While methods for backbone extraction including a few of the ones mentioned above have existed in the literature for several years, there did not exist one central software package or program where they were all implemented. This meant that researchers who wanted to find a backbone of their network would have to first find which method they wanted to use, potentially guessing which was best for their purposes, and then see if the algorithm was already implemented or available for use. To increase the ease of access for backbone methods, we’ve implemented the SDSM, FDSM, FRM, FCM, and FFM in the new R package backbone. The package and its usage are described in chapter 7. To demonstrate how to use backbone, we apply the functions to two different data sets, a legislative network and a spatial network. Through implementing the R package and increasing its user base, we’re often met with the same question from researchers: “which model should be 3 used for my data?” This is the question we investigate in chapter 8. In chapter 8 we consider each of the five aforementioned models and compare their accuracy, speed, statistical power, similarity, and community detection. These analyses are conducted in four studies. In section 8.1, we evaluate the accuracy and speed of different approaches for estimating cell-filling probabilities used by the SDSM. In section 8.2, we evaluate the statistical power of the SDSM relative to the FDSM. In section 8.3, we examine how degree distributions impact the similarity of backbones extracted using different models. In section 8.4, we examine the extent to which backbones extracted using different models accurately recover a known community structure. Finally, we conclude in section 8.5 with recommendations for backbone model selection and opportunities for future model development. 4 CHAPTER 2 BACKGROUND ON PERMUTATION PATTERNS AND STATISTICS We begin by reviewing some notions from the well-studied theory of patterns in (linear) permuta- tions. We then discuss permutation statistics and generating functions for cyclic descents. We’ll finish by exploring what is known about the pinnacle set. The pinnacle set is the value analogue of a particular permutation statistic, the peak set. More information on the topic of patterns in permutations can be found in the texts of Bóna [Bón04], Sagan [Sag20], or Stanley [Sta97, Sta99]. Let N and P be the nonnegative and positive integers, respectively. If 𝑚, 𝑛 ∈ N then we define [𝑚, 𝑛] = {𝑚, 𝑚 + 1, . . . , 𝑛}; if 𝑚 = 1 we then abbreviate to [𝑛] = [1, 𝑛]. Consider the symmetric group 𝔖𝑛 of all permutations 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 of [𝑛] written in one-line notation. We call 𝑛 the length of 𝜋 and write |𝜋| = 𝑛. We will also use this notation to represent the cardinality of a set, where the difference should be clear by context. We will sometimes put commas between the elements of 𝜋 for readability. We say that two sequences of distinct integers 𝜋 = 𝜋1 . . . 𝜋 𝑘 and 𝜎 = 𝜎1 . . . 𝜎𝑘 are order isomorphic, written 𝜋  𝜎, whenever 𝜋𝑖 < 𝜋 𝑗 if and only if 𝜎𝑖 < 𝜎 𝑗 . If 𝜎 ∈ 𝔖𝑛 and 𝜋 ∈ 𝔖𝑘 then 𝜎 contains 𝜋 as a pattern if there is a subsequence 𝜎0 of 𝜎 with |𝜎0 | = 𝑘 and 𝜎0  𝜋. If no such subsequence exists then 𝜎 avoids 𝜋. We use the notation Av𝑛 (𝜋) = {𝜎 ∈ 𝔖𝑛 | 𝜎 avoids 𝜋} for the avoidance class of 𝜋. For example 𝜎 = 42351 contains the pattern 𝜋 = 3241 because of the subsequence 4251, among others. But it avoids 1234 because it has no increasing subsequence of length 4. One can extend this notion to sets of permutations Π by letting Ù Av𝑛 (Π) = {𝜎 ∈ 𝔖𝑛 | 𝜎 avoids all 𝜋 ∈ Π} = Av𝑛 (𝜋). 𝜋∈Π A famous theorem of Erdős and Szekeres [ES35] can be stated in terms of pattern containment and avoidance. Let 𝜄𝑛 = 12 . . . 𝑛 and 𝛿𝑛 = 𝑛 . . . 21 5 5 5 4 4 3 3 2 2 1 1 1 2 3 4 5 1 2 3 4 5 Figure 2.1: The graph of 42351 on the left and of [42351] on the right be the increasing and decreasing permutations of length 𝑛, respectively. Theorem 2.0.1 ([ES35]). Suppose 𝑚, 𝑛 ∈ N. Then any 𝜎 ∈ 𝔖𝑚𝑛+1 contains either 𝜄𝑚+1 or 𝛿𝑛+1 . This is the best possible in that there exist permutations in 𝔖𝑚𝑛 which avoid both 𝜄𝑚+1 and 𝛿𝑛+1 . The diagram of 𝜋 ∈ 𝔖𝑛 is the collection of points (𝑖, 𝜋𝑖 ) in the first quadrant of the Cartesian plane. The graphical representation of 𝜋 = 42351 is given on the left in Figure 2.1. It follows that we can act on 𝜋 with the dihedral group of the square 𝐷 4 = {𝜌0 , 𝜌90 , 𝜌180 , 𝜌270 , 𝑟 0 , 𝑟 1 , 𝑟 −1 , 𝑟 ∞ } where 𝜌 𝜃 is rotation counterclockwise through 𝜃 degrees and 𝑟 𝑚 is reflection in a line of slope 𝑚. We wish to write some of these rigid motions in terms of the one-line notation for 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 . Reflection in a vertical line gives the reversal of 𝜋 which is 𝜋𝑟 = 𝜋𝑛 . . . 𝜋2 𝜋1 . Similarly, reflection in a horizontal line results in the complement of 𝜋 𝜋 𝑐 = 𝑛 + 1 − 𝜋1 , 𝑛 + 1 − 𝜋2 , . . . , 𝑛 + 1 − 𝜋𝑛 . Combining these two operations gives rotation by 180 degree or reverse complement 𝜋𝑟𝑐 = 𝑛 + 1 − 𝜋𝑛 , . . . , 𝑛 + 1 − 𝜋2 , 𝑛 + 1 − 𝜋1 . 6 𝜎2 𝜎3 𝜎1 Figure 2.2: The diagram of 132 (left) and 132h𝜎1 , 𝜎2 , 𝜎3 i (right) We apply any of these operations to sets of permutations by applying them to each element of the set. We can use diagrams to inflate permutations. If we are given 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 ∈ 𝔖𝑛 and permutations 𝜎1 , 𝜎2 , . . . , 𝜎𝑛 then the inflation of 𝜋 by the 𝜎𝑖 is the permutation 𝜋h𝜎1 , 𝜎2 , . . . , 𝜎𝑛 i whose diagram is obtained from that of 𝜋 by replacing each vertex (𝑖, 𝜋𝑖 ) by a copy of 𝜎𝑖 . For example, given 𝜋 = 132 and 𝜎1 , 𝜎2 , 𝜎3 then a schematic of the diagram of 132h𝜎1 , 𝜎2 , 𝜎3 i is given on the right in Figure 2.2. More concretely, if 𝜎1 = 21, 𝜎2 = 1, and 𝜎3 = 213 then 132h𝜎1 , 𝜎2 , 𝜎3 i = 216435. We say that patterns 𝜋 and 𝜋0 are Wilf equivalent, written 𝜋 ≡ 𝜋0, if # Av𝑛 (𝜋) = # Av𝑛 (𝜋0) for all 𝑛 ∈ N, where the hash symbol denotes cardinality. This definition extends in the obvious way to sets of patterns. Note that if 𝜋 and 𝜋0 are Wilf equivalent then both must be in the same 𝔖𝑛 . It is easy to see that if 𝜙 ∈ 𝐷 4 then 𝜋 ≡ 𝜙(𝜋) and so these are called trivial Wilf equivalences. It is well known that all elements of 𝔖3 are Wilf equivalent. Theorem 2.0.2. If 𝜋 ∈ 𝔖3 then # Av𝑛 (𝜋) = 𝐶𝑛 where 𝐶𝑛 = 𝑛+1 1 2𝑛  is the 𝑛th Catalan number. 𝑛 Trivial Wilf equivalence carries over to sets Π of permutations. Simion and Schmidt [SS85] determined all Wilf equivalences among the Av𝑛 (Π) for all Π ⊆ 𝔖3 . 7 A permutation statistic is a map st : ]𝑛≥0𝔖𝑛 → 𝑆 where 𝑆 is some set. Famous permutation statistics include the descent set statistic Des 𝜋 = {𝑖 | 𝜋𝑖 > 𝜋𝑖+1 }, where the elements 𝑖 ∈ Des 𝜋 are called descents and if 𝜋𝑖 < 𝜋𝑖+1 then 𝑖 is called an ascent, the descent number statistic des 𝜋 = # Des 𝜋, the major index statistic Õ maj 𝜋 = 𝑖, 𝑖∈Des 𝜋 the inversion statistic inv 𝜋 = #{(𝑖, 𝑗) | 𝑖 < 𝑗 and 𝜋𝑖 > 𝜋 𝑗 }, the excedance statistic exc 𝜋 = #{𝑖 | 𝜋(𝑖) > 𝑖}, and the peak set statistic Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 }. Returning to the example given in fig. 2.1, the permutation 𝜋 = 42351 has Des 𝜋 = {1, 4}, des 𝜋 = 2, maj 𝜋 = 5, inv 𝜋 = 6, and exc 𝜋 = 2, and Pk 𝜋 = {4}. Let st be a statistic whose range is N and let 𝑞 be a variable. If Π is a set of patterns then its avoidance class has a corresponding generating function 𝐹𝑛st (Π) = 𝐹𝑛st (Π; 𝑞) = 𝑞 st 𝜎 . Õ 𝜎∈Av𝑛 (Π) st Say that Π and Π0 are st-Wilf equivalent and write Π ≡ Π0 if 𝐹𝑛st (Π) = 𝐹𝑛st (Π0) for all 𝑛 ≥ 0. Clearly st-Wilf equivalence implies Wilf equivalence. The maj- and inv-Wilf equivalence classes for Π ⊆ 𝔖3 were determined by Dokos, Dwyer, Johnson, Sagan, and Selsor [DDJ+ 12]. If 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 ∈ 𝔖𝑛 then the corresponding cyclic permutation is the set of all rotations of 𝜋, denoted [𝜋] = {𝜋1 𝜋2 . . . 𝜋𝑛 , 𝜋2 . . . 𝜋𝑛 𝜋1 , . . . , 𝜋𝑛 𝜋1 . . . , 𝜋𝑛−1 }. 8 Continuing our example from the beginning of the section, [42351] = {42351, 23514, 35142, 51423, 14235}. If necessary, we will call permutations from 𝔖𝑛 linear to distinguish them from their cyclic cousins. We also use square brackets to denote cyclic analogues of objects defined in the linear case. For example, [𝔖𝑛 ] is the set of all cyclic permutations of length 𝑛. We say a cyclic permutation [𝜎] contains [𝜋] as a pattern if there is some rotation 𝜎0 of 𝜎 which contains 𝜋 linearly. Otherwise [𝜎] avoids [𝜋]. In our perennial example, even though 42351 avoids 1234 we have that [42351] contains [1234] since the rotation 14235 has the copy 1235 of this pattern. Given a set [Π] of cyclic patterns the cyclic avoidance class Av𝑛 [Π] is defined as expected. Note that when using a specific set of cyclic permutations the square brackets will be put around the permutations themselves, for example, Av𝑛 ([𝜋], [𝜋0]). Callan [Cal02] determined # Av𝑛 [𝜋] for all [𝜋] ∈ [𝔖4 ]. Gray, Lanning, and Wang continued work in this direction considering cyclic packing of patterns [GLW18] and patterns in colored cyclic permutations [GLW19]. The graph of a cyclic permutation [𝜋] is obtained by embedding the graph of 𝜋 on a cylinder. This is indicated on the right in Figure 2.1 by identifying the two dotted arrows. Cyclic Wilf equivalence has the obvious definition. But note that now there are fewer trivial cyclic Wilf equivalences since we need the chosen group element to preserved the cylinder, not just the square. So the only trivial equivalences are [𝜋] ≡ [𝜋𝑟 ] ≡ [𝜋 𝑐 ] ≡ [𝜋𝑟𝑐 ]. (2.1) Certain linear permutation statistics have obvious cyclic analogues. For example, if 𝜋 ∈ 𝔖𝑛 then its cyclic descent number is cdes[𝜋] = #{𝑖 | 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}. Note that this is well defined because the cardinality does not depend on which representative of [𝜋] is chosen. To illustrate, 𝜋 = 23514 has cyclic descents at indices 3 and 5 so cdes[𝜋] = 2. The 9 corresponding generating function 𝐹𝑛cdes [Π] where [Π] is a set of cyclic permutations, and cdes- Wilf equivalence should now need no definition. Note that cdes is another form of the excedance statistic on linear permutations. In particular, if 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 then cdes[𝜋] = exc(𝜋𝑛 , . . . , 𝜋2 , 𝜋1 ) where (𝜋𝑛 , 𝜋𝑛−1 . . . , 𝜋1 ) is cycle notation for the linear permutation which, as a function, sends 𝜋𝑖 to 𝜋𝑖−1 for all 𝑖 modulo 𝑛. We return our attention to the peak set statistic on linear permutations, Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 } ⊆ [2, 𝑛 − 1]. For example, if 𝜋 = 18524376 then Pk 𝜋 = {2, 5, 7} since 𝜋2 = 8, 𝜋5 = 4, and 𝜋7 = 7 are all bigger than the elements directly to their left and right. It is easy to see that 𝑆 ⊆ [2, 𝑛 − 1] is the peak set of some 𝜋 ∈ 𝔖𝑛 if and only if no two elements of 𝑆 are consecutive. So the number of possible peak sets is a Fibonacci number. One could also ask how many permutations have a given peak set. This question was answered by Billey, Burdzy and Sagan. Theorem 2.0.3 ([BBS13]). If 𝑛 ∈ P and 𝑆 ⊆ [2, 𝑛] then #{𝜋 | Pk 𝜋 = 𝑆} = 𝑝(𝑆; 𝑛)2𝑛−#𝑆−1 where # denotes cardinality and 𝑝(𝑆; 𝑛) is a polynomial in 𝑛 depending on 𝑆. It is natural to study the values at the peak indices. This line of research was initiated by Davis, Nelson, Petersen, and Tenner [DNKPT18] and continued by Rusu [Rus20]; Diaz-Lopez, Harris, Huang, Insko, and Nilsen [DLHH+ 21]; and Rusu and Tenner [RT]. Define the pinnacle set of a permutation 𝜋 ∈ 𝔖𝑛 to be Pin 𝜋 = {𝜋𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 } ⊆ [3, 𝑛] Continuing with the example 𝜋 = 18524376 we see that Pin 𝜋 = {4, 7, 8}. Following Davis et al., call a set 𝑆 an admissible pinnacle set if there is some permutation 𝜋 with Pin 𝜋 = 𝑆. They found a 10 criterion for 𝑆 to be admissible which will be useful in this work. This result was stated in recursive fashion, but it is clearly equivalent to the following non-recursive version. Theorem 2.0.4 ([DNKPT18]). Let 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } ⊂ P. The set 𝑆 is an admissible pinnacle set if and only if we have 𝑠𝑖 > 2𝑖 for all 𝑖 ∈ [𝑑]. Davis et al. were able to count the number of admissible pinnacle sets for 𝜋 ∈ 𝔖𝑛 . Theorem 2.0.5 ([DNKPT18]). If A𝑛 = {𝑆 | 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 } then 𝑛 − 1k   #A𝑛 = 𝑛−1 .j 2 They also studied the more refined constants 𝔭(𝑚, 𝑑) = #{𝑆 ∈ A𝑛 | max 𝑆 = 𝑚 and #𝑆 = 𝑑} where 𝑛 ≥ 𝑚. Note that if 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 then 𝑆 is also a pinnacle set of some 𝜋0 ∈ 𝔖𝑛0 for all 𝑛0 ≥ 𝑛 since one can just add values larger than 𝑛 to the beginning of 𝜋 in decreasing order. It follows that the exact value of 𝑛 does not play a role in the definition of 𝔭(𝑚, 𝑑). A number of questions have been raised about pinnacle sets. For example, if 𝑝 𝑆 (𝑛) = #{𝜋 ∈ 𝔖𝑛 | Pin 𝜋 = 𝑆} then how can one compute these numbers as there does not seem to be an analogue of Theorem 2.0.3 in the context of pinnacles. Davis et al. gave a recursive procedure for doing so, and then a non- recursive summation formula for determining the 𝑝 𝑆 (𝑛) was proposed in the paper of Diaz-Lopez et al. 11 Another problem suggested earlier is as follows. Given an admissible 𝑆, a permutation 𝜎 of 𝑆 is called an admissible ordering if there is a 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆 and the pinnacles of 𝜋 occur in the same order as they do in 𝜎. Let O (𝑆) = {𝜎 | 𝜎 is an admissible ordering of 𝑆}. For example, if 𝑆 = {3, 5, 7} then 𝜎 = 537 ∈ O (𝑆) as witnessed by 𝜋 = 4513276. But 375 ∉ O (𝑆) since in order for 6 not to be a pinnacle, it must be directly to the left or right of 7 and both choices lead to a contradiction. The set O𝑆 was studied in the articles of Rusu, and of Rusu and Tenner [Rus20, RT]. In the latter paper, the authors asked for a function to compute #O (𝑆). With these definitions and results in hand, we first examine cyclic pattern containment and avoidance in the following chapter 3. We begin by proving a cyclic version of the Erdős-Szekeres Theorem 3.1.1 in section 3.1. This result is used to help us count # Av𝑛 ([Π]) in sections 3.2 and 3.3, where [Π] ⊆ [𝔖4 ] and #[Π] ≥ 2 . We then consider cyclic descent generating functions over Av𝑛 ( [Π]) in section 3.4, and find 𝐹𝑛cdes [Π] for #[Π] = 1, 2 and [Π] ⊆ [𝔖4 ]. We then continue to our study of pinnacle sets in chapter 4. We begin by counting the number of admissible pinnacle sets in section 4.1. This quantity, given in theorem 2.0.5, was already found in [DNKPT18]. Here we provide a simpler proof using a bijection using interleaved and right canonical permutations. As mentioned, [DNKPT18] also studied the values 𝔭(𝑚; 𝑑). In section 4.2, we show these constants are actually ballot numbers, specifically 𝔭(𝑚; 𝑑) = 𝑚−2𝑑+1 𝑑−1 . We 𝑚−1 𝑚−1 do this in two ways, using finite differences and a bijection. Once we’ve counted the number of admissible pinnacle sets, we consider the number of permutations with a given pinnacle set in section 4.3. We provide a sum to count 𝑝 𝑆 (𝑛) in theorem 4.3.1 which is asymptotically more efficient than previously existing methods. We then extend this result to count #O (𝑆) in theorem 4.3.12. 12 CHAPTER 3 CYCLIC PATTERN CONTAINMENT AND AVOIDANCE This chapter contains material from Domagalski, Liang, Minnich, Sagan, Schmidt, and Siet- sema [DLM+ 21a]. All results in this chapter come from this manuscript except as otherwise noted. 3.1 A cyclic Erdős-Szekeres Theorem In this section we will use the linear Erdős-Szekeres Theorem to prove a cyclic analogue. We will need a variant of the decreasing permutation 𝛿𝑛 defined as follows. Given nonnegative integers 𝑛 (the length), 𝑑 (the difference), and 𝑠 (the smallest value) define the decreasing sequence 𝛿𝑛,𝑑,𝑠 = 𝑠 + (𝑛 − 1)𝑑, 𝑠 + (𝑛 − 2)𝑑, . . . , 𝑠 + 𝑑, 𝑠. For example 𝛿5,2,3 = 11, 9, 7, 5, 3. Theorem 3.1.1. Suppose 𝑚, 𝑛 ∈ N. Then any [𝜎] ∈ [𝔖𝑚𝑛+2 ] contains either [𝜄𝑚+2 ] or [𝛿𝑛+2 ]. This is the best possible in that there exist permutations in [𝔖𝑚𝑛+1 ] which avoid both [𝜄𝑚+2 ] and [𝛿𝑛+2 ]. Proof. To prove the first statement we can assume, by rotating 𝜋 if necessary, that 𝜎 = 𝜎1 , 𝜎2 , . . . , 𝜎𝑚𝑛+1 , 𝑚𝑛 + 2. So 𝜎0 = 𝜎1 𝜎2 . . . 𝜎𝑚𝑛+1 ∈ 𝔖𝑚𝑛+1 and, by Theorem 2.0.1, contains a copy 𝜅 of either 𝜄𝑚+1 or 𝛿𝑛+1 . In the first case, the concatenation 𝜅, 𝑚𝑛 + 1 is a copy of [𝜄𝑚+2 ] in [𝜋]. In the second case, we have that 𝑚𝑛 + 1, 𝜅 is a copy of [𝛿𝑛+2 ] in [𝜎]. To prove the second statement, consider the concatenation 𝜎 = 1, 𝛿𝑛,𝑚,2 , 𝛿𝑛,𝑚,3 , . . . , 𝛿𝑛,𝑚,𝑚+1 . 13 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 3.1: The graph of [𝜎] when 𝑚 = 5 and 𝑛 = 3 For example, when 𝑚 = 5 and 𝑛 = 3, then [𝜎] = [1, 12, 7, 2, 13, 8, 3, 14, 9, 4, 15, 10, 5, 16, 11, 6] whose graph is shown in Figure 3.1. Define 𝜎0 by 𝜎 = 1𝜎0 and note that 𝜎0 can be written either as a disjoint union of 𝑚 decreasing subsequences of length 𝑛, or of 𝑛 increasing subsequences of length 𝑚. In a linear permutation, any increasing subsequence can intersect any decreasing subsequence at most once. So any increasing subsequence of 𝜎0 has length at most 𝑚, and any decreasing subsequence has length at most 𝑛. Now let [𝜋] be a subsequence of [𝜎]. We consider two cases. Suppose first that [𝜋] contains 1. If [𝜋] is increasing then rotate, if necessary, until 𝜋 = 1𝜋0 for some 𝜋0 which is a subsequence of 𝜎0. But from the previous paragraph, |𝜋0 | ≤ 𝑚 which implies |𝜋| ≤ 𝑚 + 1 as desired. If [𝜋] is decreasing then we pick a representative 𝜋 = 𝜋01 and proceed as in the increasing case to get |𝜋| ≤ 𝑛 + 1. Now consider the possibility that [𝜋] does not contain 1. Again, we start with the subcase when 14 [𝜋] is increasing. Suppose, for simplicity, that 𝜋 contains an element of 𝑥 ∈ 𝛿𝑛,𝑚,2 as the proof will be similar for the other deltas. As before, 𝜋 can contain at most one element of each of 𝛿𝑛,𝑚,3 through 𝛿𝑛,𝑚,𝑚+1 . Now [𝜋] can wrap around and pick up other elements. But those elements must come before 𝑥. And since 𝛿𝑛,𝑚,2 is decreasing, at most one other element can be added in this way. It follows that |𝜋| ≤ 𝑚 + 1. On the other hand, if [𝜋] is decreasing then the proof is similar. The only difference is that if one attempts to pick up elements of 𝛿𝑛,𝑚,2 before 𝑥 then this is impossible since such elements are larger than 𝑥 and [𝜋] is decreasing. So |𝜋| ≤ 𝑛 which is an even tighter bound. This completes the demonstration of the theorem. We’ll see the advantages that Theorem 3.1.1 brings in our study of length four pattern avoidance, specifically, when considering # Av𝑛 (𝑆) where 𝑆 contains [1234] and [1432]. 3.2 Pattern avoidance of doubletons In this section we will enumerate Av𝑛 [Π] for all [Π] ⊂ [𝔖4 ] with #[Π] = 2. Any cyclic Wilf equivalences stated without proof are trivial. Let us first dispose of the simplest singleton avoidance classes where [𝜋] ∈ [𝔖𝑘 ] for 𝑘 < 4. In [𝔖2 ] there is only one cyclic permutation [12] and it is easy to see that every [𝜎] of length at least 2 contains it. In [𝔖3 ] there are only the patterns [123] and [321], and these are only avoided by [𝛿𝑛 ] and [𝜄𝑛 ], respectively. Callan [Cal02] enumerated Av𝑛 [𝜋] for any given [𝜋] ∈ [𝔖4 ]. Recall the version of the Fibonacci numbers defined by 𝐹1 = 𝐹2 = 1 and 𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 for 𝑛 ≥ 3. Unlike the case of linear permutations in 𝔖3 , there are no nontrivial Wilf equivalences. Theorem 3.2.1 ([Cal02]). For 𝑛 ≥ 2 we have   𝑛 #𝐴𝑣 𝑛 [1234] = #𝐴𝑣 𝑛 [1432] = 2𝑛 + 1 − 2𝑛 − , 3 #𝐴𝑣 𝑛 [1243] = #𝐴𝑣 𝑛 [1342] = 2𝑛−1 − 𝑛 + 1, #𝐴𝑣 𝑛 [1324] = #𝐴𝑣 𝑛 [1423] = 𝐹2𝑛−3 . 15 In presenting the enumerations for doubletons, we make the following conventions to facilitate locating a given result. All cyclic patterns will be listed starting with 1. And all sets of cyclic patterns will be given in lexicographic order. We will also use terms like “just before” or “just after” in [𝜎] to refer the left-to-right order on the cylinder of a cyclic permutation in the form of Figure 2.1. For example, in [𝜎] = [42351] the 5 comes just before 1 and the 4 just after. We also say that an element 𝑥 is between 𝑦 and 𝑧 if it is in the subsequence of [𝜎] traversed going left-to-right around the cylinder from 𝑦 to 𝑧. Continuing our example, between 2 and 5 we have 3, while between 5 and 2 we have 1 and 4. One of our tools will be generating trees. To the best of our knowledge, these trees were introduced by Chung, Grahamm, Hoggatt, and Kleiman [CGHK78] for studying Baxter permu- tations. Since then, they have become an integral technique in the theory of pattern avoid- ance [BBMD+ 02, BM03, Kre00, Wes95, Wes96]. The generating tree for an avoidance class Av[Π], denoted 𝑇 [Π], has as its root the permutation [12]. The children of any [𝜎] ∈ Av𝑛 [Π] are all the [𝜎0] ∈ Av𝑛+1 [Π] which can be formed by inserting 𝑛 + 1 into one of the spaces of [𝜎]. A space, also called a site, where insertion of 𝑛 + 1 produces a permutation of the avoidance class is called active while the other spaces are inactive. A useful observation is that if a space is inactive it must be because inserting 𝑛 + 1 there results in copy of a forbidden pattern [𝜋] where 𝑛 + 1 plays the role of the largest element of 𝜋. Once we have picked a representative 𝜎 = 𝜎1 𝜎2 . . . 𝜎𝑛 for [𝜎] we will label the spaces as 1, 2, . . . , 𝑛 left to right where space 𝑖 comes between 𝜎𝑖 and 𝜎𝑖+1 . The nodes for Av𝑛 [Π] will be said to be at level 𝑛 in 𝑇 [Π]. We call the number of children of a vertex its degree which is denoted deg[𝜎]. Given 𝑑 ∈ N, suppose that every cyclic permutation with deg[𝜎] = 𝑑 has children of degrees 𝑐 1 , 𝑐 2 , . . . , 𝑐 𝑑 . Then this is denoted by the production rule (𝑑) → (𝑐 1 )(𝑐 2 ) . . . (𝑐 𝑑 ). There may be other nodes having some special characteristic 𝑋 which always produces nodes having characteristics 𝑌1 , 𝑌2 , . . . , 𝑌𝑑 which correspond to a production rule (𝑋) → (𝑌1 )(𝑌2 ) . . . (𝑌𝑑 ). 16 In particular, the characteristic of being the root of the tree is denote in a production rule by (∗). We can also have production rules which mix numbers for degrees and letters for characteristics. If 𝑇 [𝜋] can be characterized by production rules, these can often be used to calculate # Av𝑛 [Π]. Theorem 3.2.2. We have {[1234], [1243]} ≡ {[1234], [1342]} ≡ {[1243], [1432]} ≡ {[1342], [1432]}. And for 𝑛 ≥ 3 # Av𝑛 ([1234], [1342]) = 2(𝑛 − 2). Proof. We claim that 𝑇 = 𝑇 ([1234], [1342]) has the following production rules (∗) → (2)(2), (1) → (1), (2) → (1)(2). Once these are proven then the enumeration follows easily since one can inductively show that, for 𝑛 ≥ 3, level 𝑛 consists of two nodes of degree 2 and 2(𝑛 − 3) nodes of degree 1. It is easy to check the production rule at levels 𝑛 = 2 and 3, so we assume that 𝑛 ≥ 4 and also that [𝜎] ∈ Av𝑛 ([1234], [1342]). First of all, note that the site before 𝑛 is always active. If it were not then the result [𝜎0] of inserting 𝑛 + 1 would have a copy 𝜅 of one of the patterns containing 𝑛 + 1. But 𝑛 can not be in 𝜅 since neither of the patterns have 4 followed immediately in the cycle by 3. So replacing 𝑛 + 1 by 𝑛 in 𝜅 would give a forbidden pattern in [𝜎] which is a contradiction. Thus every [𝜎] at has at least one child. Also 𝜎 has at most two children. For suppose 𝜎0 = 𝑛 + 1, 𝜌, 𝑛, 𝜏 is the result of inserting 𝑛 + 1 in 𝜎. It follows that |𝜌| ≤ 1 since if 𝜌 ≥ 2 then [𝜎0] has a copy of either [4123] or [4213]. Thus 𝑛 + 1 must be inserted either directly before 𝑛 or two elements before 𝑛. 17 Now consider 𝛿 = 𝑛, 𝑛 − 1, . . . , 3, 2, 1, and 𝜖 = 𝑛, 𝑛 − 1, . . . , 3, 1, 2. (3.1) It is easy to check that both sites 𝑛 and 𝑛 − 1 are active in these permutations and so both have degree 2. It is also obvious that if one inserts 𝑛 + 1 in site 𝑛 in either permutation then one gets another permutation of the same form. From what we have done, we can finish the proof if we show that deg[𝜎] = 2 implies [𝜎] = [𝛿] or [𝜎] = [𝜖]. Write 𝜎 = 𝑛𝜌𝑚 where 𝑚 is the last element of 𝜎 and 𝜌 is everything between 𝑛 and 𝑚. Since deg[𝜎] = 2, site 𝑛 − 1 is active and inserting 𝑛 + 1 there yields 𝜎0 = 𝑛, 𝜌, 𝑛 + 1, 𝑚. Then 𝑚 ≤ 2 since otherwise [𝜎] contains a copy of [4123] or [4213] since 𝑛 ≥ 4. In the case 𝑚 = 1 we must have 𝜌 decreasing. For if there is an ascent 𝑥 < 𝑦 in 𝜌 then [𝜎0] contains [𝑥, 𝑦, 𝑛 + 1, 1] which is a copy of [2341], a contradiction. So in this case 𝜌 is decreasing and 𝜎 = 𝛿. The other possibility is that 𝑚 = 2. This forces the last element of 𝜌 to be 1. For if 1 is elsewhere and 𝑥 is the last element of 𝜌 then then [𝜎0] contains [1, 𝑥, 𝑛 + 1, 2] which is contradictory copy of [1342]. Similarly to the first case, one can now show that 𝜌 is decreasing and so 𝜎 = 𝜖 as desired. Comparing our next result with the previous one will provide our first nontrivial Wilf equiva- lence. Theorem 3.2.3. We have {[1234], [1324]} ≡ {[1423], [1432]}. And for 𝑛 ≥ 3 # Av𝑛 ([1234], [1324]) = 2(𝑛 − 2). 18 Proof. Let 𝐷 stand for the decreasing permutation and 𝐸 for the decreasing permutation with its largest two elements swapped. We consider the root [12] to be of type 𝐷. We will show that 𝑇 = 𝑇 ( [1234], [1324]) has production rules (1) → (1), (𝐷) → (𝐷)(𝐸), (𝐸) → (1)(1). It follows by induction that level 𝑛 ≥ 3 of 𝑇 has a 𝐷, an 𝐸, and 2(𝑛 − 3) nodes of degree one, proving the theorem. The same demonstration as in the previous theorem shows that the site before 𝑛 in any [𝜎] ∈ Av𝑛 ( [1234], [1324]) is active. So again, every such permutation has at least one child. Also, every [𝜎] has at most two children. Indeed, write 𝜎 = 1𝜎2 . . . 𝜎𝑛 (3.2) and put 𝑛 + 1 in site 𝑖 ≥ 3. Then 1, 𝜎2 , 𝜎3 , 𝑛 + 1 is a copy of either 1234 or 1324, another contradiction. Now consider permutations corresponding to 𝐷 and 𝐸 at level 𝑛 𝛿 = 1, 𝑛, 𝑛 − 1, 𝑛 − 2, 𝑛 − 3, . . . , 2 and 𝜖 = 1, 𝑛 − 1, 𝑛, 𝑛 − 2, 𝑛 − 3, . . . , 2. (3.3) It is easy to check that both sites 1 and 2 are active in 𝛿, 𝜖. So, by the previous paragraph, they both have degree 2. Furthermore, the two children of 𝛿 have the form 𝐷 and 𝐸. We will be done if we can show that [𝜎] having two children implies [𝜎] = [𝛿] or [𝜖]. Write 𝜎 as in (3.2). Since the active sites must be 1 and 2, and the site before 𝑛 must be active, either 𝜎2 = 𝑛 or 𝜎3 = 𝑛. If 𝜎2 = 𝑛 and there is an ascent 𝑥 < 𝑦 in the rest of the permutation, then after inserting 𝑛 + 1 in position 2 we have [𝑥, 𝑦, 𝑛, 𝑛 + 1] which is a copy of [1234], a contradiction. So in this case [𝜎] = [𝛿]. Alternatively, suppose 𝜎3 = 𝑛. This forces 𝜎2 = 𝑛 − 1, since if 𝜎2 = 𝑥 < 𝑛 − 1 then 𝑛 − 1 comes after 𝑛. But inserting 𝑛 + 1 in position 1 gives [𝑥, 𝑛, 𝑛 − 1, 𝑛 + 1] which is a copy 19 of [1324]. And similarly to the first case we see that the rest of 𝜎 is decreasing. The result is that [𝜎] = [𝜖]. This completes the proof. Theorem 3.2.4. We have {[1234], [1423]} ≡ {[1324], [1432]}. And for 𝑛 ≥ 1 𝑛−1   # Av𝑛 ([1234], [1423]) = 1 + . 2 Proof. Suppose [𝜎] ∈ Av𝑛 ([1234], [1423]) and write 𝜎 = 1𝜌𝑛𝜏 (3.4) where 𝜌 and 𝜏 are the subsequences between 1 and 𝑛, and between 𝑛 and 1, respectively. Now 𝜌 and 𝜏 must be decreasing since [𝜎] avoids [1234] and [1423], respectively. Furthermore, 𝜌 must consist of consecutive integers since, if not, then we have 𝑥 < 𝑦 < 𝑧 such that 1𝑧𝑥𝑛𝑦 is a subsequence of 𝜎. So [𝑥𝑛𝑦𝑧] is a copy of [1423] in [𝜎], which is a contradiction. Conversely, it is easy to check that if 𝜎 has the form (3.4) with 𝜌 and 𝜏 decreasing and 𝜌 consecutive then [𝜎] ∈ Av𝑛 ( [1234], [1423]). So we have characterized the elements of this class. To finish the enumeration, if 𝜌 = ∅ there is one corresponding 𝜎. But if 𝜌 ≠ ∅ then choosing the smallest and largest element of 𝜌 from the elements 2, 3, . . . , 𝑛 − 1 completely determines 𝜎. Since these two elements could be equal, we are choosing 2 elements from 𝑛 − 2 elements with repetition which is counted by 𝑛−1 2 .  The following result follows immediately from Theorem 3.1.1 Theorem 3.2.5. We have # Av𝑛 ([1234], [1432]) = 0 for 𝑛 ≥ 6. We now have, by comparison with Theorem 3.2.4, another nontrivial Wilf equivalence. 20 Theorem 3.2.6. We have {[1243], [1324]} ≡ {[1243], [1423]} ≡ {[1324], [1342]} ≡ {[1342], [1423]}. And for 𝑛 ≥ 1 𝑛−1   # Av𝑛 ([1324], [1342]) = 1 + . 2 Proof. Take [𝜎] ∈ Av𝑛 ([1324], [1342]) and write 𝜎 as in (3.4). Then 𝜌 is increasing since [𝜎] avoids [1324]. And every element of 𝜌 is smaller than every element of 𝜏 since [𝜎] avoids [1342]. To avoid a copy of one of the forbidden patterns containing the 1 of 𝜎 we must have that 𝜏 avoids 213 and 231. And to avoid a copy of [1324] where 𝑛 plays the role of 4, it must be that 𝜏 avoids 132. The 𝜏 which avoid these three pattern are exactly those which are inflations of the form 𝜏 = 21h𝛿 𝑘 , 𝜄𝑙 i for some 𝑘, 𝑙 ≥ 0 (see the chart on page 2773 of [DDJ+ 12]). Absorbing the 1 and 𝑛 of 𝜎 into 𝜌 and 𝜏, respectively, we see that 𝜎 = 132h𝜄 𝑗 , 𝛿 𝑘 , 𝜄𝑙 i (3.5) where 𝑗, 𝑘 ≥ 1 and 𝑙 ≥ 0. Again, it is not hard to check that for every 𝜎 of this form we have [𝜎] ∈ Av𝑛 ( [1324], [1342]). To enumerate these 𝜎, we distinguish two cases. If 𝑙 ≥ 2 then picking the smallest and largest elements of the copy of 𝜄𝑙 from 2, 3, . . . , 𝑛 − 1 completely determines 𝜎 . So in this case there are 𝑛−2 2 choices. If 𝑙 ≤ 1 then the copy of 𝜄𝑙 can be appended to the copy of 𝛿 𝑘 so that  𝜎 = 12[𝜄 𝑗 , 𝛿𝑛− 𝑗 ]. Since we must have 1 and 𝑛 in the ascending and decreasing subsequences, there are now 𝑛 − 1 choices. Adding the two counts given the desired result. Theorem 3.2.7. For 𝑛 ≥ 4 we have # Av𝑛 ([1243], [1342]) = 4. Proof. Take [𝜎] ∈ Av𝑛 ([1243], [1342]) and write 𝜎 as in (3.4). Then 𝜌 and 𝜏 can not both be nonempty. For if 𝑥 ∈ 𝜌 and 𝑦 ∈ 𝜏 then 1𝑥𝑛𝑦 is a copy of either 1243 or 1342. 21 Assume first that 𝜌 = ∅ so that 𝜎 = 1𝑛𝜏. (3.6) Then 𝜏 must be increasing or decreasing. For suppose it was neither. Then it would contain a copy of one of the patterns 132, 231, 213, or 312. In the first two cases this would give, together with the 1, a copy of 1243 or 1342 in 𝜎. And in the last two cases, prepending 𝑛 gives a copy of 4213 or 4312. Conversely, if 𝜎 is given by (3.6) with 𝜏 increasing or decreasing then it is easy to verify that [𝜎] ∈ Av𝑛 ([1243], [1342]). Using the same ideas, one can also show that if 𝜏 = ∅ then one gets exactly two elements of Av𝑛 ( [1243], [1342]), of the form 𝜎 = 1𝜌𝑛 where 𝜌 is either increasing or decreasing. Thus there are a total of four elements in the avoidance class. Theorem 3.2.8. For 𝑛 ≥ 3 we have # Av𝑛 ([1324], [1423]) = 2𝑛−2 . Proof. Take [𝜎] ∈ Av𝑛 ([1324], [1423]) and write 𝜎 = 𝑛, 𝜌, 𝑛 − 1, 𝜏. Similar to the previous proof, one of 𝜌 or 𝜏 must be empty since otherwise 4132 or 4231 is a pattern in 𝜎. If 𝜌 = ∅ then one shows similarly that 𝑛 − 2 either begins or ends 𝜏. Continuing in this manner, we see that there are 2 choices for the positions of 𝑛 − 1, 𝑛 − 2, . . . , 2. Checking, as usual, that all such permutations are actually in the avoidance set, the enumeration follows. This fully characterizes all non-trivial Wilf equivalences for all length four doubletons. 3.3 Three or more patterns We will now compute # Av𝑛 [Π] for Π ⊆ 𝔖𝑛 having #Π ≥ 3. We will not consider those [Π] containing both [1234] and [1432] since for such classes # Av𝑛 [Π] = 0 for 𝑛 ≥ 6 as in Theorem 3.2.5. 22 Theorem 3.3.1. We have {[1234], [1243], [1324]} ≡ {[1234], [1324], [1342]} ≡ {[1243], [1423], [1432]} ≡ {[1342], [1423], [1432]}. And for 𝑛 ≥ 4 # Av𝑛 ([1234], [1324], [1342]) = 3. Proof. If [𝜎] ∈ Av𝑛 ([1234], [1324], [1342]) then [𝜎] avoids [1324] and [1342]. So, by the proof of Theorem 3.2.6, we can write 𝜎 in the form (3.5) for 𝑗, 𝑘, 𝑙 ≥ 1. But since [𝜎] also avoids [1234] we must have 𝑗 + 𝑙 ≤ 3. For the same reason, 𝑗 ≤ 2 since if 𝑗 = 3 then the copy of 𝜄3 and one element of the copy of 𝛿 𝑘 would form a [1234]. Thus the only possibilities are ( 𝑗, 𝑙) = (1, 1), (1, 2), or (2, 1) which proves the result. Theorem 3.3.2. We have {[1234], [1243], [1342]} ≡ {[1243], [1342], [1432]}. And for 𝑛 ≥ 5 # Av𝑛 ([1234], [1243], [1342]) = 2. Proof. If [𝜎] ∈ Av𝑛 ([1234], [1243], [1342]) then [𝜎] avoids [1243] and [1342]. So, by the proof of Theorem 3.2.7, we can write 𝜎 = 𝑥𝑦𝜌 (3.7) where {𝑥, 𝑦} = {1, 𝑛} and 𝜌 is either increasing or decreasing. Since 𝑛 ≥ 5 we have |𝜌| ≥ 3. But [𝜎] also avoides [1234] and this forces 𝜌 to be decreasing. So there are two choices for [𝜎] depending on the values of 𝑥 and 𝑦. Theorem 3.3.3. We have {[1234], [1243], [1423]} ≡ {[1234], [1342], [1423]} ≡ {[1243], [1324], [1432]} ≡ {[1324], [1342], [1432]}. 23 And for 𝑛 ≥ 2 # Av𝑛 ([1234], [1342], [1423]) = 𝑛 − 1. Proof. We will show that 𝑇 = 𝑇 ([1234], [1342], [1423]) has production rules (∗) → (1)(2), (1) → (1), (2) → (1)(2). Then, by induction, level 𝑛 ≥ 2 of 𝑇 will contain one node of degree 2 and 𝑛 − 2 nodes of degree 1. Checking the root is easy, so assume 𝑛 ≥ 3. By Theorem 3.2.2, 𝑇 is a subtree of 𝑇 ([1234], [1342]). So we just need to check which nodes of that tree also avoid [1423]. As in the proof of that theorem, the site before 𝑛 in [𝜎] at level 𝑛 in 𝑇 is still active since 4 is not followed immediately by 3 in [1423]. Thus it suffices to show that both sites of 𝛿 remain active, but only one in 𝜖 where 𝛿, 𝜖 are defined by (3.1). Indeed, the two sites of 𝛿 give rise to copies of 𝛿 and 𝜖 at level 𝑛 + 1 of 𝑇. But site 𝑛 − 1 of delta which was active in the larger tree is now inactive since inserting 𝑛 + 1 there gives the copy [1, 𝑛 + 1, 2, 𝑛] of [1423]. This completes the proof. We now have, in comparison with the previous theorem, a nontrivial Wilf equivalence. Theorem 3.3.4. We have {[1234], [1324], [1423]} ≡ {[1324], [1423], [1432]}. And for 𝑛 ≥ 2 # Av𝑛 ([1234], [1324], [1423]) = 𝑛 − 1. Proof. It suffices to show that 𝑇 = 𝑇 ([1234], [1324], [1423]) satisfies the same production rules as in the previous theorem. Now 𝑇 is a subtree of 𝑇 ([1234], [1324]) which was constructed in the proof of Theorem 3.2.3. And we see in the usual way that the site before 𝑛 in any [𝜎] remains active in 𝑇 because 4 is not immediately followed by 3 in [1423]. 24 So it suffices to show, with 𝛿 and 𝜖 as in (3.3), that site 1 remains active in 𝛿, but not in 𝜖. Indeed, inserting 𝑛 + 1 in this site of 𝛿 just produces another descending sequence. But in 𝜖 such a placement gives the copy [1, 𝑛 + 1, 𝑛 − 1, 𝑛] of [1423]. We now have another nontrivial Wilf equivalence with Theorem 3.3.1. Theorem 3.3.5. We have {[1243], [1324], [1342]} ≡ {[1243], [1342], [1423]}. And for 𝑛 ≥ 4 # Av𝑛 ([1243], [1324], [1342]) = 3. Proof. By Theorem 3.2.7, we just need to show that exactly 3 of the 4 permutations [𝜎] avoiding {[1243], [1342]} also avoid [1324]. These permutations are described in equation (3.7). If 𝑥 = 𝑛 and 𝑦 = 1 then [𝜎] contains the copy [𝑛132] of this pattern. It is also easy to check that the other three avoid it. We now have our last nontrivial Wilf equivalence for triples. Theorem 3.3.6. We have {[1243], [1324], [1423]} ≡ {[1324], [1342], [1423]}. And for 𝑛 ≥ 2 # Av𝑛 ([1324], [1342], [1423]) = 𝑛 − 1. Proof. Comparing the description of Av𝑛 ([1324], [1342]) in the proof of Theorem 3.2.6 and that of Av𝑛 ( [1324], [1423]) in the proof of Theorem 3.2.8, we see that any [𝜎] ∈ Av𝑛 ( [1324], [1342], [1423]) can be put in the form 𝜎 = 21[𝛿 𝑘 , 𝜄𝑛−𝑘 ] with 𝑘 ≥ 1. Also, 𝑘 = 𝑛 − 1 and 𝑘 = 𝑛 yield the same permutation. So there are 𝑛 − 1 choices for 𝑘 and we are done. 25 [Π] # Av𝑛 [Π] {[1234], [1243], [1324], [1342]} 1 {[1243], [1342], [1423], [1432]} {[1234], [1243], [1324], [1423]} 2 {[1234], [1243], [1342], [1423]} {[1234], [1324], [1342], [1423]} {[1243], [1324], [1342], [1423]} {[1243], [1324], [1342], [1432]} {[1243], [1324], [1423], [1432]} {[1324], [1342], [1423], [1432]} {[1234], [1243], [1324], [1342], [1423]} 1 {[1243], [1324], [1342], [1423], [1432]} Table 3.1: Wilf equivalence classes and cardinalities of Av𝑛 [Π] for certain [Π] and 𝑛 ≥ 5 When #[Π] ≥ 4 where [Π] ⊂ [𝔖4 ], the size of Av𝑛 [Π] becomes constant for 𝑛 ≥ 5. And this size is trivial to calculate for 𝑛 ≤ 4. Furthermore, the description of the surviving permutations for large 𝑛 is easy to obtain given our previous proofs. So we content ourselves with a listing of the equivalence classes and associated constants in Table 3.1. Classes are separated by double horizontal line. As usual, we do not consider classes containing both the increasing and decreasing permutations because of the cyclic Erdős-Szekeres Theorem. 3.4 Cyclic descent generating functions We will now consider the generating function for the number of cyclic descents over various avoidance classes [Π] ⊂ [𝔖4 ], starting with those defined by a single element. We will sometimes use the characterizations given by Callan [Cal02] for these classes to facilitate our work, and use the abbreviation 𝑞 cdes 𝜎 Õ 𝐷 𝑛 ([Π]) = 𝐷 𝑛 ([Π]; 𝑞) = 𝜎∈Av𝑛 [Π] for the generating function. To begin, we have a lemma showing that trivial Wilf equivalences also give simple relationships between the corresponding generating functions. 26 Lemma 3.4.1. For any [Π], we have 𝐷 𝑛 ([Π] 𝑟 ; 𝑞) = 𝐷 𝑛 ([Π] 𝑐 ; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([𝜋]; 1/𝑞) and 𝐷 𝑛 ([Π] 𝑟𝑐 ; 𝑞) = 𝐷 𝑛 ([Π]; 𝑞). Proof. Reversing or complementing a permutation turns all cyclic descents into cyclic ascents and vice-versa. Translating this into generating functions gives the first displayed equalities. And the second displayed equation follows from the the previous display. Now consider the possible 𝐷 𝑛 ([𝜋]) for [𝜋] ∈ [𝔖4 ]. We begin with the simplest case. Theorem 3.4.2. We have 𝐷 𝑛 ([1423]; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([1324]; 1/𝑞) where, for 𝑛 ≥ 2, 𝑛+𝑘 −3 𝑘 𝑛−1 Õ  𝐷 𝑛 ([1324]; 𝑞) = 𝑞 . 𝑛−𝑘 −1 𝑘=1 Proof. We use Callan’s characterization of this avoidance class to obtain a recursion for 𝐷 𝑛 ([1324]). If [𝜎] ∈ Av𝑛 ([1324]) and 𝑛 ≥ 3 then write 𝜎 = 𝜎1 𝜎2 . . . 𝜎𝑛−1 𝑛. Let 𝑘 be the index such that 𝜎𝑘 = 𝑛 − 1. There are two cases. If 𝑘 = 𝑛 − 1 then 𝜎 = 𝜏, 𝑛 − 1, 𝑛 where [𝜏, 𝑛 − 1] ∈ Av𝑛−1 ([1324]) and this is a bijection. Since cdes[𝜎] = cdes[𝜏, 𝑛 − 1], this case contributes 𝐷 𝑛−1 ([1324]) to the recursion. If 1 ≤ 𝑘 ≤ 𝑛 − 2 then this forces 𝜎 = 2314[𝜄 𝑘−1 , 1, 𝜏, 1] for some 𝜏 such that [𝜏𝑛] avoids [1324]. Because of the extra descent caused by 𝑛 − 1 we have cdes[𝜎] = 1 + cdes[𝜏𝑛]. So this case gives a contribution of 𝑛−2 Í 𝑘=1 𝑞𝐷 𝑛−𝑘 ([1324]). Putting everything together, we have 𝑛−2 Õ 𝐷 𝑛 ([1324]) = 𝐷 𝑛−1 ([1324]) + 𝑞𝐷 𝑛−𝑘 ([1324]). 𝑘=1 for 𝑛 ≥ 3 and 𝐷 2 ([1324]) = 𝑞. It is now a simple manner of manipulating binomial coefficients to show that the formula given in the theorem satisfies this initial value problem. 27 For the next case, we will use a characterization of the class different from the one found by Callan. This will permit us to avoid the use of a recurrence. Lemma 3.4.3. Suppose [𝜎] ∈ [𝔖𝑛 ] and write 𝜎 = 1𝜌𝑛𝜏. We have [𝜎] ∈ Av𝑛 ([1342]) if and only if the following three conditions are satisfied: (a) 𝜌 and 𝜏 both avoid {213, 231}, (b) max 𝜌 < min 𝜏, (c) there is not both a descent in 𝜌 and an ascent in 𝜏. Proof. For the forward direction, suppose [𝜎] ∈ Av𝑛 ([1342]). Condition (a) is true since if either 𝜌 or 𝜏 contains 213 then, together with 𝑛, we have that [𝜎] contains [2134]. Similarly, if either contains 231 then [𝜎] contains the forbidden pattern by prepending the 1. As far as (b), if there is 𝑦 > 𝑥 with 𝑦 ∈ 𝜌 and 𝑥 ∈ 𝜏 then [1𝑦𝑛𝑥] is a copy of [1342]. Finally for (c), if there were a descent in 𝜌 and an ascent in 𝜏 then, because of (b), putting them together would again give a copy of the pattern to avoid. The converse is similar where one assumes that a copy of [1342] exists and then considers all the different intersections it could have with 1, 𝜌, 𝑛, and 𝜏. We leave the details to the reader. In order to use this lemma, we will need a result about the ordinary descent statistic on linear permutations avoiding {123, 231}. The next result is a specialization of Proposition 5.2 of the paper of Dokos, Dwyer, Johnson, Sagan, and Selsor [DDJ+ 12] and so the proof is ommited. Lemma 3.4.4 ([DDJ+ 12]). We have 𝑞 des 𝜎 = (1 + 𝑞) 𝑛−1 . Õ 𝜎∈Av𝑛 (213,231) We need one last well-known definition. Call a polynomial 𝑓 (𝑞) = of degree 𝑛 Í𝑛 𝑘 𝑘=0 𝑎 𝑘 𝑞 symmetric if 𝑎 𝑘 = 𝑎 𝑛−𝑘 for all 0 ≤ 𝑘 ≤ 𝑛. Note that 𝑓 (𝑞) of degree 𝑛 is symmetric if and only if 𝑞 𝑛 𝑓 (1/𝑞) = 𝑓 (𝑞). (3.8) 28 Theorem 3.4.5. We have 𝐷 𝑛 ([1243]; 𝑞) = 𝐷 𝑛 ([1342]; 𝑞) where, for 𝑛 ≥ 2, 1 − 𝑞 𝑛−1 𝐷 𝑛 ([1342]; 𝑞) = 2𝑞(1 + 𝑞) 𝑛−2 − 𝑞 · 1−𝑞 is symmetric. Proof. It is easy to prove from the explicit form of 𝐷 𝑛 ([1342]) that it satisfies equation (3.8) and so is symmetric. So once this is proved, the equality of the two generating functions follows from Lemma 3.4.1. We adopt the notation of Lemma 3.4.3 and let 𝜎𝑘 = 𝑛 where 2 ≤ 𝑘 ≤ 𝑛. We will consider cases depending on whether 𝜌 or 𝜏 is empty. If 𝜌 = ∅ then by Lemma 3.4.3 (a) and Lemma 3.4.4 we have that the generating function for the possible linear 𝜏 is (1 + 𝑞) 𝑛−3 . Also, cdes[𝜎] = 2 + des 𝜏 by the form of 𝜎, so the contribution of such [𝜎] to 𝐷 𝑛 ([1342]) is 𝑞 2 (1 + 𝑞) 𝑛−3 . In an analogous way, we see that those [𝜎] with 𝜏 = ∅ yield 𝑞(1 + 𝑞) 𝑛−3 . Adding these, we have a total of 𝑞(1 + 𝑞) 𝑛−2 so far. We now assume that 𝜌, 𝜏 are both nonempty so that 3 ≤ 𝑘 ≤ 𝑛 − 1. By parts (b) and (c) of Lemma 3.4.3, either 𝜌 must be an increasing subsequence of consecutive integers or 𝜏 must be a decreasing one. Using Lemma 3.4.4 again, we see that in the first subcase a contribution of 𝑞 2 (1 + 𝑞) 𝑛−𝑘−1 is obtained. And in the second, taking into account the descents in 𝜌, the contribution is 𝑞 𝑛−𝑘+1 (1 + 𝑞) 𝑘−3 . However, these two subcases overlap when 𝜌 is increasing and 𝜏 is decreasing. So we must subtract 𝑞 𝑛−𝑘+1 . Thus we get a grand total of 𝑛−1 [𝑞 2 (1 + 𝑞) 𝑛−𝑘−1 + 𝑞 𝑛−𝑘+1 (1 + 𝑞) 𝑘−3 − 𝑞 𝑛−𝑘+1 ]. Õ 𝐷 𝑛 ([1342]) = 𝑞(1 + 𝑞) 𝑛−2 + 𝑘=3 Summing the geometric series and simplifying completes the proof. For the avoidance class of the increasing (or decreasing) pattern in [𝔖4 ], we will need another concept. Given sequences 𝜌 and 𝜏 of distinct integers, their shuffle set is 𝜌  𝜏 = {𝜎 : |𝜎| = |𝜌| + |𝜏| and both 𝜌, 𝜏 are subsequences of 𝜎}. 29 For example, 12  34 = {1234, 1324, 1342, 3124, 3142, 3412}. In the statement of the next result we make the usual convention that 𝑛𝑘 = 0 if 𝑘 > 𝑛.  Theorem 3.4.6. We have 𝐷 𝑛 ([1234]; 𝑞) = 𝑞 𝑛 𝐷 𝑛 ([1432]; 1/𝑞) where, for 𝑛 ≥ 2, Õ  − 𝑛)𝑞 2 𝑛 𝐷 𝑛 ([1432]; 𝑞) = 𝑞 + (2𝑛−1 + 𝑞𝑗. 2𝑗 − 1 𝑗 ≥3 Proof. We use Callan’s description of the avoidance for [1234] translated by complementation to apply to [1432]. We are going to derive a recursion for 𝐷 𝑛 ([1432]; 𝑞). If [𝜎] ∈ 𝔖𝑛 [1432] then suppose 𝜎𝑛 = 1 and 𝜎𝑘 = 2 for some 1 ≤ 𝑘 ≤ 𝑛 − 1. There are three cases. If 𝑘 = 1 then there is a bijection between such [𝜎] and Av𝑛−1 [1432] obtained by removing 1 and taking the order isomorphic cyclic permutation on [𝑛 − 1]. Since 2 immediately follows 1 cyclically in [𝜎], the descent into 1 remains a descent after applying the map. So the contribution of this case is 𝐷 𝑛−1 ([1432]; 𝑞). Now suppose that 2 ≤ 𝑘 ≤ 𝑛 − 1 and write 𝜎 = 𝜌2𝜏1. where |𝜌| = 𝑘 − 1, |𝜏| = 𝑛 − 𝑘 − 1. As Callan proves, 𝜌 must be increasing. So there are two more cases depending upon whether the elements of 𝜌 are consecutive or not. Suppose first that they are not consecutive. In this case, 𝜏 must also be increasing so cdes[𝜎] = 2. To compute the number of such 𝜎, note that once the elements of 𝜌 have been picked from [3, 𝑛], all of 𝜎 is determined. The total number of nonempty subsets of this interval is 2𝑛−2 − 1. And those which consist of consecutive integers are determined by their minimum and maximum element, which could be equal. So there are 𝑛−1 2 subsets to exclude. The contribuion of this case is then  𝑛−1     2𝑛−2 − − 1 𝑞2 . 2 Finally we consider the case when 𝜌 ≠ ∅ is consecutive (and still increasing), say with minimum 𝑚 + 1 and maximum 𝑀 − 1. Note that if 𝑙 = |𝜏| then 0 ≤ 𝑙 ≤ 𝑛 − 3. Callan shows that the possible 𝜏 30 are the elements of (34 . . . 𝑚)  (𝑀, 𝑀 + 1, . . . , 𝑛). Since a permutation can be written as a shuffle in many ways, the same shuffle could occur for different 𝜌. So it will be convenient to color the elements of the second sequence by marking them with a hat. Thus the 𝜎 in this case are in bijection with colored shuffles (34 . . . 𝑚)  ( 𝑀, b 𝑀\ + 1, . . . , b 𝑛). It will also be convenient to consider these as corresponding to the sequences 2𝜏 by prepending a 2 to each shuffle and considering 2 as an uncolored element. Set 𝑆 be the set of such sequences 𝑠 = 2𝑠2 𝑠3 . . . 𝑠𝑙+1 where 𝑙, 𝑚, 𝑀 are allowed to vary over all possible values. Note that if 𝑠 corresponds to 𝜎 then des 𝜎 = 2 + des 𝑠. To compute des 𝑠, we consider the transition indices Tr 𝑠 = {𝑖 | 𝑠𝑖 is colored and 𝑠𝑖+1 is not, or vice-versa}. For example, if 𝑠 = 23b 645b7b8 then Tr 𝑠 = {2, 3, 5}. It is easy to see that the map Tr : 𝑆 → 2 [𝑙] , the range being all subsets of [𝑙], is a bijection. Also, every other transition index of 𝑠 starting with the second corresponds to a descent. So, using the round down function, des 𝑠 = b# Tr 𝑠/2c. We can now complete this case using 𝑖 = # Tr 𝑠 to see that the contribution is 𝑛−3 𝑙   𝑛−3 𝑛−3   ÕÕ 𝑙 b𝑖/2c+2 Õ b𝑖/2c+2 Õ 𝑙 𝑞 = 𝑞 𝑖 𝑖 𝑙=0 𝑖=0 𝑖=0 𝑙=𝑖 Õ 𝑛 − 2 𝑛−3 = 𝑞 b𝑖/2c+2 𝑖+1 𝑖=0 Õ  𝑛 − 2   𝑛 − 2  =𝑞 2 + 𝑞𝑗 2𝑗 + 1 2𝑗 + 2 𝑗 ≥0 Õ 𝑛−1 =𝑞 2 𝑞𝑗. 2𝑗 + 2 𝑗 ≥0 Putting all the cases together we have   − 1  Õ  𝑛 − 1   + 𝑞2 2 𝑛 −1+  𝑛−2 𝐷 𝑛 ([1432]; 𝑞) = 𝐷 𝑛−1 ([1432]; 𝑞) − 𝑞 𝑗  .  2 2𝑗 + 2    𝑗 ≥0  As usual, the routine verification that our desired formula satisfies this recursion and the initial condition is left to the reader. 31 We now turn to the cyclic descent polynomials for pairs in [𝔖4 ]. To simplify notation, for any polynomial 𝑓 (𝑞) and 𝑛 ∈ N we let 𝑓 (𝑛) (𝑞) = 𝑞 𝑛 𝑓 (1/𝑞). Theorem 3.4.7. We have the following descent polynomials. (a) We have (𝑛) 𝐷 𝑛 ([1234], [1243]) = 𝐷 𝑛 ([1342], [1432]) = 𝐷 𝑛 ([1243], [1432]) (𝑛) = 𝐷 𝑛 ([1234], [1342]). And for 𝑛 ≥ 3 𝐷 𝑛 ([1234], [1342]; 𝑞) = (2𝑛 − 5)𝑞 𝑛−2 + 𝑞 𝑛−1 . (b) We have (𝑛) 𝐷 𝑛 ([1423], [1432]) = 𝐷 𝑛 ([1234], [1324]). And for 𝑛 ≥ 3 𝐷 𝑛 ([1234], [1324]; 𝑞) = (2𝑛 − 5)𝑞 𝑛−2 + 𝑞 𝑛−1 . (c) We have (𝑛) 𝐷 𝑛 ([1324], [1432]) = 𝐷 𝑛 ([1234], [1423]). And for 𝑛 ≥ 1 𝑛 − 1 𝑛−2   𝐷 𝑛 ([1234], [1423]; 𝑞) = 𝑞 𝑛−1 + 𝑞 . 2 (d) We have (𝑛) 𝐷 𝑛 ([1243], [1423]) = 𝐷 𝑛 ([1342], [1423]) = 𝐷 𝑛 ([1243], [1324]) (𝑛) = 𝐷 𝑛 ([1324], [1342]). And for 𝑛 ≥ 1 𝑛−1 Õ 𝐷 𝑛 ([1324], [1342]; 𝑞) = 𝑞 + (𝑛 − 𝑘)𝑞 𝑘 . 𝑘=2 32 (e) For 𝑛 ≥ 4 we have 𝐷 𝑛 ([1243], [1342]; 𝑞) = 𝑞 + 𝑞 2 + 𝑞 𝑛−1 + 𝑞 𝑛−2 . (f) For 𝑛 ≥ 3 we have 𝐷 𝑛 ([1324], [1423]; 𝑞) = 𝑞(1 + 𝑞) 𝑛−2 . Proof. We will only prove (a) as the others follow easily in a similar fashion from the descriptions of the avoidance classes in Section 3.2. We adopt the notation of the proof of Theorem 3.2.2. We will use the description of the generating tree to obtain a recursion for 𝐷 𝑛+1 [1243], [1432]). Note that if 𝑛 + 1 is inserted in site 𝑖 of 𝜎 to form 𝜎0 then  cdes[𝜎] if 𝑖 is a cyclic descent,   0 cdes[𝜎 ] =   cdes[𝜎] + 1 if 𝑖 is a cyclic ascent.   Since the site before 𝑛 is always active, and such a site is a cyclic ascent, these children will give a contribution of 𝑞𝐷 𝑛 ([1243], [1432]). In 𝛿 and 𝜖, insertion in the other active site gives permutations with 𝑛 − 1 descents. So 𝐷 𝑛+1 [1243], [1432]) = 2𝑞 𝑛−1 + 𝑞𝐷 𝑛 ([1243], [1432]). It is now easy to check that the formula in (a) satisfies this recursion and is also valid at 𝑛 = 3, completing the proof. For classes avoiding 3 or more patterns, we will only write down the results for those which are not eventually constant. The interested reader can easily compute the polynomials for the remaining classes. We also content ourselves with stating the polynomial for one member of every trivial Wilf equivalence class since the rest can be computed from Lemma 3.4.1. Theorem 3.4.8. We have the descent polynomials 𝐷 𝑛 ( [1234], [1342], [1423]; 𝑞) = 𝐷 𝑛 ([1234], [1324], [1423]; 𝑞) = (𝑛 − 2)𝑞 𝑛−2 + 𝑞 𝑛−1 and 1 − 𝑞 𝑛−1 𝐷 𝑛 ([1324], [1342], [1423]; 𝑞) = 𝑞 · 1−𝑞 for 𝑛 ≥ 2. 33 3.5 Open problems and concluding remarks We collect here various areas for future research in the hopes that the reader will be interested in pursuing this work. 3.5.1 Longer patterns There has been very little work about containment and avoidance for cyclic patterns of length longer than 4. Of course, the cyclic Erdős-Szekeres Theorem, Theorem 3.1.1 above, is one such result. There is also a paper of Gray, Lanning and Wang [GLW18] where the authors consider cyclic packing (maximizing the number of copies of a given pattern among all the permutations [𝜎] ∈ [𝔖𝑛 ] for some 𝑛) and superpatterns (permutations containing all the patterns [𝜋] ∈ [𝔖𝑘 ] for some 𝑘). It would be interesting to see if there are nice enumerative formulas for classes consisting of cyclic patterns of length 5 and up. 3.5.2 Other statistics We have previously mentioned the peak set of a linear permutation, Pk 𝜋 = {𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 } which has corresponding peak number pk 𝜋 = # Pk 𝜋. Peaks are an important part of Stembridge’s theory of enriched 𝑃-partitions [Ste97] where 𝑃 is a partially ordered set. On the enumerative side, the study of permutations which have a given peak set has been a subject of current interest [BBPS15, BBS13, BFT16, CVDLO+ 17, DLHIO17a, DLHIO17b, DLHIPL17]. Now define the cyclic peak number to be cpk[𝜋] = #{𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}. 34 As with cdes, this is well defined since it is independent of the choice of representative of [𝜋]. There should be interesting generating functions for the distribution of cpk over avoidance classes, or even for the joint distribution of cdes and cpk. As evidence, we prove one such result. Theorem 3.5.1. For 𝑛 ≥ 3 𝑞 cdes[𝜎] 𝑡 cpk[𝜎] = 𝑞 𝑛−2 𝑡 + (2𝑛 − 6)𝑞 𝑛−2 𝑡 2 + 𝑞 𝑛−1 𝑡. Õ [𝜎]∈Av𝑛 ( [1234],[1342]) Proof. Let 𝐹𝑛 (𝑞, 𝑡) denote the desired generating function. We proceed as in the proof of Theo- rem 3.4.7 (a) to find a recursion for 𝐹𝑛+1 (𝑞, 𝑡). Since the largest element of [𝜎] is always a cyclic peak, inserting 𝑛 + 1 before 𝑛 does not change cpk. So this contributes 𝑞𝐹𝑛 (𝑞, 𝑡) to the recursion. For 𝛿 and 𝜖, inserting 𝑛 + 1 in the other active site increases the number of peaks to 2. So the contribution from these cases is 2𝑞 𝑛−1 𝑡 2 . In summary 𝐹𝑛+1 (𝑞, 𝑡) = 2𝑞 𝑛−1 𝑡 2 + 𝑞𝐹𝑛 (𝑞, 𝑡) and the desired polynomial is easily seen to be the solution. In a recent paper Adin, Gessel, Reiner, and Roichman [AGRR20] defined a cyclic analogue of the Hopf algebra of quasisymmetric functions. In this context the cyclic descent set of a linear permutation arises naturally in the description of the product in this algebra. They also raise the following intriguing question. Question 3.5.2. Find an analogue of the major index for cyclic permutations that has nice proper- ties, such as a generating function over [𝔖𝑛 ] which factors nicely as does the generating function for the ordinary major index over 𝔖𝑛 . 3.5.3 Vincular patterns The study of vincular patterns was originated by Babson and Steingrímsson [BS00] and has since become a mainstay of the pattern field. We consider 𝜋 as a vincular pattern if one only counts occurrences in 𝜎 where certain adjacent elements of 𝜋 must also be adjacent in the copy in 𝜎. Such 35 adjacent elements are overlined in 𝜋. For example, 𝜎 = 24513 contains two copies of 𝜋 = 132, namely 243 and 253. But only 243 is a copy of 132. Avoidance and Wilf equivalence are defined in the obvious way. These notions and the corresponding notation carry over to cyclic patterns without change. There are undoubtedly nice results which can be proven about vincular cyclic patterns. As an example, we show how one vincular class is enumerated by the Catalan numbers. Theorem 3.5.3. We have [1324] ≡ [1423] ≡ [1324] ≡ [2314]. And for 𝑛 ≥ 1 # Av𝑛 [1324] = 𝐶𝑛−1 . Proof. The Wilf equivalences are trivial. To prove the Catalan formula, suppose that [𝜎] ∈ Av𝑛 [1324] for 𝑛 ≥ 2 and write 𝜎 so that 𝜎𝑛 = 𝑛 and 𝜎𝑛−1 = 𝑚 for some 𝑚 ∈ [𝑛 − 1]. First notice that 𝜎 = 𝜌𝜏𝑚𝑛 where 𝜌 and 𝜏 are permutations of [𝑚 + 1, 𝑛 − 1] and [𝑚 − 1], respectively. For if there are 𝑥 < 𝑚 < 𝑦 < 𝑛 with 𝑥 before 𝑦 in 𝜎 then [𝑥𝑦𝑚𝑛] is a copy of [1324]. Furthermore, it is clear that [𝑚𝜌] and [𝜏𝑚] must avoid the forbidden pattern. We claim the if 𝜎 = 𝜌𝜏𝑚𝑛 where 𝜌 and 𝜏 obey the restrictions of the previous paragraph then [𝜎] avoids [1324]. Suppose, towards a contradiciton, that a copy [𝜅] = [𝑤𝑦𝑥𝑧] exists with 𝑤𝑦𝑥𝑧 order isomorphic to 1324. Consider the elements 𝑥 and 𝑧 which play the roles of 2 and 4. The possibility that they are 𝑚 and 𝑛, respectively, is ruled out by the fact that every element of 𝜌 is larger than every element of 𝜏. If 𝑧 ∈ 𝜏𝑚 then all of 𝜅 must be in this subsequence since 𝑧 is the largest element of the copy. But this is impossible since [𝜏𝑚] avoids the bad pattern. Finally, suppose 𝑧 ∈ 𝜌. This forces 𝑥 ∈ 𝜌 since it is comes cyclically just before 𝑧, and 𝑛 is too large to be 𝑥. We must also have 𝑦 ∈ 𝜌 since 𝑥 < 𝑦 < 𝑧. But now there is no possible choice for 𝑤. Indeed, if 𝑤 ∈ [𝑚𝜌] then [𝜅] is in this subsequence, contradicting our assumption. And if 𝑤 ∈ 𝜏 then it could be replaced by 𝑚 since 𝑥, 𝑦, 𝑧 > 𝑚, yielding the same contradiction as before. 36 From the first two paragraphs we immediately get the recursion 𝑛−1 Õ # Av𝑛 [1324] = # Av𝑚 [1324] · # Av𝑛−𝑚 [1324]. 𝑚=1 From this the Catalan enumeration follows by induction. It appears that sometimes rather than trying to find the size of the avoidance class directly, it may be easier to use exponential generating functions. Given a set of (possibly vincular) patterns [Π], let Õ 𝑥𝑛 𝐸 [Π] = # Av𝑛 [Π] . 𝑛! 𝑛≥0 We have the following conjectures for two vincular avoidance classes. Once the corresponding differential equation is proved, an explicit solution can easily found using separation of variables. Conjecture 3.5.4. We have the following. 1. If 𝐸 = 𝐸 [123] then 𝐸 0 = 𝐸 2 − 𝐸 + 1. 2. If 𝐸 = 𝐸 [213] then 2 𝐸− 𝑥 𝐸0 = 𝑒 2. Recently, Sergi Elizalde and Bruce Sagan have constructed proofs of both conjectured results through a more general result using generating functions which keep track of the number of cyclic occurrences, instead of just avoidance [ES21]. 37 CHAPTER 4 PINNACLE SET PROPERTIES This chapter contains material from Domagalski, Liang, Minnich, Sagan, Schmidt, and Siet- sema [DLM+ 21c]. All results in this chapter are from this manuscript except as otherwise noted. 4.1 Counting admissible pinnacle sets In this section we give our proof of Theorem 2.0.5, which gives the number of admissible pinnacle sets 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 . Our strategy will be as follows. First, we will introduce the set of interleaved permutations which are obviously counted by the desired binomial coefficient. Next, we will associate with each admissible pinnacle set 𝑆 a particular permutation 𝜋 such that Pin 𝜋 = 𝑆. This permutation will be called right canonical because its pinnacles will be as far right as possible. Finally, we will show that the set of interleaved permutations and the set of right canonical permutations are, in fact, the same. This will complete the proof of the theorem. An interleaved permutation 𝜋 ∈ 𝔖𝑛 is one constructed in the following manner. Pick any j k 𝐴 ⊆ [2, 𝑛] with #𝐴 = 2 . 𝑛−1 j k I1 Fill the first 𝑛−1 2 even positions of 𝜋 with the elements of 𝐴 in increasing order. I2 Fill the remaining positions of 𝜋 with the elements of 𝐴 = [𝑛] − 𝐴 in increasing order. As an example, suppose 𝑛 = 9 and 𝐴 = {2, 3, 7, 9}. After step I1 we have 𝜋= 2 3 7 9 . Since 𝐴 = {1, 4, 5, 6, 8}, after I2 we have the full interleaved permutation 𝜋 = 1 2 4 3 5 7 6 9 8. (4.1) Let I𝑛 = {𝜋 ∈ 𝔖𝑛 | 𝜋 is interleaved}. 38 Clearly 𝜋 ∈ I𝑛 is completely determined by the choice of 𝐴. It follows immediately that 𝑛 − 1k   #I𝑛 = 𝑛−1 . j (4.2) 2 Now given an admissible pinnacle set 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } ⊂ [𝑛] we wish to construct a permutation 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆. We use the following algorithm to construct the right canonical permutation 𝜋 from 𝑆. We first deal with the case where 𝑛 is odd. Let 𝑆 = [𝑛] − 𝑆. C1 Place elements of 𝑆 in 𝜋 moving right to left, starting with the largest unused element of 𝑆 and then decreasing until an element less than the largest unused element of 𝑆 is placed. C2 Place the largest unused element of 𝑆 in the rightmost unused position. C3 Iterate C1 and C2 until all elements of 𝑆 and 𝑆 are placed. If 𝑛 is even, the only change to this procedure is that we fill both 𝜋𝑛 and 𝜋𝑛−1 with elements of 𝑆 before considering whether to place an element of 𝑆. To illustrate, consider 𝑛 = 9 and 𝑆 = {4, 7, 9}. So 𝑆 = {1, 2, 3, 5, 6, 8}. Here is the construction of 𝜋 where, at each stage, we note whether C1 or C2 is being used. step C1 C2 C1 C2 C1 C1 C2 C1 C1 𝜋 8 98 698 7698 57698 357698 4357698 24357698 124357698 So the right canonical permutation for 𝑆 = {4, 7, 9} is 𝜋 = 124357698. Note that Pin 𝜋 = 𝑆. Furthermore, this is the same permutation as obtained in (4.1). However, neither the sets 𝐴 nor 𝐴 equals 𝑆. Let C𝑛 = {𝜋 ∈ 𝔖𝑛 | 𝜋 is right canonical}. We first need to show that C1–C3 is well defined in that every position of 𝜋 gets filled and that we always have Pin 𝜋 = 𝑆. Recall that A𝑛 = {𝑆 | 𝑆 = Pin 𝜋 for some 𝜋 ∈ 𝔖𝑛 }. Lemma 4.1.1. If 𝑆 ⊂ [𝑛] is an admissible set then C1–C3 produces a permutation 𝜋 with Pin 𝜋 = 𝑆. Thus #C𝑛 = #A𝑛 . 39 Proof. Clearly the second sentence follows from the first. For the first sentence, we will present details for the case when 𝑛 is odd. If 𝑛 is even, then one can just place the largest element of 𝑆 in position 𝑛 and proceed as in the odd case. The following notation will be useful. Let 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 }, 𝑆 = { 𝑠¯1 < 𝑠¯2 < . . . < 𝑠¯𝑛−𝑑 }. We will also let 𝑆 𝑝 and 𝑆 𝑝 denote the elements of 𝑆 and of 𝑆, respectively, which have not been used during the placement of 𝜋𝑛 , 𝜋𝑛−1 , . . . , 𝜋 𝑝 . We will use reverse induction on the position 𝑝 being filled in 𝜋. When 𝑝 = 𝑛, we have 𝑆 ≠ ∅ since 1, which is always a non-pinnacle, must be in 𝑆. So there is an element 𝑠¯𝑛−𝑑 to place in position 𝑛. Furthermore this element can not be a pinnacle since it is the last element of the permutation, which agrees with the fact that it is in 𝑆. Suppose that 𝜋𝑛 , 𝜋𝑛−1 , . . . , 𝜋 𝑝+1 have been constructed. Suppose first that 𝜋 𝑝+1 ∈ 𝑆. One subcase is if either 𝑆 𝑝 = ∅, or 𝑆 𝑝 ≠ ∅ and 𝜋 𝑝+1 > max 𝑆 𝑝 . We must show that 𝑆 𝑝 ≠ ∅ so that we can let 𝜋 𝑝 = max 𝑆 𝑝 . This is true when 𝑆 𝑝 = ∅ since |𝑆 𝑝 ] 𝑆 𝑝 | = 𝑝. If the second option holds then we have 𝜋 𝑝+1 > max 𝑆 𝑝 . But there must be at least two elements of 𝑆 smaller than max 𝑆 𝑝 since 𝑆 is admissible and so there is some permutation making max 𝑆 𝑝 a pinnacle. Also, these elements must still be in 𝑆 𝑝 since elements of this set are placed in decreasing order right to left. Thus this set is nonempty as desired. Furthermore, 𝜋 𝑝 is not a pinnacle since it is smaller than 𝜋 𝑝+1 . Now consider the subcase when 𝜋 𝑝+1 < max 𝑆 𝑝 . Then we let 𝜋 𝑝 = max 𝑆 𝑝 which is well defined. But we must show that 𝜋 𝑝 is a pinnacle. We know 𝜋 𝑝 > 𝜋 𝑝+1 . So there remains to check whether one can construct 𝜋 𝑝−1 with 𝜋 𝑝−1 < 𝜋 𝑝 . For this, it suffices to show that 𝑆 𝑝−1 ≠ ∅ since then we will have 𝜋 𝑝−1 = max 𝑆 𝑝−1 < 𝜋 𝑝+1 < 𝜋 𝑝 . Note that this will also finish the induction step. We claim that if 𝜋 𝑝 = 𝑠𝑖 and 𝜋 𝑝+1 = 𝑠¯ 𝑗 then 𝑗 > 𝑖. It will then follow that 𝑠¯ 𝑗−1 exists and can be used for 𝜋 𝑝−1 . But by Theorem 2.0.4 we have 𝑠𝑖 > 𝑠¯𝑖+1 . Indeed, if 𝑠𝑖 < 𝑠¯𝑖+1 then at most the elements 𝑠1 , . . . , 𝑠𝑖−1 , 𝑠¯1 , . . . , 𝑠¯𝑖 are less than 𝑠𝑖 so that 𝑠𝑖 ≤ 2𝑖, a contradiction. Also, elements of 40 𝑆 are placed in decreasing order with 𝑠𝑖 being placed as early as possible with a smaller element to its right. The desired bound on 𝑗 follows. We are now ready to give our proof of Theorem 2.0.5. Theorem 4.1.2. We have C𝑛 = I𝑛 . Thus 𝑛 − 1k   #A𝑛 = 𝑛−1 . j 2 Proof. The second statement follows directly from the first, Lemma 4.1.1, and equation (4.2). So we only need to prove that the two sets are the same. We will consider the case when 𝑛 is odd, as the even case is similar. We begin by showing that any right canonical permutation 𝜋 is interleaved. That is to say, the subword consisting of all even indices is an increasing sequence, and the subword consisting of all odd indices is an increasing sequence starting with 1. In terms of the placement of 1, note that 𝜋1 is not a pinnacle. And since non-pinnacles are placed in decreasing order from right to left, we must have 𝜋1 = 1. To finish this direction, it is enough to show that for any elements 𝜋𝑖 and 𝜋𝑖+2 , we have that 𝜋𝑖+2 > 𝜋𝑖 . Note that we are done immediately if 𝜋𝑖 and 𝜋𝑖+2 are either both pinnacles or both non-pinnacles since the construction places them in decreasing order from right to left. If 𝜋𝑖+2 is a pinnacle and 𝜋𝑖 is not, then by the pinnacle assumption 𝜋𝑖+2 > 𝜋𝑖+1 . And since non-pinnacles are placed in decreasing order right to left, 𝜋𝑖+1 > 𝜋𝑖 . Combining the two inequalities gives the desired result. Finally, suppose 𝜋𝑖 is a pinnacle and 𝜋𝑖+2 is not. Then 𝜋𝑖+1 is not a pinnacle, being adjacent to 𝜋𝑖 . And, by construction, 𝜋𝑖+1 must be the first available non-pinnacle right to left which is smaller than 𝜋𝑖 . It follows that 𝜋𝑖+2 > 𝜋𝑖 . For set containment the other way, let 𝜋 be an interleaved permutation. It suffices to show that if the elements of 𝜋 are placed right to left then they follow C1–C3. Consider 𝜋𝑖 placed after 𝜋𝑖+1 with 1 < 𝑖 < 𝑛. The boundary cases when 𝑖 = 1 or 𝑛 are similar. If 𝜋𝑖 < 𝜋𝑖+1 then 𝜋𝑖 is a non-pinnacle and 𝜋𝑖+1 is either a non-pinnacle or a pinnacle. In the first case, the non-pinnacles are being placed in decreasing order as desired. In the second, the previously placed non-pinnacle is 𝜋𝑖+2 . So the 41 2 1 6 7 8 9 0 1 2 3 4 5 6 7 8 2 5 -1 3 4 -2 Figure 4.1: The lattice path 𝐿 for 𝐴 = {2, 3, 7, 9} same conclusion holds by the interleaving condition. Now consider the possibility 𝜋𝑖 > 𝜋𝑖+1 . By the interleaving condition, 𝜋𝑖−1 < 𝜋𝑖+1 so 𝜋𝑖 is a pinnacle. Either 𝜋𝑖+2 is a pinnacle or not, the latter possibility including the case that 𝜋𝑖+2 does not exist. If it is, then the interleaving condition shows that pinnacles are being placed in decreasing order. If 𝜋𝑖+2 is not a pinnacle, then this fact and the interleaving condition again imply 𝜋𝑖 < 𝜋𝑖+2 < 𝜋𝑖+3 . It follows that 𝜋𝑖 was placed after the first smaller non-pinnacle and, by the interleaving condition one last time, that any pinnacles to its right are larger. This completes the proof of the other containment. Given a set 𝐴 and 𝑘 ∈ N we let 𝐴𝑘 be the set of all 𝑘-element subsets of 𝐴. The above construct  gives us a bijection [2, 𝑛]   𝜓 : j 𝑛−1 k → A𝑛 2 given by 𝜓( 𝐴) = Pin 𝜋 where 𝜋 is the interleaving permutation corresponding to 𝐴. In [DNKPT18], the authors proved Theorem 2.0.5 using a bijection [2, 𝑛]   𝜙 : j 𝑛−1 k → A𝑛 2 42 defined as follows. An up-down lattice path 𝐿 starts at the origin and uses steps which are either up (𝑈) or down (𝐷) parallel to the vectors [1, 1] and [1, −1], respectively. For more information about lattice paths, see the text of Sagan [Sag20]. It will be convenient to index the steps of 𝐿 with [2,𝑛]  [2, 𝑛] and write 𝐿 = 𝑠2 𝑠3 . . . 𝑠𝑛 . Associate with 𝐴 ∈ j 𝑛−1 k the lattice path 𝐿 such that 2  𝐷 if 𝑖 ∈ 𝐴,    𝑠𝑖 =  𝑈 if 𝑖 ∉ 𝐴.   To illustrate, if 𝑛 = 9 and 𝐴 = {2, 3, 7, 9} as in the example beginning this section then 𝐿 = 𝐷𝐷𝑈𝑈𝑈𝐷𝑈𝐷 as depicted in Figure 4.1 where each step is labeled by its index. We now define 𝜙( 𝐴) = {𝑖 | in 𝐿 either 𝑠𝑖 = 𝑈 strictly below the 𝑥-axis, or 𝑠𝑖 = 𝐷 weakly above the 𝑥-axis}. Continuing our example, 𝜙({2, 3, 7, 9}) = {4, 7, 9} = 𝜓({2, 3, 7, 9}). This is not an accident. Proposition 4.1.3. We have 𝜙 = 𝜓. Proof. We will give the proof for 𝑛 odd as the even case is similar. Let 𝑙 = (𝑛 − 1)/2. We need to show that 𝜙( 𝐴) = 𝜓( 𝐴) for all 𝐴 ∈ [2,𝑛] 𝑙 . Suppose 𝐴 = {𝑎 1 < 𝑎 2 < . . . < 𝑎 𝑙 } and  𝐴 = [𝑛] − 𝐴 = {𝑎 1 < 𝑎 2 < . . . < 𝑎 𝑛−𝑙 }. Let 𝐿 and 𝜋 be the lattice path and interleaved permutation, respectively, associated with 𝐴. So 𝜓( 𝐴) = Pin 𝜋 and there will be two cases depending on whether a pinnacle of 𝜋 comes from 𝐴 or 𝐴 In the first case, suppose 𝑎𝑖 ∈ Pin 𝜋. Since 𝜋 is interleaved, this is equivalent to 𝑎𝑖 = 𝜋2𝑖 > 𝜋2𝑖+1 = 𝑎𝑖+1 . Recall that 𝑎𝑖 indexes the 𝑖th 𝐷 step of 𝐿, and similarly for 𝑎𝑖+1 and 𝑈 steps. So the previous inequality is equivalent to step 𝑠 𝑎𝑖 = 𝐷 being preceded by more up steps than down steps. And this is precisely the condition for 𝑎𝑖 to be the index of a down step weakly above the 𝑥-axis, which means it is in 𝜙( 𝐴). Thus this case is complete. In a similar manner, one proves that 𝑎𝑖 ∈ Pin 𝜋 if and only if 𝑎𝑖 is the index of an up step strictly below the 𝑥-axis. This completes the second case and the proof. 43 4.2 Ballot numbers Davis et al. derived a number of properties of the constants 𝔭(𝑚, 𝑑) which count the number of admissible pinnacle sets 𝑆 with 𝑑 elements and maximum 𝑚. In this section we prove that these constants are, in fact, ballot numbers. We give two proofs of this result. In the first, we derive a formula for 𝔭(𝑚, 𝑑) using finite differences and then show that it agrees with the well-known expression for ballot numbers. In the second, we give an explicit bijection between these admissible sets and ballot sequences. Suppose we are given nonnegative integers 𝑝 > 𝑞. A ( 𝑝, 𝑞) ballot sequence is a permutation 𝛽 = 𝛽1 𝛽2 . . . 𝛽 𝑝+𝑞 of 𝑝 copies of the letter 𝑋 and 𝑞 copies of the letter 𝑌 such that in any nonempty prefix 𝛽1 𝛽2 . . . 𝛽𝑖 the number of 𝑋’s is greater than the number of 𝑌 ’s. Let B 𝑝,𝑞 = {𝛽 | 𝛽 is a ( 𝑝, 𝑞) ballot sequence}. The following result is well known. Theorem 4.2.1 ([And87],[Ber87]). For nonnegative integers 𝑝 > 𝑞 we have   𝑝−𝑞 𝑝+𝑞 #B 𝑝,𝑞 = . 𝑝+𝑞 𝑞 Note that if we let 𝑝 = 𝑑 + 1 and 𝑞 = 𝑑 then the previous result gives get 1 2𝑑 + 1   #B𝑑+1,𝑑 = = 𝐶𝑑 2𝑑 + 1 𝑑 where 𝐶𝑑 is the 𝑑th Catalan number. Our first proof that the 𝔭(𝑚, 𝑑) are ballot numbers will use the theory of finite differences. If 𝑓 (𝑚) is a function of a nonnegative integer 𝑚 then its forward difference is the function Δ 𝑓 defined by Δ 𝑓 (𝑚) = 𝑓 (𝑚 + 1) − 𝑓 (𝑚). For a fixed 𝑑 ∈ P, define the following polynomial in 𝑚 of degree 𝑑 − 1 𝑚 − 2𝑑 + 1 Ö 𝑑−1 𝑝 𝑑 (𝑚) = (𝑚 − 𝑖). (𝑑 − 1)! 𝑖=2 44 Lemma 4.2.2. The polynomial 𝑝 𝑑 (𝑚) satisfies Δ 𝑝 𝑑 (𝑚) = 𝑝 𝑑−1 (𝑚) and 𝑝 𝑑 (2𝑑 + 1) = 𝐶𝑑 . Proof. To prove the first equality, we compute Δ 𝑝 𝑑 (𝑚) = 𝑝 𝑑 (𝑚 + 1) − 𝑝 𝑑 (𝑚) 𝑚 − 2𝑑 + 2 Ö 𝑚 − 2𝑑 + 1 Ö 𝑑−1 𝑑−1 = (𝑚 + 1 − 𝑖) − (𝑚 − 𝑖) (𝑑 − 1)! (𝑑 − 1)! 𝑖=2 𝑖=2 (𝑚 − 2𝑑 + 2)(𝑚 − 1) − (𝑚 − 2𝑑 + 1)(𝑚 − 𝑑 + 1) Ö 𝑑−2 = (𝑚 − 𝑖) (𝑑 − 1)! 𝑖=2 (𝑑 − 1)(𝑚 − 2𝑑 + 3) Ö 𝑑−2 = (𝑚 − 𝑖) (𝑑 − 1)! 𝑖=2 = 𝑝 𝑑−1 (𝑚). For the second equality, we have 2 𝑑−1 Ö 𝑝 𝑑 (2𝑑 + 1) = (2𝑑 + 1 − 𝑖) (𝑑 − 1)! 𝑖=2 2𝑑 (2𝑑 − 1)! = · 𝑑! (𝑑 + 1)! (2𝑑)! = 𝑑!(𝑑 + 1)! = 𝐶𝑑 which finishes the proof. Note that by the criterion in Theorem 2.0.4, 𝔭(𝑚, 𝑑) can only be nonzero if 𝑚 > 2𝑑. We thank Richard Stanley who, on being shown Lemma 4.2.2, pointed out that 𝔭(𝑚, 𝑑) is a ballot number. 45 Theorem 4.2.3. If 𝑚, 𝑑 ∈ P with 𝑚 > 2𝑑 then 𝔭(𝑚, 𝑑) = 𝑝 𝑑 (𝑚). Thus 𝑚 − 2𝑑 + 1 𝑚 − 1   𝔭(𝑚, 𝑑) = = #B𝑚−𝑑,𝑑−1 . 𝑚−1 𝑑−1 Proof. Induct on 𝑑 where the base case of 𝑑 = 1 is trivial to verify. To finish the first claim, it suffices to use the previous lemma and show that both Δ𝔭(𝑚, 𝑑) = 𝔭(𝑚, 𝑑 − 1) and 𝔭(2𝑑 + 1, 𝑑) = 𝐶𝑑 . But these were proved in [DNKPT18, Sections 2.2–2.3]. The first displayed equality now follows from simple manipulation of the definition of 𝑝 𝑑 (𝑚), while the second comes from Theorem 4.2.1. We would like to give a bijective proof of the relationship between admissible pinnacle sets and ballot sequences from the previous theorem. Let 𝔓(𝑚, 𝑑) = {𝑆 | 𝑆 admissible with max 𝑆 = 𝑚 and #𝑆 = 𝑑} so that #𝔓(𝑚, 𝑑) = 𝔭(𝑚, 𝑑). For 𝑚 > 2𝑑, define a map 𝜂 : B𝑚−𝑑,𝑑−1 → 𝔓(𝑚, 𝑑) by sending ballot sequence 𝛽 = 𝛽1 𝛽2 . . . 𝛽𝑚−1 to 𝜂(𝛽) = {𝑖 | 𝛽𝑖 = 𝑌 } ] {𝑚}. For example, if 𝑚 = 9, 𝑑 = 3 and 𝛽 = 𝑋 𝑋 𝑋𝑌 𝑋 𝑋𝑌 𝑋 then 𝜂(𝛽) = {4, 7} ] {9} = {4, 7, 9}. Theorem 4.2.4. The map 𝜂 is a well-defined bijection. Proof. We must first show that 𝜂 is well defined in that 𝜂(𝛽) ∈ 𝔓(𝑚, 𝑑). Since 𝛽 ∈ B𝑚−𝑑,𝑑−1 we see that the set {𝑖 | 𝛽𝑖 = 𝑌 } is contained in [𝑚 − 1] and has cardinality 𝑑 − 1. It follows that 𝑆 = 𝜂(𝛽) has maximum 𝑚 and cardinality 𝑑. There remains to show that 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } is admissible. By Theorem 2.0.4, it suffices to show that 𝑠𝑖 > 2𝑖 for all 𝑖. But 𝑠𝑖 is the index of the 𝑖th 𝑌 in 𝛽. Since 𝛽 is a ballot sequence, this 𝑌 is preceded by 𝑖 copies of 𝑌 (including itself) and at least 𝑖 + 1 copies of 𝑋. So 𝑠𝑖 ≥ 𝑖 + (𝑖 + 1) = 2𝑖 + 1 which is what we wished to prove. 46 To show that 𝜂 is a bijection, we create its inverse. Given 𝑆 ∈ 𝔓(𝑚, 𝑑) we define 𝜂−1 (𝑆) = 𝛽 = 𝛽1 𝛽2 . . . 𝛽𝑚−1 by letting  𝑋 if 𝑖 ∉ 𝑆,    𝛽𝑖 =  𝑌 if 𝑖 ∈ 𝑆.   The proof that 𝜂−1 is well defined is similar to the one for 𝜂. And proving that the compositions of 𝜂 with 𝜂−1 are identity maps is easy. So we are done. 4.3 Permutations with a given pinnacle set Given an admissible set 𝑆, there does not seem to be an expression for 𝑝 𝑆 (𝑛), the number of permutations in 𝔖𝑛 with 𝑆 as pinnacle set, analogous to the one in Theorem 2.0.3 for peak sets. In [DNKPT18], they found expressions for 𝑝 𝑆 (𝑛) when #𝑆 ≤ 2 as well as bounds for general 𝑆, and asked whether an exact formula could be given in the general case. Such an expression was given in [DLHH+ 21] as a summation. In this section we will give another sum which is asymptotically more efficient. In addition, this method can be extended to count #O (𝑆), the number of admissible orderings of 𝑆. Since our sum will involve a significant amount of new notation, we will collect it here and then explain its relevance afterwards. Fix 𝑛 ∈ P. Suppose we have an admissible pinnacle set 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } for permutations in 𝔖𝑛 . We use the convention 𝑠0 = 0 and 𝑠 𝑑+1 = 𝑛 + 1 and let 𝑛𝑖 = 𝑠𝑖+1 − 𝑠𝑖 − 1 for 0 ≤ 𝑖 ≤ 𝑑. Let 𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 𝑑𝑙 , 𝑑𝑟 } and give the following total order to 𝐷’s elements 1𝑙 < 1𝑟 < 2𝑙 < 2𝑟 < . . . < 𝑑𝑙 < 𝑑𝑟 . We call 𝑖 𝑙 and 𝑖𝑟 the elements of rank 𝑖 in 𝐷. If 𝐵 ⊆ 𝐷 then we will let 𝑏 = #𝐵 47 and 𝑟 𝑗 = the rank of the 𝑗th smallest element of 𝐵 for 1 ≤ 𝑗 ≤ 𝑏. We also define 𝑏𝑖 = the number of elements in 𝐵 with rank at least 𝑖. Note that we always have 𝑏 1 = 𝑏 and 𝑏 𝑑+1 = 0 since 𝑑 is the largest rank. For example, if 𝑑 = 4, then 𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , 3𝑙 , 3𝑟 , 4𝑙 , 4𝑟 } and one possible 𝐵 might be 𝐵 = {1𝑙 , 3𝑙 , 3𝑟 , 4𝑟 } which has 𝑟 1 = 1, 𝑟 2 = 3, 𝑟 3 = 3, 𝑟 4 = 4 and 𝑏 1 = 4, 𝑏 2 = 3, 𝑏 3 = 3, 𝑏 4 = 1, 𝑏 5 = 0. We can now state the first main result of this section. Theorem 4.3.1. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have 𝑏−1 ! 𝑑 ! Õ Ö Ö 𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 . 𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0 To prove this, it will be convenient to convert the linear permutations we have been studying into cyclic ones in order to avoid considering boundary cases. Given a linear permutation 𝜋 = 𝜋1 𝜋2 . . . 𝜋𝑛 the corresponding cyclic permutation is the set of permutations [𝜋] = {𝜋1 𝜋2 . . . 𝜋𝑛 , 𝜋2 . . . 𝜋𝑛 𝜋1 , ..., 𝜋𝑛 𝜋1 . . . 𝜋𝑛−1 }. Intuitively, we think of [𝜋] as the result of arranging the elements of 𝜋 on a circle. Let [𝔖𝑛 ] = {[𝜋] | 𝜋 ∈ 𝔖𝑛 }. For example if 𝜋 = 1324 then [𝜋] = {1324, 3241, 2413, 4132}. We are also using the bracket notation in [𝑛] where 𝑛 ∈ N but this should not cause any confusion. Cyclic permutations are of interest in part because of their relation with pattern avoidance, standard Young tableaux, quasisymmetric functions, and other mathematical objects [AGRR20, Cal02, DLM+ 21a, DLM+ 21b, GLW18, GLW19]. 48 We define the pinnacle set of [𝜋] = [𝜋1 𝜋2 . . . 𝜋𝑛 ] to be Pin[𝜋] = {𝜋𝑖 | 𝜋𝑖−1 < 𝜋𝑖 > 𝜋𝑖+1 where subscripts are taken modulo 𝑛}. Continuing our example from the last paragraph Pin[1324] = {3, 4}. Note in particular that Pin[12] = {2} and, more generally, 𝑛 ∈ Pin[𝜋] for any [𝜋] ∈ [𝔖𝑛 ] where 𝑛 ≥ 2. Lemma 4.3.2. For 𝑛 ∈ P, there is a bijection between linear permutations in 𝔖𝑛 with pinnacle set 𝑆 and cyclic permutations in [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1}. Proof. Given a linear 𝜋, append the element 𝑛 + 1 to the end of 𝜋 and take the corresponding equivalence class in 𝔖𝑛+1 to form an element of [𝔖𝑛+1 ]. The map is clearly invertible and does not destroy or create any pinnacles for elements in [𝑛]. Since 𝑛 + 1 ≥ 2, we know that 𝑛 + 1 will become a pinnacle. Therefore the map has the desired properties concerning the pinnacle set. Consider some admissible pinnacle set 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 }. Given the above lemma, we may count the number of permutations in 𝔖𝑛 with pinnacle set 𝑆 by counting the number of cyclic permutations [𝜋] ∈ [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1} where we let 𝑠 𝑑+1 = 𝑛 + 1. Therefore, much of what follows will be in regards to cyclic permutations with pinnacle set 𝑆0. A factor of a (cyclic) permutation is a subsequence of consecutive elements. We may attempt to construct a [𝜋] with pinnacle set 𝑆0 by first putting the elements of 𝑆0 in some cyclic order, and then placing all elements in 𝑆0 = [𝑛 + 1] − 𝑆0 into either decreasing factors starting with some 𝑠𝑖 , or into increasing factors ending with some 𝑠𝑖 . Such a [𝜋] will then be completely determined by the increasing/decreasing factors that each element of 𝑆0 falls into, and we will call every such assignment a placement. Note that it is possible for multiple placements to result in the same permutation since each vale (an element of [𝜋] smaller than the elements on either side) can be part of the factor on either side. For example, start with a desired pinnacle set {4, 5} and place 49 non-pinnacles between these elements to form the cyclic permutation [𝜋] = [14325]. Then [𝜋] would be associated with a placement where the decreasing factor starting with 4 is 43 and the increasing factor ending with 5 is 25. But it would also be associated with a placement having these factors be 432 and 5, respectively. It is also possible, depending on the placement, that [𝜋] will not have pinnacle set 𝑆0 if no sufficiently small elements are placed between two pinnacles. In our example above, this could have happened if we had placed 1, 2 and 3 all in the increasing factor ending in 5, resulting in the cyclic permutation [41235] in which only 5 is a pinnacle. It is true, however, that any [𝜋] so constructed will have a pinnacle set that is a subset of 𝑆0 since every non-pinnacle was placed so that its factor contains an 𝑠𝑖 which is the largest element. For our arguments, we will focus on counting placements and then convert them into permutations later. Fix a cyclic ordering of the pinnacle indices and write it as [𝜏] = [𝜏1 · · · 𝜏𝑑+1 ] ∈ [𝔖𝑑+1 ]. An example is shown in Figure 4.2 where 𝜏 = [7612354]. Now given a placement consistent with this ordering, for every space between two adjacent elements in [𝜏] define the dale set of this placement to consist of all elements between the two corresponding pinnacles that are also smaller than both pinnacles. So in Figure 4.2 the dales are outlined by triangles with solid lines as sides. If 𝑠𝑖 is the smaller of the two pinnacles, then we say that the dale has rank 𝑖. Note that the rank is from the index of 𝑠𝑖 and not its actual value. We will further denote the rank as either 𝑖 𝑙 or 𝑖𝑟 depending on whether the dale is to the left, or right of the pinnacle 𝑠𝑖 . In Figure 4.2 the dale ranks are given along the 𝑥-axis. Define the dale rank set 𝐷 [𝜏] to be the set of the dale ranks of [𝜏]. And define the master dale rank set to be 𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 𝑑𝑙 , 𝑑𝑟 } so that 𝐷 ⊇ 𝐷 [𝜏] for all [𝜏]. In Figure 4.2, we have that 𝐷 [𝜏] = {1𝑙 , 1𝑟 , 2𝑟 , 3𝑟 , 41 , 4𝑟 , 6𝑙 } while 𝐷 = {1𝑙 , 1𝑟 , 2𝑙 , 2𝑟 , . . . , 6𝑙 , 6𝑟 }. Note that, by our definitions, there will be no dales in the case where 𝑑 = 0. Clearly 𝐷 [𝜏] will be a subset of 𝐷 consisting of exactly 𝑑 + 1 elements if 𝑑 > 0, and empty otherwise. We can derive further information about 𝐷 [𝜏] if we want, such as how it will always 50 𝑠7 𝑠7 𝑛6 elements 𝑠6 𝑛5 elements 𝑠5 𝑛4 elements 𝑠4 𝑛3 elements 𝑠3 𝑛2 elements 𝑠2 𝑛1 elements 𝑠1 𝑛0 elements 6𝑙 1𝑙 1𝑟 2𝑟 3𝑟 4𝑙 4𝑟 Figure 4.2: Example of a pinnacle set ordering [𝜏] = [7612354] with corresponding dales. contain both 1𝑙 and 1𝑟 if 𝑑 > 0, how it will never contain both 𝑑𝑙 and 𝑑𝑟 if 𝑑 > 1, and how 𝐷 [𝜏] will never be able to have certain combinations of the higher ranked dales. These facts are not necessary for proving our formula, although further analysis of them might help to improve its efficiency. Lemma 4.3.3. For 𝑛 ∈ P, a given placement will correspond to a permutation [𝜋] ∈ [𝔖𝑛+1 ] with pinnacle set 𝑆0 if and only if every dale is non-empty. Proof. First, suppose 𝑑 = 0. In this case, the theorem is trivial since there are no dales. And every placement will automatically result in only one pinnacle, namely 𝑛 + 1, as long as 𝑛 > 0. Now suppose 𝑑 > 0. Clearly if any dale of rank 𝑖 (whether left or right) is empty, then the pinnacle 𝑠𝑖 will have no smaller elements between itself and the higher pinnacle next to it, which will force 𝑠𝑖 to not be a pinnacle. On the other hand, if all dales have at least one element, then the space between any two pinnacles will always contain an element smaller than both, and all elements of 𝑆0 will in fact be pinnacles. We can now enumerate all placements corresponding to a given cyclic ordering of the indices of the pinnacle set 𝑆0. 51 Lemma 4.3.4. Given an admissible pinnacle set 𝑆0, fix an order [𝜏] of the pinnacle indices. The total number of placements with order [𝜏] that will result in a permutation with pinnacle set 𝑆0 is given by Õ Ö 𝑑 (−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 𝐵⊂𝐷 [𝜏] 𝑖=0 where 𝑏, 𝑑, the 𝑏𝑖 , and the 𝑛𝑖 are defined above. Proof. We will use the Principle of Inclusion and Exclusion or PIE. We let our universal set be all possible placements with no restrictions. We then wish to exclude any placement where at least one dale is empty. Therefore, if 𝐵 is some subset of the dales, we must be able to count the number of placements where all dales in 𝐵 (and possibly others) are empty. First consider the case when 𝐵 = ∅. There are 2(𝑑 + 1) factors of which 2𝑖 only exist below 𝑠𝑖 . So each of the 𝑛𝑖 non-pinnacles between 𝑠𝑖 and 𝑠𝑖+1 may be placed in any of the 2(𝑑 + 1 − 𝑖) factors that are long enough to extend above 𝑠𝑖 . As an example, in fig. 4.2, if we look between the horizontal boundary lines for the elements counted by 𝑛2 we see there are 10 = 2(6 + 1 − 2) such factors represented by the diagonal lines (solid or dotted) which intersect the region. For non-empty 𝐵, each dale of rank at least 𝑖 + 1 that we require to be empty will result in a loss of two additional factors, and so there are only 2(𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) choices. Therefore, for a given 𝐵, the total number of placements guaranteeing the dales in 𝐵 are empty is Ö 𝑑 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 . 𝑖=0 To use the PIE, we must also attach the sign (−1) |𝐵| = (−1) 𝑏 to this term before summing. Therefore, given a fixed order [𝜏] of the pinnacle indices of 𝑆0, we have that the total number of placements that will result in a permutation with pinnacle set 𝑆0 is Õ Ö 𝑑 (−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 . 𝐵⊆𝐷 [𝜏] 𝑖=0 Finally, when 𝐵 = 𝐷 [𝜏] then 𝑏 1 = #𝐵 = 𝑑 + 1. So we can ignore this term because the product has a factor of 𝑑 + 1 − 𝑏 1 = 0. 52 The above formula must be summed over all possible [𝜏] to give a final count for the number of [𝜋] with Pin[𝜋] = 𝑆0. This results in computationally expensive double sum. Also, note that in the above formula there may be multiple 𝐵 resulting in the same term. For example, {1𝑙 , 2𝑟 , 5𝑙 } is not the same as {1𝑟 , 2𝑟 , 5𝑙 } even though both produce the same 𝑏𝑖 . We will take care of this redundancy when we optimize our formula below. To fix the double sum problem, note that each 𝐵 in Lemma 4.3.4 is a subset of the master dale rank set 𝐷. We will fix some subset 𝐵 ⊆ 𝐷 and count the number of orderings [𝜏] that will produce a 𝐷 [𝜏] which can have 𝐵 as a subset. This will allow us to just sum over all subsets 𝐵 ⊆ 𝐷 without having to keep track of [𝜏]. Furthermore, we only have to sum over the subsets 𝐵 of cardinality at most 𝑑 since requiring more than 𝑑 dales to be empty is impossible for an admissible pinnacle set. Lemma 4.3.5. Fix some 𝐵 ⊆ 𝐷 with |𝐵| ≤ 𝑑. The number of orderings [𝜏] that will produce a 𝐷 [𝜏] such that 𝐵 ⊆ 𝐷 [𝜏] is given by 𝑏−1 Ö (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) 𝑖=0 where 𝑏, 𝑑, and the 𝑟𝑖 are defined as above. Proof. We will start by viewing all 𝑑 + 1 pinnacles as separate and then adjoin them in pairs in such a way so that the desired dales are formed. Here, “adjoining a pair of pinnacles” means requiring that they be adjacent in [𝜏]. We start with the dale of rank 𝑟 𝑏 the largest rank in 𝐵. In that case, the only way to generate such a dale is to order 𝑠𝑟 𝑏 so that one of the 𝑑 + 1 − 𝑟 𝑏 higher pinnacles is directly to its left or right depending on whether the corresponding element of 𝐵 is a left or right rank, respectively. So select one such pinnacle and adjoin it to the appropriate side of 𝑠𝑟 𝑏 . Next we will examine the dale in 𝐵 with the next highest rank, 𝑟 𝑏−1 . If 𝑟 𝑏−1 is a smaller rank than 𝑟 𝑏 , we may once again select a taller pinnacle to place next to 𝑠𝑟 𝑏−1 , on either the left or right as necessary, in order to produce the desired dale. This time however, although there are 𝑑 +1−𝑟 𝑏−1 pinnacles higher than 𝑠𝑟 𝑏−1 , one of them is unavailable since we have already adjoined two of the higher-ranked pinnacles together. More specifically, because of adjoining a higher pinnacle with 53 𝑠𝑟 𝑏 , we know that one taller pinnacle cannot be joined to its left and another cannot be joined to its right. So no matter whether 𝑟 𝑏−1 corresponded to a left or right dale, there is one less option. Therefore, the number of ways to append a larger pinnacle is 𝑑 + 1 − 𝑟 𝑏−1 − 1. On the other hand, if 𝑟 𝑏−1 = 𝑟 𝑏 then we need to adjoin a second pinnacle to 𝑠𝑟 𝑏 on the side opposite the one used when considering 𝑟 𝑏 . Again, the pinnacle already adjoined to 𝑠𝑟 𝑏 removes one option so the number of choices is 𝑑 + 1 − 𝑟 𝑏−1 − 1 as before. So in either case we have the same number of possibilities. Similar consideration show that, in general, each 𝑟 𝑏−𝑖 results in 𝑑 + 1 −𝑖 −𝑟 𝑏−𝑖 choices for adjoining pinnacles. Note that for this argument we are using the fact that 𝑏 ≤ 𝑑 since if 𝑏 = 𝑑 + 1 then the string of pinnacles would wrap into a circle before creating the final dale. Once all dales have been created by the above process, we only need to count the number of ways to join the resulting strings of pinnacles together. Since we have adjoined pinnacles together 𝑏 times, we have 𝑑 + 1 − 𝑏 strings which we then must arrange in a circle. This can be done in (𝑑 + 1 − 𝑏 − 1)! = (𝑑 − 𝑏)! ways. Therefore, 𝑏−1 Ö (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) 𝑖=0 is the number of orderings [𝜏] that will allow for a given 𝐵 to be a subset of 𝐷 [𝜏] . We are now in a position to prove Theorem 4.3.1 which we restate here for ease of reference. Theorem 4.3.6. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have 𝑏−1 ! 𝑑 ! Õ Ö Ö 𝑝 𝑆 (𝑛) = 2 𝑛−2𝑑−1 𝑏 (−1) (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑖 . 𝑛 𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0 Proof. It is easy to verify the formula if 𝑑 = 0, so we assume 𝑑 > 0. From Lemma 4.3.3, the number of permutations 𝜋 ∈ 𝔖𝑛+1 with pinnacle set 𝑆 equals the number of cyclic permutations [𝜋] ∈ [𝔖𝑛+1 ] with pinnacle set 𝑆0 = 𝑆 ∪ {𝑛 + 1}. So we will count the latter. From Lemma 4.3.4, the number of placements which correspond to a cyclic permutation with pinnacle set 𝑆0 is given by Õ Õ Ö𝑑 (−1) 𝑏 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 [𝜏] 𝐵⊆𝐷 [𝜏] 𝑖=0 54 where the outer sum is over all possible cyclic orderings [𝜏] of the index set of 𝑆0. We now wish to swap the summations so that the outer sum is over all 𝐵 ⊆ 𝐷 with |𝐵| ≤ 𝑑. We may restrict to size at most 𝑑 since any larger 𝐵 will either consist of a combination of dales that cannot exist, or will require all 𝑑 + 1 dales to be empty which is impossible because of the assumption that 𝑑 > 0. In order to interchange the summations we must multiply the term corresponding to each 𝐵 by the number of distinct permutations [𝜏] that could have generated it. This was counted in Lemma 4.3.5, and so we get the formula 𝑏−1 ! 𝑑 ! Õ Ö Ö (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) 2𝑛𝑖 (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 𝐵⊂𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0 for the number of placements. Now we seek to turn the placements into permutations. Since all dales are guaranteed to be non-empty, we have that every permutation corresponding to one of these placements will have 𝑑 + 1 non-pinnacle elements that are part of both a decreasing factor and an increasing factor. This means that every such corresponding [𝜋] has been counted by 2𝑑+1 placements. Dividing by this, and also pulling some common factors of two out from the second product, we have 𝑑 𝑏−1 ! 𝑑 ! Ö Õ Ö Ö 𝑝 𝑆 (𝑛) = 2−𝑑−1 2𝑛𝑖 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 𝑖=0 𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0 𝑏−1 ! 𝑑 ! Õ Ö Ö = 2𝑛−2𝑑−1 (−1) 𝑏 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 𝐵⊆𝐷: |𝐵|≤𝑑 𝑖=0 𝑖=0 where this is the formula we set out to prove. In [DNKPT18], explicit formulas were given for 𝑝 𝑆 (𝑛) when |𝑆| ≤ 3. These expressions follow easily from the previous reslt. Corollary 4.3.7. We have the following values for 𝑝 𝑆 (𝑛). (1) If 𝑆 = ∅ then 𝑝 𝑆 (𝑛) = 2𝑛−1 . 55 (2) If 𝑆 = {𝑙} where 3 ≤ 𝑙 ≤ 𝑛 then 𝑝 𝑆 (𝑛) = 2𝑛−2 (2𝑙−2 − 1). (3) If 𝑆 = {𝑙, 𝑚} where 𝑙 ≥ 3, 𝑚 ≥ 5, and 𝑙 < 𝑚 ≤ 𝑛, then 𝑝 𝑆 (𝑛) = 2𝑛+𝑚−𝑙−5 (3𝑙−1 − 2𝑙 + 1) − 2𝑛−3 (2𝑙−2 − 1). Proof. In each of the results we apply Theorem 4.3.6. (1) When 𝑑 = 0, the first product in Theorem 4.3.6 is always empty and the second always equals one. Therefore, everything reduces immediately to 𝑝 𝑆 (𝑛) = 2𝑛−1 , as desired. (2) When 𝑑 = 1 we have 𝑛0 = 𝑙 − 1, 𝑛1 = 𝑛 − 𝑙. Therefore, we have the following possibilities for 𝐵, and the corresponding terms in the summation • 𝐵 = ∅ : 2𝑙−1 • 𝐵 = {1𝑙 } or {1𝑟 }: −1 which when substituted into the formula gives 𝑝 𝑆 (𝑛) = 2𝑛−3 (2𝑙−1 − 2) = 2𝑛−2 (2𝑙−2 − 1). (3) When 𝑑 = 2 we have 𝑛0 = 𝑙 − 1, 𝑛1 = 𝑚 − 𝑙 − 1, and 𝑛2 = 𝑛 − 𝑚. Additionally, the first inner product will always zero out if 2𝑟 , 2𝑙 are both in 𝐵. Therefore, we have the following possibilities for 𝐵, and the corresponding terms in the summation: • 𝐵 = ∅ : (2)3𝑙−1 2𝑚−𝑙−1 • 𝐵 = {1𝑙 } or {1𝑟 }: (−2)2𝑙−1 2𝑚−𝑙−1 • 𝐵 = {2𝑙 } or {2𝑟 }: (−1)2𝑙−1 • 𝐵 = {1𝑙 , 1𝑟 }: 2 • 𝐵 = {1𝑙 , 2𝑟 } or {1𝑟 , 2𝑟 } or {1𝑙 , 2𝑙 } or {1𝑟 , 2𝑙 }: 1. 56 When we substitute all these into the formula, we get 𝑝 𝑆 (𝑛) = 2𝑛−5 [(2)3𝑙−1 2𝑚−𝑙−1 − (4)2𝑙−1 2𝑚−𝑙−1 − (2)2𝑙−1 + (2)2𝑚−𝑙−1 + 4] = 2𝑛−5 [(2)3𝑙−1 2𝑚−𝑙−1 − (4)2𝑙−1 2𝑚−𝑙−1 + (2)2𝑚−𝑙−1 ] − 2𝑛−5 [(2)2𝑙−1 − 4] = 2𝑛+𝑚−𝑙−5 (3𝑙−1 − 2𝑙 + 1) − 2𝑛−3 (2𝑙−2 − 1) as desired. We can make Theorem 4.3.6 more efficient by summing over certain weak compositions rather than subsets. A weak composition of 𝑛 ∈ N is a sequence 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼 𝑘 ] of nonnegative integers called parts such that 𝑖 𝛼𝑖 = 𝑛. In this case we write 𝛼 |= 𝑛 or |𝛼| = 𝑛 where |𝛼| = 𝑖 𝛼𝑖 . Í Í To 𝐵 ⊆ 𝐷 we associate the composition 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] where 𝛼𝑖 is the number of dales in 𝐵 of rank 𝑖. To illustrate, for the example in Figure 4.2 the corresponding composition is 𝛼 = [2, 1, 1, 2, 0, 1]. Note that all the necessary parameters for 𝐷 can be read off of 𝛼. In particular 𝑟 𝑗 = min{𝑖 | 𝛼1 + 𝛼2 + · · · + 𝛼𝑖 ≥ 𝑗 }, and 𝑏𝑖 = 𝛼𝑖 + 𝛼𝑖+1 + · · · + 𝛼𝑑 . Note that 𝑏 = 𝑏 1 = |𝛼|. Thus we will be able to sum over the following set 𝐶 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] | 𝛼𝑖 ∈ [0, 2] for all 𝑖 and |𝛼| ≤ 𝑑}. We must find how many 𝐵 correspond to a given 𝛼. If 𝛼𝑖 = 0 then 𝐵 contains no dales of rank 𝑖. If 𝛼𝑖 = 2 then 𝐵 contains both dales of rank 𝑖. So the only choice comes if 𝛼𝑖 = 1 in which case 𝐵 could contain either 𝑖 𝑙 or 𝑖𝑟 . Letting 𝑜 = the number of 𝛼𝑖 = 1 we see that the number of 𝐵 represented by 𝛼 is 2𝑜 . Thus we have proved the following result. 57 Corollary 4.3.8. Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have 𝑏−1 ! 𝑑 ! Õ Ö Ö 𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 (−1) 𝑏 2𝑜 (𝑑 − 𝑏)! (𝑑 + 1 − 𝑖 − 𝑟 𝑏−𝑖 ) (𝑑 + 1 − 𝑖 − 𝑏𝑖+1 ) 𝑛𝑖 . 𝛼∈𝐶 (𝑑) 𝑖=0 𝑖=0 In order to compare this formula to the one in [DLHH+ 21], we need to introduce some notation. The vale set of a permutation 𝜋 is Val 𝜋 = {𝜋𝑖 | 𝜋𝑖−1 > 𝜋𝑖 < 𝜋𝑖+1 }. Call a pair (𝑆, 𝑇) 𝑛-admissible if there is a permutation 𝜋 ∈ 𝔖𝑛 with Pin 𝜋 = 𝑆 and Val 𝜋 = 𝑇. Define V𝑛 (𝑆) = {𝑇 | (𝑆, 𝑇) is 𝑛-admissible}. Theorem 4.3.9 ([DLHH+ 21]). Given 𝑛 ∈ P and admissible 𝑆 with #𝑆 = 𝑑 we have Ö 𝑁  𝑆𝑇 (𝑠) Õ Ö 𝑝 𝑆 (𝑛) = 2𝑛−2𝑑−1 𝑁 𝑆𝑇 (𝑡) 2 𝑇 ∈V𝑛 (𝑆) 𝑠∈𝑆 𝑡∈[𝑛]−(𝑆]𝑇) where 𝑆𝑖 = {𝑠 ∈ 𝑆 | 𝑠 < 𝑖}, 𝑇𝑖 = {𝑡 ∈ 𝑇 | 𝑡 < 𝑖}, and 𝑁 𝑆𝑇 (𝑖) = #𝑇𝑖 − #𝑆𝑖 . In order to estimate the number of terms in this sum, we need a formula for #V𝑛 (𝑆). Let 𝐾 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑 ] |= 𝑑 | 𝛼1 + 𝛼2 + · · · + 𝛼 𝑘 ≥ 𝑘 for all 𝑘 ∈ [𝑑]}. Theorem 4.3.10 ([DLHH+ 21]). Given 𝑛 ∈ P and admissible 𝑆 = {𝑠1 < 𝑠2 < . . . < 𝑠 𝑑 } we have Õ 𝑛 − 1 Ö 𝑑   0 𝑛𝑖−1 #V𝑛 (𝑆) = . 𝛼1 𝛼𝑖 𝛼∈𝐾 (𝑑) 𝑖=2 We can now compare the number of terms in the sums of Corollary 4.3.8 and Theorem 4.3.9. In the former we have 𝑐(𝑑) := #𝐶 (𝑑) ≤ 3𝑑 terms, where the inequality comes from the fact that every 𝛼𝑖 ∈ {0, 1, 2}. In the latter, we have 𝑣 𝑛 (𝑆) := #V𝑛 (𝑆) terms which depends on 𝑛 and 𝑆, and not just 𝑑 as seen in Theorem 4.3.10. If 𝑛1 ≤ 4 and 𝑛𝑖 ≤ 3 for 𝑖 ≥ 2 then each of the binomial coefficients in the sum is a most 3 and so 𝑣 𝑛 (𝑆) could be significantly smaller than 𝑐(𝑑). But if 58 𝑆 DLHHIN DLMSSS {3, 5, 7, 9, 11, 13, 15, 17, 19, 21} 9.2 × 10−5 0.72 {3, 6, 9, 12, 15, 18, 21, 24, 27, 30} 0.11 0.73 {3, 7, 11, 15, 19, 23, 27, 31, 35, 39} 9.5 0.73 {3, 8, 13, 18, 23, 28, 33, 38, 43, 48} 210 0.78 Table 4.1: Run times in seconds compared when most 𝑛𝑖 are equal even one of the 𝑛𝑖 is large, then the inequality will be reversed. For example, suppose 𝑛1 ≥ 2𝑑 + 1 and take 𝛼 = [𝑑, 0, 0, . . . , 0] ∈ 𝐾 (𝑑). Then, by Stirling’s approximation, 2𝑑 4𝑑   𝑣 𝑛 (𝑆) ≥ ∼√ 𝑑 𝜋𝑑 which will eventually be greater than 3𝑑 . So, for fixed 𝑑, there are only finitely many 𝑛 such that 𝑣 𝑛 (𝑆) ≤ 𝑐(𝑑). Thus, in most cases, Corollary 4.3.8 will be more efficient. We should mention that Diaz-Lopez, Insko, and Nilsen [DLIN21] have come up with a refinement of the ideas in [DLHH+ 21] which permits the product of binomial coefficients in Theorem 4.3.10 to be replaced by 2𝑑 . The observations of the previous paragraph are borne out by actual computer computations. In Tables 4.1 and 4.2 we show the results of computing 𝑝 𝑆 (1000) for various sets 𝑆 (first column) with constant 𝑑 by the algorithm in [DLHH+ 21] (second column) and our algorithm (third column). The run times are in seconds and are the average over 10 trials for each set using a 15-inch 2017 MacBook Pro with a 3.1 GHz Quad-Core Intel Core i7 processor. In Table 4.1 the 𝑛𝑖 for 0 < 𝑖 < 𝑑 are constant in each set, but allowed to increase as one goes down the table. As expected, the algorithm using vales starts out orders of magnitude faster than the one using dales but quickly becomes orders of magnitude slower, with the latter’s times being virtually constant. Similar behaviour is shown in the two parts of Table 4.2 which keep all of the 𝑛𝑖 for 0 ≤ 𝑖 < 𝑑 constant except for one which is allowed to grow. Note the difference in growth rate of the vale algorithm between increasing 𝑛4 (upper chart) and 𝑛0 (lower chart). 59 Increase 𝑛4 with other 𝑛𝑖 constant 𝑆 DLHHIN DLMSSS {3, 5, 7, 9, 11} 2.9 × 10−5 0.0014 {3, 5, 7, 9, 21} 7.1 × 10−5 0.0014 {3, 5, 7, 9, 31} 0.00012 0.0015 {3, 5, 7, 9, 41} 0.00017 0.0015 Increase 𝑛0 with other 𝑛𝑖 constant 𝑆 DLHHIN DLMSSS {3, 5, 7, 9, 11} 2.9 × 10−5 0.0014 {13, 15, 17, 19, 21} 0.012 0.0015 {23, 25, 27, 29, 31} 0.26 0.0015 {33, 35, 37, 39, 41} 1.8 0.0015 Table 4.2: Run times in seconds compared when most 𝑛𝑖 are constant Another advantage to this approach is that it can be modified to count #O (𝑆), the number of admissible orderings of an admissible pinnacle set 𝑆. First, if we fix 𝑛 > 0 we have that Lemma 4.3.2 will again allow us to reduce to the case of cyclic orderings of the pinnacle set 𝑆0 for permutations in 𝔖𝑛+1 . We now prove the following intermediate result. Lemma 4.3.11. Consider a cyclic ordering [𝜏] with dale set 𝐷 [𝜏] and corresponding 𝑟 𝑗 . The ordering is admissible if and only if 𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1 for all 𝑗 ∈ [𝑑 + 1]. Proof. Note that, by definition of the 𝑛𝑖 and 𝑟𝑖 , the right hand side of the inequality is simply the number of non-pinnacles small enough to be placed in any of the dales having rank at least 𝑟 𝑗 . So if for any 𝑗 we have 𝑗 > 𝑛0 + 𝑛1 + · · · 𝑛𝑟 𝑗 −1 , then there will be at least 𝑗 + 1 dales having rank at most 𝑟 𝑗 . This means there would not be enough small non-pinnacle elements to fill them all. Therefore, 60 any such ordering is not admissible. On the other hand, if we have that 𝑗 ≤ 𝑛0 + 𝑛1 + · · · 𝑛𝑟 𝑗 −1 for all 𝑗, then we may always fill all the dales by placing the smallest non-pinnacle in the lowest ranked dale, and proceeding upwards. The inequalities guarantee that we will always have enough non-pinnacles to do this at every step, and so we are done. Since the problem is trivial if 𝑑 = 0, so we may also assume that 𝑑 > 0. We also define for the master dale rank set 𝐷 𝐷 0 = 𝐷 − {1𝑙 , 1𝑟 } and for any subset 𝐵  1 if 𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1 for all 𝑗 ∈ [𝑏],     𝛿𝐵 = 0 otherwise.     With this notation, we can count admissible orderings. Theorem 4.3.12. If 𝑑 ∈ P and 𝑆 is admissible then Õ 𝑑−2 Ö #O (𝑆) = 𝛿 𝐵∪{1 ,1𝑟 } (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) . 𝑙 𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0 Proof. We first wish to sum over all possible orderings, partitioned by their dales. Since every dale set for 𝑑 > 0 is guaranteed to have 𝑑 + 1 elements and contain {1𝑙 , 1𝑟 }, we may index the dales by taking 𝐵 ⊆ 𝐷 0 where |𝐵| = 𝑑 − 1. We then consider the following summation Õ 𝑑−2 Ö (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) . 𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0 Clearly this sums over every possible dale set once, and the expression inside comes from Lemma 4.3.5, which counts the number of cyclic orderings of 𝑆0 which have dales containing those in 𝐵. However, due to the restrictions placed on the size of 𝐵 and the comments above, this expression will count those cyclic orderings of 𝑆0 which have dales equal to 𝐵 ∪ {1𝑙 , 1𝑟 } instead of just a subset. Therefore, no ordering can be counted twice by two different 𝐵’s and so every ordering is accounted for exactly once in the above summation, making the total 𝑑!. 61 Finally, using Lemma 4.3.11, we may exclude from this sum precisely those orderings which are not admissible by writing it as Õ 𝑑−2 Ö 𝛿 𝐵∪{1 ,1𝑟 } (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ) . 𝑙 𝐵⊆𝐷 0: |𝐵|=𝑑−1 𝑖=0 This completes the proof. We may also rewrite our result in terms of compositions for a faster summation. Lemma 4.3.11 still holds as the 𝑟 𝑗 are the same whether or not the dales sets are represented as compositions, but now we will need make some new definitions. Let 𝐶 0 (𝑑) = {𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ] |= [𝑑 − 1] | 𝛼𝑖 ∈ [0, 2] for all 𝑖}, and if 𝛼 |= 𝑏   1 if 𝑗 ≤ 𝑛0 + 𝑛1 + · · · + 𝑛𝑟 𝑗 −1 for all 𝑗 ∈ [𝑏],     𝛿𝛼 =  0 otherwise.     Also, if 𝛼 = [𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ] then define 2 ⊕ 𝛼 = [2, 𝛼1 , 𝛼2 , . . . , 𝛼𝑑−1 ]. The following result follows from Theorem 4.3.12 in much the same way that Corollary 4.3.8 followed from Theorem 4.3.6. So the proof is omitted. Corollary 4.3.13. If 𝑑 ∈ P and 𝑆 is admissible then Õ 𝑑−2 Ö #O (𝑆) = 𝛿2⊕𝛼 2𝑜 (𝑑 + 1 − 𝑖 − 𝑟 𝑑−1−𝑖 ). 𝛼∈𝐶 0 (𝑑) 𝑖=0 4.4 Open problems and concluding remarks Others have also been working on finding a fast formula for computing the number of permu- tations with a given pinnacle set. Recently, Falque, Novelli, and Thibon [FNT21] have constructed an efficient recursion to compute 𝑝 𝑆 (𝑛). This formula is a low degree polynomial in both 𝑚, the 62 maximum of the pinnacle set 𝑆 and 𝑑, the cardinality of the set, and has complexity O (𝑚𝑑 2 ). While the result was originally stated in terms of 𝑛 instead of 𝑚, we can simplify in the following way. A permutation in 𝔖𝑚 with pinnacle set 𝑆 can be extended to a permutation in 𝔖𝑚+1 by placing 𝑚 + 1 at either the far left or the far right of the permutation. Any other way of inserting 𝑚 + 1 into the permutation would make 𝑚 + 1 a pinnacle. Recursively applying this procedure to 𝜋 ∈ 𝔖𝑚 with the elements {𝑚 + 1, 𝑚 + 2, . . . , 𝑛} will extend it to a permutation 𝜋0 ∈ 𝔖𝑛 . Since there were two possible positions to place each of the elements {𝑚 + 1, 𝑚 + 2, . . . , 𝑛}, we have 𝑝 𝑛 (𝑆) = 2𝑛−𝑚 𝑝 𝑚 (𝑆) and thus the result can be stated in terms of 𝑚. In addition, they provide a conjectured formula for the weighted sum introduced in [DNKPT18]: Õ 𝑞 𝑆 (𝑛) := 2 |𝐼 | 𝑝 𝐼 (𝑛). (4.3) 𝐼 ⊂𝑆 Following this work, Fang [Fan21] provided another recurrence to compute 𝑝 𝑆 (𝑛) with com- plexity O (𝑑 4 + 𝑑 log 𝑛). He also proved an expression for eq. (4.3) which is simpler than the earlier conjecture and which is very combinatorial in nature. Quinn Minnich has recently found a simpler proof of this result. 63 CHAPTER 5 BACKGROUND ON BACKBONE EXTRACTION Bipartite or two-mode networks are composed of two types of nodes, which we call agents and artifacts, and edges between nodes of one type and nodes of the other type. They can be used to represent a wide range of phenomena and therefore are studied in a diverse range of disciplines. For example, natural selection unfolds as species (the agents) compete over sites (the artifacts), commerce is possible as traders exchange resources, scientific advances are reported as scholars write papers, and laws are adopted as legislators sponsor bills. Although bipartite networks are useful in their own right, they can also be useful for inferring unipartite (i.e., one-mode) networks that would otherwise be difficult or impossible to measure directly. A bipartite projection transforms a bipartite network into a unipartite co-occurrence network in which agents are connected to the extent that they share artifacts. For example, competitive interaction networks can be inferred from species’ co-occurrence in sites [Dia75], trade networks can be inferred from firm co-location [TCW02] or product co-exchange [SDCGS15], scholarly collaboration networks can be inferred from paper co-authorship [New01], and political alliance networks can be inferred from bill co-sponsorship [Nea20]. Throughout this thesis, we use these applications to offer concrete examples, however the models we discuss are perfectly general and can be applied to derive unipartite backbones in a range of contexts [AABB11, Tol21, ZH05]. Indeed, in principle any unipartite network can be represented as the projection of some bipartite network [VFO20, GL04, NP03] . Despite their promise, bipartite projections (i.e., co-occurrence networks) are challenging to analyse because they are typically dense and weighted, and because the edge weights do not necessarily capture the strength of the relationship between nodes [Nea14]. In particular, when transforming a bipartite graph into a unipartite graph via projection, information about the artifacts responsible for edges between vertices is lost [LMDV08], specifically, one no longer knows which artifact(s) gave rise to a given edge and therefore no longer knows whether the artifact(s) are large 64 or small (i.e. the column sums of the bipartite matrix). This is important because co-participation in small artifacts provides more information about the relationship between two vertices than co- participation in large artifacts [Nea14]. For example, observing two people attending the same small party provides more information about a potential social relationship between them than observing these individuals attending the same large gathering. Similarly, observing two legislators co- sponsoring the same unpopular bill (i.e. one that is co-sponsored by no one else) provides more information about a potential political relationship between them than observing these legislators co-sponsoring the same popular bill (i.e. one that is co-sponsored by many others also). Bipartite projection also involves the loss of information about the individual vertices, one no longer knows how many artifacts a given vertex participated in (i.e. the row sums of the bipartite matrix). This information is important to consider because the scale of each edge weight in a bipartite projection is driven by the number of artifacts participated in by the two vertices it connects [Nea14]. For example, on average the number of events co-attended by two people who each attend many events will be larger (on average) than the number of events co-attended by two people who each attend few events. Similarly, on average the number of bills co-sponsored by two legislators who each sponsor many bills will be larger (on average) than the number of bills co-sponsored by two legislators who each sponsor few bills. Therefore, what counts as a ‘large’ or ‘small’ number of co-attendances or co-sponsorships depends in part on the total number of attendances or sponsorships of both members of a dyad. As we will see, the backbone extraction methods we consider cope with these challenges by controlling for the row and column sums of the bipartite matrix associated with the bipartite graph in question. As a result of these challenges, it is often useful to analyze the backbone of a bipartite projection, which is an unweighted and typically sparser network that retains only the most ‘important’ edges. Although well-known methods exist for extracting the backbone of weighted networks that are not bipartite projections [SBV09, Dia16], methods designed specifically for bipartite projections have recently been developed [Nea14, ZK11, SSDC+ 17, TML+ 11]. To begin, we’ll define notation and language for discussing bipartite projections and backbones. 65 Throughout this chapter, we use the ecological case of Darwin’s Finches to provide a concrete example [San00, Got00]. On his voyage to the Galapagos Islands on the H.M.S. Beagle, Darwin observed that only some species of finches lived on each island. These patterns can be represented as a bipartite network in which finch species (the agent nodes) are connected to the islands (the artifact nodes) where they are found [NN20]. A bipartite network can be represented as a binary matrix in which the agents are arrayed as rows, and the artifacts are arrayed as columns. We use B to denote a bipartite network’s representation as a matrix, where 𝐵𝑖𝑘 = 1 if agent 𝑖 is connected to artifact 𝑘, and otherwise is 0. The sequence of row sums and the sequence of column sums of B are called the agent and artifact degrees sequences, respectively. These sequences are among the bipartite network’s most significant features and are known to have implications for bipartite projections and backbones [VFO20, DNS21, NDS21a]. In the ecological case, the agent degree sequence captures the number of islands where each species is found, while the artifact degree sequence captures the number of species found on each island. The projection of a bipartite network is a weighted unipartite co-occurrence network in which a pair of agents is connected by an edge with a weight equal to their number of shared artifacts. For example, the bipartite projection of Darwin’s species location network is a species co-occurrence network in which a pair of species is connected by an edge with a weight equal to the number of islands where they are both found. We use P to denote the matrix representation of a bipartite projection, which is computed as BB𝑇 , where B𝑇 indicates the transpose of B. In a projection P, 𝑃𝑖 𝑗 indicates the number of times both 𝑖 and 𝑗 were connected to the same artifact 𝑘 in B. The diagonal entries of P, 𝑃𝑖𝑖 , are equal to the agent degrees. Typically the backbone of P will discard these diagonal entries, though their values are used in deciding which other edges are deemed important. As the reader may have inferred, bipartite networks and their weighted projections are equivalent to bipartite and weighted graphs. This equivalence helps in the visualization and analysis techniques in the network sciences. A graph 𝐺 is a set of objects called vertices, together with a set of 2- element subsets of the vertices which are called edges. An edge between vertices 𝑖 and 𝑗 can be 66 Figure 5.1: Bipartite and bipartite projection networks denoted as 𝑒 = 𝑖 𝑗. If there exists an edge 𝑒 = 𝑖 𝑗 between vertices 𝑖 and 𝑗, we say that 𝑖 and 𝑗 are adjacent. We call a graph weighted if each edge has an associated numeric value, and unweighted otherwise. The weight of edge 𝑒 = 𝑖 𝑗 is denoted 𝑤(𝑖 𝑗); in unweighted graphs, we set 𝑤(𝑖 𝑗) = 1 for all present edges. The degree of vertex 𝑖 is the number of edges of the form 𝑖 𝑗 for some 𝑗. Graphs are often discussed by viewing their adjacency matrices G, where 𝐺 𝑖 𝑗 = 𝑤(𝑖 𝑗). As mentioned above, the matrix representation of a bipartite network B is the graph’s bipartite adjacency matrix, while the matrix P is the adjacency matrix of the weighted graph. See fig. 5.1 for an example of this connection. The backbone of a bipartite projection is a binary representation of P that contains only the most ‘important’ or ‘significant’ edges. For example, the backbone of a species co-occurrence network connects pairs of species if they are found on a significant number of the same islands, which might be interpreted as evidence that the two species do not compete for resources and perhaps are 67 symbiotic. We use P0 to denote the matrix representation of the backbone of P. Because multiple methods exist for deciding when an edge is significant and thus should occur in the backbone, we 0 use P M to denote a backbone extracted using method 𝑀. Backbone extraction methods that were originally developed for non-projection weighted net- works are often also applied to weighted bipartite projections. One simple method preserves an edge in the backbone if its weight in the projection exceeds some universal threshold 𝑇. However, when 𝑇 = 0 is chosen (which is common), since each artifact of degree 𝑑 induces 𝑑 (𝑑 − 1)/2 edges in the backbone, this leads to a very dense backbone with a high clustering coefficient [LMDV08]. Here, density refers to the number of edges present in the network divided by the maximum possible number of edges. A network clustering coefficient measures how many ‘triangles’, three pairwise adjacent vertices, are present in the network compared to all triples. Backbones with high density and clustering coefficient may not elucidate any interesting information regarding the network. Using 𝑇 > 0 can yield a sparser and less clustered backbone [DT05, Fon20, BR11], but the choice of a particular threshold value is arbitrary, and applying the same threshold to all edges yields backbones that overlook agents with low degree in the projection [SBV09]. More sophisticated methods, including the disparity filter [SBV09] and likelihood filter [Dia16], aim to overcome these limitations of the universal threshold method by using a different threshold for each edge based on a null model. However, all methods that can be applied to non-projection weighted networks have the same shortcoming when applied to weighted bipartite projections: they ignore information about the artifacts [Nea14]. In the ecological case, the universal threshold, disparity filter, and likelihood filter methods all decide whether two species should be connected in the backbone only by examining how many islands they are both found on, but do not consider the characteristics of those islands, including how many other species are found there, or even how many islands there are. Therefore, although these methods are promising for extracting the backbone from non-projection weighted networks, different methods are required for extracting the backbone from a bipartite projection. 68 CHAPTER 6 BACKBONE MODELS AND THEIR PROBABILITY MASS FUNCTIONS This chapter contains material from Neal, Domagalski, and Sagan [NDS21b]. All results in this chapter are from this manuscript unless otherwise noted. 6.1 Bipartite ensemble backbone models Bipartite ensemble backbone models decide whether an edge’s observed weight 𝑃𝑖 𝑗 is signif- icantly large, and thus whether a corresponding edge should be included in the backbone, in the following way. Let B be the set of all bipartite networks B∗ having the same number of agents and artifacts as B. In the ecological case, B∗ might be viewed as representing a possible world containing the same species and islands, but in which locations of species on islands is different, and likewise B is the set of all such possible worlds. We will create our ensembles by taking a subset B M of B subject to certain constraints 𝑀 and imposing a probability distribution on it. In all our models except the SDSM, we impose the uniform probability distribution on B M , that is, each element of the ensemble is equally likely. We will then extract the backbone from the projection of B by using the distribution of edge weights arising from projections of members of the ensemble under consideration. We use 𝑃𝑖∗𝑗 to denote a random variable equal to (B∗ B∗𝑇 )𝑖 𝑗 for B∗ ∈ B M . That is, 𝑃𝑖∗𝑗 is the number of artifacts shared by 𝑖 and 𝑗 in a bipartite network randomly drawn from B M . In the ecological case, 𝑃𝑖∗𝑗 represents the number of islands that are home to both species 𝑖 and 𝑗 in a possible world, while the distribution of 𝑃𝑖∗𝑗 is the distribution of the number of islands shared by species 𝑖 and 𝑗 in all possible worlds. Decisions about which edges should appear in a backbone extracted at the two-tailed statistical 69 significance level 𝛼 are made by comparing 𝑃𝑖 𝑗 to 𝑃𝑖∗𝑗  1 if Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) < 𝛼2 ,    𝑃𝑖0 𝑗  = 0 otherwise.     This test preserves an edge in the backbone if its weight in the observed projection is uncommonly large compared to its weight in projections of members of the ensemble. A two-tailed significance test is used because, in principle, an edge’s weight in the observed projection could be uncommonly larger or uncommonly smaller than its weight in projections of members of the ensemble. One can use the same principles to obtain a signed backbone by comparing 𝑃𝑖 𝑗 to 𝑃𝑖∗𝑗 with  1 if Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) < 𝛼2 ,        𝑃𝑖0 𝑗 = −1    if Pr(𝑃𝑖∗𝑗 ≤ 𝑃𝑖 𝑗 ) < 𝛼2 ,    0 otherwise.     In the ecological case, two species are connected in the backbone if their number of shared islands in the observed world is uncommonly large compared to their number of shared islands in all possible worlds. There are many ways that B can be constrained [SUG18], with each set of constraints describing a different ensemble B M and different ensemble backbone model; however, in this work we focus on five possibilities. We describe each of these models and their meaning in the context of Darwin’s species and islands, and derive their probability mass functions for the respective edge weight distributions. These probability mass functions of 𝑃𝑖∗𝑗 are used by ensemble backbone models to evaluate the statistical significance of the weight of edge 𝑃𝑖 𝑗 in a bipartite projection. We use the following notation: • Let B be an 𝑚 × 𝑛 bipartite matrix, with a vector of row sums 𝑅 = (𝑟 1 , . . . , 𝑟 𝑚 ), a vector of column sums 𝐶 = (𝑐 1 , . . . , 𝑐 𝑛 ), and 𝑓 cells containing a 1. So Õ 𝑚 Õ𝑛 𝑓 = 𝑟𝑖 = 𝑐𝑗. 𝑖=1 𝑗=1 70 • Let B M be the ensemble of all 𝑚 × 𝑛 matrices B∗ = (𝐵𝑖∗𝑗 ) that obey the constraints of the respective model. In all models, the probability distribution on B M is uniform except in the stochastic case. • Let 𝑃𝑖∗𝑗 be a random variable equal to (B∗ B∗𝑇 )𝑖 𝑗 for all B∗ ∈ B M . Note that we have 𝑃𝑖∗𝑗 = 𝐵𝑖1 ∗ 𝐵∗ + 𝐵∗ 𝐵∗ + · · · + 𝐵∗ 𝐵∗ . 𝑗1 𝑖2 𝑗2 𝑖𝑛 𝑗𝑛 (6.1) 6.2 Fixed degree sequence model (FDSM) In the fixed degree sequence model (FDSM) B∗ ∈ B FDSM are constrained to have the same agent and artifact degree sequences as B. Adopting the FDSM implies, for example, that in all possible worlds a given species is found on exactly the same number of islands, and a given island is home to exactly the same number of species. The distribution of 𝑃𝑖∗𝑗 arising from B FDSM is unknown, but can be approximated by uniformly sampling B∗ from B FDSM , constructing P∗ , and saving the values 𝑃𝑖∗𝑗 . In the studies below, we use 1000 samples of B∗ generated using the ‘curveball’ algorithm, which is among the fastest methods to sample B FDSM uniformly at random [SNB+ 14, Car15]. The FDSM has been used to extract the backbone of bipartite projections of, for example, movies co-liked by viewers [ZK11] and conference panel co-participation by scholars [SR12, DL16]. In this paper, we use the FDSM as the reference model to which other ensemble models are compared because it fully controls for both degree sequences. The primary limitation of the FDSM is its computational cost. First, constructing each P∗ requires matrix multiplication, which must be performed repeatedly and has complexity O (𝑛2.37 ) for two 𝑛×𝑛 matrices using the fast Coppersmith-Winograd algorithm [CW90]. Second, computing Pr(𝑃𝑖∗𝑗 ≥ 𝑃𝑖 𝑗 ) with sufficient precision to achieve a two-tailed familywise error rate of 𝛼 requires 2 −.5𝑚 at least .5𝑚𝛼/2 + 1 samples, where 𝑚 is the number of rows (i.e., agents) in B and P. Thus, for example, extracting the backbone of a bipartite projection with 1000 agents at a family-wise error rate of 0.05 would require performing at least 20 million matrix multiplications. Therefore, the tightly-constrained FDSM is frequently impractical for backbone extraction. However, models that rely on ensembles with more relaxed constraints offer computationally-feasible alternatives. 71 6.3 Fixed fill model (FFM) In the highly relaxed fixed fill model (FFM), B∗ ∈ B FFM are simply constrained to contain the same number of 1s as B. Adopting the FFM implies, for example, that in all possible worlds only the total number of species-habitat pairs is fixed, but any given species may be found on a different number of islands and any given island may be home to a different number of species. The distribution of 𝑃𝑖∗𝑗 arising from 𝐵FFM has not been described before. We derive it and call it a Jacobi distribution because it is related to Jacobi polynomials. Let the fixed fill model constrain all B∗ ∈ B FFM to contain the same number of 1s (i.e. fill) as B. Theorem 6.3.1. Under the fixed fill model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 satisfies (𝑚 − 2)𝑛  Õ    𝑛 𝑛−𝑘 2 𝑛−𝑘−𝑟 𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟 Pr(𝑃𝑖∗𝑗 = 𝑘) =   . (6.2) 𝑚𝑛 𝑓 Proof. For the denominator we need to compute the cardinality #B FFM . If B∗ ∈ B FFM then B∗ has 𝑚𝑛 entries of which 𝑓 must be chosen to be ones. So   #B FFM = 𝑚𝑛 . 𝑓 For the numerator, suppose 𝑃𝑖∗𝑗 = 𝑘. We see from equation (6.1) that there are exactly 𝑘 columns 𝑐 where 𝐵𝑖𝑐 ∗ = 𝐵∗ = 1. There are 𝑛  ways to choose these columns. Now define the 𝑗𝑐 𝑘 following parameters: 𝑝 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 1 and 𝐵∗ = 0, 𝑗𝑐 𝑞 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 0 and 𝐵∗ = 1, 𝑗𝑐 𝑟 = number of columns 𝑐 where 𝐵𝑖𝑐 ∗ = 0 and 𝐵∗ = 0. 𝑗𝑐 The number of ways to pick the columns counted by these parameters from the 𝑛 − 𝑘 columns which do not contains ones in both rows is the trinomial coefficients 𝑝,𝑞,𝑟 𝑛−𝑘  . Now we have used 72 2𝑘 + 𝑝 + 𝑞 ones in rows 𝑖 and 𝑗. So there are 𝑓 − 2𝑘 − 𝑝 − 𝑞 left to distribute to the remaining 𝑚 − 2 rows. And these rows have (𝑚 − 2)𝑛 entries. So the number of possibilities for these remaining (𝑚−2)𝑛  ones is 𝑓 −2𝑘−𝑝−𝑞 . Thus the total number of choices from this and the previous paragraph is (𝑚 − 2)𝑛 (𝑚 − 2)𝑛            𝑛 Õ 𝑛−𝑘 𝑛 Õ 𝑛−𝑘 𝑛−𝑘 −𝑟 = 𝑘 𝑝, 𝑞, 𝑟 𝑓 − 2𝑘 − 𝑝 − 𝑞 𝑘 𝑟 𝑝 𝑓 −𝑛−𝑘 +𝑟 𝑝+𝑞+𝑟=𝑛−𝑘 𝑝+𝑞+𝑟=𝑛−𝑘 (𝑚 − 2)𝑛 Õ 𝑛 − 𝑘 − 𝑟  Õ     𝑛 𝑛−𝑘 = 𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟 𝑝 𝑝 (𝑚 − 2)𝑛  Õ    𝑛 𝑛−𝑘 = 2 𝑛−𝑘−𝑟 𝑘 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟 as desired. For even modestly large B, computing equation (6.2) involves values larger than can be handled by some programs. In practice, we use logs to make these computations practical. We now show that the sum in the numerator of this probability is related to the famous Jacobi orthogonal polynomials. This sum is a terminating hypergeometric series. Given a real number 𝑎 and a nonnegative integer 𝑟 the corresponding Pochhammer symbol or rising factorial is (𝑎)𝑟 = 𝑎(𝑎 + 1)(𝑎 + 2) · · · (𝑎 + 𝑟 − 1). Note that if 𝑎 is an integer with −𝑟 < 𝑎 ≤ 0 then (𝑎)𝑟 = 0 because the product contains 0 as a factor. Given real numbers 𝑎 1 , 𝑎 2 , . . . , 𝑎 𝑝 and 𝑏 1 , 𝑏 2 , . . . , 𝑏 𝑞 as well as a variable 𝑧, the corresponding hypergeometric series is   𝑎1 𝑎2 . . . 𝑎 𝑝  Õ (𝑎 ) (𝑎 ) · · · (𝑎 ) 𝑧𝑟 1𝑟 2𝑟  𝑝 𝑟 ; 𝑧 =   𝑝 𝐹𝑞  .   𝑏 1 𝑏 2 . . . 𝑏 𝑞  𝑟 ≥0 (𝑏 1 )𝑟 (𝑏 2 )𝑟 · · · (𝑏 𝑞 )𝑟 𝑟!   Note that if any of the 𝑎𝑖 are negative integers then, because of the remark above, this series will terminate and become a polynomial in 𝑧. 73 To convert a binomial coefficient into Pochhammer symbols, we write (𝑛)(𝑛 − 1) · · · (𝑛 − 𝑟 + 1)   𝑛 = 𝑟 𝑟! (−1) 𝑟 (−𝑛)(−𝑛 + 1) · · · (−𝑛 + 𝑟 − 1) = (1)𝑟 (−1) 𝑟 (−𝑛)𝑟 = . (1)𝑟 The following identity will also be useful (𝑎)𝑏+𝑟 = (𝑎)(𝑎 + 1) · · · (𝑎 + 𝑏 − 1) × (𝑎 + 𝑏)(𝑎 + 𝑏 + 1) · · · (𝑎 + 𝑏 + 𝑟 − 1) = (𝑎)𝑏 (𝑎 + 𝑏)𝑟 . We now return to the sum in the numerator of equation (6.2). We will ignore the factor of 2𝑛−𝑘 since it is constant with respect to the sum and so can be pulled outside. For simplicity of calculation we will also use the substitutions 𝑠 = (𝑚 − 2)𝑛, 𝑡 = 𝑓 − 𝑛 − 𝑘. Thus we have (𝑚 − 2)𝑛    Õ   Õ 𝑛−𝑘 𝑛−𝑘 𝑠 2−𝑟 = (1/2) 𝑟 𝑟 𝑟 𝑓 −𝑛−𝑘 +𝑟 𝑟 𝑟 𝑡 + 𝑟 Õ (−1) 𝑟 (𝑘 − 𝑛)𝑟 (−1) 𝑡+𝑟 (−𝑠)𝑡+𝑟 = · (1/2) 𝑟 𝑟 (1)𝑟 (1) 𝑡+𝑟 Õ (𝑘 − 𝑛)𝑟 (−𝑠)𝑡 (−𝑠 + 𝑡)𝑟 (1/2) 𝑟 = (−1) 𝑡 𝑟 (1)𝑡 (𝑡 + 1)𝑟 (1)𝑟 (−1) 𝑡 (−𝑠)𝑡 Õ (𝑘 − 𝑛)𝑟 (−𝑠 + 𝑡)𝑟 (1/2) 𝑟 = (1)𝑡 𝑟 (𝑡 + 1)𝑟 𝑟!     𝑠  𝑘 − 𝑛 − 𝑠 + 𝑡 1 = ;    𝑡 2 1 𝐹  𝑡+1 2      We are indebted to Marko Petkovšek [personal communication] for pointing out that this 2 𝐹1 is, up to a factor, a specialization of a Jacobi polynomial. Given a nonnegative integer ℓ and real 74 numbers 𝛼, 𝛽 the associated Jacobi polynomial is    −ℓ ℓ + 𝛼 + 𝛽 + 1 1 − 𝑧    (𝛼,𝛽) 𝛼 + ℓ (𝑧) = ;  𝑃ℓ 2 1 𝐹 2  ℓ   𝛼+1     To make these 2 𝐹1 polynomials agree we can let ℓ = 𝑛 − 𝑘, 𝛼 = 𝑡 = 𝑓 − 𝑛 − 𝑘, 𝛽 = −𝑠 + 𝑡 − (ℓ + 𝛼 + 1) = 𝑘 − (𝑚 − 1)𝑛 − 1 and 𝑧 = 0. With these substitutions we get (𝑚 − 2)𝑛   (𝑚 − 2)𝑛    Õ −𝑟 𝑛−𝑘 𝑓 −𝑛−𝑘 ( 𝑓 −𝑛−𝑘, 𝑘−(𝑚−1)𝑛−1) 2 =   𝑃𝑛−𝑘 (0). 𝑟 𝑟 𝑓 − 𝑛 − 𝑘 + 𝑟 𝑓 − 2𝑘 𝑛−𝑘 6.4 Fixed row model (FRM) In the more constrained fixed row model (FRM), B∗ ∈ B FRM are constrained to have the same agent degree sequence as B, but have unconstrained artifact degree sequences. Adopting the FRM for backbone extraction implies, for example, that in all possible worlds a given species is found on the same number of islands, but a given island may be home to a different number of species. The distribution of 𝑃𝑖∗𝑗 arising from B FRM is hypergeometric [TML+ 11, Nea13]. The FRM has been used to extract the backbone of bipartite projections of, for example, movies co-starring actors [TML+ 11], papers co-written by authors [TML+ 11], parties co-attended by women [Nea13], majority opinions joined by Supreme Court justices [Nea13], and microRNAs co-associated with diseases [CXW+ 18]. Let the fixed row model constrain all B∗ ∈ B FRM to have the same row sums as B. Theorem 6.4.1. Under the fixed row model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is hypergeometric and satisfies    𝑟𝑗 𝑛 − 𝑟𝑗 𝑘 𝑟𝑖 − 𝑘 Pr(𝑃𝑖∗𝑗 = 𝑘) =   . 𝑛 𝑟𝑖 75 Proof. The total number of ways to pick 𝑟𝑖 of the 𝑛 columns for ones in the 𝑖th row and 𝑟 𝑗 of the 𝑛 columns for ones in the 𝑗th row is      𝑛 𝑛 𝑛 𝑛! = . (6.3) 𝑟𝑖 𝑟 𝑗 𝑟𝑖 𝑟 𝑗 !(𝑛 − 𝑟 𝑗 )! So that will go in the denominator of the desired probability. For the numerator we follow the same line of reasoning as in the previous proof, where the parameters therein can be expressed as 𝑝 = 𝑟𝑖 − 𝑘, 𝑞 = 𝑟 𝑗 − 𝑘, 𝑟 = 𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘. So we have a total of    𝑛 𝑛−𝑘 𝑛! = (6.4) 𝑘 𝑝, 𝑞, 𝑟 𝑘!(𝑟𝑖 − 𝑘)!(𝑟 𝑗 − 𝑘)!(𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘)! choices. Dividing equation (6.4) by (6.3) and cancelling 𝑛! gives 𝑟𝑗! (𝑛 − 𝑟 𝑗 )!    𝑟𝑗 𝑛 − 𝑟𝑗 · ∗ 𝑘!(𝑟 𝑗 − 𝑘)! (𝑟𝑖 − 𝑘)!(𝑛 − 𝑟𝑖 − 𝑟 𝑗 + 𝑘)! 𝑘 𝑟𝑖 − 𝑘 Pr(𝑃𝑖 𝑗 = 𝑘) =   =   . 𝑛 𝑛 𝑟𝑖 𝑟𝑖 as desired. 6.5 Fixed column model (FCM) In the closely related fixed column model (FCM), B∗ ∈ B FCM are constrained to have the same artifact degree sequence as B, but have unconstrained agent degree sequences. Adopting the FCM for backbone extraction implies, for example, that in all possible worlds a given species may be found on a different number of islands, but a given island is home to the same number of species. 76 The distribution of 𝑃𝑖∗𝑗 arising from B FCM has not been described before, but we derive it here to show it is Poisson-binomial. Let the fixed column model constrain all B∗ ∈ B FCM to have the same column sums as B. Let 𝑋1 , . . . , 𝑋𝑛 be independent Bernoulli random variables. Let the probability of success for 𝑋𝑖 be Pr(𝑋𝑖 = 1) = 𝑝𝑖 . The random variable 𝑋 = 𝑋1 + · · · + 𝑋𝑛 (6.5) is said to have the Poisson binomial distribution with parameters 𝑝 1 , . . . , 𝑝 𝑛 . Theorem 6.5.1. Under the fixed column model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is Poisson binomial with parameters 𝑐 (𝑐 − 1) 𝑐 (𝑐 − 1) 𝑐 𝑛 (𝑐 𝑛 − 1) 𝑝1 = 1 1 , 𝑝2 = 2 2 , . . . , 𝑝𝑛 = . 𝑚(𝑚 − 1) 𝑚(𝑚 − 1) 𝑚(𝑚 − 1) Proof. The 𝐵𝑖𝑘 ∗ are all either zero or one and are independent in different columns when only the column sums are fixed. So as 𝑘 varies, the products 𝐵𝑖𝑘 ∗ 𝐵∗ are independent Bernoulli random 𝑗𝑘 variables. Comparing equations (6.1) and (6.5), we see that the distribution of 𝑃𝑖∗𝑗 is Poisson binomial. If column 𝑘 has column sum 𝑐 = 𝑐 𝑘 then all zero-one vectors with sum 𝑐 are equally likely for that column of B∗ . So there are 𝑚𝑐 possible 𝑘th columns. The number of ways to have a success  is the number of possible columns which have ones in both positions 𝑖 and 𝑗 where 𝑖 ≠ 𝑗. So the number of choices is the number of ways to choose the remaining 𝑐 − 2 ones in that column from the other 𝑚 − 2 positions, that is, 𝑚−2𝑐−2 . Thus  𝑚−2   𝑝 𝑘 = Pr(𝐵𝑖𝑘∗ 𝐵∗ = 1) = 𝑐 − 2 = 𝑐(𝑐 − 1) 𝑗𝑘 𝑚 𝑚(𝑚 − 1) 𝑐 which finishes the demonstration. 77 6.6 Stochastic degree sequence model (SDSM) Finally, the stochastic degree sequence model (SDSM) takes B SDSM to be all binary 𝑚 × 𝑛 matrices, but also gives a process for generating these matrices with different probabilities. Each B∗ is generated by filling the cells 𝐵𝑖𝑘∗ with a 0 or 1 depending on the outcome of an independent ∗ . The distribution of the random variable 𝑃 ∗ arising from B SDSM Bernoulli trial with probability 𝑝𝑖𝑘 𝑖𝑗 is Poisson-binomial with parameters which can be computed using the 𝑝𝑖𝑘 ∗ [DNS21, LR16]. There are many ways to choose 𝑝𝑖𝑘 ∗ , but in the studies in chapter 8, we choose 𝑝 ∗ so that it approximates 𝑖𝑘 Pr(𝐵𝑖𝑘∗ = 1) for B∗ ∈ B FDSM , with the goal of ensuring that the expected agent and artifact degree sequences of B∗ ∈ B SDSM match those of B. Adopting such a version of SDSM implies, for example, that in each possible world a given species may be found on many or few islands and a given island may be home to many or few species, but the average number of islands on which a given species lives in all possible worlds and the average number of species that live on an given island in all possible worlds matches these values the observed world. The SDSM has been used to extract the backbone of bipartite projections of, for example, legislators co-sponsoring bills [Nea20, Nea14, SB20], zebrafish (Danio rerio) sharing operational taxonomic units [BDS+ 20], countries sharing exports [SDCGS15], and genes expressed in genesets [MLLS21]. In the stochastic degree sequence model, B SDSM consists of all binary 𝑚×𝑛 matrices. A method is then chosen to generate probabilities 𝑝𝑖𝑘 ∗ . Finally, matrices B∗ ∈ B SDSM are generated using these probabilities for independent Bernoulli trials, where 𝐵𝑖𝑘 ∗ is filled with a one with probability ∗ and zero otherwise. 𝑝𝑖𝑘 Theorem 6.6.1. Under the stochastic degree sequence model, the distribution of 𝑃𝑖∗𝑗 for 𝑖 ≠ 𝑗 is Poisson binomial with parameters ∗ 𝑝∗ , . . . , 𝑝 = 𝑝∗ 𝑝∗ . 𝑝 1 = 𝑝𝑖1 𝑗1 𝑛 𝑖𝑛 𝑗𝑛 Proof. The fact that the distribution is Poisson binomial follows immediately from the independence assumption on the Pr(𝐵𝑖𝑘 ∗ ) and equation (6.1). Furthermore, the probability that the 𝑘th variable is 78 one is ∗ 𝐵∗ = 1) = Pr(𝐵∗ = 1) Pr(𝐵∗ = 1) = 𝑝 ∗ 𝑝 ∗ . 𝑝 𝑘 = Pr(𝐵𝑖𝑘 𝑗𝑘 𝑖𝑘 𝑗𝑘 𝑖𝑘 𝑗 𝑘 So we are done. In the following chapter, we will implement these emsemble methods in the R package backbone. 79 CHAPTER 7 BACKBONE: AN R PACKAGE FOR EXTRACTING THE BACKBONE OF WEIGHTED GRAPHS This chapter contains material from Domagalski, Neal, and Sagan [DNS21, NDS21a], and back- ground from Neal, Domagalski, and Yan [NDY22]. Replication materials are available at https://www.github.com/domagal9/dissertation. We now introduce the R package backbone that implements these five models, fixed degree sequence model (FDSM), fixed fill model (FFM), fixed column model (FCM), fixed row model (FRM), and the stochastic degree sequence model (SDSM). The backbone package provides these methods in a common framework making them both accessible and easy to use for scientists and researchers. It can be installed in R [R C18] from The Comprehensive R Archive Network (CRAN) via install.packages("backbone") and used with library(backbone) [DNS20]. Informa- tion regarding the CRAN distribution is found at https://CRAN.R-project.org/package=backbone. Additional materials relating to backbone including papers, presentations, workshop materials, and datasets are available at https://rbackbone.net. 7.1 Two Illuminating Data Sets We illustrate the use of the R backbone package to extract the backbone of two networks: the first is a network of bill co-sponsorship relations among Senators in the 114th session of the United States Senate, the second is a network of world city firm co-locations amongst large cities in the year 2000. Both of these networks, legislative and spatial, are used as templates for network research in their corresponding fields. 7.1.1 Legislative Networks For more than a decade, legislative networks have shed new light on understanding legislative behavior [Fow06a, Fow06b]. Although legislative networks clarify that governance is an interactive 80 and interdependent process, they are most useful if they help us explain or predict key parts of this process. The most consequential action a legislator can take is voting, and several studies have shown that a legislator’s position in a legislative network helps explain their voting behavior. For example, [Fow06a] found that US legislators were more likely to vote in favor of bills sponsored by well-connected legislators, even after controlling for shared party membership, and therefore that well-connected legislators were more effective at advancing their legislative agendas. Similarly, [RNH13] found that social ties among European legislators exacerbated ideological voting patterns: friendship increased the likelihood of political allies voting the same way, but decreased the likelihood of political adversaries voting the same way. [Fon20] offers one potential explanation for the network’s influence over voting behavior: “When legislators are called on to vote on a question that they do not understand, they take cues from experts who are nearby in the legislative network” (p. 270). Although voting is particularly consequential, legislative networks have also been used to explain how the coalitions that shape voting outcomes change over time. For example, [Nea20] demonstrated that the US Congress has become substantially more partisan since 1973 with legislators increasingly collaborating only with members of the same party, and opposing members of the other party. However, [KMN16] and [AN20a] clarified that these coalitions are not strictly partisan and frequently include members from both parties. Directly measuring legislative networks (e.g., simply asking legislators who they work with) is challenging because legislators are busy and may have motivations to conceal or misrepresent their true collaborations. As a result, most studies of legislative networks rely on more indirect mea- surements derived from bill sponsorship [e.g., Nea20], committee memberships [e.g., PMNW05], attendance at press events [e.g., DMSK15], and roll call votes [e.g., ALH+ 15a]. What do such indirectly measured legislative networks measure? Different source data provides information about different types of relations among legislators. For example, voting similarly in roll call votes provides information about ideological alignment, whereas sharing membership on a committee provides information about alignment on prioritized issues. The majority of legislative networks are derived from patterns of bill sponsorship, which also provides information about ideological 81 and issue alignment, but more directly provides information about collaboration as legislators join together in lending their collective support to bills [Kir11, KK96]. All but the most popular legislative measures require collaboration to cultivate support and ensure their eventual passage. Past studies have identified many factors that influence when legislators choose to collaborate, consistently finding support for homophily [MSLC01]: similar legislators are more likely to collaborate [CP87]. In the context of legislative collaboration, homophily with respect to political party is known as partisanship, which when particularly intense leads to partisan polarization. Both research [e.g., Nea20, LCH06, MM13] and media reports [e.g., Ing15] confirm that polarization has become a hallmark of legislative relations in the US Congress, so observing party homophily in networks of legislative collaboration is expected. To demonstrate how the backbone package works, we employ its use on a co-sponsorship network of the United States Senate during the 114th session. Since both prior research [LCH06, Nea20, SB20, ALH+ 15b, AN20b] and media accounts [Dru16] of the current US political climate provide us with a priori expectations about what structure a properly extracted backbone should have, we expect positive relationships to form primarily between those in the same political party, and accordingly a relatively large modularity statistic computed from a partition of the nodes by political party. Modularity measures the strength of division within the network. Specifically, for a network G with vertex degree sequence (𝑑1 , . . . , 𝑑𝑛 ), it is given by the quantity 1 Õ 𝐺 𝑖 𝑗 − 𝑑𝑖 𝑑 𝑗 𝑄= 𝛿(𝑐𝑖 , 𝑐 𝑗 ), 2 2 G Í Í G 𝑖, 𝑗 where 𝑐𝑖 and 𝑐 𝑗 represent the communities (in this case political party) that vertices 𝑖 and 𝑗 belong to, and 𝛿(𝑐𝑖 , 𝑐 𝑗 ) is the Kronecker delta function. In visualizations of the extracted backbones, we depict Republican senators by red vertices, and both Democratic and Independent senators who are left-leaning and caucused with Democrats by blue vertices. Although we discuss signed backbones in the text, for visual clarity we only provide figures for binary backbones which contain positive edges. Positive relations of collaboration between two Republicans are depicted in red, between two Democrats are blue, and for all other pairs are purple. For an example, see fig. 7.1 82 Heller, D. (NV−R) Alexander, L. (TN−R) Flake, J. (AZ−R) McCaskill, C. (MO−D) Cassidy, B. (LA−R) Lankford, J. (OK−R) Hatch, O. (UT−R) Risch, J. (ID−R) Johnson, R. (WI−R) Warner, M. (VA−D) Paul, R. (KY−R) Corker, B. (TN−R) Enzi, M. (WY−R) Portman, R. (OH−R) Rounds, M. (SD−R) Roberts, P. (KS−R) Cornyn, J. (TX−R) Perdue, D. (GA−R) Crapo, M. (ID−R) Blunt, R. (MO−R) Scott, T. (SC−R) Carper, T. (DE−D) Toomey, P. (PA−R) McCain, J. (AZ−R) Manchin, J. (WV−D) Shelby, R. (AL−R) Gardner, C. (CO−R) Cochran, T. (MS−R) Lee, M. (UT−R) Coats, D. (IN−R) McConnell, M. (KY−R) Isakson, J. (GA−R) Cruz, T. (TX−R) Moran, J. (KS−R) Boozman, J. (AR−R)Daines, S. (MT−R) Barrasso, J. (WY−R) Reed, J. (RI−D) Inhofe, J. (OK−R) Ayotte, K. (NH−R) Sessions, J. (AL−R) Reid, H. (NV−D) Hoeven, J. (ND−R) Graham, L. (SC−R) Sasse, B. (NE−R) Fischer, D. (NE−R) Kaine, T. (VA−D) Donnelly, J. (IN−D) Franken, A. (MN−D) Feinstein, D. (CA−D) Gillibrand, K. (NY−D) Murray, P. (WA−D) Capito, S. (WV−R) Tillis, T. (NC−R) Wicker, R. (MS−R) Heitkamp, H. (ND−D) Cotton, T. (AR−R) Collins, S. (ME−R) Whitehouse, S. (RI−D) Baldwin, T. (WI−D) Ernst, J. (IA−R) Brown, S. (OH−D) Mikulski, B. (MD−D) Kirk, M. (IL−R) Vitter, D. (LA−R) Thune, J. (SD−R) Shaheen, J. (NH−D) Burr, R. (NC−R) Stabenow, D. (MI−D) Coons, C. (DE−D) Menéndez, R. (NJ−D) Rubio, M. (FL−R) Boxer, B. (CA−D) Tester, J. (MT−D) Bennet, M. (CO−D) Cardin, B. (MD−D) Klobuchar, A. (MN−D) Booker, C. (NJ−D) Casey, R. (PA−D) Grassley, C. (IA−R) Warren, E. (MA−D) Hirono, M. (HI−D) King, A. (ME−I) Durbin, R. (IL−D) Murphy, C. (CT−D) Schatz, B. (HI−D) Markey, E. (MA−D) Schumer, C. (NY−D) Heinrich, M. (NM−D) Sullivan, D. (AK−R) Nelson, B. (FL−D) Blumenthal, R. (CT−D) Leahy, P. (VT−D) Peters, G. (MI−D) Merkley, J. (OR−D) Udall, T. (NM−D) Sanders, B. (VT−I) Wyden, R. (OR−D) Cantwell, M. (WA−D) Murkowski, L. (AK−R) Figure 7.1: An example of an extracted backbone, with Democratic senators represented by blue vertices, and Republican senators represented by red vertices. The data set consists of 100 senators and the 3589 bills that they have sponsored or co-sponsored in the 114th session of Congress [USG20]. This data takes the form of a bipartite network B, where the agents are the senators (rows) and the artifacts are the bills (columns). Here, 𝐵𝑖𝑘 = 1 if senator 𝑖 sponsored or co-sponsored bill 𝑘, and otherwise is 0. Below we examine the data set. Notice that the row names correspond to each senator (including their party affiliation and the state they represent) and the column names refer to the bill number. > set.seed(19) > library(backbone) > senate <- read.csv("S114.csv", row.names = 1, header = TRUE) > senate <- as.matrix(senate) > dim(senate) [1] 100 3589 > senate[1:5, 1:5] sj9 sj8 sj7 sj6 sj5 Alexander, L. (TN-R) 0 1 0 1 0 83 Boxer, B. (CA-D) 0 0 0 0 1 Cantwell, M. (WA-D) 0 0 0 0 1 Carper, T. (DE-D) 0 0 0 0 1 Cochran, T. (MS-R) 0 1 0 1 0 A weighted network P can be constructed from B via bipartite projection, where P = BB𝑇 and 𝑃𝑖 𝑗 contains the number of bills that both senator 𝑖 and senator 𝑗 sponsored. Notice the network is now 100 rows by 100 columns. > G <- senate%*%t(senate) > dim(G) [1] 100 100 > G[1:5, 1:2] Alexander, L. (TN-R) Boxer, B. (CA-D) Alexander, L. (TN-R) 141 10 Boxer, B. (CA-D) 10 303 Cantwell, M. (WA-D) 15 82 Carper, T. (DE-D) 12 55 Cochran, T. (MS-R) 40 25 The projected network P now indicates that Senator Lamar Alexander sponsored a total of 141 bills in the 114th session. Among these 141 bills, 10 were co-sponsored with Senator Barbara Boxer, and 15 were co-sponsored with Senator Maria Cantwell. We can use the values of graph P to observe differences between those with similar or dissimilar ideology. Below, we compare the number of bills co-sponsored by two individuals with similar political ideology, Senators Cory Booker and Elizabeth Warren, versus those with dissimilar ideology, Senators Ted Cruz and Bernie Sanders. The results are consistent with the expectation that legislators sharing a similar ideology engage in more co-sponsorships. 84 > G["Booker, C. (NJ-D)", "Warren, E. (MA-D)"] [1] 98 > G["Cruz, T. (TX-R)", "Sanders, B. (VT-I)"] [1] 5 The differences in the number of bills co-sponsored prompts an important underlying question: how many bills do two senators have to co-sponsor before we would be justified in concluding they are political collaborators? Similarly, how few bills do they have to co-sponsor before we would be justified in concluding they are political opponents? These questions are what the backbone package seeks to answer. 7.1.2 Spatial Networks The second type of network we will examine with the backbone package is a spatial network. Bipartite projections appear in spatial analysis, where they can take two distinct forms depending on whether the agents or artifacts are spatial entities (i.e., locations). In the locations-as-agents approach, a spatial bipartite projection is a network of locations, such that a pair of locations is connected to the extent that they share artifacts. Calling it the “interlocking world city network model,” this is the approach that [Tay01] proposed and which launched a wave of research on world city networks: major cities (the agents, which are locations) are connected to the extent that they house branch offices of the same advanced producer services firms (e.g., finance, accounting, consulting; the artifacts). It rests on the logic that offices of the same firm must communicate and interact with one another, and therefore that when two cities have an office of the same firm, there is likely interaction between them. Spatial networks adopting the locations-as-agents approach to measurement via bipartite projection are quite common at multiple spatial scales, and have been used to measure networks among urban locations connected by twitter users [Poo18], bus routes [LD20], networks among cities connected by patents [BR17], banking syndicates [PWK19], networks among countries connected by treaties [HBKM09], trade [SCS17], and corporate executives [HFC16]. 85 In the locations-as-artifacts approach, a spatial bipartite projection is a network of agents (often people or other social actors), such that a pair of agents is connected to the extent that they share locations. The locations-as-artifacts approach is less common in geography because the spatial units play only an instrumental role in the network, forging the links between agents, but do not appear in the bipartite projection network itself. However, it is common in sociological research, where the focus is on social networks emerging from spatial interactions. For example, [BCS+ 17] and [XCB20] use this approach to measure and study the social network among households in Los Angeles: households (the agents) are connected to the extent that they visit the same routine activity locations (e.g., school, work; the artifacts). This rests on the logic that places offer opportunities for casual encounters which lead to the formation of social bonds, and therefore when two households frequent the same places, they are more likely to interact with each other [Jac61]. [HKBH07] adopted a similar locations-as-artifacts approach to derive a ‘product space’ in which export products were connected to the extent that they were exported by the same countries. This follows the logic that “if [the production of] two goods...require similar institutions, infrastructure, physical factors, technology, or some combination thereof, they will tend to be produced [in the same location],” and therefore the spatial co-production of products indirectly captures their production technology similarity [HKBH07, p. 484]. There is an important link between these two approaches. When B is a bipartite network where the rows represent locations, then BB0 will yield a locations-as-agents bipartite projection, while B0B will yield a locations-as-artifacts bipartite projection. Therefore, a single bipartite network can be studied from both perspectives. For example, although the world cities literature usually focuses on cities linked by sharing firms, some have simultaneously examined a network of firms linked by their co-location in cities [e.g., Nea08, VMND16]. Similarly, [SCS17] examined not only a network of countries linked by trading the same products, but also a network of products that are traded by the same countries. The key advantage to measuring spatial networks using bipartite projections lies in the relative ease of data collection. For example, data about economic exchanges between cities may not be 86 available from official government sources, and collecting such data directly is often impractical. However, data about where firms’ offices are located is readily available, usually on the firms’ own websites. Accordingly, bipartite projections offer a practical way for researchers to indirectly approximate a city-level economic network. Similarly, because social network analysis requires data from a population (not a sample) and is sensitive to missingness, it is often impractical to collect data on the social network among residents of a large city. However, data about the places residents visit or tweet about can be collected using routine surveys, remote sensing, and digital trace measures. Accordingly, bipartite projections also offer a practical way for researchers to indirectly approximate social networks in large geographic areas. In the context of spatial analysis, it can be used for research adopting a locations-as-agents approach, to infer the spatial network among a set of locations from data on their shared character- istics. However, it can also be used for research adopting a locations-as-artifacts approach, to infer a social network among a set of actors from data on their shared locations. To illustrate backbone’s application in one specific spatial analytic context, we will demonstrate its use to examine the world city network and identify the most central cities in it. The Globalization and World Cities (GaWC) “Data Set 11” was originally collected in 2000, and records the extent of 100 advanced producer services firms’ presence in each of 315 large cities [TCW02]. These data served as the foundation for one of the earliest and most comprehensive empirical studies of the world city network [Tay04], and as a template for a substantial body of empirical research conducted by those associated with the GaWC research network. Formally, the data set takes the form of a rectangular 315 × 100 bipartite matrix B, in which 𝐵𝑖𝑘 contains the ‘service value’ of firm 𝑘’s presence in city 𝑖. The service values are an ordinal scale intended to capture the importance or extent of a firm’s presence in a city, and ranged from 0 (no presence) to 5 (global headquarters), with a value of 2 representing an presence that provides “the ‘normal’ or ‘typical’ service level of the given firm in a city” [TCW02, p. 2370]. These publicly available data can be loaded into R directly from the GaWC website (as of July 2021) and converted to matrix form. This data set is also included in the replication materials. 87 > cities <- read.csv(file="https://www.lboro.ac. uk/gawc/datasets/da11.csv", header = TRUE, row.names = 1) > cities <- as.matrix(cities) The backbone package is designed for use with binary bipartite data, so for this illustration we transform the original ordinal B into a binary B0 such that  1 if 𝐵𝑖 𝑗 ≥ 3    𝐵𝑖0 𝑗 =  . 0 if 𝐵𝑖 𝑗 ≤ 2     This transformation can be achieved, and the cities that contain no firms with a larger-than-typical presence can be excluded, by typing: > cities[cities <= 2] <- 0 > cities[cities >= 3] <- 1 > cities <- cities[rowSums(cities) != 0,] This transformation allows us to focus only on firms that maintain a larger-than-typical presence in a given city, and only on the 196 cities that contain at least one such firm. For convenience, we use B to refer to this binary matrix in the remainder of this section. Once the bipartite data has been loaded and transformed, it is possible to examine some of its features. For example, it is possible to look at the pattern of firms’ presence in cities. > cities[114:117,8:11] Horwath KPMG Summit...Baker RSM MELBOURNE 0 1 0 1 MEXICO CITY 0 1 0 0 MIAMI 1 1 0 1 MILAN 0 0 0 1 88 This command shows the portion of B that includes the 114th to 117th cities, and 8th to 11th firms. The output shows that while the accounting firms of KPMG and RSM maintained offices in several of these cities, Horwath and Summit International+Baker Tilley did not. Two key characteristics of any bipartite data are the row sums and column sums. In these data, the row sums indicate the number of firms located in a city, while the column sums indicate the number of cities in which a firm maintains a presence. > rowSums(cities)["AMSTERDAM"] AMSTERDAM 29 > rowSums(cities)["NEW YORK"] NEW YORK 74 > colSums(cities)["KPMG"] KPMG 76 > colSums(cities)["HSBC"] HSBC 43 For example, there are 74 firms that maintain a larger-than-typical presence in New York, but only 29 firms that maintain a larger-than-typical presence in Amsterdam. Likewise, KPMG maintains a larger-than-typical presence in 76 cities, while HSBC maintains a larger-than-typical presence in only 43 cities. Figure 7.2 illustrates these values for all cities and firms in these data. Specifically, Figure 7.2A shows that while most cities contain fewer than 20 firms, some cities contain many more firms. Similarly, Figure 7.2B shows that while most firms maintain a presence fewer than 40 cities, some firms maintain a presence of many more cities. 89 Figure 7.2: The distribution of (A) row sums and (B) column sums in the GaWC Dataset 11. The conventional “specification of the world city network” used in GaWC research involves computing a weighted bipartite projection P from the original bipartite data B [Tay01]. > P <- cities %*% t(cities) Following this specification, the cities are treated as agents and the firms are treated as artifacts. The resulting square matrix P is treated as a weighted world city network in which the strength of the connection between a pair of cities is measured by their number of co-located firms. For example, examining the matrix cell corresponding to the connection between Amsterdam and New York > P["AMSTERDAM","NEW YORK"] [1] 26 indicates that 26 firms maintain a presence in both cities, and might be interpreted as evidence that they interact economically. Many analyses of the world city network focus on cities’ degree centrality, or what is sometimes called a city’s “global network centrality” (GNC). This value measures a city’s total number or 90 strength of connections in the network, and is interpreted as an indicator of a city’s status or importance in the network. > sort(rowSums(P), decreasing = TRUE)[1:5] LONDON NEW YORK PARIS HONG KONG SINGAPORE 1496 1403 1043 1032 913 In these data, London and New York have the greatest centrality, occupying the top tier of the urban hierarchy as what GaWC research calls Alpha++ cities [BST99]. They are followed by a second tier of Alpha+ cities that include Paris, Hong Kong, and Singapore. This approach appears to successfully identify what nearly any scholar of globalization would regard as the cities “used by global capital as basing points in the spatial organization and articulation of production and markets” [Fri86, p. 71]. However, these values and this weighted spatial network are less informative than they might seem. The centrality values derived from this network are almost perfectly correlated with the number of firms located in each city (i.e. the row sums of B). > cor(rowSums(P), rowSums(cities)) [1] 0.9767704 The high correlation indicates that this approach to identifying central cities in a world city network is actually just identifying cities that contain many firms. This occurs because measuring a world city network using a weighted bipartite projection of firm locations guarantees that cities with many firms will have stronger connections and larger centrality values [Nea12]. If world city researchers were simply interested in finding cities with many firms, there are much simpler ways achieve this (e.g., counting a city’s number of firms). In practice, world city researchers are interested in something more nuanced: studying cities that are central in a network of economic interactions. The challenge is that although firm co-location may provide information about which cities interact economically, firm co-location is not the same 91 as economic interaction. The backbone package can be used to make inferences about which cities are engaged in economic interaction based on firm co-location patterns. Specifically, it can be used to estimate whether the number of firms co-located in two cities is large enough to warrant concluding that the two cities are engaged in meaningful economic interaction. The backbone of the world city network is a binary network in which pairs of cities are connected only if their number of co-located firms suggests they are engaged in meaningful economic interaction, and therefore provides a simplified and potentially more focused depiction of the world city network. We’ll now examine how the backbone package’s functionality provides insights on both the spatial and legislative networks described. 7.2 Universal Threshold universal() The simplest approach to backbone extraction applies a single threshold value 𝑇 to all edges. As mentioned previously, often 𝑇 = 0 is used which leads to very dense and highly clustered backbones. While we do not recommend using a universal threshold method, this is included in the backbone package for comparison purposes. The function, universal() allows the user to extract a single threshold 𝑇, or extract a signed backbone by selecting upper and lower thresholds 𝑇 + and 𝑇 − . For both the senate and the world cities data sets, we’ll use the universal() function to compute a backbone with a single threshold of 0. Thus in the legislative network, if two senators have co-sponsored one or more bills, there will be an edge between them. Similarly, any number of firm co-locations is interpreted as evidence of economic interaction between a pair of cities. Notice that our backbone graph is represented by a square adjacency matrix with 0-1 entries. > universalbb <- universal(senate, upper = 0, bipartite = TRUE) > universalbb$backbone[1:5, 1:2] Alexander, L. (TN-R) Boxer, B. (CA-D) Alexander, L. (TN-R) 0 1 Boxer, B. (CA-D) 1 0 92 Reed, J. (RI−D) Menéndez, R. (NJ−D) Grassley, C. (IA−R) Cardin, B. (MD−D) Lankford, J. (OK−R) Vitter, D. (LA−R) Hirono, M. (HI−D) Graham, L. (SC−R) Fischer, D. (NE−R) Kaine, T. (VA−D) Wicker, R. (MS−R) Toomey, P. (PA−R) McCaskill, C. (MO−D) Booker, C. (NJ−D) Hoeven, J. (ND−R) Heitkamp, H. (ND−D) Brown, S. (OH−D) Capito, S. (WV−R) Sasse, B. (NE−R) Tillis, T. (NC−R) King, A. (ME−I) Bennet, M. (CO−D) Moran, J. (KS−R) Tester, J. (MT−D) Risch, J. (ID−R) Perdue, D. (GA−R) Corker, B. (TN−R) Murkowski, L. (AK−R) Enzi, M. (WY−R) McCain, J. (AZ−R) Schatz, B. (HI−D) Warner, M. (VA−D) Casey, R. (PA−D) McConnell, M. (KY−R) Donnelly, J. (IN−D) Crapo, M. (ID−R) Alexander, L. (TN−R) Warren, E. (MA−D) Blumenthal, R. (CT−D) Coats, D. (IN−R) Roberts, P. (KS−R) Lee, M. (UT−R) Sessions, J. (AL−R) Rounds, M. (SD−R) Manchin, J. (WV−D) Cantwell, M. (WA−D) Murphy, C. (CT−D) Ernst, J. (IA−R) Heller, D. (NV−R) Nelson, B. (FL−D) Sullivan, D. (AK−R) Franken, A. (MN−D) Gillibrand, K. (NY−D) Cruz, T. (TX−R) Reid, H. (NV−D) Cotton, T. (AR−R) Daines, S. (MT−R) Wyden, R. (OR−D) Ayotte, K. (NH−R) Barrasso, J. (WY−R) Hatch, O. (UT−R) Peters, G. (MI−D) Mikulski, B. (MD−D) Coons, C. (DE−D) Baldwin, T. (WI−D) Durbin, R. (IL−D) Boozman, J. (AR−R) Inhofe, J. (OK−R) Rubio, M. (FL−R) Collins, S. (ME−R) Whitehouse, S. (RI−D) Flake, J. (AZ−R) Leahy, P. (VT−D) Cochran, T. (MS−R) Cornyn, J. (TX−R) Schumer, C. (NY−D) Johnson, R. (WI−R) Gardner, C. (CO−R) Paul, R. (KY−R) Murray, P. (WA−D) Portman, R. (OH−R) Kirk, M. (IL−R) Feinstein, D. (CA−D) Boxer, B. (CA−D) Scott, T. (SC−R) Cassidy, B. (LA−R) Burr, R. (NC−R) Udall, T. (NM−D) Shaheen, J. (NH−D) Markey, E. (MA−D) Isakson, J. (GA−R) Blunt, R. (MO−R) Merkley, J. (OR−D) Thune, J. (SD−R) Carper, T. (DE−D) Heinrich, M. (NM−D) Klobuchar, A. (MN−D) Sanders, B. (VT−I) Shelby, R. (AL−R) Stabenow, D. (MI−D) Figure 7.3: The positive backbone of the US Senate co-sponsorship network with edges retained between two senators if they sponsored at least 1 bill together. Cantwell, M. (WA-D) 1 1 Carper, T. (DE-D) 1 1 Cochran, T. (MS-R) 1 1 The density of a network is the number of edges in the network, divided by the number of possible edges in the network. Plotting this backbone using the igraph package [CN06] reveals that it is extremely dense as only 1 pair of senators out of the total 4950 unique pairs have not sponsored at least one bill together (see fig. 7.3). Accordingly, this universal threshold backbone is uninformative about the underlying structure of the network. Moreover, partitioning this backbone into two groups by political party yields a modularity near zero, which indicates that this backbone does not reflect the partisan polarization known to exist in the US Senate. We see a similar density problem occur in the world cities network. > universal0 <- universal(cities, upper = 0, bipartite = TRUE) 93 > table(universal0$backbone) 0 1 21506 16910 > mean(universal0$backbone) [1] 0.4401812 > sort(rowSums(universal0$backbone), decreasing = TRUE)[1:5] LONDON NEW YORK PARIS HONG KONG LOS ANGELES 191 185 175 171 171 > cor(rowSums(universal0$backbone), rowSums(cities)) [1] 0.7407175 A backbone extracted using 𝑇 = 0 is quite dense (44% of possible inter-city connections are present) because it treats even small numbers of firm co-locations as evidence of economic interaction between cities. As a result, the most central cities are still obviously large cities that contain many firms, and indeed, cities’ centrality in this network remains highly correlated (𝑟 = 0.74) with their total number of firms. A sparser network containing fewer inter-city connections can be obtained using a higher (i.e. more stringent) threshold that retains only particularly strong connections [e.g., DT05]. For example, the universal() function can be used to extract a backbone where 𝑇 = 25, and therefore only cities with more than 25 co-located firms are counted as connected: > universal25 <- universal(cities, upper = 25, bipartite = TRUE) > mean(universal25$backbone) [1] 0.001665973 > sort(rowSums(universal25$backbone), decreasing = TRUE)[1:5] LONDON NEW YORK HONG KONG PARIS CHICAGO 15 12 5 5 3 > cor(rowSums(universal25$backbone), rowSums(cities)) 94 [1] 0.8381523 This more stringent universal threshold is indeed much less dense (only 0.16% of possible edges are present). However, it still remains focused on the largest cities, whose centrality is highly correlated (𝑟 = 0.84) with the total number of firms. These approaches involve an arbitrarily-selected threshold, however the universal() function can also be used to apply a universal threshold that is based on characteristics of the weighted bipartite projection P. For example, it is possible to extract a backbone in which cities are connected if they have more than two standard deviations above the average number of co-located firms. > universal.meansd <- universal(B, upper = function(x)mean(x)+2*sd(x), bipartite = TRUE) > mean(universal.meansd$backbone) [1] 0.03092461 > sort(rowSums(universal.meansd$backbone), decreasing = TRUE)[1:5] LONDON NEW YORK HONG KONG PARIS SINGAPORE 64 61 51 49 42 > cor(rowSums(universal.meansd$backbone), rowSums(cities)) [1] 0.9655334 This backbone is also lower density (3% of possible edges are present), but once again it focuses only on large cities, whose centrality is nearly identical to their total number of firms (𝑟 = 0.97). To create a signed backbone, we can apply both an upper and lower threshold value. The following code will return a backbone where the positive edges indicate two senators co-sponsored more than 1 standard deviation above the mean number of co-sponsored bills and negative edges indicate two senators co-sponsored less than 1 standard deviation below the mean number of co-sponsored bills. The graph of the positive edges of this backbone can be seen in fig. 7.4. > universalbb2 <- universal(senate, upper = function(x) mean(x)+sd(x), 95 Burr, R. (NC−R) Warner, M. (VA−D) Cassidy, B. (LA−R) Carper, T. (DE−D) Lee, M. (UT−R) McCaskill, C. (MO−D) Kaine, T. (VA−D) Enzi, M. (WY−R) Rounds, M. (SD−R) Toomey, P. (PA−R) Barrasso, J. (WY−R) Shelby, R. (AL−R) Manchin, J. (WV−D) Johnson, R. (WI−R) Paul, R. (KY−R) Corker, B. (TN−R) Donnelly, J. (IN−D) Gardner, C. (CO−R) Hoeven, J. (ND−R) Sullivan, D. (AK−R) Reid, H. (NV−D) Sasse, B. (NE−R) Ernst, J. (IA−R) Graham, L. (SC−R) Fischer, D. (NE−R) Sessions, J. (AL−R) Murkowski, L. (AK−R) Thune, J. (SD−R) Daines, S. (MT−R) Tillis, T. (NC−R) Risch, J. (ID−R) Scott, T. (SC−R) Cochran, T. (MS−R) Heller, D. (NV−R) Crapo, M. (ID−R) McConnell, M. (KY−R) Perdue, D. (GA−R) Moran, J. (KS−R) Portman, R. (OH−R) Blunt, R. (MO−R) Roberts, P. (KS−R) Hatch, O. (UT−R) Coats, D. (IN−R) Isakson, J. (GA−R) Inhofe, J. (OK−R) Cornyn, J. (TX−R) Wicker, R. (MS−R) Alexander, L. (TN−R) Boozman, J. (AR−R) Vitter, D. (LA−R) Capito, S. (WV−R) Ayotte, K. (NH−R) Cotton, T. (AR−R) Rubio, M. (FL−R) Grassley, C. (IA−R) Kirk, M. (IL−R) Nelson, B. (FL−D) Collins, S. (ME−R) King, A. (ME−I) Flake, J. (AZ−R) Cruz, T. (TX−R) Leahy, P. (VT−D) Schatz, B. (HI−D) McCain, J. (AZ−R) Coons, C. (DE−D) Wyden, R. (OR−D) Peters, G. (MI−D) Casey, R. (PA−D) Shaheen, J. (NH−D) Booker, C. (NJ−D) Klobuchar, A. (MN−D) Gillibrand, K. (NY−D) Merkley, J. (OR−D) Murphy, C. (CT−D) Schumer, C. (NY−D) Durbin, R. (IL−D) Hirono, M. (HI−D) Baldwin, T. (WI−D) Brown, S. (OH−D) Sanders, B. (VT−I) Bennet, M. (CO−D) Blumenthal, R. (CT−D) Whitehouse, S. (RI−D) Tester, J. (MT−D) Murray, P. (WA−D) Boxer, B. (CA−D) Warren, E. (MA−D) Franken, A. (MN−D) Markey, E. (MA−D) Menéndez, R. (NJ−D) Heinrich, M. (NM−D) Stabenow, D. (MI−D) Cardin, B. (MD−D) Lankford, J. (OK−R) Feinstein, D. (CA−D) Mikulski, B. (MD−D) Udall, T. (NM−D) Cantwell, M. (WA−D) Reed, J. (RI−D) Heitkamp, H. (ND−D) Figure 7.4: The positive backbone of the US Senate co-sponsorship network with edges retained between two senators if they sponsored more bills together than one standard deviation above the mean. lower = function(x) mean(x)-sd(x), bipartite = TRUE) The resulting graph in fig. 7.4 is much less dense than when using an upper threshold of 0. Additionally, the polarized structure of the Senate by political party is visible, and is confirmed by a larger modularity (𝑄 = 0.277). However, it still does not necessarily reveal the underlying structure of the network among legislators. In this case, “the application of a threshold to the global weight distribution...belittles nodes with a small [degree],” resulting in a backbone that preserves edges only among legislators who sponsor many bills, and treating legislators who sponsor few bills as isolates [SBV09, p. 6484]. Similarly in the world cities network, the universal threshold backbone extraction does not take into account variations in the number of firms located in each city. By not controlling for these variations (which are substantial in this data, see 7.2A) when deciding whether two cities are connected, it privileges cities that contain many firms. In these data, because there are large variations in the number of firms located in each city that must be controlled for, a 96 universal threshold backbone is not appropriate. To obtain meaningfully sparse graphs that do not ignore the multi-scalar character of node degrees we must allow the threshold to vary for different edges. To improve our backbone results, we move to methods of bipartite projection backbones that rely on a distinct threshold value for each pair of vertices. Extracting a null model backbone: backbone.extract( ) Instead of using a universal threshold to determine a backbone, the backbone package in- corporates the five different ensemble methods previously mentioned in chapter 6: FFM, FRM, FCM, SDSM, and FDSM. These models 𝑀 do control for variation in the row and column degree sequences of B∗ ∈ B 𝑀 . To use these methods in backbone, one first calls to an ensemble model function (fixedfill(), fixedrow(), fixedcol(), sdsm(), or fdsm()), which finds the prob- ability of observing an edge with the observed weight in a corresponding null model, returning an object of class ‘backbone.’ This object contains the following: a positive matrix with (𝑖, 𝑗) entry equal to the probability that 𝐺 𝑖∗𝑗 is equal to or above the corresponding entry in 𝐺, and a negative matrix with (𝑖, 𝑗) entry equal to the probability that 𝐺 𝑖∗𝑗 is equal to or below the corresponding entry in 𝐺, and summary, a data frame summary of the inputted matrix and model including the class, model name, number of rows, and number of columns. This ‘backbone’ object is then supplied to backbone.extract(), which performs the hypoth- esis test for a given significance value and returns a backbone graph. The user can input bipartite graph objects of class ‘matrix’, ‘sparseMatrix’, ‘Matrix’, ‘igraph’, ‘network’, and ‘edgelist’ (a matrix of two columns), and can choose the type of backbone returned by specifying the desired class in backbone.extract(). The backbone.extract() function allows the user to input the backbone class object and obtain either a signed or positive backbone. This backbone.extract() function has five arguments: matrix, signed, alpha, class, narrative, and fwer. The matrix argu- ment takes a backbone object generated by fixedfill(), fixedrow(), fixedcol(), sdsm(), or fdsm() and returns a backbone graph of class = class using a two-tailed significance test with 97 significance value 𝛼 = alpha. If the signed parameter is set to TRUE then a signed backbone is returned, if it is set to FALSE then a positive backbone is returned. If the narrative parameter is set to TRUE then suggested narrative text for a manuscript, including possible citations, is displayed. Extracting the backbone of a bipartite projection involves conducting an independent statistical test on ℓ = 𝑚(𝑚 − 1)/2 edges in the projection, where 𝑚 is the number of vertices in the bipartite projection. Because each of these tests is independent, this can inflate the familywise error rate beyond the desired alpha. The fwer parameter offers two ways to correct for this: the classical Bonferroni correction is applied when fwer = ‘bonferroni’, and the more powerful Holm- Bonferroni correction is applied when fwer = ‘holm’ [Hol79]. 7.3 Fixed fill model fixedfill() The fixedfill() function will apply the fixed fill ensemble model to the bipartite network. Due to the large binomial coeffients in the probability distribution, this model as currently imple- mented in backbone v1.5.0 is infeasible on large networks like the Senate data set. However, we can still apply it to the world cities network and do so below. Regardless, as we’ll see in Chapter chapter 8, FFM is not the recommended model for bipartite backbone extraction when there is concern regarding the degree sequences. > fixedprobs <- fixedfill(cities) > fixedbb <- backbone.extract(fixedprobs) In this null model, the number of edges in the network is held constant, that is, our observed world cities network is compared to all other possible networks with the same density. Specifically in this instance, the number of firms present in cities remains fixed, but the number of firms per company and number of firms per city may vary. Notice above we’ve applied the backbone.extract() function here after choosing the fixedfill() function which determined the ensemble method. Under the default settings, backbone.extract() has extracted a positive backbone under an alpha value of 0.05. Since all statistical tests are two-tailed tests, an edge is retained in the cities network 98 if the probability of two cities having the observed number of co-located firms is greater than or equal to 0.025, i.e., the upper tail of the Jacobi distribution. > mean(fixedbb) [1] 0.07418784 > sort(rowSums(fixedbb), decreasing = TRUE)[1:5] LONDON NEW YORK PARIS HONG KONG TOKYO 94 87 76 71 71 > cor(rowSums(fixedbb), rowSums(cities)) [1] 0.9293961 This FFM backbone network has a low density but again provides information focused around the largest cities. The centrality is highly correlated with number of firms. Instead of this model which compares a bipartite B with other networks of the same density, we’ll now apply the remaining models which are based upon the degree sequences. 7.4 Fixed row model fixedrow() To apply the fixed row distribution to a bipartite graph, one uses the fixedrow() function. The FRM is also often called hypergeometric as it estimates a hypergeometric probability distribution for each pair of nodes in the network. As an example, > rowprobs <- fixedrow(senate) > rowbb <- backbone.extract(rowprobs, alpha = .01) We can now examine how this method has changed the appearance of our network, focusing only on the positive edges of the signed backbone in fig. 7.5. We can see that the FRM has reduced the density of our network and that we begin to see some of the two party structure that is inherent in the United States Senate. The known polarized structure is also apparent, which is reflected in this network’s modularity (𝑄 = 0.215). 99 Boxer, B. (CA−D) Sanders, B. (VT−I) Durbin, R. (IL−D) Reid, H. (NV−D) Hirono, M. (HI−D) Blumenthal, R. (CT−D) Wyden, R. (OR−D) Cardin, B. (MD−D) Markey, E. (MA−D) Heinrich, M. (NM−D) Murphy, C. (CT−D) Whitehouse, S. (RI−D) Leahy, P. (VT−D) Reed, J. (RI−D) Mikulski, B. (MD−D) Schumer, C. (NY−D) Bennet, M. (CO−D) McCaskill, C. (MO−D) Warren, E. (MA−D) Murray, P. (WA−D) Schatz, B. (HI−D) Klobuchar, A. (MN−D) Gillibrand, K. (NY−D) Casey, R. (PA−D) Merkley, J. (OR−D) Baldwin, T. (WI−D) Warner, M. (VA−D) Tester, J. (MT−D) Cantwell, M. (WA−D) Franken, A. (MN−D) King, A. (ME−I) Booker, C. (NJ−D) Cochran, T. (MS−R) Stabenow, D. (MI−D) Daines, S. (MT−R) Peters, G. (MI−D) Heller, D. (NV−R) Murkowski, L. (AK−R) Manchin, J. (WV−D) Carper, T. (DE−D) Brown, S. (OH−D) Kirk, M. (IL−R) Kaine, T. (VA−D) Alexander, L. (TN−R) Moran, J. (KS−R) Feinstein, D. (CA−D) Donnelly, J. (IN−D) Graham, L. (SC−R) Johnson, R. (WI−R) Shaheen, J. (NH−D) Roberts, P. (KS−R) Nelson, B. (FL−D) Menéndez, R. (NJ−D) Burr, R. (NC−R) Ernst, J. (IA−R) Hoeven, J. (ND−R) Blunt, R. (MO−R) Collins, S. (ME−R) Flake, J. (AZ−R) Udall, T. (NM−D) Heitkamp, H. (ND−D) Capito, S. (WV−R) Corker, B. (TN−R) Fischer, D. (NE−R) Coons, C. (DE−D) Ayotte, K. (NH−R) Gardner, C. (CO−R) Boozman, J. (AR−R) Enzi, M. (WY−R) McCain, J. (AZ−R) Sullivan, D. (AK−R) Portman, R. (OH−R) Wicker, R. (MS−R) Perdue, D. (GA−R) Hatch, O. (UT−R) Risch, J. (ID−R) Grassley, C. (IA−R) Tillis, T. (NC−R) Cassidy, B. (LA−R) Thune, J. (SD−R) Rounds, M. (SD−R) McConnell, M. (KY−R) Sasse, B. (NE−R) Cotton, T. (AR−R) Isakson, J. (GA−R) Toomey, P. (PA−R) Lankford, J. (OK−R) Lee, M. (UT−R) Rubio, M. (FL−R) Crapo, M. (ID−R) Scott, T. (SC−R) Sessions, J. (AL−R) Barrasso, J. (WY−R) Paul, R. (KY−R) Vitter, D. (LA−R) Coats, D. (IN−R) Cornyn, J. (TX−R) Cruz, T. (TX−R) Inhofe, J. (OK−R) Shelby, R. (AL−R) Figure 7.5: The positive backbone of the US Senate co-sponsorship network under the fixed row model. Specifically, for our example, the fixed row function will fix the number of bills that each senator sponsors, while allowing each bill to be sponsored by a varying number of senators. The function will compute the probability of each senator sponsoring at least (or at most) the observed number of bills when the bills which they sponsor were chosen randomly. Similarly, we can see how the fixed row model affects the world cities network. > rowprobs2 <- fixedrow(cities) > rowbb2 <- backbone.extract(rowprobs2, alpha = .1) > mean(rowbb2) [1] 0.09225323 > sort(rowSums(rowbb2), decreasing = TRUE)[1:5] INDIANAPOLIS PORTLAND MELBOURNE LYON AUCKLAND 60 54 52 49 44 100 > cor(rowSums(rowbb2), rowSums(cities)) [1] 0.3039028 First, it is less dense than the 𝑇 = 0 universal threshold backbone, but denser than the 25- threshold or mean-threshold backbones, containing 9.2% of possible edges. That is, this model does reduce the complexity of the original network, but still preserves many intercity connections. Second, and perhaps more notably, because the FRM controls for the number of firms in each city when deciding which intercity connections to keep, it does not simply focus on cities that are large and contain many firms. Indeed, while the most central cities are major financial centers, they are not the obvious ones typically highlighted in world cities research. Moreover, cities’ centrality and total firm count are only modestly correlated (𝑟 = 0.30), indicating that cities’ centrality in this network provides information that is unique from what could have been learned from simply counting their number of firms. Although the FRM does control for the number of firms in each city (i.e. the row sums of B∗ ∈ B 𝐹 𝑅𝑀 ), it does not control for the number of cities where each firm maintains a presence (i.e. the column sums of B∗ ∈ B 𝐹 𝑅𝑀 ). However, there is substantial variation in the number of cities where each firm maintains a presence (see Figure 7.2B), and not controlling for this variation can distort decisions about whether a particular city pair’s number of co-located firms is significant. For example, if Firm X maintains a presence in every city, then observing that it is co-located in Amsterdam and New York is trivial. In contrast, if Firm Y maintains a presence in only two cities then observing that it is co-located in Amsterdam and New York is quite noteworthy. Because these data contain not only large variations in the number of firms in each city (see figure 7.2A) but also large variations in the number of cities where each firm maintains a presence (see figure 7.2B), the FRM is not appropriate. More generally, a FRM backbone and the fixedrow() function are appropriate only when there is variation in the row sums of B, but limited variation in the column sums of B. 101 Sasse, B. (NE−R) Shelby, R. (AL−R) McConnell, M. (KY−R) Carper, T. (DE−D) McCaskill, C. (MO−D) Warner, M. (VA−D) Nelson, B. (FL−D) Kaine, T. (VA−D) Reed, J. (RI−D) Booker, C. (NJ−D) Cantwell, M. (WA−D) Sullivan, D. (AK−R) Menéndez, R. Warren, (NJ−D) E. (MA−D) Wyden, R. Mikulski, (OR−D) B. (MD−D) Boxer, B. (CA−D) Heinrich, M. (NM−D) Durbin, R. (IL−D) Baldwin, T. (WI−D) Schatz, B. (HI−D) Gillibrand, K. (NY−D) Cardin, B. (MD−D) Blumenthal, R. (CT−D) Markey, Murphy, C. (CT−D) E. (MA−D) Sanders, B. (VT−I) Hirono, M. (HI−D) Shaheen, J. (NH−D) Leahy, P. (VT−D) Casey, R. (PA−D) Merkley, J. (OR−D) Coons, C. (DE−D) Brown, S. (OH−D) Peters, G. (MI−D) Feinstein, D. (CA−D)Murray, P. (WA−D) Franken, A. (MN−D) Klobuchar, A. (MN−D) Schumer, C. (NY−D) Stabenow, D. Whitehouse, (MI−D) S. (RI−D) Udall, T. (NM−D) Paul, R. (KY−R) Bennet, M. (CO−D) Tester, J. (MT−D) Ernst, J. (IA−R) King, A. (ME−I) Murkowski, L. (AK−R) Collins, S. (ME−R) Heitkamp, H. (ND−D) Kirk, M. (IL−R) Portman, R. (OH−R) Burr, R. (NC−R) Graham, L. (SC−R) Ayotte, K. (NH−R) Grassley, C. (IA−R) Tillis, T. (NC−R) Moran, J. (KS−R) Donnelly, J. (IN−D) Coats, D. (IN−R) Blunt, R. (MO−R) Wicker, R. (MS−R) Gardner, C. (CO−R) Boozman, J. (AR−R) McCain, J. (AZ−R) Cotton, T. (AR−R) Roberts, P. (KS−R) Capito, S. (WV−R) Lankford, J. (OK−R) Cochran, T. (MS−R) Rubio, M. (FL−R) Daines, S. (MT−R) Hatch, O. (UT−R) Johnson, R. (WI−R) Isakson, J. (GA−R) Inhofe, J. (OK−R) Vitter, D. (LA−R) Heller, D. (NV−R) Toomey, P. (PA−R) Crapo, M. (ID−R) Cornyn, J. (TX−R) Barrasso, J. (WY−R) Fischer, D. (NE−R) Flake, J. (AZ−R) Risch, J. (ID−R) Thune, J. (SD−R) Perdue, D. (GA−R) Enzi, M. (WY−R) Corker, B. (TN−R) Scott, T. (SC−R) Rounds, M. (SD−R) Hoeven, J. (ND−R) Lee, M. (UT−R) Sessions, J. (AL−R) Cruz, T. (TX−R) Cassidy, B. (LA−R) Manchin, J. (WV−D) Alexander, L. (TN−R) Reid, H. (NV−D) Figure 7.6: The positive backbone of the US Senate co-sponsorship network under the fixed column model. 7.5 Fixed column model fixedcol() The fixed column distribution can be used through the fixedcol() function. In this scenario, the fixed column function fixes the number of senators that sponsor each bill, while allowing each senator to sponsor a varying number of bills. > colprobs<- fixedcol(senate) > colbb <- backbone.extract(colprobs, alpha = .01) We can now examine how the fixed column model (also called Poisson binomial) has changed the appearance of our co-sponsorship network, again examining the positive edges in fig. 7.6. We can see that the fixed column function has again reduced the density of our network and the two party structure is more apparent. The known polarized structure is reflected in this network’s even higher modularity (𝑄 = 0.424). We mentioned the FRM is not a good choice for the world cities network because of the substantial variation in the column sums. Here, the FCM would control for this variation in number 102 of cities where each firm maintains a presence, but introduces a similar problem in that the row sums are now also not controlled for. The high correlation with total number of firms exemplifies this issue. > colprobs2 <- fixedcol(cities) > colbb2 <- backbone.extract(colprobs2, alpha = 0.1, signed = FALSE) > mean(colbb2) [1] 0.07418784 > sort(rowSums(colbb2), decreasing = TRUE)[1:5] LONDON NEW YORK PARIS HONG KONG TOKYO 94 87 76 71 71 > cor(rowSums(fixedcol_bb2), rowSums(cities)) [1] 0.9293961 We’ll now attempt to approach our ‘gold-standard’ model, where we compare our observed data set to all other bipartite networks with the exact same degree sequences. The backbone package provides two ways to do this, SDSM where the degree sequences are approximately fixed and the probability mass function is known, and FDSM where the probability mass function is unknown and thus the distribution is constructed through sampling. 7.6 Stochastic degree sequence model sdsm() When describing the Stochastic degree sequence model in chapter 6, we choose probabilities ∗ so that it approximates Pr(𝐵∗ = 1) for B∗ ∈ B 𝑆𝐷𝑆𝑀 . Here we use the Bipartite Configuration 𝑝𝑖𝑘 𝑖𝑘 Model or BiCM to compute those probabilities for the Poisson binomial distribution, which is used in the SDSM. In the following chapter 8, we will demonstrate why BiCM is the right choice for computing these probabilities. In the context of the senate co-sponsorship matrix, the stochastic degree sequence model will compare our observed values to a distribution where each senator sponsors roughly the same number of bills, and each bill is sponsored by roughly the same number of people. Also demonstrated is 103 the ‘narrative’ parameter which prints out information regarding the backbone network and the citations for the model used. > sdsm <- sdsm(senate) > sdsmbb <- backbone.extract(sdsm, narrative = TRUE, alpha = .01) Suggested manuscript text and citations: From a bipartite graph containing 100 agents and 3589 artifacts, we obtained the weighted bipartite projection, then extracted its binary backbone using the backbone package (Domagalski, Neal, & Sagan, 2021). Edges were retained in the backbone if their weights were statistically significant (alpha = 0.01) by comparison to a null Stochastic Degree Sequence Model (SDSM; Neal, 2014). Domagalski, R., Neal, Z. P., and Sagan, B. (2021). backbone: An R Package for Backbone Extraction of Weighted Graphs. PLoS ONE. https://doi.org/10.1371/journal.pone.0244363 Neal, Z. P. (2014). The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Social Networks, 39, 84-97. https://doi.org/10.1016/j.socnet.2014.06.001 We are able to see more of the partisan structure that is suggested to be present in the US Senate in fig. 7.7, and this visualization provides more information than the extremely dense graph found using a universal threshold. Moreover, the known polarized structure of the US Senate is particularly evident, and confirmed by the much larger modularity (𝑄 = 0.471). 104 McCaskill, C. (MO−D) Warner, M. (VA−D) Nelson, B. (FL−D) Carper, T. (DE−D) Blumenthal, R. (CT−D) Kaine, T. (VA−D) Murkowski, L. (AK−R) Menéndez, R. (NJ−D) Schumer, C. (NY−D) Feinstein, D. (CA−D) Durbin, R. (IL−D) Stabenow, D. (MI−D) Reed, J. (RI−D) Gillibrand, K. (NY−D) Whitehouse, S. (RI−D) Booker, C. (NJ−D) Bennet, M. (CO−D) Cardin, B. (MD−D) Klobuchar, A. (MN−D) Brown, S. (OH−D) Baldwin, T. (WI−D) Casey, R. (PA−D) Mikulski, B. (MD−D) Sullivan, D. (AK−R) Reid, H. (NV−D) King, A. (ME−I) Murphy, C. (CT−D) Heitkamp, H. (ND−D) Markey, E. (MA−D) Peters, G. (MI−D) Coons, C. (DE−D) Franken, A. (MN−D) Leahy, P. (VT−D) Sanders, B. (VT−I) Warren, E. (MA−D) Shaheen, J. (NH−D) Hirono, M. (HI−D) Boxer, B. (CA−D) Manchin, J. (WV−D) Murray, P. (WA−D) Donnelly, J. (IN−D) Merkley, J. (OR−D) Heinrich, M. (NM−D) Cantwell, M. (WA−D) Schatz, B. (HI−D) Grassley, C. (IA−R) Udall, T. (NM−D) Collins, S. (ME−R) Cassidy, B. (LA−R) Tester, J. (MT−D) Wyden, R. (OR−D) Vitter, D. (LA−R) Kirk, M. (IL−R) Thune, J. (SD−R) Wicker, R. (MS−R) Hoeven, J. (ND−R) Alexander, L. (TN−R) Cochran, T. (MS−R) Capito, S. (WV−R) Fischer, D. (NE−R) Rounds, M. (SD−R) Cornyn, J. (TX−R) Ayotte, K. (NH−R) Graham, L. (SC−R) Shelby, R. (AL−R) Boozman, J. (AR−R) Sasse, B. (NE−R) Sessions, J. (AL−R) Roberts, P. (KS−R) Blunt, R. (MO−R) Scott, T. (SC−R) McConnell, M. (KY−R) Hatch, O. (UT−R) Coats, D. (IN−R) Daines, S. (MT−R) Enzi, M. (WY−R) Barrasso, J. (WY−R) Moran, J. (KS−R) Perdue, D. (GA−R) Paul, R. (KY−R) Burr, R. (NC−R) Inhofe, J. (OK−R) Tillis, T. (NC−R) Cruz, T. (TX−R) Isakson, J. (GA−R) Ernst, J. (IA−R) Lee, M. (UT−R) Crapo, M. (ID−R) Lankford, J. (OK−R) Cotton, T. (AR−R) Risch, J. (ID−R) Toomey, P. (PA−R) Gardner, C. (CO−R) Johnson, R. (WI−R) Corker, B. (TN−R) Rubio, M. (FL−R) Flake, J. (AZ−R) McCain, J. (AZ−R) Portman, R. (OH−R) Heller, D. (NV−R) Figure 7.7: The positive backbone of the US Senate co-sponsorship network under the stochastic degree sequence model. Before examining the entire SDSM world cities backbone, consider how it determines whether the number of co-located firms is statistically significant for a single city-pair. In fig. 7.8, three of our ensemble models are drawn. The blue curve shows the number of firms that would be co-located in Amsterdam and New York if all firms located in cities randomly, but on average the number of firms in each city did not change and on average the number of cities where each firm maintains a presence did not change. The SDSM distribution is wider and flatter than the FDSM distribution, but has nearly the same midpoint. These differences arise because the SDSM distribution is an approximation of the more targeted FDSM distribution. As an approximation with a wider distribution, the SDSM is less statistically powerful, therefore we use a more liberal threshold of statistical significance so that it will more closely mirror the FDSM. The 26 co-located firms actually observed in Amsterdam and New York is in the middle of the SDSM distribution, which indicates that this value is about what might be expected even under random conditions (i.e. not statistically significant). Therefore, the SDSM backbone does not include a link between 105 Figure 7.8: Null weight distributions generated using the backbone package on from the GaWC Dataset 11 Amsterdam and New York. > sdsm2 <- sdsm(cities) > sdsmbb2 <- sdsm(sdsm2) > mean(sdsmbb2) [1] 0.01973136 > sort(rowSums(sdsmbb2), decreasing = TRUE)[1:5] KANSAS CITY CHARLOTTE RICHMOND INDIANAPOLIS BORDEAUX 24 21 20 18 17 > cor(rowSums(sdsmbb2), rowSums(cities)) [1] -0.1062661 The SDSM backbone is a sparse network, in which medium-sized regional centers are the most central cities, and cities’ centrality and total firm count are uncorrelated (𝑟 = −0.11). 106 7.7 Fixed degree sequence model fdsm() As mentioned in the previous chapter, the fixed degree sequence model first samples random bipartite networks 𝐵∗ ∈ B 𝐹 𝐷𝑆𝑀 that preserves both degree sequences using the curveball algorithm [SUG18]. These bipartite graphs 𝐵∗ are then projected to obtain random weighted bipartite projection P∗ = B∗ B∗> . These two steps are repeated a number of times to sample the space of possible 𝑃𝑖∗𝑗 . At each iteration, we compare 𝑃𝑖 𝑗 to the value of 𝑃𝑖∗𝑗 and keep a record of how often it was above, below, or equal to the generated value. The fdsm() function returns a backbone object containing a matrix object positive of the proportion of times 𝑃𝑖∗𝑗 is equal to or above the corresponding entry in P, and a matrix object negative containing the proportion of times 𝑃𝑖∗𝑗 is equal to or below the corresponding entry in P. This differs from the previous ensemble methods where the exact probability mass function is known and a probability can be given. The fdsm() function can also save each value of 𝑃𝑖∗𝑗 for a given 𝑖, 𝑗. This is useful for visualizing an example of the empirical null edge weight distribution generated by the model. The values 𝑖, 𝑗 correspond to the row and column indices of a cell in the projected matrix and can be input as either numeric values or a string containing the row names. These values are returned in the list dyad_values. Using the fixed degree sequence model on the senate data set will allow us to compare our observed values to a distribution where each senator sponsors the exact same number of bills and each bill is sponsored by the exact same number of people. We can find the backbone using the fixed degree sequence model as follows: > fdsm <- fdsm(senate, trials = 1000, dyad = c("Booker, C. (NJ-D)", "Warren, E. (MA-D)")) The dyad_values output is a list of the 𝐺 𝑖∗𝑗 values for each of the 1000 trials, where 𝑖 = “Booker, C. (NJ-D)” and 𝑗 = “Warren, E. (MA-D)”. These values correspond to the number of bills Senators Booker and Warren would be expected to co-sponsor when we create a random bipartite graph with the curveball algorithm where: (a) the number of bills sponsored by Senator Booker, by 107 0.08 0.06 Density 0.04 0.02 0.00 40 50 60 70 80 90 Expected Number of Co−Sponsorships under FDSM Figure 7.9: A histogram of the expected co-sponsorships between Senators Cory Booker and Elizabeth Warren under the fixed degree sequence model (1000 samples). A positive edge between Booker and Warren would be preserved in the FDSM backbone because their actual number of co-sponsorships (98) is statistically significantly larger. Senator Warren, and all other Senators was fixed, and (b) the number of senators sponsoring each bill was fixed. We can compare their actual number of co-sponsorships, 98, to what is generated under our null model. We can view a histogram of the expected co-sponsorships generated in each of the 1000 trials as follows (see fig. 7.9): > hist(fdsm$dyadvalues, freq = FALSE, xlab = "Number of Co-Sponsorships") > lines(density(fdsm$dyadvalues)) > fdsmbb <- backbone.extract(fdsm, alpha = 0.01, signed = TRUE) The FDSM backbone, based on 1000 Monte Carlo samples, requires approximately 81 seconds 108 Heller, D. (NV−R) Alexander, L. (TN−R) Flake, J. (AZ−R) McCaskill, C. (MO−D) Cassidy, B. (LA−R) Lankford, J. (OK−R) Hatch, O. (UT−R) Risch, J. (ID−R) Johnson, R. (WI−R) Warner, M. (VA−D) Paul, R. (KY−R) Corker, B. (TN−R) Enzi, M. (WY−R) Portman, R. (OH−R) Rounds, M. (SD−R) Roberts, P. (KS−R) Cornyn, J. (TX−R) Perdue, D. (GA−R) Crapo, M. (ID−R) Blunt, R. (MO−R) Scott, T. (SC−R) Carper, T. (DE−D) Toomey, P. (PA−R) McCain, J. (AZ−R) Manchin, J. (WV−D) Shelby, R. (AL−R) Gardner, C. (CO−R) Cochran, T. (MS−R) Lee, M. (UT−R) Coats, D. (IN−R) McConnell, M. (KY−R) Isakson, J. (GA−R) Cruz, T. (TX−R) Moran, J. (KS−R) Boozman, J. (AR−R)Daines, S. (MT−R) Barrasso, J. (WY−R) Reed, J. (RI−D) Inhofe, J. (OK−R) Ayotte, K. (NH−R) Sessions, J. (AL−R) Reid, H. (NV−D) Hoeven, J. (ND−R) Graham, L. (SC−R) Sasse, B. (NE−R) Fischer, D. (NE−R) Kaine, T. (VA−D) Donnelly, J. (IN−D) Franken, A. (MN−D) Feinstein, D. (CA−D) Gillibrand, K. (NY−D) Murray, P. (WA−D) Capito, S. (WV−R) Tillis, T. (NC−R) Wicker, R. (MS−R) Heitkamp, H. (ND−D) Cotton, T. (AR−R) Collins, S. (ME−R) Whitehouse, S. (RI−D) Baldwin, T. (WI−D) Ernst, J. (IA−R) Brown, S. (OH−D) Mikulski, B. (MD−D) Kirk, M. (IL−R) Vitter, D. (LA−R) Thune, J. (SD−R) Shaheen, J. (NH−D) Burr, R. (NC−R) Stabenow, D. (MI−D) Coons, C. (DE−D) Menéndez, R. (NJ−D) Rubio, M. (FL−R) Boxer, B. (CA−D) Tester, J. (MT−D) Bennet, M. (CO−D) Cardin, B. (MD−D) Klobuchar, A. (MN−D) Booker, C. (NJ−D) Casey, R. (PA−D) Grassley, C. (IA−R) Warren, E. (MA−D) Hirono, M. (HI−D) King, A. (ME−I) Durbin, R. (IL−D) Murphy, C. (CT−D) Schatz, B. (HI−D) Markey, E. (MA−D) Schumer, C. (NY−D) Heinrich, M. (NM−D) Sullivan, D. (AK−R) Nelson, B. (FL−D) Blumenthal, R. (CT−D) Leahy, P. (VT−D) Peters, G. (MI−D) Merkley, J. (OR−D) Udall, T. (NM−D) Sanders, B. (VT−I) Wyden, R. (OR−D) Cantwell, M. (WA−D) Murkowski, L. (AK−R) Figure 7.10: The positive backbone of the US Senate co-sponsorship network under the fixed degree sequence model. to extract. Using the fixed degree sequence model allows us to see more of the partisan structure we assume to be present in the United States Senate in fig. 7.10. This expected partisan structure is confirmed by the backbone’s high modularity (𝑄 = 0.468). The spatial network backbone extracted using FDSM is noticeably different from the other networks extracted using FFM, FRM, FCM, and SDSM. > fdsm2 <- fdsm(cities, trials = 10000) > fdsmbb2 <- backbone.extract(fdsm2, alpha = 0.1, signed = FALSE) > mean(fdsmbb2) [1] 0.02207414 > sort(rowSums(fdsmbb2), decreasing = TRUE)[1:5] KANSAS CITY CHARLOTTE INDIANAPOLIS RICHMOND BORDEAUX 24 21 20 20 17 109 > cor(rowSums(fdsmbb2), rowSums(cities)) [1] -0.001015871 > cor(as.vector(fdsmbb2),as.vector(sdsmbb2)) [1] 0.9315762 First, it has a very low density, containing only 2.2% of possible edges. Second, the cities with the highest centrality are medium-sized regional centers. Moreover, cities’ centrality and total firm count are uncorrelated (𝑟 = −0.001), indicating that the FDSM backbone is detecting interaction patterns unrelated to a city’s number of firms. Importantly, the pattern of intercity links in the SDSM and FDSM backbones are highly correlated (𝑟 = 0.93). The original bipartite firm location data are known to contain substantial variation in both number of firms in each city (see figure 7.2A) but also large variations in the number of cities where each firm maintains a presence (see figure 7.2B). Because the FDSM controls for variation in these two characteristics, it is an appropriate model to use for backbone extraction in this case. Using it yields a world city network backbone that contains only those intercity links that are not simply the product of these characteristics. That is, the FDSM backbone allows world city researchers to look beyond these characteristics to identify pairs of cities with unexpectedly-large numbers of firm co-locations, which are potentially indicative of unexpectedly-strong economic interaction. More generally, the FDSM and fdsm() function are appropriate when there is variation in both the row sums of B and the column sums of B, which is likely to occur in most empirical bipartite data. However, although FDSM may often be the most suitable model for many empirical data, its simulation-based approach can be impractically slow when applied to bipartite data containing many agents and artifacts. As we’ll see in the following chapter 8, in such cases, the SDSM model is the recommended alternative. Additionally, we’ll investigate the relationship between the alpha values used in SDSM and those used in FDSM. The backbone R package in the future will also be home to additional backbone extraction methods, adding functionality for weighted networks that are not bipartite projections. 110 CHAPTER 8 COMPARING MODELS FOR BACKBONE EXTRACTION All results in this chapter are from Neal, Domagalski, and Sagan [NDS21b]. Replication materials are available at https://www.github.com/domagal9/dissertation. In this chapter we will compare the different bipartite ensemble backbone models. We begin by examining different methods for choosing the cell-filling probabilities in SDSM. As mentioned in chapter 7, this study will eventually conclude with deciding that the Bipartite Configuration Model is the best choice for these values. After having a defined SDSM to work with, we study its statistical power as compared to the FDSM backbones. Again, we’ll use the world cities network for this analysis. Following this comparison, we can evaluate each of the five different models under varying degree distributions, looking to examine their speed, accuracy, similarity, and community detection. The culmination of these studies allows us to make a recommendation that in general, SDSM is the correct backbone extraction method to use for most bipartite projections. 8.1 Study 1: Choosing cell-filling probabilities for the SDSM The SDSM requires choosing 𝑝𝑖𝑘 ∗ , which we want to approximate Pr(𝐵∗ = 1) for B∗ ∈ B FDSM . 𝑖𝑘 There are three types of methods that might be used for doing so: arithmetic, general linear models, and entropy maximization. First, we can choose 𝑝𝑖𝑘 ∗ = (𝑟 × 𝑐 )/ 𝑓 , where 𝑟 is the sum of entries 𝑖 𝑘 𝑖 in row 𝑖 of B, 𝑐 𝑘 is the sum of entries in column 𝑘 of B, and 𝑓 is the sum of all entries in B. When ∗ falls outside the [0, 1] range, it is truncated toward 0 or 1, respectively [Got00]. We call this 𝑝𝑖𝑘 method RCF because the value is chosen based on a row sum, a column sum, and the number of entries of B that are filled with a one. Second, an estimate can be obtained by fitting a general linear model of the form: 𝐵𝑖𝑘 = 𝛽0 + 𝛽1𝑟𝑖 + 𝛽2 𝑐 𝑘 + 𝜖, or 𝐵𝑖𝑘 = 𝛽0 + 𝛽1𝑟𝑖 + 𝛽2 𝑐 𝑘 + 𝛽3𝑟𝑖 𝑐 𝑘 + 𝜖, 111 where the 𝛽’s are estimated coefficients and 𝜖 is an error term. If the model is treated as a linear regression and the coefficients are estimated using ordinary least squares, then the predicted value of 𝐵𝑖𝑘 is chosen for 𝑝𝑖𝑘∗ , either truncating values outside the required [0, 1] range (linear probability model; LPM) or transforming them into the required range using a linear discriminant model (LDM) [AWvH20]. If the model is treated as a logistic regression and the coefficients are estimated using maximum likelihood, then the predicted probability that 𝐵𝑖𝑘 = 1 is chosen for ∗ . In prior work, the logistic regression approach has used a scobit or logit link function, with 𝑝𝑖𝑘 or without an interaction term (𝛽3 ) [Nea14, SB20, Nea20]. Finally, an estimate can be obtained by entropy maximization methods, including the polytope method (Poly) [DNS21, NDY22] or bipartite configuration model (BiCM) [SSDC+ 17]. In this study, we evaluate the accuracy and speed of these methods for choosing 𝑝𝑖𝑘 ∗ that approximate Pr(𝐵∗ = 1) for B∗ ∈ B FDSM . 𝑖𝑘 8.1.1 Methods To evaluate accuracy, we begin by enumerating all the members of a small B FDSM . For example, given an agent degree sequence of [1, 1, 2] and an artifact degree sequence of [1, 1, 2], B FDSM contains 5 members (see Table 8.1A). Second, from this complete enumeration, we compute the ∗ to approximate (i.e., Pr(𝐵∗ = 1) for B∗ ∈ B FDSM , see Table 8.1B). Third, probabilities we wish 𝑝𝑖𝑘 𝑖𝑘 we compute 𝑝𝑖𝑘 ∗ using each of nine methods (see Table 8.1C for values obtained using the BiCM method). Finally, we quantify the accuracy with which 𝑝𝑖𝑘 ∗ approximates the desired probabilities using the absolute mean difference for all 𝑖, 𝑘. In the example shown in Table 8.1, BiCM’s accuracy for these degree sequences is 0.028. That is, on average 𝑝𝑖𝑘 ∗ chosen using BiCM deviates from the desired probabilities by ± 0.028. Because evaluating accuracy in this way requires enumerating all members of B FDSM , it is possible only for short degree sequences that define B FDSM with small cardinality. We focus on degree sequences ranging in length from 2 to 5, which define 384 unique B FDSM ranging in cardinality from 4 to 2040. After identifying each method’s accuracy, we evaluate the computational running time of the four most accurate methods by using them to choose 𝑝𝑖𝑘 ∗ for bipartite graphs defined by up to 3162 112 (A) Members of B FDSM 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 0 1 (B) Desired probabilities ∗ computed using BiCM (C) 𝑝𝑖𝑘 0.2 0.2 0.6 0.216 0.216 0.568 0.2 0.2 0.6 0.216 0.216 0.568 0.6 0.6 0.8 0.568 0.568 0.863 Table 8.1: SDSM probabilities given agent and artifact degree sequences [1,1,2] agents and up to 3162 artifacts, and thus requiring choosing up to 10,000,000 probabilities. 8.1.2 Results Figure 8.1A shows the accuracy of each method’s computation of 𝑝𝑖𝑘 ∗ . Each gray line plots the accuracy of each method for a single B FDSM , while the red line plots the mean accuracy of each method over all 384 B FDSM . We find that choosing 𝑝𝑖𝑘 ∗ using a logistic regression with an interaction term (i.e., (Scobit-I and Logit-I)) is on average least accurate [Nea14, Nea20], while choosing 𝑝𝑖𝑘 ∗ using entropy maximization (i.e., BiCM and Poly) is on average most accurate [DNS21, SDCGS15]. Figure 8.1B shows the number of seconds required to compute 𝑝𝑖𝑘 ∗ using a 2.3 GhZ Intel i7 processor. Among the two most accurate methods, BiCM is several orders of magnitude faster than Polytope. When computing more than 104 probabilities, BiCM is also faster than the two slightly less accurate Logit and LDM methods. In the largest case we evaluated, computing 107 probabilities, BiCM took only about 0.3 seconds. Therefore, we use BiCM for choosing 𝑝𝑖𝑘 ∗ when extracting SDSM backbones in the remaining studies because it is both the most accurate and fastest. In previous versions of the R package backbone, different methods for determining these probabilities were included. However, based on these results, the sdsm() function uses the BiCM method. 113 A B 0.40 One B FDSM Mean for all 384 B FDSM 103 Logit Accuracy: Mean absolute difference Seconds to estimate probabilities 0.30 LDM 10 2 Poly 0.20 BiCM 101 0.10 100 0.05 10−1 0.01 10−2 10−3 Logit−I Scobit−I RCF LPM Scobit LDM Logit Poly BiCM 102 103 104 105 106 107 Probability estimation method Number of probabilities to estimate ∗ using different methods. Figure 8.1: (A) Accuracy and (B) speed computing 𝑝𝑖𝑘 8.2 Study 2: Statistical power of SDSM Ensemble backbone models require the specification of a statistical significance level 𝛼, which determines how uncommonly large an observed edge weight 𝑃𝑖 𝑗 must be when compared to edge weights 𝑃𝑖∗𝑗 arising from an ensemble in order for a corresponding edge to be included in the backbone. For a given model, smaller values of 𝛼 represent more stringent criteria for retaining edges, and therefore yield sparser backbones. Although FDSM and SDSM define their respective ensembles by constraining both agent and artifact degree sequences, and thus aim to yield similar backbones, a given 𝛼 does not necessarily represent the same level of stringency in these two models. Because the SDSM allows variation in the degree sequences of B∗ ∈ B SDSM , the distribution of 𝑃𝑖∗𝑗 is wider. These wider distributions mean that the SDSM provides a more conservative test of edge weight significance than FDSM, or alternatively the SDSM has less statistical power to detect significant edges than FDSM. A concrete example serves to illustrate this difference. As in chapter 7, we study the world city network using a bipartite projection where two cities are linked to the extent that firms maintain locations in both cities. Recall the Globalization and World Cities (GaWC) data set takes the form of a bipartite network recording the presence or absence of 100 firms (artifacts) in 196 cities (agents) in the year 2000 [TCW02, NDS21a]. In this bipartite network, the agent degrees are right-tailed because most cities contain only a few firms, while a few cities such as New York 114 contain many (see fig. 7.2). Likewise, the artifact degrees are also right tailed because most firms maintain locations in only a few cities, while a few firms such as the accounting firm KPMG maintain locations in many. Figure 8.2A illustrates the distribution of the Milan-Paris edge weight in projections arising from B FDSM and B SDSM of which the observed bipartite network is a member (i.e., the random variable 𝑃𝑖∗𝑗 ). These distributions allow a researcher to decide whether Milan and Paris’s observed number of co-located firms is significantly large, and therefore whether Milan and Paris should be connected in a world city network backbone. The SDSM distribution is wider than the FDSM distribution, which has implications for whether the Milan-Paris edge will be included in a backbone extracted at a given significance level using each model. In the observed data, there are 26 firms co-located in Milan and Paris (i.e., 𝑃𝑖 𝑗 = 26). The probability of observing the same or larger edge weight in projections from the FDSM ensemble is 0.0033, which is less than 0.05 2 , and therefore a Milan-Paris edge is deemed significant by the FDSM and is included in the FDSM backbone extracted at 𝛼 = 0.05. In contrast, the probability of observing the same or larger edge weight in projections from the SDSM ensemble is 0.0275, which is not less than 0.05 2 , and therefore a Milan-Paris edge is not deemed significant by the SDSM and is not included in the SDSM backbone extracted at 𝛼 = 0.05. For a given level of significance 𝛼, this difference in statistical power leads the SDSM backbone to be sparser than the FDSM backbone (density = 0.004 vs. 0.012), and means that these two backbones are dissimilar (Jaccard = 0.36). In this study, we investigate SDSM’s statistical power relative to FDSM, and specifically whether extracting an SDSM backbone using a more liberal (i.e., larger) 𝛼 makes it more similar to an FDSM backbone extracted at 𝛼 = 0.05. 8.2.1 Methods To evaluate SDSM’s statistical power and the effect of significance levels on the similarity of SDSM and FDSM backbones, we first extracted the FDSM backbone from the GaWC bipartite network at 𝛼 = 0.05. We then extracted several SDSM backbones from the GaWC bipartite network at 115 0.01 ≤ 𝛼 ≤ 0.3 in 0.001 increments, each time computing the Jaccard index (𝐽) to measure the similarity between the SDSM and FDSM backbones. The Jaccard index is the ratio of the edges 0 0 the P 𝑆𝐷𝑆𝑀 and P 𝐹 𝐷𝑆𝑀 have in common to their total edges. After comparing SDSM and FDSM backbones extracted from the empirical GaWC bipartite network, we repeat this process using 100 synthetic bipartite networks with the same dimensions (196 × 100), density (0.08) and right-tailed agent and artifact degree distributions. 8.2.2 Results The green line in Figure 8.2B shows the Jaccard similarity between an FDSM backbone extracted from the empirical GaWC network at 𝛼 = 0.05 and SDSM backbones extracted at the significance levels shown on the x-axis. We find that an SDSM backbone achieves its maximum similarity to the FDSM backbone (𝐽 = 0.81) when it is extracted using the more liberal significance level of 𝛼 = 0.12. Returning to the example in Figure 8.2A, using this more liberal significance level would result in the Milan-Paris edge being deemed significant and included in the SDSM backbone because its SDSM p-value 0.0275 < 0.12 2 . Because this more liberal significance level results in the inclusion of additional edges, the new SDSM backbone extracted at 𝛼 = 0.12 has a density (0.01), which is closer to that of the FDSM backbone extracted at 𝛼 = 0.05 (0.012). The purple line in Figure 8.2B shows the mean Jaccard similarity between an FDSM backbone extracted using 𝛼 = 0.05 and SDSM backbones extracted using 0.01 ≤ 𝛼 ≤ 0.3 from 100 bipartite networks generated to resemble the empirical GaWC network. The shaded purple region shows the 10th and 90th percentile of Jaccard similarities of these backbones. We find that these synthetic networks behave similarly to the empirical network. Specifically, SDSM and FDSM backbones extracted from a low-density 196 × 100 bipartite network with right-tailed degree distributions achieve a maximum similarity of 0.49 < 𝐽 < 0.76 when the FDSM backbone is extracted using 𝛼 = 0.05 and the SDSM backbone is extracted using 𝛼 = 0.14. This is promising because it suggests that, given the characteristics of an empirical bipartite network, it may be possible to select a significance level for extracting a computationally-efficient SDSM backbone that closely 116 A B 1.00 Empirical GaWC network 100 simulated networks Ensemble Jaccard simiilarity of P′SDSM and P′FDSM FDSM 0.81 SDSM 0.75 0.65 Probability α α=0.05 0.50 0.25 0 10 20 Pij = 26 30 0.05 0.12 0.20 0.25 0.30 0. Firms co−located in Milan & Paris (P∗ij ) 14 SDSM Significance level (α) Figure 8.2: Statistical power of SDSM. (A) Distribution of weights for the Paris-Milan edge in projections derived from FDSM and SDSM ensembles. (B) Similarity of an FDSM backbone extracted at 𝛼 = 0.05 to SDSM backbones extracted at various 𝛼 from an empirical bipartite network (green line) and from 100 synthetic bipartite networks (purple line = mean, purple region = 10th –90th percentile). resembles a computationally-infeasible FDSM backbone. 8.3 Study 3: Backbone equivalence under varying degree distributions Agent and artifact degree distributions are a key feature of a bipartite network, and are known to have implications for bipartite projections [VFO20, DNS21, NDS21a]. The FDSM is particularly appealing because it allows decisions about the significance of edges in a projection to be condi- tioned on both bipartite degree sequences, thereby taking into account these important features. However, because the computational requirements of the FDSM make it impractical for extracting the backbone from most bipartite projections, it is often necessary to use a different backbone model. In this study, we evaluate the equivalence of an FDSM backbone and backbones extracted using more computationally efficient models. We perform this comparison for backbones extracted from bipartite networks characterized by five types of degree distributions: right-tailed, left-tailed, normal, constant, and uniform. For the sake of concreteness, in this section we use the example of a bipartite network in which authors (agents) are linked to the papers they have written (artifacts). The projection of 117 Degree Distribution Authors (agents) Papers (artifacts) Right-tailed Most write some papers, but a few Most papers are sole-authored, but ∼ 𝛽(1, 10) are prolific (most departments). some are written by large teams (e.g., sociology). Left-tailed ∼ 𝛽(10, 1) Most are prolific, but some are in- Most papers are written by large active (elite departments). teams, but some are sole-authored (e.g., physics). Uniform ∼ 𝛽(1, 1) There is substantial diversity in There is substantial diversity in the scholarly output (e.g., interdisci- size of authorship teams (e.g., an plinary departments). entire university). Constant ∼ There are strong norms about how There are strong norms about how 𝛽(10000, 10000) many papers an author should many authors a paper should have have (e.g., for performance eval- (e.g., a senior author & a junior uations). author) Normal ∼ 𝛽(10, 10) Scholarly output varies around Authorship teams vary around some typical level. some typical size. Table 8.2: Bipartite degree distributions, with examples in the context of a scholarly authorship bipartite network such a network yields a co-authorship network in which the edge weight between a pair of authors indicates their number of co-authored papers [New01]. These edge weight values will depend heavily on the distribution of papers written by authors (i.e., the agent degree sequence), and on the distribution of authors on each paper (i.e., the artifact degree sequence). Different degree distributions describe different kinds of scholarly environments as shown in Table 8.2. The choice of a backbone model affects whether these distributions are considered, and in this example affects whether decisions about the significance of two authors’ number of co-authored papers consider the scholarly environment. The FDSM compares their observed number of co-authored papers to the number that might be observed in alternative realizations of the same environment, while other backbone models relax the extent to which the environment is held constant. 8.3.1 Methods We evaluate similarities among the backbones extracted using different models by comparing backbones extracted from synthetic 100 × 100 bipartite networks with a density of 0.1, and with a combination of agent and artifact degree distributions shown in Table 8.2. Following our example, 118 these synthetic bipartite networks might represent a college of 100 faculty who collectively wrote 100 papers, in a particular type of scholarly environment where each individual had a 10% chance of being an author on each paper. After generating a bipartite network with a given size, density, and degree distributions, we extract five different backbones from the generated bipartite network, using the fixed fill model, fixed row model, fixed column model, stochastic degree sequence model, and fixed degree sequence model; in all cases we use 𝛼 = 0.05. We compute the similarity of the first four backbones to the FDSM backbone using a Jaccard index, repeating this process 100 times for each of the 25 possible combinations of agent and artifact degree distributions. 8.3.2 Results The heatmaps in Figure 8.3 illustrate the similarity between an FDSM backbone and a backbone extracted using an alternative model. The rows of each heat map correspond to different agent degree distributions, and the columns correspond to different artifact degree distributions, in the synthetic bipartite networks from which the backbones were extracted. The lightest patches identify conditions under which a given backbone model yields a backbone that is similar to what would be obtained using the computationally costly FDSM, while darker patches identify conditions under which these two backbones differ. We find that when agent degrees are constant (i.e., every agent has the same degree) and artifact degrees are constant or left-tailed, all backbone models yield the same backbone as FDSM (Mean 𝐽 = 1). However, beyond this special case, which is likely to be rare in empirical data, similarity to FDSM-extracted backbones varies. As expected, the similarity of backbones extracted using FRM and FDSM depends primarily on the distribution of artifact degrees, not agent degrees (see Figure 8.3B). For example, for any agent degree distribution, these two models yield very different backbones when artifact degrees follow a right-tailed distribution (Mean 𝐽 = 0.186), but very similar backbones when artifact degrees follow a normal distribution (Mean 𝐽 = 0.863). This occurs because both models exactly control for agent degrees, however FDSM also controls for artifact degrees, while FRM does not. A similar but rotated pattern emerges when considering the FCM: the similarity of backbones 119 A Fixed Fill Model B Fixed Row Model Agent degree distribution Agent degree distribution Unif Cons Left Norm Unif Cons Left Norm Jaccard 1.00 Right Right Right Unif Cons Left Norm Right Unif Cons Left Norm 0.75 Artifact degree distribution Artifact degree distribution C Fixed Column Model D Stochastic Degree Sequence Model 0.50 Agent degree distribution Agent degree distribution 0.25 0.00 Unif Cons Left Norm Unif Cons Left Norm Right Right Right Unif Cons Left Norm Right Unif Cons Left Norm Artifact degree distribution Artifact degree distribution Figure 8.3: Jaccard similarity of a backbone extracted at 𝛼 = 0.05 using the Fixed Degree Sequence Model and a backbone extracted using (A) the Fixed Fill Model, (B) Fixed Row Model, (C) Fixed Column Model, (D) Stochastic Degree Sequence Model. Each cell represents the mean over 100 instances of a 100 × 100 bipartite network with given agent and artifact degree distributions. extracted using FCM and FDSM depends primarily on the distribution of agent degrees, not artifact degrees (see Figure 8.3C). For any artifact degree distribution, these two models yield very different backbones when agent degrees follow a right-tailed or uniform (Mean 𝐽 = 0.084) distribution , but more similar backbones when agent degrees follow a left-tailed distribution or are constant (Mean 𝐽 = 0.617). This occurs because both models exactly control for artifact degrees, however FDSM also controls for agent degrees, while FRM does not. However, there is a notable exception to this general pattern: when artifact degrees follow a uniform distribution, FCM and FDSM always yield different backbones (Mean 𝐽 = 0.151). The conditions under which the FFM yields FDSM-similar backbones occur at the intersection of the conditions under which the FRM and FCM both yield FDSM-equivalent backbones (see 120 Figure 8.3A). When artifact degrees follow a right-tailed distribution and/or the agent degrees follow a right-tailed or uniform distribution, then FFM and FDSM backbones differ (Mean 𝐽 = 0.1). In contrast, for other combinations of degree distributions, FFM and FDSM backbones are more similar (Mean 𝐽 = 0.724). Finally, as expected based on the findings from study 2, we observe that the SDSM generally yields different backbones than FDSM when both are extracted at 𝛼 = 0.05 (see Figure 8.3D). Specifically, except in the narrow case where agent degrees are constant and artifact degrees are constant or left-tailed (Mean 𝐽 = 1), SDSM and FDSM backbones exhibit only modest similarity (Mean 𝐽 = 0.314). This lack of similarity or equivalence occurs because SDSM offers a less statistically powerful (or more conservative) test of edges statistical significance than FDSM, and therefore retains fewer edges in the backbone. However, findings from study 2 also suggested that careful selection of the significance level used for extracting an SDSM backbone can yield results more similar to FDSM. To explore this possibility, we expanded the analysis reported in figure 8.3D by extracting SDSM backbones at different significance levels. We find that when a suitably more liberal (i.e., larger) significance level 𝛼 is used to extract an SDSM backbone, the resulting SDSM backbone is very similar to an FDSM backbone extracted at 𝛼 = 0.05 (see Figure 8.4A). Specifically, for backbones extracted from bipartite networks with any agent or artifact degree distributions, these two backbones tend to be nearly equivalent (Mean 𝐽 = 0.865). This suggests that in principle the fast SDSM can be used to obtain a close approximation of a computationally-infeasible FDSM backbone from any bipartite network. In practice, using SDSM to obtain an FDSM-like backbone requires selecting an 𝛼 value for the SDSM that corresponds to 𝛼 = 0.05 in the FDSM. We observe that there are three distinct values of such an ‘optimal’ 𝛼 that depend on agent and artifact degree distributions (see Figure 8.4B). First, when agent degrees are constant, a value only slightly higher than 0.05 (Mean = 0.062, SD = 0.021) achieves the best approximation of an FDSM backbone. Second, when artifact degrees are constant, a value roughly double (Mean = 0.09, SD = 0.022) achieves the best approximation 121 A Similarity of SDSM (α = optimal) and FDSM (α = 0.05) B SDSM α to maximize simlarity with FDSM (α = 0.05) Agent degree distribution Agent degree distribution Jaccard Optimal α 1.00 0.150 0.75 0.125 Unif Cons Left Norm Unif Cons Left Norm 0.50 0.100 0.25 0.075 0.00 0.050 Right Right Right Unif Cons Left Norm Right Unif Cons Left Norm Artifact degree distribution Artifact degree distribution Figure 8.4: (A) Given agent and artifact degree distributions, there exists a statistical significance level 𝛼 that maximizes the similarity between an SDSM backbone extracted at this level and an FDSM backbone extracted at 𝛼 = 0.05, and (B) when used yields an SDSM backbone that is very similar to the corresponding FDSM backbone. of an FDSM backbone. Finally, when neither agent nor artifact degrees are constant, which is likely in most empirical bipartite networks, a value roughly 2.5 times larger (Mean = 0.13, SD = 0.014) achieves the best approximation of an FDSM backbone. Although further work is needed to facilitate the a priori selection of an 𝛼 that allows an SDSM backbone to closely approximate an FDSM backbone, these results suggest that under the most common circumstances (i.e., when there is variation in degrees) 𝛼 ≈ 0.13 may be appropriate. 8.4 Study 4: Recovery of community structure Studies 1-3 examine the backbones extracted from synthetic random bipartite networks; how- ever, empirical bipartite networks are generally not random, but instead have a clustered or blocked structure. In this study, we evaluate the extent to which backbones extracted using different models reflect a known community structure that is encoded in the bipartite data from which they are extracted [CWW18]. As shown in chapter 7, SDSM and FDSM backbones extracted from a bi- partite network representing bill co-sponsorship in the 114th session of the US Senate more clearly captured the known partisan community structure than an FRM backbone [DNS21]. For the sake of concreteness, we use this legislative network context as an example in this section, but we extend this prior work by considering a broader range of backbone models, and by examining their ability 122 to recover community structures from bipartite data containing varying levels of evidence for this structure. 8.4.1 Methods We investigate the ability for backbones to recover a known community structure in three steps. First, we simulate a 200 × 1000 bipartite network with a density of 0.1 and right-tailed agent and artifact degree distributions. We focus on a bipartite network with more artifacts than agents to ensure that these data contain sufficient information to encode potential community memberships. We focus on a bipartite network with right-tailed degree distributions because they are common in many empirical unipartite [BC19] and bipartite networks [Nea20, NDS21a, AABB11]. Similar to the Senate data set we examined in chapter 7, this synthetic bipartite network could represent a legislative body composed of 200 legislators casting votes on 1000 bills, where any given legislator had a 10% chance of voting in favor of any given bill. The right-tailed degree distributions capture the fact that most legislators vote in favor of only a few bills, and that most bills receive the support of only a few legislators, which is typical of legislative bodies. The backbone of a projection of such a bipartite network would represent a network of collaboration or ideological alignment among legislators [Nea20]. Second, we incorporate evidence of communities in this bipartite network by randomly assigning each agent and each artifact to one of two groups. We then perform checkerboard swaps, which preserve the degree distributions, until a given fraction of edges 𝑊 are within-group, connecting an agent and artifact from the same group [GSPA07]. Figure 8.5A provides graphical depictions of the matrices describing synthetic bipartite networks at two values of 𝑊. In each plot, the rows represent agents assigned to group A or B, the columns represent artifacts assigned to group A or B, and a cell is shaded black if the row agent is connected to the column artifact. When 𝑊 = 0.5, agents in a given group are equally likely to associate with artifacts in either group, placing ≈ 0.5 of the edges (i.e., shaded cells) in the diagonal blocks and ≈ 0.5 of the edges in the off-diagonal blocks. In contrast, when 𝑊 = 0.8, agents in a given group are much more likely to associate with 123 artifacts from their own group than artifacts in the other group, placing ≈ 0.8 of the edges in the diagonal blocks and ≈ 0.2 of the edges in the off-diagonal blocks. Returning to our example, the groups could represent political parties: each legislator belongs to one of two parties (i.e., there are conservative and liberal legislators), and each bill advances the agenda of one of these parties (i.e., there are conservative and liberal bills). When 𝑊 = 0.5, a conservative legislator is equally likely to vote for conservative and liberal bills, while when 𝑊 = 0.8, a conservative legislator is four-times more likely to vote for a conservative bill than a liberal bill. Finally, we extract a backbone from the bipartite network using a given model and compute the backbone’s modularity 𝑄 with respect to the agents’ group assignments [NG04]. If a backbone model is able to recover the community structure from evidence in the bipartite network, then we expect a positive association between 𝑊 and 𝑄. In the legislative example, if legislators are bipartisan in their voting patterns (i.e., 𝑊 = 0.5), then legislators should not be clustered by party in the backbone (i.e., 𝑄 ≈ 0). In contrast, if legislators are strongly partisan in their voting patterns (i.e., 𝑊 = 0.8), then legislators should be clustered by party in the backbone (i.e., 𝑄 ≈ 0.5). We repeat these three steps 10 times for 0.5 ≤ 𝑊 ≤ 0.8 in 0.05 increments. When evaluating the SDSM backbone, we consider both a backbone extracted using the conventional significance level of 𝛼 = 0.05 and one extracted at the more liberal 𝛼 = 0.13, which study 3 suggests yields a backbone similar to FDSM. 8.4.2 Results Figure 8.5B shows the modularity (y-axis; with respect to known community memberships) of backbones extracted using different models from bipartite networks with different fractions of within-community edges (x-axis). All six lines increase monotonically, confirming that all back- bone models yield backbones that can recover a known community structure. However, there is notable variation among the models. As evidence of community structure grows stronger in the bipartite network, the modularity of backbones extracted using the FFM and FCM slowly increase, but even when the evidence of such a structure is quite strong (i.e., when 𝑊 = 0.8) they only achieve 124 A B C 0.5 Backbone Model Fraction of within−group edges, W = 0.5 FDSM SDSM (α = 0.13) AGENTS 0.4 SDSM (α = 0.05) FRM Group B Group A 0.3 FCM Modularity (Q) FFM Group A Group B ARTIFACTS 0.2 Fraction of within−group edges, W = 0.8 0.1 AGENTS Group B Group A 0.0 Group A Group B ARTIFACTS −0.1 0.50 0.55 0.60 0.65 0.70 0.75 0.80 Fraction of within−group edges (W) Figure 8.5: (A) Synthetic bipartite networks with varying levels of block structure, from which (B) backbones extracted using different models exhibit varying modularity. (C) When 65% of bipartite edges are within-block, a backbone extracted using FDSM shows a clear group structure (top) while a backbone extracted using FCM does not (bottom). average values of 𝑄 = 0.15 and 0.18, respectively. Backbones extracted using the FRM display a similar pattern, but achieve a higher average modularity (𝑄 = 0.39) value when 𝑊 is large. In contrast, backbones extracted using FDSM and SDSM are virtually indistinguishable in their ability to recover the known community structure, and do so very well. As evidence of a community structure grows stronger in the bipartite network, the modularity of backbones extracted using these models rapidly increases. When the evidence of community structure is strong (i.e., when 𝑊 = 0.8), these backbones have very high modularity (mean 𝑄 = 0.49). However, even when there is only modest evidence of community structure in the bipartite network (e.g., when 𝑊 = 0.65), these backbones are still able to identify the community structure and have a distinctively high modularity (mean 𝑄 = 0.37). Figure 8.5C illustrates the difference between two backbone models’ abilities to recover a known community structure, when evidence of that structure is modest in the bipartite data from which the backbone is extracted (𝑊 = 0.65). In both plots, agents from group A (e.g., conservatives, in the legislative example) are colored red, while agents from group B (e.g., liberals, in the legislative example) are colored blue. The FDSM-extracted backbone clearly places agents from different 125 groups in separate clusters. In contrast, the FCM-extracted backbone is unable to distinguish this group structure and fails to cluster agents according to their known group memberships. These findings suggest that although all backbone models can yield backbones that recover a known community structure, SDSM and FDSM backbones are able to detect this structure more clearly and from a weaker signal. 8.5 Recommendations for Backbone Selection Bipartite networks can be used to represent a wide range of phenomena in the social and natural worlds including interspecies competition, global trade, scientific advances, and legislative deliberation. Likewise, projections of bipartite networks, which take the form of co-occurrence networks, can be useful for inferring unipartite networks that would otherwise be difficult to measure directly. Several models have been proposed for extracting the backbone of bipartite projections, and thus for making such inferences, including the fixed fill model (FFM), fixed row model (FRM), fixed column model (FCM), fixed degree sequence model (FDSM), and stochastic degree sequence model (SDSM). We have introduced each of these models and found their probability mass functions in chapter 6. To facilitate their use, we have described the R package backbone where we have implemented each model in chapter 7. We then systematically compared these models in terms of their relative accuracy, speed, statistical power, similarity, and ability to recover a known community structure in chapter 8. In study 1, we examined several methods for choosing the probabilities necessary for applying the stochastic degree sequence model (SDSM), finding that the bipartite configuration model (BiCM) is both the fastest and most accurate. In study 2, we examined the statistical power of the SDSM relative to the fixed degree sequence model (FDSM), finding that the SDSM can be viewed as a statistically less powerful (or more conservative) variant of the FDSM. In study 3, we examined the similarity of an FDSM-extracted backbone to backbones extracted using other models, finding that the SDSM and FDSM extract very similar backbones from bipartite networks with a wide range of possible degree distributions when an appropriate significance level 𝛼 is chosen. Finally, 126 in study 4, we examined the ability for backbones extracted using different models to recover a known community structure, finding that although all models can recover the structure, SDSM and FDSM can detect a community structure more clearly and from a weaker signal. Based on these findings, and with the goal of offering researchers some guidance in extracting the backbones of bipartite projections, we offer three recommendations. First, we recommend the stochastic degree sequence model (SDSM) for extracting the backbones of bipartite projections be- cause it is fast, controls for both agent and artifact degree sequences, and yields modular backbones when the bipartite data contains even modest evidence of within-community clustering. Second, when the SDSM is used, we recommend that the cell-filling probabilities 𝑝𝑖𝑘 ∗ be chosen using the Bipartite Configuration Model (BiCM) because it is faster and more accurate than any other currently available method. Third, when an FDSM backbone extracted at the 𝛼 = 0.05 significance level is desired but computationally infeasible, we recommend extracting an SDSM backbone at the 𝛼 = 0.13 significance level, which we observe is very similar when there is variation in the agent and artifact degree sequences. The models and options necessary to adopt these recommendations are implemented in the backbone package for R [DNS21]. These findings and recommendations must be viewed in light of the fact that, due to the computational requirements of the FDSM and of extracting a large number of backbones across the four studies, these studies have relied on small synthetic bipartite networks ranging in size from 3 × 3 (study 1) to 200 × 1000 (study 4). However, in practice bipartite networks may be several orders of magnitude larger. For example, a bipartite network used to infer collaborations in the US House of Representatives includes 435 agents (representatives) and over 6000 artifacts (bills) [Nea20, NDY22], while a bipartite network used to infer movie recommendations includes 17,770 agents (films) and nearly 500,000 artifacts (viewers) [ZK11]. Future research should explore whether these findings extend to backbones extracted from such large bipartite networks. Limitations of existing backbone models also point to directions for future research. First, using the FDSM will generally be computationally infeasible in practice because the distribution of 𝑃𝑖∗𝑗 arising from B FDSM must be estimated via numerical simulation. Identifying this distribution’s 127 probability mass function, which is known for the other ensembles (as discussed in chapter 6), would facilitate the use of this otherwise attractive model; however, this is a well-studied problem and so is probably very hard to solve. Second, all the ensemble models we have considered impose constraints on the degree sequences, but other types of constraints may also be useful. For example, in some contexts it may be necessary to constrain all members of an ensemble to contain a 0 in a particular cell (e.g., to represent that an author was not alive to co-author a paper, or a legislator was not present to co-sponsor a bill). These limitations and future directions notwithstanding, the results presented above provide a starting point for further development of backbone models, and provide applied researchers with some practical guidance on model selection. 128 BIBLIOGRAPHY 129 BIBLIOGRAPHY [AABB11] Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-László Barabási. Flavor network and the principles of food pairing. Scientific Reports, 1(1):1–7, 2011. [AGRR20] Ron M. Adin, Ira M. Gessel, Victor Reiner, and Yuval Roichman. Cyclic quasi- symmetric functions. Sém. Lothar. Combin., 82B:Art. 67, 12, 2020. [ALH+ 15a] C. Andris, D Lee, M. J. Hamilton, M. Martino, C. E. Gunning, and J. A. Selden. The rise of partisanship and super-cooperators in the us house of representatives. PloS One, 10:e0123507, 2015. [ALH+ 15b] Clio Andris, David Lee, Marcus J Hamilton, Mauro Martino, Christian E Gunning, and John Armistead Selden. The rise of partisanship and super-cooperators in the us house of representatives. PloS One, 10(4):e0123507, 2015. [AN20a] S. Aref and Z. P. Neal. Detecting coalitions by optimally partitioning signed net- works of political collaboration. Scientific Reports, 10:1506, 2020. [AN20b] Samin Aref and Zachary P. Neal. Detecting coalitions by optimally partitioning signed networks of political collaboration. Scientific reports, 10(1):1–10, 2020. [And87] Désiré André. Solution directe du probleme résolu par m. bertrand. CR Acad. Sci. Paris, 105(436):7, 1887. [AWvH20] Paul Allison, R. A. Williams, and P. von Hippel. Better predicted probabilities from linear probability models with applications to multiple imputation. 2020 Stata Conference 1, Stata Users Group, 2020. [BBMD+ 02] Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Danièle Gardy, and Dominique Gouyou-Beauchamps. Generating functions for generating trees. Discrete Math., 246:29–55, 2002. Formal power series and algebraic combi- natorics (Barcelona, 1999). [BBPS15] Sara Billey, Krzysztof Burdzy, Soumik Pal, and Bruce E. Sagan. On meteors, earthworms and WIMPs. Ann. Appl. Probab., 25(4):1729–1779, 2015. [BBS13] Sara Billey, Krzysztof Burdzy, and Bruce E. Sagan. Permutations with given peak set. J. Integer Seq., 16(6):Article 13.6.1, 18, 2013. [BC19] Anna D Broido and Aaron Clauset. Scale-free networks are rare. Nature communi- cations, 10(1):1–10, 2019. [BCS+ 17] Christopher R Browning, Catherine A Calder, Brian Soller, Aubrey L Jackson, and Jonathan Dirlam. Ecological networks and neighborhood social organization. American Journal of Sociology, 122(6):1939–1988, 2017. 130 [BDS+ 20] Amanda N Buerger, David T Dillon, Jordan Schmidt, Tao Yang, Jasenka Zubce- vic, Christopher J Martyniuk, and Joseph H Bisesi Jr. Gastrointestinal dysbiosis following diethylhexyl phthalate exposure in zebrafish (danio rerio): Altered mi- crobial diversity, functionality, and network connectivity. Environmental Pollution, 265:114496, 2020. [Ber87] J. Bertrand. Solution d’un problème. CR Acad. Sci. Paris, 105:369, 1887. [BFT16] Sara Billey, Matthew Fahrbach, and Alan Talmage. Coefficients and roots of peak polynomials. Exp. Math., 25(2):165–175, 2016. [BM03] Mireille Bousquet-Mélou. Four classes of pattern-avoiding permutations under one roof: generating trees with two labels. Electron. J. Combin., 9(2):Research paper 19, 31, 2002/03. Permutation patterns (Otago, 2003). [Bón04] Miklós Bóna. Combinatorics of permutations. Discrete Mathematics and its Ap- plications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2004. [BR11] K. A. Bratton and S. M. Rouse. Networks in the legislative arena: How group dynamics affect cosponsorship. Legislative Studies Quarterly, 36:423–460, 2011. [BR17] Pierre-Alexandre Balland and David Rigby. The geography of complex knowledge. Economic Geography, 93(1):1–23, 2017. [BS00] Eric Babson and Einar Steingrímsson. Generalized permutation patterns and a classification of the Mahonian statistics. Sém. Lothar. Combin., 44:Art. B44b, 18, 2000. [BST99] Jonathan V Beaverstock, Richard G Smith, and Peter J Taylor. A roster of world cities. cities, 16(6):445–458, 1999. [Cal02] David Callan. Pattern avoidance in circular permutations. Preprint arXiv:0210014, 2002. [Car15] C. J. Carstens. Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast curveball algorithm. Physical Review E, 91(4), Apr 2015. [CGHK78] F. R. K. Chung, R. L. Graham, V. E. Hoggatt, Jr., and M. Kleiman. The number of Baxter permutations. J. Combin. Theory Ser. A, 24(3):382–394, 1978. [CN06] Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research, 2006. [CP87] G. A. Caldeira and S. C. Patterson. Political friendship in the legislature. The Journal of Politics, 49:953–975, 1987. [CVDLO+ 17] Francis Castro-Velez, Alexander Diaz-Lopez, Rosa Orellana, José Pastrana, and Rita Zevallos. The number of permutations with the same peak set for signed permutations. J. Comb., 8(4):631–652, 2017. 131 [CW90] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251–280, 1990. [CWW18] Tristan JB Cann, Iain S Weaver, and Hywel TP Williams. Is it correct to project and detect? assessing performance of community detection on unipartite projections of bipartite networks. In International Conference on Complex Networks and their Applications, pages 267–279. Springer, 2018. [CXW+ 18] Xing Chen, Di Xie, Lei Wang, Qi Zhao, Zhu-Hong You, and Hongsheng Liu. BNPMDA: Bipartite network projection for mirna–disease association prediction. Bioinformatics, 34(18):3178–3186, 2018. [DDJ+ 12] Theodore Dokos, Tim Dwyer, Bryan P. Johnson, Bruce E. Sagan, and Kimberly Selsor. Permutation patterns and statistics. Discrete Math., 312(18):2760–2775, 2012. [Dia75] Jared M Diamond. Assembly of species communities. In M. L. Cody and J. M. Diamond, editors, Ecology and evolution of communities, pages 342–444. Harvard University Press, 1975. [Dia16] Navid Dianati. Unwinding the hairball graph: Pruning algorithms for weighted complex networks. Physical Review E, 93(1):012304, 2016. [DL16] Ben Derudder and Xingjian Liu. How international is the annual meeting of the association of american geographers? A social network analysis perspective. Envi- ronment and Planning A, 48(2):309–329, 2016. [DLHH+ 21] Alexander Diaz-Lopez, Pamela E. Harris, Isabella Huang, Erik Insko, and Lars Nilsen. A formula for enumerating permutations with a fixed pinnacle set. Discrete Math., 344(6):112375, 2021. [DLHIO17a] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Mohamed Omar. A proof of the peak polynomial positivity. Sém. Lothar. Combin., 78B:Art. 6, 9, 2017. [DLHIO17b] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Mohamed Omar. A proof of the peak polynomial positivity conjecture. J. Combin. Theory Ser. A, 149:21–29, 2017. [DLHIPL17] Alexander Diaz-Lopez, Pamela E. Harris, Erik Insko, and Darleen Perez-Lavin. Peak sets of classical Coxeter groups. Involve, 10(2):263–290, 2017. [DLIN21] Alexander Diaz-Lopez, Erik Insko, and Lars Nilsen. Pinnacle ordering. In prepa- ration, 2021. [DLM+ 21a] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie Schmidt, and Alexander Sietsema. Cyclic pattern containment and avoidance. arXiv:2106.02534 [math], Jun 2021. arXiv: 2106.02534. 132 [DLM+ 21b] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie Schmidt, and Alexander Sietsema. Cyclic shuffle compatibility. arXiv:2106.10182 [math], Jun 2021. arXiv: 2106.10182. [DLM+ 21c] Rachel Domagalski, Jinting Liang, Quinn Minnich, Bruce E. Sagan, Jamie Schmidt, and Alexander Sietsema. Pinnacle set properties. arXiv:2105.10388 [math], May 2021. arXiv: 2105.10388. [DMSK15] B. A. Desmarais, V. G. Moscardelli, B. F. Schaffner, and M. S. Kowal. Measuring legislative collaboration: The senate press events network. Social Networks, 40:43– 54, 2015. [DNKPT18] Robert Davis, Sarah A. Nelson, T. Kyle Petersen, and Bridget E. Tenner. The pinnacle set of a permutation. Discrete Math., 341(11):3249–3270, 2018. [DNS20] Rachel Domagalski, Zachary P. Neal, and Bruce Sagan. backbone: Extracts the Backbone from Weighted Graphs, 2020. R package version 1.2.0. [DNS21] Rachel Domagalski, Zachary P Neal, and Bruce Sagan. Backbone: An R package for extracting the backbone of bipartite projections. PloS One, 16(1):e0244363, 2021. [Dru16] L. Drutman. American politics has reached peak polarization, 2016. [DT05] Ben Derudder and Peter Taylor. The cliquishness of world cities. Global Networks, 5(1):71–91, 2005. [ES35] P. Erdős and G. Szekeres. A combinatorial problem in geometry. Compositio Math., 2:463–470, 1935. [ES21] Sergi Elizalde and Bruce Sagan. Consecutive patterns in circular permutations. 2021. [Fan21] Wenjie Fang. Efficient recurrence for the enumeration of permutations with fixed pinnacle set. arXiv:2106.09147 [math], Jun 2021. arXiv: 2106.09147. [FNT21] Justine Falque, Jean-Christophe Novelli, and Jean-Yves Thibon. Pinnacle sets re- visited. arXiv:2106.05248 [math], Jun 2021. arXiv: 2106.05248. [Fon20] Christian Fong. Expertise, networks, and interpersonal influence in congress. The Journal of Politics, 82(1):269–284, 2020. [Fow06a] J. H. Fowler. Connecting the congress: A study of cosponsorship networks. Political Analysis, 14:456–487, 2006. [Fow06b] J. H. Fowler. Legislative cosponsorship networks in the us house and senate. Social Networks, 28:454–465, 2006. [Fri86] John Friedmann. The world city hypothesis. Development and change, 17(1):69–83, 1986. 133 [GL04] Jean-Loup Guillaume and Matthieu Latapy. Bipartite structure of all complex networks. Information Processing Letters, 90(5):215–221, 2004. [GLW18] Daniel Gray, Charles Lanning, and Hua Wang. Pattern containment in circular permutations. Integers, 18B:Paper No. A4, 13, 2018. [GLW19] Daniel Gray, Charles Lanning, and Hua Wang. Patterns in colored circular permu- tations. Involve, 12(1):157–169, 2019. [Got00] Nicholas J Gotelli. Null model analysis of species co-occurrence patterns. Ecology, 81(9):2606–2621, 2000. [GSPA07] Roger Guimera, Marta Sales-Pardo, and Luís A Nunes Amaral. Module identifica- tion in bipartite and directed networks. Physical Review E, 76(3):036102, 2007. [HBKM09] Emilie M Hafner-Burton, Miles Kahler, and Alexander H Montgomery. Network analysis for international relations. International organization, pages 559–592, 2009. [HFC16] Eelke M Heemskerk, Meindert Fennema, and William K Carroll. The global corporate elite after the financial crisis: evidence from the transnational network of interlocking directorates. Global Networks, 16(1):68–88, 2016. [HKBH07] César A Hidalgo, Bailey Klinger, A-L Barabási, and Ricardo Hausmann. The product space conditions the development of nations. Science, 317(5837):482–487, 2007. [Hol79] Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65–70, 1979. [Ing15] Christopher Ingraham. A stunning visualization of our divided congress. Washing- ton Post, Apr 2015. [Jac61] Jane Jacobs. The death and life of great American cities. Random House, 1961. [Kir11] J. H. Kirkland. The relational determinants of legislative outcomes: Strong and weak ties between legislators. The Journal of Politics, 73:887–898, 2011. [KK96] Daniel Kessler and Keith Krehbiel. Dynamics of cosponsorship. The American Political Science Review, 90(3):555–566, 1996. [KMN16] G. Koger, S. Masket, and H. Noel. No disciplined army: American political parties as networks. In J. N. Victor, A. H. Montgomery, and Lubell M., editors, The Oxford Handbook of Political Netwokrs, chapter 18, pages 453–470. Oxford University Press, Oxford, 2016. [Kre00] Darla Kremer. Permutations with forbidden subsequences and a generalized Schröder number. Discrete Math., 218(1-3):121–130, 2000. 134 [LCH06] Geoffrey C Layman, Thomas M Carsey, and Juliana Menasce Horowitz. Party polarization in american politics: Characteristics, causes, and consequences. Annu. Rev. Polit. Sci., 9:83–110, 2006. [LD20] Chengliang Liu and Dezhong Duan. Spatial inequality of bus transit dependence on urban streets and its relationships with socioeconomic intensities: A tale of two megacities in china. Journal of Transport Geography, 86:102768, 2020. [LMDV08] Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio. Basic notions for the analysis of large two-mode networks. Social Networks, 30(1):31–48, 2008. [LR16] J. Liebig and A. Rao. Fast extraction of the backbone of projected bipartite networks to aid community detection. Europhysics Letters, 113(2):28003, 2016. [MLLS21] Federico Marini, Annekathrin Ludt, Jan Linke, and Konstantin Strauch. Gene- tonic: an r/bioconductor package for streamlining the interpretation of rna-seq data. bioRxiv, 2021. [MM13] J. Moody and P. J. Mucha. Portrait of political party polarization. Network Science, 1:119–121, 2013. [MSLC01] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415–444, 2001. [NDS21a] Z. P. Neal, R. Domagalski, and B. Sagan. Analysis of spatial networks from bipartite projections using the R backbone package. Geographical Analysis, 2021. [NDS21b] Zachary P. Neal, Rachel Domagalski, and Bruce Sagan. Comparing models for extracting the backbone of bipartite projections. arXiv:2105.13396 [cs, stat], Jun 2021. arXiv: 2105.13396. [NDY22] Zachary P Neal, Rachel Domagalski, and Xiaoqin Yan. Homophily in collaborations among us house representatives, 1981–2018. Social Networks, 68:97–106, 2022. [Nea08] Zachary P. Neal. The duality of world cities and firms: comparing networks, hierarchies, and inequalities in the global economy. Global Networks, 8(1):94–115, 2008. [Nea12] Zachary P. Neal. Structural determinism in the interlocking world city network. Geographical Analysis, 44(2):162–170, 2012. [Nea13] Zachary P. Neal. Identifying statistically significant edges in one-mode projections. Social Network Analysis and Mining, 3(4):915–924, Dec 2013. [Nea14] Zachary P. Neal. The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Social Networks, 39:84–97, Oct 2014. [Nea20] Zachary P. Neal. A sign of the times? Weak and strong polarization in the us congress, 1973–2016. Social Networks, 60:103–112, 2020. 135 [New01] Mark EJ Newman. Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1):016131, 2001. [NG04] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004. [NN20] Zachary P. Neal and Jennifer W Neal. Out of bounds? The boundary specification problem for centrality in psychological networks, Aug 2020. [NP03] Mark EJ Newman and Juyong Park. Why social networks are different from other types of networks. Physical review E, 68(3):036122, 2003. [PMNW05] M. A. Porter, P. J. Mucha, M. E. Newman, and C. M. Warmbrand. A network analysis of committees in the us house of representatives. Proceedings of the National Academy of Sciences, 102:7057–7062, 2005. [Poo18] Ate Poorthuis. How to draw a neighborhood? the potential of big data, regionaliza- tion, and community detection for understanding the heterogeneous nature of urban neighborhoods. Geographical Analysis, 50(2):182–203, 2018. [PWK19] Vladimír Pažitka, Dariusz Wójcik, and Eric Knight. Critiquing construct validity in world city network research: Moving from office location networks to inter- organizational projects in the modeling of intercity business flows. Geographical Analysis, 2019. [R C18] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. [RNH13] N. Ringe, Victor J. N., and Gross J. H. Keeping your friends close and your enemies closer? information networks in legislative politics. British Journal of Political Science, 43:601–628, 2013. [RT] Irena Rusu and Bridget E. Tenner. Admissible pinnacle orderings. Preprint arXiv:2001.08185. [Rus20] Irena Rusu. Sorting permutations with fixed Pinnacle set. Electron. J. Combin., 27(3):Paper No. 3.23, 21, 2020. [Sag20] Bruce E. Sagan. Combinatorics: The art of counting, volume 210 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2020. [San00] James G Sanderson. Testing ecological patterns. American Scientist, 88(4):332, 2000. [SB20] David Schoch and Ulrik Brandes. Legislators’ roll-call voting behavior increasingly corresponds to intervals in the political spectrum. Scientific Reports, 10(1):1–9, 2020. 136 [SBV09] M Ángeles Serrano, Marián Boguná, and Alessandro Vespignani. Extracting the multiscale backbone of complex weighted networks. Proceedings of the National Academy of Sciences, 106(16):6483–6488, 2009. [SCS17] Mika J Straka, Guido Caldarelli, and Fabio Saracco. Grand canonical validation of the bipartite international trade network. Physical Review E, 96(2):022306, 2017. [SDCGS15] Fabio Saracco, Riccardo Di Clemente, Andrea Gabrielli, and Tiziano Squartini. Randomizing bipartite networks: the case of the world trade web. Scientific Reports, 5(1):1–18, 2015. [SNB+ 14] Giovanni Strona, Domenico Nappo, Francesco Boccacci, Simone Fattorini, and Jesus San-Miguel-Ayanz. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications, 5:4114, Jun 2014. [SR12] Christian Stegbauer and Alexander Rausch. How international are international congresses? Connections, 32(1):1–11, 2012. [SS85] Rodica Simion and Frank W. Schmidt. Restricted permutations. European J. Combin., 6(4):383–406, 1985. [SSDC+ 17] Fabio Saracco, Mika J Straka, Riccardo Di Clemente, Andrea Gabrielli, Guido Caldarelli, and Tiziano Squartini. Inferring monopartite projections of bipartite networks: An entropy-based approach. New Journal of Physics, 19(5):053022, 2017. [Sta97] Richard P. Stanley. Enumerative Combinatorics. Vol. 1, volume 49 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1997. With a foreword by Gian-Carlo Rota, Corrected reprint of the 1986 original. [Sta99] Richard P. Stanley. Enumerative Combinatorics. Vol. 2, volume 62 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1999. With a foreword by Gian-Carlo Rota and appendix 1 by Sergey Fomin. [Ste97] John R. Stembridge. Enriched 𝑃-partitions. Trans. Amer. Math. Soc., 349(2):763– 788, 1997. [SUG18] Giovanni Strona, Werner Ulrich, and Nicholas J. Gotelli. Bi-dimensional null model analysis of presence-absence binary matrices. Ecology, 99(1):103–115, 2018. [Tay01] Peter J Taylor. Specification of the world city network. Geographical analysis, 33(2):181–194, 2001. [Tay04] Peter J Taylor. World city network: a global urban analysis. Routledge, 2004. [TCW02] Peter J Taylor, Gilda Catalano, and David RF Walker. Measurement of the world city network. Urban Studies, 39(13):2367–2376, 2002. 137 [TML+ 11] Michele Tumminello, Salvatore Miccichè, Fabrizio Lillo, Jyrki Piilo, and Rosario N. Mantegna. Statistically validated networks in bipartite complex systems. PLoS One, 6(3):e17994, Mar 2011. [Tol21] Jeff Tollefson. Tracking QAnon: How Trump turned conspiracy-theory research upside down. Nature, 2021. [USG20] USGPO. govinfo – Bulk Data - Bill Status. United States Government Publishing Office (GPO), 2020. [VFO20] Demival Vasques Filho and Dion R. J. O’Neale. Transitivity and degree assortativity explained: The bipartite structure of social networks. Phys. Rev. E, 101:052305, May 2020. [VMND16] Michiel Van Meeteren, Zachary P. Neal, and Ben Derudder. Disentangling ag- glomeration and network externalities: A conceptual typology. Papers in Regional Science, 95(1):61–80, 2016. [Wes95] Julian West. Generating trees and the Catalan and Schröder numbers. Discrete Math., 146(1-3):247–262, 1995. [Wes96] Julian West. Generating trees and forbidden subsequences. In Proceedings of the 6th Conference on Formal Power Series and Algebraic Combinatorics (New Brunswick, NJ, 1994), volume 157, pages 363–374, 1996. [XCB20] Wenna Xi, Catherine A Calder, and Christopher R Browning. Beyond activity space: Detecting communities in ecological networks. Annals of the American Association of Geographers, pages 1–20, 2020. [ZH05] Bin Zhang and Steve Horvath. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), 2005. [ZK11] Katharina Anna Zweig and Michael Kaufmann. A systematic approach to the one-mode projection of bipartite graphs. Social Network Analysis and Mining, 1(3):187–218, Jul 2011. 138