TEACHERS IN SOCIAL MEDIA: A DATA SCIENCE PERSPECTIVE
                                By
                           Hamid Karimi
                        A DISSERTATION
                            Submitted to
                    Michigan State University
            in partial fulfillment of the requirements
                         for the degree of
           Computer Science – Doctor of Philosophy
                               2021


                                             ABSTRACT
              TEACHERS IN SOCIAL MEDIA: A DATA SCIENCE PERSPECTIVE
                                                  By
                                            Hamid Karimi
Social media has become an integral part of human life in the 21st century. The number of social
media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms
(e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation
of groups and communities, to name a few. As far as education systems are concerned, online
social media has transformed and connected traditional social networks within the schoolhouse to a
broader and expanded world outside. In such an expanded virtual space, teachers engage in various
activities within their communities, e.g., exchanging instructional resources, seeking new teaching
methods, engaging in online discussions. Therefore, given the importance of teachers in social
media and its tremendous impact on PK-12 education, in this dissertation, we investigate teachers
in social media from a data science perspective. Our investigation in this direction is essentially
an interdisciplinary endeavor bridging modern data science and education. In particular, we have
made three contributions, as briefly discussed in the following.
    Current teachers in social media studies suffice to a small number of surveyed teachers while
thousands of other teachers are on social media. This hinders us from conducting large-scale data-
driven studies pertinent to teachers in social media. Aiming to overcome this challenge and further
facilitate data-driven studies related to teachers in social media, we propose a novel method that
automatically identifies teachers on Pinterest, an image-based social media popular among teachers.
In this framework, we formulate the teacher identification problem as a positive unlabelled (PU)
learning where positive samples are surveyed teachers, and unlabelled samples are their online
friends. Using our framework, we build the largest dataset of teachers on Pinterest.
    With this dataset at our disposal, we perform an exploratory analysis of teachers on Pinterest
while considering their genders. Our analysis incorporates two crucial aspects of teachers in social


media. First, we investigate various online activities of male and female teachers, e.g., topics and
sources of their curated resources, the professional language employed to describe their resources.
Second, we investigate male and female teachers in the context of the social network (the graph)
they belong to, e.g., investigating structural centrality, gender homophily. Our analysis and findings
in this part of the dissertation can serve as a valuable reference for many entities concerned with
teachers’ gender, e.g., principals, state, and federal agencies.
     Finally, in the third part of the dissertation, we shed light on the diffusion of teacher-curated
resources on Pinterest. First, we introduce three measures to characterize the diffusion process.
Then, we investigate these three measures while considering two crucial characteristics of a re-
source, e.g., the topic and the source. Ultimately, we investigate how teacher attributes (e.g., the
number of friends) affect the diffusion of their resources. The conducted diffusion analysis is the
first of its kind and offers a deeper understating of the complex mechanism driving the diffusion of
resources curated by teachers on Pinterest.


To my wife, daughter, parents, siblings, and entire family for their love and support.
                                          iv


                                   ACKNOWLEDGEMENTS
First and foremost, I would like to thank Dr. Jiliang Tang, my Ph.D. advisor, for his support
and encouragement during my Ph.D. He helped me with numerous skills and provided me with
countless professional and academic opportunities, e.g., writing a research paper, writing a grant
proposal, polishing a novel idea, mentoring undergrad and grad students, job interview skills,
conducting interdisciplinary research, managing a research lab to just a few. I feel honored and
lucky to have been his Ph.D. student. He tirelessly helped me to be a better scholar and prepared
me for my future job.
    I want to extend my gratitude to my other Ph.D. committee members: Dr. Pang-Ning Tan, Dr.
Arun Ross, Dr. Kenneth Frank, and Dr. Kaitlin Torphy, for their insightful comments and feedback.
I had a chance to collaborate with Dr. Tan’s lab in 2018 on a joint project about compromised
account detection. This collaboration was through Dr. Courtland VanDam – the Ph.D. student of
Dr. Tan at the time- whom I am also thankful for our fruitful collaboration. Dr. Tan’s expertise
in data mining has been a great source of help both in our joint project and this dissertation. Dr.
Ross has had an instrumental role in improving the quality of this dissertation by providing me
with great comments. His invaluable expertise has been constructive and inspiring. I met Dr.
Kenneth Frank and Dr. Kaitlin Torphy through the Teachers in Social Media (TISM) project– an
interdisciplinary project founded by Dr. Torphy wherein I have been leading computational efforts
since Fall 2018. Dr. Frank has been an extraordinary mentor for me. He taught me how to think
like a social scientist. He helped me hone my data science skills to answer educational research
questions. I have continuously utilized the precious expertise of Dr. Frank on social network
analysis and education science during my Ph.D. and especially in this dissertation. His endless
support during my Ph.D. has had a significant impact on my professional and academic growth,
for which I am always grateful. Finally, I am thankful for all the support I have received from Dr.
Torphy. Throughout our close collaboration in the TISM project, I have learned a lot from her. Her
support has prepared me for being better equipped as an interdisciplinary researcher, especially for
                                                v


applying data science techniques to education.
    I joined the Data Science and Engineering (DSE) Lab at the end of the Summer 2017 semester.
During my Ph.D., I have had the pleasure and fortune of having supportive and encouraging
friends and colleagues from the DSE lab. I thank all the fantastic members of the DSE lab. In
particular, I want to thank my dear friend, Dr. Tyler Derr, who has been highly supportive during
our collaborations in the DSE lab. Moreover, I want to thank several outstanding undergraduate
students that I have had a chance to mentor, including Liyang Ye, Aaron Brookhouse, and Xochitl
Weiss. Finally, I am thankful to all my collaborators from outside the DSE Lab. In particular, I
want to express my special gratitude to the leading social scientist, Dr. H. Russell Bernard, who
has been an incredible mentor for me.
    What I am today owed to my parents: my dear and kind mom, Farast Hossein Panahi, and
my wonderful late father, Esmaeil Karimi. I am eternally grateful for their unconditional love and
support. Also, I am thankful for the love and support I have received from my incredible wife,
Nazanin Donyapour, who is not only a wife but also a dear friend. Moreover, I am thankful to my
older brother, Omid Karimi, for his support during the years of my education. Furthermore, I am
also thankful for the support from my in-laws, especially my late mother-in-law, whose kindness
was immeasurable. Finally, again, I would like to thank my entire family, especially my amazing
and stunning siblings.
                                                 vi


                                 TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     x
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   xi
CHAPTER 1 INTRODUCTION . . . . . . . . . . . .            . . . . . . . . . . . . . . . . . . .  1
   1.1 Contributions . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . .  4
       1.1.1 Automatic Teacher Identification . . . .     . . . . . . . . . . . . . . . . . . .  4
       1.1.2 Teacher Gender Analysis . . . . . . . .      . . . . . . . . . . . . . . . . . . .  4
       1.1.3 Diffusion of Teacher-curated Resources       . . . . . . . . . . . . . . . . . . .  5
   1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
CHAPTER 2 FOUNDATIONS AND PRELIMINARIES                     . . . . . . . . . . . . . . . . . .  7
   2.1 Current Studies on Teachers in Social Media . . .    . . . . . . . . . . . . . . . . . .  7
       2.1.1 Facebook-driven Studies . . . . . . . . .      . . . . . . . . . . . . . . . . . .  7
       2.1.2 Twitter-driven Studies . . . . . . . . . . .   . . . . . . . . . . . . . . . . . .  9
       2.1.3 Pinterest-driven Studies . . . . . . . . . .   . . . . . . . . . . . . . . . . . . 12
   2.2 Data Collection . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . 14
       2.2.1 Pinterest Data Description . . . . . . . .     . . . . . . . . . . . . . . . . . . 14
       2.2.2 The Surveyed Teachers . . . . . . . . . .      . . . . . . . . . . . . . . . . . . 18
       2.2.3 Network of Teachers . . . . . . . . . . .      . . . . . . . . . . . . . . . . . . 19
CHAPTER 3 AUTOMATIC TEACHER IDENTIFICATION . . . .                      . . . . . . . . . . . . 22
   3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
   3.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . 24
   3.3 The Proposed Framework (PUTeacher) . . . . . . . . . . .         . . . . . . . . . . . . 25
       3.3.1 Unsupervised Representation Learning . . . . . . . .       . . . . . . . . . . . . 26
       3.3.2 Automatic User Labeling . . . . . . . . . . . . . . .      . . . . . . . . . . . . 27
       3.3.3 User Classification . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . 28
   3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . 28
       3.4.1 Experimental Settings . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . 29
               3.4.1.1 Dataset . . . . . . . . . . . . . . . . . . .    . . . . . . . . . . . . 29
               3.4.1.2 Input Features . . . . . . . . . . . . . . .     . . . . . . . . . . . . 30
               3.4.1.3 Hyperparameter Tuning . . . . . . . . . .        . . . . . . . . . . . . 31
       3.4.2 Verification of Two-stage PU Learning Assumptions          . . . . . . . . . . . . 31
       3.4.3 Separability . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . 32
       3.4.4 Smoothness . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . 33
       3.4.5 Baseline Comparison . . . . . . . . . . . . . . . . .      . . . . . . . . . . . . 34
   3.5 Resiliency Analysis . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . 36
       3.5.1 Input Perturbation . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . 36
       3.5.2 Imbalance in Number of Pins . . . . . . . . . . . . .      . . . . . . . . . . . . 38
       3.5.3 Teacher Filtering Parameter Analysis . . . . . . . . .     . . . . . . . . . . . . 40
                                              vii


      3.5.4   Applying PUTeacher to Unlabelled Users . . . . . . . . . . . . . . . . .        . 42
      3.5.5   State Representativeness . . . . . . . . . . . . . . . . . . . . . . . . . .    . 46
              3.5.5.1 The U.S. State Distribution . . . . . . . . . . . . . . . . . . . .     . 46
              3.5.5.2 The U.S. State Generalization of PUTeacher . . . . . . . . . .          . 47
              3.5.5.3 The U.S. State Distribution of Automatically Identified Teachers        . 48
CHAPTER 4 GENDER ANALYSIS OF TEACHERS ON SOCIAL MEDIA                           . . . . . . . . 50
  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . 50
  4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
      4.2.1 Employing Automatic Teacher Identification . . . . . . . . .        . . . . . . . . 53
      4.2.2 Gender Identification . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . 55
      4.2.3 Privacy Concerns . . . . . . . . . . . . . . . . . . . . . . .      . . . . . . . . 57
  4.3 Online Activity Analysis . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . 58
      4.3.1 Resource Curation Rate . . . . . . . . . . . . . . . . . . . .      . . . . . . . . 58
      4.3.2 Topic of Pins . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . 60
              4.3.2.1 Top Topics . . . . . . . . . . . . . . . . . . . . .      . . . . . . . . 60
              4.3.2.2 Topic Entropy . . . . . . . . . . . . . . . . . . .       . . . . . . . . 63
              4.3.2.3 Topic Oscillation . . . . . . . . . . . . . . . . . .     . . . . . . . . 67
      4.3.3 Domain of Pins . . . . . . . . . . . . . . . . . . . . . . . .      . . . . . . . . 71
              4.3.3.1 Top Domains . . . . . . . . . . . . . . . . . . . .       . . . . . . . . 72
              4.3.3.2 Domain Entropy . . . . . . . . . . . . . . . . . .        . . . . . . . . 76
              4.3.3.3 Domain Oscillation . . . . . . . . . . . . . . . . .      . . . . . . . . 80
      4.3.4 Language of Resources . . . . . . . . . . . . . . . . . . . .       . . . . . . . . 85
      4.3.5 Resource Curation Over Time . . . . . . . . . . . . . . . .         . . . . . . . . 88
              4.3.5.1 Days of the Week . . . . . . . . . . . . . . . . . .      . . . . . . . . 89
              4.3.5.2 Months of the Year . . . . . . . . . . . . . . . . .      . . . . . . . . 90
              4.3.5.3 Days of the Month . . . . . . . . . . . . . . . . .       . . . . . . . . 92
  4.4 Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . 94
      4.4.1 Distribution of Connections . . . . . . . . . . . . . . . . .       . . . . . . . . 95
      4.4.2 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . 98
      4.4.3 Gender Homophily . . . . . . . . . . . . . . . . . . . . . .        . . . . . . . . 99
CHAPTER 5 DIFFUSION OF TEACHER-CURATED RESOURCES ON SOCIAL MEDIA104
  5.1 Dataset: Diffusion Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
  5.2 Characterizing Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
      5.2.1 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
      5.2.2 Virality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
      5.2.3 Velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
  5.3 Diffusion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
      5.3.1 Distribution of Diffusion Measures . . . . . . . . . . . . . . . . . . . . . . 109
      5.3.2 Resource Attributes and Diffusion Measures . . . . . . . . . . . . . . . . . 112
              5.3.2.1 Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
              5.3.2.2 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
      5.3.3 Teacher Attributes and Diffusion . . . . . . . . . . . . . . . . . . . . . . . 117
                                              viii


CHAPTER 6 CONCLUSION AND FUTURE DIRECTIONS . . . . . . . . . . . . . . . . 123
   6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
   6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
                                              ix


                                        LIST OF TABLES
Table 2.1: Pin-related fields in our dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Table 2.2: Board-related fields in our dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 17
Table 2.3: User-related fields in our dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Table 2.4: Some of the statistics of the Pinterest network. . . . . . . . . . . . . . . . . . . . 20
Table 3.1: PU learning terminology and its equivalents in our automatic teacher identifi-
           cation task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Table 3.2: Samples used in training, evaluating, and testing PUTeacher’s components.
           Ann: Annotated, Auto: Automatically identified, Surv: Surveyed . . . . . . . . 30
Table 3.3: Comparing PUTeacher with baseline methods. . . . . . . . . . . . . . . . . . . 35
Table 4.1: Basic statistics of our constructed dataset of male and female teachers. . . . . . . 57
Table 5.1: Some statistics of the introduced diffusion measures of the constructed diffu-
           sion trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Table 5.2: Regression analysis results of predicting volume using teacher attributes. . . . . 119
Table 5.3: Regression analysis results of predicting the virality using teacher attributes.    . . 119
Table 5.4: Regression analysis results of predicting the average re-pin time using teacher
           attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Table 5.5: Regression analysis results of predicting the first re-pin time using teacher
           attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
                                                  x


                                        LIST OF FIGURES
Figure 1.1: An overview of the research contributions presented in this dissertation. . . . .        3
Figure 2.1: An example of Pinterest newsfeed. . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 2.2: An example of a pin and its original source on the web. . . . . . . . . . . . . . 15
Figure 2.3: An example of a Pinterest user’s page. . . . . . . . . . . . . . . . . . . . . . . 16
Figure 2.4: The distribution of grade levels of the surveyed teachers. . . . . . . . . . . . . . 19
Figure 2.5: The number of surveyed teachers across five U.S. states. . . . . . . . . . . . . . 20
Figure 2.6: The CCDF of the degrees for the surveyed teachers and their online friends .
             x-axes are in log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 2.7: The CCDF of the number of pins for the surveyed teachers and their online
             friends. x-axis is in log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 3.1: An illustration of the proposed method for automatic teacher identification
             (PUTeacher). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 3.2: t-SNE visualization of teacher and non-teacher embeddings for the verification
             of the separability assumption. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 3.3: The fitted regression line between pairwise embedding distances and teacher
             scores differences for the verification of the smoothness assumption. . . . . . . 33
Figure 3.4: Perturbing the input features using the Gaussian noise. . . . . . . . . . . . . . . 36
Figure 3.5: The CCDF of the number of pin for unlabelled users. x-axis is in log scale. . . . 38
Figure 3.6: The ROC curves of training PUTeacher on four ranges of the number of pins.
             Numbers in the parentheses are AUC scores. . . . . . . . . . . . . . . . . . . . 39
Figure 3.7: Sensitivity analysis of the hyperparamter 𝛼. . . . . . . . . . . . . . . . . . . . 40
Figure 3.8: Sensitivity analysis of the hyperparamter 𝛽. . . . . . . . . . . . . . . . . . . . 41
Figure 3.9: The top 10 topics of pins of unlabelled users classified by PUTeacher. . . . . . 42
Figure 3.10: The top 10 words of pin descriptions of unlabelled users classified by PUTeacher. 43
                                                   xi


Figure 3.11: The top 10 domains of pins of unlabelled users classified by PUTeacher. . . . . 44
Figure 3.12: The distribution of the U.S. states for users in our dataset. . . . . . . . . . . . . 46
Figure 3.13: The ROC curves of PUTeacher’s performance for three levels of state repre-
             sentativeness. Numbers in the parentheses are AUC scores. . . . . . . . . . . . 47
Figure 3.14: The distribution of the U.S. states of automatically identified teachers. . . . . . 48
Figure 4.1: An overall illustration of our proposed automatic teacher identification ap-
             proach (PUTeacher) presented in Chapter 3. . . . . . . . . . . . . . . . . . . . 53
Figure 4.2: The number of identified teachers for different values of threshold 𝜏. . . . . . . 54
Figure 4.3: The CCDF of the number of pins and boards. x-axes are in log scale. . . . . . . 58
Figure 4.4: The CCDF of re-pins and non-repins. x-axes are in log scale. . . . . . . . . . . 59
Figure 4.5: The average proportion of topics for male and female teachers. . . . . . . . . . 61
Figure 4.6: Average proportion of topics of non-repins for male and female teachers. . . . . 62
Figure 4.7: The CCDF of the topic entropy (Eq. 4.1). . . . . . . . . . . . . . . . . . . . . . 63
Figure 4.8: The topic entropy based on the number of pins. . . . . . . . . . . . . . . . . . . 64
Figure 4.9: The topic entropy for male teachers across three distinct ranges of the numbers
             of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.10: The topic entropy for female teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 4.11: The CCDF of the topic oscillation (Eq. 4.2). . . . . . . . . . . . . . . . . . . . 68
Figure 4.12: The topic oscillation based on the number of pins. . . . . . . . . . . . . . . . . 68
Figure 4.13: The topic oscillation for male teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 4.14: The topic oscillation for female teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 4.15: A summary of the topic entropy and the topic oscillation for male and female
             teachers (values are median in ranges). . . . . . . . . . . . . . . . . . . . . . . 71
                                                  xii


Figure 4.16: The top 20 domains of pins for male teachers. . . . . . . . . . . . . . . . . . . 73
Figure 4.17: The distribution of the top 20 domains for male teachers across topics (𝐷𝑇 𝑚 ). . 74
Figure 4.18: The top 20 domains of pins for female teachers. . . . . . . . . . . . . . . . . . 75
Figure 4.19: The distribution of the top 20 domains for female teachers across topics (𝐷𝑇 𝑓 ).     76
Figure 4.20: An example of an educational pin curated from youtube.com. . . . . . . . . . . 77
Figure 4.21: The CCDF of the domain entropy (Eq. 4.3). . . . . . . . . . . . . . . . . . . . 78
Figure 4.22: The domain entropy based on the number of pins. . . . . . . . . . . . . . . . . 78
Figure 4.23: The domain entropy for male teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Figure 4.24: The domain entropy for female teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 4.25: The CCDF of the domain oscillation (Eq. 4.4).       . . . . . . . . . . . . . . . . . 81
Figure 4.26: The domain oscillation based on the number of pins. . . . . . . . . . . . . . . . 81
Figure 4.27: The domain oscillation for male teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 4.28: The domain oscillation for female teachers across three distinct ranges of the
             numbers of pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 4.29: A summary of the domain entropy and the domain oscillation for male and
             female teachers (values are median in ranges). . . . . . . . . . . . . . . . . . . 85
Figure 4.30: The top 30 words of pin descriptions for male and female teachers. . . . . . . . 85
Figure 4.31: The top 30 words of board names for male and female teachers. . . . . . . . . . 86
Figure 4.32: Similarity of the top-k pin-related word lists using Rank-biased Overlap (RBO).       87
Figure 4.33: The average percentage of pin curations on each day of the week for male and
             female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Figure 4.34: The average percentage of board curations on each day of the week for male
             and female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
                                                 xiii


Figure 4.35: The average percentage of pin curations in each month of the year for male
             and female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 4.36: The average percentage of board curations in each month of the year for male
             and female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 4.37: The average percentage of pin curations in each day of the month for male
             and female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 4.38: The average percentage of board curations in each day of the month for male
             and female teachers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Figure 4.39: The CCDF of number of connections for male and female teachers. x-axes
             are in log-scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 4.40: The CCDF of number of connections based on their types for each gender
             group separately. x-axes are in log-scale. . . . . . . . . . . . . . . . . . . . . . 96
Figure 4.41: Regression plots of the number and the number of followees. . . . . . . . . . . 97
Figure 4.42: The CCDF of the reciprocity for male and female teachers. . . . . . . . . . . . 97
Figure 4.43: The CCDF of centrality measures for male and female teachers. x-axes are
             in log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Figure 4.44: Dyad types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure 4.45: Gender homophily in dyadic relationships. . . . . . . . . . . . . . . . . . . . . 102
Figure 4.46: Triad types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 4.47: Gender homophily in triadic relationships. . . . . . . . . . . . . . . . . . . . . 103
Figure 5.1: An example of diffusion tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Figure 5.2: Three structurally different trees with the same volume but different virality
             values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Figure 5.3: The CCDF of the volume and virality. x-axes are in log scale. . . . . . . . . . . 110
Figure 5.4: The CCDF of the velocity measures. x-axes are in log scale. . . . . . . . . . . . 111
Figure 5.5: The average volume per topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Figure 5.6: The average virality per topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
                                                 xiv


Figure 5.7: The median of the average re-pein time per topic. . . . . . . . . . . . . . . . . 114
Figure 5.8: The median of the first re-pin time per topic. . . . . . . . . . . . . . . . . . . . 115
Figure 5.9: The median of the velocity measures for the top topics. . . . . . . . . . . . . . 115
Figure 5.10: The average of the volume and virality for the top 10 domains of teacher-
             curated resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Figure 5.11: The median of the velocity measures for the top 10 domains of teacher-curated
             resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Figure 5.12: A showcase of a popular pin from moffattgirls.blogspot.com adopted by 936
             other users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Figure .1:   The flowchart of the annotation procedure. . . . . . . . . . . . . . . . . . . . . 128
Figure .2:   An example of self-description and website URL in a Pinterest’s account. . . . . 128
                                                  xv


                                              CHAPTER 1
                                           INTRODUCTION
Social media has become an integral part of human life in the 21st century. The number of social
media users in 2020 was estimated to be around 3.6 billion individuals [1]. Social media platforms
(e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the cre-
ation of groups and communities, to name a few. As far as education systems are concerned, online
social media has transformed and connected traditional social networks within the schoolhouse to
a broader and expanded world outside [2]. Thanks to advancements in communication, educa-
tors have access to ample online instructional resources curated and shared across social media
platforms. In such an expanded virtual space, teachers engage in various activities within their
community, e.g., exchanging instructional resources, seeking new teaching methods, and engaging
in online discussions [3, 4, 5, 6, 7, 8, 9, 10, 11]. Students use social media as well–for example,
to supplement educational materials and interact with others [12, 13, 14, 15, 16, 17]. Further-
more, educational policymakers take advantage of social media to infer public opinion about new
policies [18]. In addition, parents seek out resources within social media to supplement their
children with educational materials [19]. Hence, today’s education, specially PK-12 education, is
closely intertwined with online social media and entails various entities. Nevertheless, the essen-
tial entities who play a critical role in bridging education and social media are teachers. Next, we
provide several reasons behind the importance of teachers in social media and why it deserves our
investigation.
     • While we might deem social media merely a communication tool, it is far beyond that. In
       a broader sense, social media is the reflection of who we are as humans, and it increasingly
       plays a critical role in shaping our identity [20]. Moreover, this digital identity is not limited
       to our personal interests and inherent beliefs; it encompasses the professional aspect of our
       life as well [21]. In other words, a significant portion of many people’s professional life
                                                    1


  is reflected on social media platforms. As far as teachers are concerned, this reflection
  involves performing various professional activities, e.g., curriculum development and lesson
  planning [22, 23, 24, 25, 4, 26, 27, 28]. Furthermore, as explained in [29], today’s teachers
  conceptualize their professional identity through online social media. Thus, teaching as a
  career is no longer confined to the physical world, and its development has a huge presence
  in online social media. In summary, as much as we care about teachers and their profession
  (i.e., teaching), we need to care about their online social media presence as well.
• One of the primary motivations of teachers to turn to online social media is to supplement
  their instructional and educational resources. In the classroom, many teachers encounter
  needing additional educational resources to improve their students’ learning. However,
  traditional educational resource curation (e.g., asking a colleague) is time-consuming and
  not scalable [30]. In contrast, seeking out educational resources from other teachers in online
  social media is easy. Specifically, the diffusion of online resources can be rapid within the
  same day, and teachers may integrate resources into their classroom practices quickly and
  conveniently. It is worth mentioning that during the COVID-19 pandemic, online educational
  resources curated by teachers have become specially essential [31]. Moreover, while teachers
  often find a minimal voice in school decisions, they are provided with tremendous flexibility
  and diversity in the portfolio of resources they can obtain from online social media. Hence,
  social media acts as a reliable support group for teachers and aids in the widespread diffusion
  of educational resources used in various classroom activities.
• Compared to the traditional data for studying teachers (i.e., interviews or surveys), the social
  media data offers several significant benefits. First, it is directly related to their educational
  classroom practices, while in surveys/interviews, we essentially create a proxy to tap into
  teachers’ pedagogical efforts. Second, we can have real-time access to a large amount of
  teacher-related data in social media. Third, the data from social media is without the response
  bias (incorrect responses from participants in surveys/interviews [32]) and the observer
                                               2


       bias (inaccuracy or subjectivity in recording the responses [33]). Ultimately, compared to
       surveys/interviews, the social media data of teachers offer a wide variety of formats, e.g.,
       images, videos, texts. Thus, the social media data of teachers offers a great potential to study
       teachers in the current digital age.
We refer the reader interested in further discussion about why we need to care about teachers in
social media to the engaging article by Frank and Torphy [34].
                                          Automatic
                                                                    Ø PU learning formulation
                                            Teacher                 Ø Scalable dataset of teachers
                                        Identification
    Teachers in Social                Teacher Gender                Ø Online activity analysis
      Media: A Data                                                 Ø Social network analysis
                                            Analysis
   Science Perspective
                                         Diffusion of               Ø Constructing diffusion trees
                                                                    Ø Characterizing diffusion
                                         Information                Ø Teacher/resource attributes & diffusion
        Figure 1.1: An overview of the research contributions presented in this dissertation.
    Therefore, given the importance of teachers in social media and its tremendous impact on
PK-12 education, in this dissertation, we investigate teachers in social media from a data science
perspective. Our investigation in this direction is essentially an interdisciplinary endeavor bridging
modern data science and education. A unique characteristic of this dissertation is that we have
incorporated real social media data from thousands of teachers and their online friends in our
investigations. Moreover, we have overcome significant technical challenges, proposed novel
machine learning, and data mining algorithms, and performed various novel analyses about teachers
in social media. An overview of the dissertation research contributions is summarized in Figure 1.1.
Next, we introduce our major contributions and the addressed challenges.
                                                   3


1.1    Contributions
1.1.1   Automatic Teacher Identification
In response to the importance of teachers in social media, recent years have witnessed a rapid
increase in studies assessing teachers in social media and its impact on the quality of education.
These studies, however, suffice to a small number of surveyed teachers (at most a few hundred) while
there are plenty of other teachers in social media. For instance, previous studies showed that more
than 75% of American teachers use Pinterest to seek lessons and educational materials [35, 36].
Hence, using only a small number of teachers is not representative of the large population of
teachers present in social media. This underrepresentation hinders us from conducting large-scale
data-driven studies pertinent to teachers in social media. For instance, if we intend to study the
diffusion of information among teachers, using only a small number of teachers, we cannot fully
characterize this diffusion since we need to have a broader picture of the online network that embeds
teachers. Aiming to overcome this challenge and further facilitate data-driven studies related to
teachers in social media (including those presented in this dissertation), we propose a framework
that automatically identifies teachers on Pinterest, an image-based social media popular among
teachers. In this framework, we formulate the teacher identification problem as a positive unlabeled
(PU) learning [37] where the positive samples are some surveyed (labeled) teachers, and unlabeled
samples are some of their online friends. Using our framework, we build the largest dataset of
teachers on Pinterest. We believe our proposed method has great potential in advancing research
on teachers in social media.
1.1.2   Teacher Gender Analysis
For decades, there have been numerous studies in educational literature investigating the role of
teacher’s gender and how it affects the quality of education and particularly students’ success [38,
39, 40, 41, 42, 42]. The primary motivation behind these studies is that the academic environment
created by teachers largely influences the way students see themselves as learners/students. In
                                                   4


this regard, some argue that the behavioral differences in male vs. female teachers are what
indeed matter, e.g., the way teachers manage the classroom or prepare materials [43]. However,
while identifying the behavioral differences between male and female teachers has been investigated
before [44, 41], no study to date juxtaposes male and female teachers in social media and investigates
their behavioral differences (and similarities). Given the significance of social media in shaping
teachers’ professional lives, we believe analyzing teachers’ online behavior while considering their
gender is essential. Unfortunately, an obstacle to conducting such a study has been the availability of
a large dataset of teachers in social media. However, thanks to our automatic teacher identification
framework, we have overcome this obstacle where we have built a rich dataset of teachers in social
media. Hence, given this dataset at our disposal, we perform an exploratory analysis of male and
female teachers on Pinterest. Our study incorporates two crucial aspects of teachers. First, we
investigate various online activities of male and female teachers, such as topics of their curated
resources. The motivation for this type of analysis is to understand male and female teachers through
the lens of their resource curation process. Second, we investigate male and female teachers in the
context of their social network (the graph). The performed social network analysis complements
online activity analysis by examining how teachers are connected in a social network. Notably, we
look into the critical notion of homophily (i.e., the tendency of similar individuals to connect in
a network) and substantiate gender homophily among teachers. Our analysis and findings in this
part of the dissertation can serve as a valuable reference for many entities concerned with teachers’
gender, e.g., educational scientists, policymakers, principals, state and federal governments.
1.1.3   Diffusion of Teacher-curated Resources
As mentioned before, previous studies have reported that the diffusion of information from social
media to the classroom can be potentially very fast [4, 5, 34, 45]. This is encouraging as teachers can
quickly implement these resources in their pedagogical practices. Nevertheless, our understanding
of how resources curated by teachers diffuse across the network and what factors affect this
diffusion is still slim. Again, a barrier to such understating has been the availability of a large
                                                   5


dataset of teachers in social media, including curated resources and their diffusion dynamics. We
overcome this obstacle by constructing the diffusion trees for more than one million resources
curated by teachers. Another major challenge in investigating the diffusion of information is
how to characterize the diffusion. To address this challenge, we introduce three crucial measures
which consider different aspects of the diffusion process. The first measure is volume, i.e., the
number of users who have received and saved a resource. The second measure is virality which
captures the structural virality of a resource based on its diffusion tree. Finally, the third metric
includes velocity measured by calculating (a) the first time a resource is re-pinned and (b) the
average time difference between consecutive re-pins.1 Using the introduced diffusion measures, we
investigate how different attributes of resources (e.g., their topics) affect the diffusion. Moreover,
through several regression analyses, we determine how teacher attributes (e.g., the number of their
followers) affect the diffusion. Our investigation in this part of the dissertation is the first of its
kind and offers a deeper understating of the complex mechanism driving the diffusion of resources
curated by teachers.
1.2      Organization
    The remainder of this dissertation is organized as follows. First, in Chapter 2, we introduce the
preliminaries, including current studies on teachers in social media and the data collection process.
In Chapter 3, we present our proposed approach for automatic teacher identification. We verify the
working of this approach on a large set of carefully constructed sets of teacher and non-teacher users
on Pinterest. Chapter 4 is devoted to analyzing the online behavior of male and female teachers.
We conduct two types of analysis, namely online activity analysis (Section 4.3) and social network
analysis (Section 4.4). In Chapter 5, we present our study on the diffusion of resources curated
by the teachers. Finally, Chapter 6 concludes the dissertation and offers promising future research
directions.
    1 In Pinterest, share is called re-pin.
                                                   6


                                             CHAPTER 2
                             FOUNDATIONS AND PRELIMINARIES
In this chapter, we present the foundations and preliminaries necessary for the rest of the dissertation.
In Section 2.1, we review current studies related to teachers in social media, and in Section 2.2, we
present the data collection process.
2.1     Current Studies on Teachers in Social Media
    Social media platforms are ubiquitous and have transformed almost every aspect of our lives.
As far as the education system is concerned, many teachers use online social media platforms (e.g.,
Facebook, Twitter, Pinterest) for educational engagement. Hence, during the past few years, there
has been a growing number of studies focusing on why and how teachers use online social media.
In this section, we review some of these studies. As for particular social media platforms, most
studies have focused on Facebook, Twitter, and recently Pinterest since these platforms are the ones
predominately used by teachers [46]. Therefore, we review the notable studies whose social media
platform is Facebook, Twitter, or Pinterest.
2.1.1    Facebook-driven Studies
Rutherford [47] conducted one of the earliest studies on how teachers use Facebook for professional
career development. They used a mixture of qualitative and quantitative analysis and investigated
384 users who actively participated in the Ontario teacher Facebook group. They found out that
majority of discussions were practical and related to teacher professional development. Cinkara
and Arslan [48] found similar results for EFL (English as a Foreign Language) teachers leveraging
Facebook groups for professional career development. Aiming at determining the professional
development practices, Ranieri et al. [49] investigated five Italian Facebook groups used by 1,1170
teachers. They thoroughly analyzed the dynamics of these groups and their members by taking
advantage of social capital theory [50]. Moreover, they inspected membership motivations based
                                                   7


on the type of the professional group (i.e., generic and thematic) and how the dynamic of social
capital differs in these groups. In general, they indicated that Facebook assists in improving
professional development. Bicen and Uzunboylu [51] investigated the usefulness of Facebook in
education. They set up an online learning environment for 71 teachers on Facebook where teachers
could do various activities such as sharing pedagogical videos and participating in discussions with
students and other teachers. Based on their results, participating teachers responded positively to the
incorporation of Facebook in teaching, which ultimately helped improve students’ learning. Authors
in [52] investigated views of students and teachers on introducing information and communications
technology outside the classroom, namely Google and Facebook discussion groups. They gathered
survey and interview data from 283 teachers and students after the deployment of Google and
Facebook groups. Their findings demonstrated students’ positive attitude toward these technologies,
while some teachers were reluctant to integrate them in their educational activities, mostly due to
time affordability. Sumuer et al. [53] looked into the habits related to how teachers use Facebook
(N=616). Similar to other Facebook-related studies, they showed professional usage of Facebook.
However, interestingly, they recognized that many teachers have privacy concerns, such as sharing
their personal information with students and parents. Similar privacy concerns were identified
by [54]. In an interesting study, Forkosh-Baruch et al. [55] investigated willing-to-connect (via
Facebook) for 160 Israeli teachers and 587 students based on various personal attributes e.g.,
age. They showed that willing-to-connect teachers were younger than not-willing-to-connect ones.
However, it turned out that willing-to-connect students were older than not-willing-to-connect
ones. In a similar study, Asterhan and Rosenberg [56] focused on student-teacher communication
on Facebook (N=198) and pointed out the different challenges facing teachers on Facebook, e.g.,
privacy concerns. Additionally, their results showed that teachers utilize Facebook for instructional
and psycho-pedagogical communications. Following the study in [56], Schwarz and Caduri [57]
performed an in-depth analysis of the interaction logs of five teachers with students to characterize
their communications. Although one teacher merely practiced the transmission of knowledge
(which has been shown to be an ineffective teaching style [58, 59]), the other teachers help foster
                                                  8


positive educational practices, namely social learning, autonomy, and active engagement. In
somewhat a unique study, Robson [29] shed light on the conceptualization of the professional
identity of teachers by conducting interviews with 20 teachers who were using Facebook. They
concluded that social media allows teachers to express themselves and form their ideal professional
image. Blonder and Rap [60] investigated how Technological Pedagogical Content Knowledge
(TPACK) and the self-efficacy beliefs of 12 high school chemistry teachers changed after introducing
several chemistry-related Facebook groups. Their study showed that TPACK improved among
teachers, and they developed TPACK skills geared more specifically toward teaching chemistry.
Ab Rashid [61] analyzed Facebook timelines of 34 high school teachers using thematic, and
discourse analysis approaches. Their investigation showed that teachers receive support from their
peers through conversations reflected on their Facebook timeline, which eventually help them in
their teaching. In conclusion, most studies have demonstrated that Facebook is primarily used for
professional development by teachers, which consequently can improve the quality of education.
Despite this, some teachers are reluctant to use Facebook due to privacy concerns.
2.1.2   Twitter-driven Studies
Cano [62] experimented with introducing Twitter as a learning and teaching tool in three Spanish
high schools across three subjects: Spanish language, social science, and natural science (15
teachers and 280 students). Their results revealed improvement in students’ grades after introducing
Twitter. In a similar study, Van Vooren and Bess [63] investigated the relationship between the use
of Twitter and the students’ success (N=86 students). Their findings suggested that students who
received support from their teacher via Twitter performed significantly better in standardized tests
compared to their counterparts who received no Twitter-related support. Similarly, Noble et al.
[64] indicated that teacher-student interactions on Twitter led to improved student learning and
reinforced the trust between students and teachers. Wesely [65] investigated the role of Twitter in
the professional development of 9 world language teachers. They monitored teachers participating
in #edchat on Twitter for more than a year and tried to determined how these teachers develop
                                                  9


communities of practice which “[are] groups of people who share a concern or a passion for
something they do and learn how to do it better as they interact regularly” [66]. They showed that
Twitter helps improve teacher career development. Following on the work of [65], Britt and Paulus
[67] performed a qualitative analysis of interviews with eight teachers who participated in #edchat
on Twitter. Their findings indicated that #edchat is an effective community of practice reinforcing
teacher professional development. Moreover, [68] found similar results for 30 identified influential
educators (teachers) on Twitter. In their prominent studies Carpenter and Krutka [69, 70] conducted
a thorough analysis of how and why teachers leverage Twitter for professional career development.
Their findings indicated that Twitter plays an essential role in the professional development of
teachers. In particular, the individualization offered by Twitter is a major advantage compared to
traditional professional development approaches. Moreover, their results suggested that Twitter
is of big help in combating teacher isolation where they received valuable support from their
peers online. Visser et al. [71] surveyed 324 active educators on Twitter about their experience of
using Twitter for professional purposes. Their findings indicated that teachers emphasized more
on professional usage of Twitter than personal. Through a qualitative analysis, they showed that
the surveyed teachers had created a positive socialized knowledge community accommodating
interpersonal communication and fostering collaboration. A noteworthy aspect of this study is its
teacher recruitment (sampling) process. Unlike most studies wherein teachers are pre-determined
(e.g., through contacting school or district officials), they employed a snowball sampling strategy
where they broadcast the link to their survey and asked teachers to participate themselves and further
invite their friends to participate. Our proposed teacher identification method in Chapter 3 follows
a similar strategy where we start from some initial teachers and attempt to expand (or snowball)
the sample size, i.e., identify more teachers. However, since our purpose is to incorporate teachers’
social media data, we focus on automating the teacher identification instead of directly asking for
the teacher’s participation. Trust et al. [72] performed a similar snowball sampling on multiple
platforms (e.g., Twitter, Facebook, Google+) to survey teachers on the effectiveness of online
platforms in professional development. Their results suggested that participating teachers have
                                                  10


found social media platforms to be supportive of their professional growth. Also, teachers reported
improvement in their student learning after utilizing online social media. Studies in [73, 74]
reported on the critical role of Twitter in teacher professional development as well as participating
teachers’ perception regarding Twitter. In particular, participants in [73] revealed that Twitter had
offered them a sense of belonging and community, which is even stronger than what their physical
workplace would deliver. Rosenberg et al. [75] conducted an interesting data-driven analysis of
teachers on Twitter through the lens of the affinity space framework [76], a physical or virtual
space revolving around a certain topic wherein people utilize a medium to interact with each other
about that topic. The authors used state educational Twitter hashtags (SETHs) to define the affinity
space on Twitter. SETHs are educational state-level Twitter hashtags developed by educators to
participate in the educational discussions, e.g., #miched for Michigan or #nebedchat for Nebraska.
They collected more than 500k tweets over six months, covering 68,552 unique Twitter users. Then,
to answer who is participating in these affinity spaces, they manually identified 500 Twitter profiles
belonging to educators (e.g., teacher, administrator). They also determined how active educators are
in each state and further characterized their tweet timing behavior, e.g., percentage of tweets per day
of the week. They found out that SETHs are effective spaces for teacher professional development.
Rehm and Notten [6] leveraged social capital theory [50] and performed a qualitative study of
4,196 Twitter users participating in #EDchatDE, a hashtag developed for educational conversations
in Germany. They found out that teachers’ social capital would increase through participating
in #EDchatDE. In conclusion, similar to Facebook, Twitter is used for professional development.
Moreover, according to reviewed studies, Twitter offers a more interactive environment where
teachers can participate in education-related discussions around certain topics. Compared to
Facebook, the open and public nature of Twitter makes teachers less concerned with privacy issues.
Eventually, the interested reader can refer to [77] to know more about how and why educators
utilize Twitter.
                                                  11


2.1.3   Pinterest-driven Studies
Pinterest is an image-based personalized social media platform that draws 150 million active users
per month. American teachers frequently use Pinterest as a common social media platform and
virtual resource pool for professional purposes [4, 34, 78, 79, 80]. According to a national survey
conducted by RAND Corporation, the majority of elementary and secondary teachers in the U.S.
turn to Pinterest in response to recent national education reform (e.g., Common Core State Standards
Reform [81]) or their instructional needs [82]. Frank et al. [25] analyzed the role of social networks
in providing emerging beneficial opportunities for education. They argued that social networks
outside schools, especially online ones like Pinterest, have a great potential to distribute knowledge
and expertise among teachers equally. Hence, given the importance of Pinterest in education, there
has been a growing number of studies investigating teachers on Pinterest.1 Next, we review notable
studies in this area.
    Through an interview with eight teachers, Carpenter et al. [80] conducted a qualitative analysis
on how teachers use Pinterest. They recruited teachers via a snowball sampling on Twitter, where
they asked Twitter users to participate in an interview. Theoretically, they based their study on 1)
Pinterest as an affinity space wherein teachers share common interests, and 2) teachers on Pinterest
as teacherpreneurs who strive and spend time to impact beyond their classroom while not necessary
to do so via traditional means, e.g., administrative roles. Although their sample size is small, their
findings are interesting. They identified seven themes on how and why teachers use Pinterest.
Notably, participants described Pinterest as a content curation tool. More specifically, teachers
perceived Pinterest as an organizer or binder of the resources they encounter on the web or create
themselves. This unique property of Pinterest has been demonstrated in previous (non-educational)
studies [83, 84, 85, 86, 87, 88]. In fact, Pinterest as a social curation tool is what makes it very
appealing among teachers [4]. Through a qualitative study of 117 teachers, Schroeder et al. [89]
showed that teachers primarily utilize Pinterest to look for educational resources according to their
classroom needs. They surveyed two types of teachers: preservice teachers (PSTs) and in-service
    1 The number of Pinterest-driven studies is smaller than Twitter and Facebook.
                                                   12


teachers (ISTs). Although both types sought specific instructional materials on Pinterest, the PTSs
were more interested in “cute” and “fun” materials since, according to [89], PSTs have more time
to implement these resources. Interestingly, our analysis shows that the word “fun” to be among
the top words used by teachers to describe their pins– See Sections 3.5.4 and 4.3.4. Torphy
et al. [79] performed a thorough analysis of teacherpreneurial behaviors of teachers on Pinterest.
They characterized the source of 140,287 resources (pins) curated by 197 teachers on Pinterest.
Their findings indicated that educational blogs were the predominant source of resources. Also,
teacher-to-teacher market websites (notably teacherspayteachers.com) constituted a considerable
portion of pins’ sources. Our findings in Chapter 4 are in line with that of [79], where we also
discovered that the predominant source of pins in our dataset is educational websites. Moreover,
they found out that a significant portion of pins (82.8%) are monetized. This and the widespread
diffusion of educational resources curated by teacherpreneurial educators signify that we face a
new and decentralized open market of educational resources, which Torphy and Drake [4] named
the “Fifth Estate within the digital age”. Hu et al. [78] examined teachers’ curation mechanism of
mathematical resources on Pinterest. They characterized the mathematical pins and showed that they
usually have low cognitive demand (difficulty). Sawyer et al. [90] found similar results regarding
the cognitive demand levels of mathematical pins on Pinterest. Additionally, their work illustrated
that socialized knowledge communities formed by mathematics teachers assist them in locating
resources relevant to teaching mathematics. In our previous study [11], we also characterized
mathematical resources shared by elementary school teachers on Pinterest and further proposed
a method to predict the cognitive demand of resources. Hu et al. [26] performed an interesting
analysis of how mathematical resources are curated. They identified three types of curation, namely
self-directed, incidental, and socialized. One of the crucial features of this study is that they shed
light on how online educational resources acquired from Pinterest are enacted in the classroom. Liu
et al. [45] examined the diffusion of educational resources on Pinterest. They collected the Pinterest
resource curation process for 34 early career teachers (ECTs) from three Midwestern states. They
only studied the diffusion of resources for an ECT and their colleagues on Pinterest, i.e., those who
                                                  13


work with the ECT in the same school and had been nominated by them as close colleagues. Their
results indicated that Pinterest act as a bridge between weakly connected teachers within the same
school. Similarly, in Chapter 5, we analyze the diffusion of teacher-curated resources on Pinterest.
    In conclusion, similar to Twitter and Facebook, teachers leverage Pinterest for professional
purposes, which has improved their teaching according to the reviewed studies. However, unlike
Twitter and Facebook, Pinterest is perceived as a social curation platform rather than a direct means
for teacher-teacher or teacher-student communications. Moreover, perhaps the less politicized and
polarized nature of Pinterest has contributed to its widespread usage by teachers. The interested
reader can refer to [4, 28, 34, 46, 91, 92] for more details about other teachers in social media
studies, especially those concerned with Pinterest. Eventually, it is worth mentioning our previous
study [10] where we offered a roadmap on how to incorporate online social media in educational
research, especially teachers in social media.
2.2    Data Collection
    In this part, we explain the data collection process—first, describing the Pinterest data we
acquired for each user. Then, we discuss the surveyed teachers and finally describe the constructed
network of teachers.
2.2.1   Pinterest Data Description
Within Pinterest, users may encounter a personalized newsfeed of resources from various topics
such as education and sports. Figure 2.1 illustrates an example Pinterest newsfeed. Each resource
in Pinterest is called a pin. As demonstrated in Figure 2.2, each pin includes several pieces of
information, including image, description, title, source, domain, comments, and board. By clicking
on the source’s URL, one will be redirected to the original website where the pin comes from. A
user saves a pin in a board which is essentially a user-created directory holding pins with a similar
topic (e.g., "My Math Pins" in Figure 2.2). Figure 2.3 shows an example of a Pinterest user’s page,
including the curated boards.
                                                   14


                             Figure 2.1: An example of Pinterest newsfeed.
                      Title Description   Board
                                                                        Original Source
                                                 Domain
                               Original Pinner
         Image      Comments
                Figure 2.2: An example of a pin and its original source on the web.
   We used API (application programming interface) provided by Pinterest and obtained data
about Pinterest users, users’ pins, and users’ boards. Table 2.1 shows the pin-related information
we retrieved for each pin. The API provided us with some crucial information. In particular, we
have the topic of a pin which is a pre-defined topic (category) assigned by Pinterest. In our dataset,
                                                  15


                         Figure 2.3: An example of a Pinterest user’s page.
there are 34 topics such as education, sports, food_drink. Moreover, similar to other social media
platforms (e.g., Facebook) where a user’s post is further shared, a pin can be a re-pin from another
pin. Luckily, we also have access to the re-pin information, including the parent pin (i.e., the
previous re-pin) and the original pin (i.e., the initial pin from which all re-pins have occurred). We
can capture the diffusion process of the pins in the network from this information, which is used in
our investigation in Chapter 5. Furthermore, Table 2.2 shows the fields related to a board. Through
the board ID in Table 2.1 and ID in Table 2.2, one can find the corresponding board of a pin.
                             Table 2.1: Pin-related fields in our dataset.
         Field                       Description                                Example
                            A unique Pinterest-generated
           ID                                                          “713539134715179880”
                                 identifier of the pin
          title             User-generated title of the pin            Subtraction Tables Chart
                                                                           Convenient, useful
      description       User-generated description of the pin
                                                                             learning tools ..
        domain             Domain of the original source             www.teachercreated.com
                                 A pre-defined topic
         topic                                                                  education
                                  assigned to a pin
       created at     Time and date that the pin has been saved       Sun, 23 Jul 2017 01:00:56
     image URL          URL of the image saved on Pinterest             https://i.pinimg.com/...
      parent pin               ID of the successor pin                 “320951910950109285”
     original pin               ID of the original pin                 “597430706814302354”
    original pinner      ID of the original pinner of the pin          “144115394234286193”
                            A unique Pinterest-generated
       board ID                                                        “255227572571702183”
                                identifier of the board
                                                   16


                            Table 2.2: Board-related fields in our dataset.
           Field                      Description                            Example
             ID         A unique Pinterest-generated identifier       “255227572571702183”
           name            User-generated title of the board                My Math Pins
         created at        Time and date of board creation         Mon, 08 Aug 2016 07:22:18
     number of pins           Number of pins in the board                       287
    In addition to information about pins and boards in a user’s account, we obtained information
about each user, as shown in Table 2.3. In particular, we have the user-declared gender, which
is used to distinguish male and female teachers in our investigation presented in Chapter 4. In
addition, a Pinterest user can add a short description about themselves and a link to their website,
both of which are available in our dataset and will be used in our analysis in subsequent Chapters.
                             Table 2.3: User-related fields in our dataset.
            Field                     Description                            Example
          username        A unique user-generated username                karimihamid65
            name            First and last names of the user              Hamid Karimi
                                   Time and date that
          joined at                                                Tue, 19 June 2016 23:17:11
                              the user has joined Pinterest
                                  User-declared gender
           gender                                                              Male
                              Male, Female, or Unspecified
                                A short self-introduction
      self-description                                              I am a 2nd grade teacher ...
                                visible in a user’s profile
        user website         User-declared website or blog            www.hamidkarimi.com
                                Country where the user
           country                                                              US
                         has logged in during their last session
    Remark. We used the Pinterest API to collect our data in November 2019. Unfortunately, since
then, the Pinterest API has been the subject of a couple of changes, and consequently, some of the
information we collected might not be provided any longer.
                                                   17


2.2.2    The Surveyed Teachers
Our efforts in this dissertation are part of an interdisciplinary project named Teachers in Social
Media Project.2 Founded by Dr. Kaitlin Torphy,3 this project considers the intersection of the
cloud to class, the nature of resources within virtual resource pools, and the implications for equity
as educational spaces grow. Much of the work coming out of the Teachers in Social Media project
concerns instructional and educational resources shared on Pinterest. Hence, as a part of this
project, we have surveyed various American teachers whose information is used as the basis of our
data collection in this dissertation. More specifically, the surveyed teachers used in this dissertation
are sampled from three sets described as follows.
    Set 1: SEMI (Study of Elementary Mathematics Instruction). This sample includes 340
ECTs (early career teachers) from 75 schools in 31 districts across four mid-west states, including
Ohio, Michigan, Illinois, and Indiana. ECT is defined as a teacher in the first four years of their
teaching career.
    Set 2: OER (Open Educational Resources). 100 Michigan teachers were sampled from two
rural pilot districts utilizing open educational resources. Teachers were identified and sampled
across K-12 grade levels and eight schools.
    Set 3: Texas Teachers. Finally, we selected a random sample of 100 Texas teachers from a
non-CCSS (common core state standard) state. Teachers are from 16 schools across 16 distinct
districts.
    In total, we have 540 teachers across five states, 48 districts, and 99 schools. Figure 2.5 shows
the number of teachers in each of the five states. Among the surveyed teachers, 428 are females,
13 males, and 99 unspecified. Figure 2.4 shows the distribution of the grade levels for the surveyed
teachers. For a teacher teaching multiple grades, we consider the highest grade they teach.4 More
than 84% of teachers are teaching grades K to 6.
    2 https://www.teachersinsocialmedia.com/
    3 https://torphyka.wixsite.com/kaitlintorphy
    4 Twelves  teachers were teaching multiple grade levels.
                                                  18


             Grade 1                                                                               103
       Kindergarten                                                                         89
             Grade 2                                                          69
             Grade 3                                                       66
             Grade 4                                                    59
             Grade 5                                                 56
         Unspecified                                37
             Grade 6                12
                Math              10
              English           8
      Social studies           7
             Science          6
          Pre-school         5
               pred-K        5
                  Art       4
                Music      3
  Physical Education     1
                      0                20            40                60           80         100
                                                               Count
                        Figure 2.4: The distribution of grade levels of the surveyed teachers.
2.2.3      Network of Teachers
In addition to the data of the surveyed teachers, we acquired the data of all their online friends, i.e.,
their followers and followees. A user’s follower is another user who follows that user, and similarly,
a followee of a user is whom the user follows. Then, we constructed the network (graph) between
all users. More formally, let 𝐺 = (𝑉, 𝐸) represent our directed Pinterest network where 𝑉 denotes
the set of nodes (i.e., Pinterest users) and 𝐸 denotes the set of edges (connections). Here, an edge
𝑒 : (𝑢, 𝑣) indicates that user 𝑢 follows user 𝑣. Some of the statistics of the network are shown in
Table 2.4. Our network has 83,768 users and millions of edges. To the best of our knowledge,
this is the largest network of teachers on social media. Furthermore, while the surveyed teachers
reside in five U.S. states, by including their online friends, we have ended up with a single network
(a connected graph).5 This indicates the presence of small-world property where on average, two
nodes (users) have a small distance from each other [93].
     Figure 2.6 demonstrates the complementary cumulative distribution (CCDF) of degree distri-
    5 Technically, our network is weakly connected while its undirected version is strongly connected.
                                                         19


     Indiana                                                                          182
   Michigan                                                               145
      Illinois                                              111
       Texas                                           100
         Ohio 2
               0      25         50        75        100        125       150     175
                                                 Count
                 Figure 2.5: The number of surveyed teachers across five U.S. states.
                      Table 2.4: Some of the statistics of the Pinterest network.
                                 The surveyed teachers             540
                             Followers+Followees (friends)       83,228
                                   Total nodes (users)           83,768
                               Total edges (connections)        5,868,122
                                    Average degree               131.58
                                   Average in-degree              65.79
                                  Average out-degree              65.79
butions for the surveyed teachers and all nodes where in-degree, out-degree, degree (in & out)
distributions are shown in Figures 2.6a, 2.6b, 2.6c, respectively. Similar to other (online) social
networks, the distributions follow a power-law distribution [94] where most of the nodes have a
small (in/out)-degree and a tiny percentage have high degrees. Also, it seems the surveyed teachers
have followed and are being followed more than their online friends. Figure 2.7 demonstrates the
CCDF of the number of pins for the surveyed teachers and their online friends. The number of pins
follows a power-law distribution.
    Remark. Note that an online friend of a surveyed teacher can be a teacher or non-teacher.
                                                  20


            1.0                                                                      1.0                                                     1.0
                                                Online friends                                                    Online friends                                       Online friends
                                                Surveyed teachers                                                 Surveyed teachers                                    Surveyed teachers
            0.8                                                                      0.8                                                     0.8
            0.6                                                                      0.6                                                     0.6
   P(X>x)                                                                       P(X>x)                                                  P(X>x)
            0.4                                                                      0.4                                                     0.4
            0.2                                                                      0.2                                                     0.2
            0.0
                  10
                       0
                           10
                                1
                                       10
                                            2
                                                    10
                                                         3
                                                                 10
                                                                      4
                                                                                     0.0 100      101         102        103                 0.0 100      101     102      103      104
                                    In-degree (X)                                                       Out-degree (X)                                          Degree (X)
                           (a) In-degree                                                         (b) Out-degree                                        (c) Degree (in+out)
Figure 2.6: The CCDF of the degrees for the surveyed teachers and their online friends . x-axes
are in log scale.
                                                                          1.0
                                                                                                                    Online friends
                                                                                                                    Surveyed teachers
                                                                          0.8
                                                                          0.6
                                                                  P(X>x)
                                                                          0.4
                                                                          0.2
                                                                          0.0
                                                                                100        101     102    103      104            105
                                                                                                  Number of pins (X)
Figure 2.7: The CCDF of the number of pins for the surveyed teachers and their online friends.
x-axis is in log scale.
We use this fact in the next section to identify more teachers among online friends of current the
surveyed teachers.
                                                                                                           21


                                            CHAPTER 3
                         AUTOMATIC TEACHER IDENTIFICATION
3.1    Introduction
    In Chapter 1, we discussed the importance of teachers in social media and its significant impact
on education. In response to this importance, there have been many studies investigating teachers
in social media in the past few years, a review of which was presented in Chapter 2. These
studies have made significant progress in illuminating the potential of teachers in social media
and the extra benefit it brings to education. Nevertheless, they suffer from a major limitation:
they base their analysis on a limited number of surveyed/interviewed teachers. More specifically,
they survey/interview a small number of teachers, and then they acquire their online social media
data, i.e., a bottom-up data collection from surveyed/interviewed teachers to their online data.
This causes two drawbacks. First, the study’s outcome may not be statistically significant as the
number of teachers is small. Second, they cannot harness the power of modern data-driven machine
learning algorithms in their analysis since these algorithms usually require a sufficient amount of
data. Hence, having an efficient mechanism to identify more teachers is crucial to advance the
research on teachers in social media. An immediate option is to sample more teachers via surveys
or interviews. However, surveying/interviewing is usually labor-intensive, costly, and non-scalable.
Thus, we need a method that can identify teachers in social media automatically. Essentially, we
need to have a binary classifier that for a given user reliably predicts whether they are a teacher or
not.
    While being very beneficial, the automatic identification of teachers in social media faces a
significant technical challenge. For a binary classifier, we need to have training data from both
teachers and non-teachers so we can train a supervised learning model classifying users to teachers
and non-teachers. However, we do not have access to non-teacher users in practice since only a
small set of surveyed/interviewed teachers are available. Given this, we cannot simply set up a
                                                  22


supervised model. To solve this challenge, we formulate the automatic teacher identification as a
positive unlabelled (PU) learning task. As far as our dataset is concerned, positive samples are
the surveyed teachers described in Section 2.2.2, and unlabelled samples are other users connected
to the surveyed teachers1, i.e., they follow or are followed by the surveyed teachers as described
in Section 2.2.3. Note that an unlabelled user can be either a teacher or a non-teacher. PU
learning has gained popularity in machine learning literature as its setting arises naturally in many
applications such as automatic diagnosis [95], marketing [96], remote sensing [97]. Likewise, our
conceptualization of the automatic teacher identification problem as a PU learning task reflects this
problem’s most practical and natural setting. Specifically, we train an efficient teacher identification
classifier from a limited number of teachers and many readily available unlabelled users. Next, we
briefly explain our proposed approach.
    This chapter proposes a PU framework to identify teachers on Pinterest automatically. We call
our framework PUTeacher, which entails three components. In the Unsupervised Representation
Learning component, we develop a deep neural autoencoder to learn a salient and compact rep-
resentation for both positive (teacher) and unlabelled users. This component’s main advantage is
that it can encode the underlying semantic of the entire training data (positive plus unlabelled)
without requiring ground truth labels. In the second component, called Automatic User Labeling,
we propose a method that utilizes the learned representations from the first component and then
automatically marks unlabelled samples as potentially non-teachers or teachers, i.e., in the PU
learning terminology, finding reliable negative samples and additional reliable positive samples.
In the last component, User Classification, we utilize the marked samples as well as original pos-
itive samples and perform a binary classification to predict the class of unlabelled users (teacher
or non-teacher). We conduct extensive experiments and show the effectiveness of our proposed
framework. In summary, our contributions are as follow:
     • We formulate the teacher identification problem as a PU learning task reflecting this problem’s
        realistic and practical setting.
    1 In this chapter, we use terms sample and user interchangeably.
                                                  23


     • We propose an effective PU learning method for teacher identification, which can reliably
       identify thousands of teachers on Pinterest.
     The rest of this chapter is organized as follows. First, in Section 3.2, we formally define
the problem, followed by presenting PUTeacher in Section 3.3. Next, Section 3.4 includes the
experimental results and discussions. Finally, in Section 3.5, we perform an extensive resiliency
analysis of PUTeacher to ensure its working.
3.2     Problem Statement
     Let X = {𝑥1 , 𝑥2 , · · · , 𝑥 𝑛 } represent a dataset of 𝑛 online social media users where 𝑥𝑖 ∈ R𝑑 and
𝑑 is the dimension of feature inputs representing 𝑥𝑖 . Suppose the random variable 𝑦 = {+1, −1}
represents the label of a sample in X where +1 indicates the sample is a teacher and −1 otherwise.
Further, let X consist of two distinct sets of 𝑙 positively labelled users and 𝑛 − 𝑙 unlabelled users, i.e.,
X 𝑝 = {𝑥 1 , 𝑥2 , · · · , 𝑥 𝑙 } and X 𝑢 = {𝑥 𝑙+1 , 𝑥 𝑙+2 , · · · , 𝑥 𝑛 }, respectively. For convenience, let |X 𝑢 | = 𝑚
i.e., 𝑚 = 𝑛 − 𝑙. Following the PU learning setting, ∀𝑥𝑖 ∈ X 𝑝 , 𝑦𝑖 = +1 while 𝑦 𝑗 for 𝑥 𝑗 ∈ X 𝑢 is
unknown.
     Now, given the notations listed above, we seek to utilize X 𝑝 and X 𝑢 to learn a model 𝑓𝜃 (𝑥)
having parameters 𝜃 such that it can predict the label for an unseen user in X 𝑢 .
Table 3.1: PU learning terminology and its equivalents in our automatic teacher identification task.
                       PU learning                      Automatic teacher identification task
                    Positive samples                                          Teachers
                   Negative samples                                         Non-teachers
                 Unlabelled samples               Followers and followees of the survyed teachers
             Reliable positive samples                          Automatically marked teachers
            Reliable negative samples                      Automatically marked non-teachers
             Original positive samples                                The surveyed teachers
     Table 3.1 demonstrates a mapping between PU learning terminology and its equivalent in our
automatic teacher identification task.
                                                                24


3.3     The Proposed Framework (PUTeacher)
    An overview of the proposed framework, PUTeacher, is demonstrated in Figure 3.1. Our
framework falls into the category of two-stage PU learning, in which we first try to identify reliable
negative and positive samples and then utilize them to train a supervised learning model. An
important assumption in the two-stage PU learning is the smoothness property, which asserts
that if two samples 𝑥𝑖 and 𝑥 𝑗 are similar, the probabilities 𝑃(𝑦 = +1|𝑥𝑖 ) and 𝑃(𝑦 = +1|𝑥 𝑗 )
are close [37]. The smoothness property has been leveraged in various two-stage PU learning
algorithms [98, 99, 100]. Assuming this property, we can identify reliable negative samples
as those far away from all labeled samples. The key to this assumption is to determine the
similarity between the two samples. To this end, we need to have an effective method to encode
the input data, which is what the Unsupervised Representation Learning component is tasked
for. Then, in the Automatic User Labeling component, we utilize these representations, and
based on the smoothness assumption, we propose a novel method to identify reliable negative and
positive samples from the unlabelled data. In other words, Automatic User Labeling component
automatically marks penitential non-teachers (reliable negative samples) and potential teachers
(reliable positive samples). These two components will be presented in Sections 3.3.1 and 3.3.2,
respectively.
    Using the identified reliable negative and positive samples and the original positive labeled
samples (i.e., the surveyed teachers), we can transform the PU learning into a supervised learning
task, which is what we will perform in the User Classification component. In the two-stage
PU learning, this supervised learning is predicated on another important assumption known as
separability, under which it is assumed that two classes (i.e., teachers and non-teachers) are
naturally separated [37]. In other words, theoretically, there should exist a ‘perfect’ classifier that
distinguishes positive samples from negative ones. In Section 3.4, we empirically demonstrate
that both the smoothness and separability assumptions hold, and hence our proposed framework is
justified.
                                                 25


 Unsupervised Representation          Automatic User Labeling                             User Classification
          Learning                                                                 Reliable Negatives (Non-Teachers)
                                       …          Unlabeled (Xu)
                                                                                                    ...
                                                                      …
   x                                                                                                                                   Non-Teacher
                                                                                                                        Classifier
                               xl+1                       xj                xl+1
              h                                                       slj
                                      s 1j
                                                   s 2j        s 3j
                                x1           x2                x3     …      xl         Reliable Negatives (Teachers)
                                        Surveyed Teachers                          Xp                  ...
                               𝑇𝑇𝑆!! = 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒(. 𝑠#$ )                                                                                 Teacher
                                                                      ∀!"
Figure 3.1: An illustration of the proposed method for automatic teacher identification
(PUTeacher).
3.3.1   Unsupervised Representation Learning
We need to extract salient and semantically informative features of the input data. Such features
are crucial for subsequent components of our framework. Unfortunately, we cannot use the labels
to represent the input data since we only have labels for a single class (i.e., teachers), and most of
the data is simply unlabelled. Hence, we train an autoencoder model to extract meaningful features
from the input data without using the labels. The autoencoder takes an input sample, encodes it
into a lower-dimensional hidden representation (embedding), and eventually decodes the hidden
representation to an output, aiming to reconstruct the input. By doing so, we force the autoencoder
to learn a condensed hidden representation that retains meaningful information about the input data.
Moreover, another benefit of autoencoders is reducing the dimensionality while reconstructing the
input sample. Autoencoders are widely used in representation learning and have shown tremendous
performance in various applications [101, 102].
   Let 𝐸 Θ (.) and 𝐷 Ω (.) denote the encoder and decoder with parameters Θ and Ω, respectively.
Then, we optimize the following loss function:
                                                  1 Õ
                               L = argmin                  k𝑥 − 𝐷 Ω (𝐸 Θ (𝑥) k 22                                                    (3.1)
                                              Θ,Ω 𝑚
                                                    ∀𝑥∈X 𝑢
where k.k denotes L2 norm (euclidean distance) of a vector. For convenience, let ℎ𝑖 = 𝐸 Θ (𝑥𝑖 )
denote the hidden representation for a sample 𝑥𝑖 . Note that, we train the autoencoder on the
                                                                            26


unlabelled samples.
3.3.2     Automatic User Labeling
Based on the smoothness assumption, similar samples have close probabilities of being positive.
Hence, we leverage this assumption and attempt to mark samples in the unlabelled data. Let 𝑠𝑖 𝑗
denote the similarity between user 𝑥 𝑗 ∈ X 𝑢 and 𝑥𝑖 ∈ X 𝑝 :
                                                              2
                                                  (− ℎ𝑖 −ℎ 𝑗 /2𝜎 2 )
                                         𝑠𝑖 𝑗 = 𝑒             2                                      (3.2)
𝑠𝑖 𝑗 is essentially RBF (radial basis function) kernel determining the similarity between two feature
vectors.2 Then, we sum up the similarity between 𝑥 𝑗 and all positive samples (denoted as 𝑆𝑥 𝑗 ):
                                                 Õ
                                     𝑆𝑥 𝑗 =             𝑠𝑖 𝑗    ∀𝑥 𝑗 ∈ X 𝑢                           (3.3)
                                               ∀𝑥𝑖 ∈X 𝑝
where 𝜎 is a free hyperparameter. We normalize the similarity score in Eq. 3.3 to range [0, 1] and
get Teacher Tendency Score (𝑇𝑇 𝑆𝑥 𝑗 ) for each unlabelled user:
                                     𝑆𝑥 𝑗 − 𝑚𝑖𝑛({𝑆𝑥 𝑘 : 𝑥 𝑘 ∈ X 𝑢 })
                𝑇𝑇 𝑆𝑥 𝑗 =                                                          ∀𝑥 𝑗 ∈ X 𝑢        (3.4)
                          𝑚𝑎𝑥({𝑆𝑥 𝑘 : 𝑥 𝑘 ∈ X 𝑢 }) − 𝑚𝑖𝑛({𝑆𝑥 𝑘 : 𝑥 𝑘 ∈ X 𝑢 })
𝑇𝑇 𝑆𝑥 𝑗 encodes the similarity between an unlabelled user and all positive users i.e., the surveyed
teachers. Now, based on the smoothness assumption, the closer 𝑇𝑇 𝑆𝑥 𝑗 is to 1, the higher chance
for 𝑥 𝑗 to be a positive user (a teacher). Similarly, the closer 𝑇𝑇 𝑆𝑥 𝑗 is to 0, the higher chance for 𝑥 𝑗
to be a negative user (a non-teacher). Hence, let 𝑦˜ 𝑥 𝑗 denote the automatic label (the pseudo-label)
assigned to an unlabelled user:
                                                +1
                                               
                                               
                                                          if 𝑇𝑇 𝑆𝑥 𝑗 > 𝛼
                                     𝑦˜ 𝑥 𝑗 =                                                        (3.5)
                                                −1
                                               
                                                           if 𝑇𝑇 𝑆𝑥 𝑗 < 𝛽
                                               
     2 We  found RBF kernel performing better empirically, but one can utilize other similarity
measures, e.g., cosine similarity.
                                                         27


where 𝛼 and 𝛽 are two hyperparameters controlling the level of sensitivity for automatically marked
teachers and non-teachers, respectively. Usually, 𝛼 needs to be close to 1 (e.g., 0.9) and 𝛽 close
to 0 (e.g., 0.05). In addition, by adjusting 𝛼 and 𝛽, we can control the number of newly identified
teachers and non-teachers, respectively.3 Also, one might wonder that instead of the summation in
Eq. 3.3, we could take the average of similarities and consider that as 𝑇𝑇 𝑆𝑥 𝑗 . However, we found
that the average of similarities for RBF is usually small, and thus flexibly selecting proper values
for 𝛼 and 𝛽 is difficult.
3.3.3   User Classification
Using the Automatic User Labeling, we create two sets of reliable negative (non-teacher) and reliable
positive (teacher) samples denoted as R 𝑛 = {𝑥𝑖 ∈ X 𝑢 , 𝑦˜ 𝑖 = −1} and R 𝑝 = {𝑥𝑖 ∈ X 𝑢 , 𝑦˜ 𝑖 = +1},
respectively. For the classification, we develop a deep feedforward neural network 𝑓𝜃 (.) trained
on representations of samples in R 𝑛 and R 𝑝 . The output of 𝑓𝜃 (.) is the probability distribution of
being teacher and non-teacher i.e., 𝑓𝜃 (ℎ𝑖 ) = [𝑦𝑖𝑡 , 𝑦𝑖𝑛𝑡 ], where 𝑦𝑖𝑡 (𝑦𝑖𝑛𝑡 ) is the probability of being a
teacher and a non-teacher, respectively. Note that 0 ≤ 𝑦𝑖𝑡 , 𝑦𝑖𝑛𝑡 ≤ 1 and 𝑦𝑖𝑡 + 𝑦𝑖𝑛𝑡 = 1. Then, we use
the backpropagation and the cross entropy loss function (Eq. 3.6) to optimize the neural network.
                                    Õ
                            L𝑐 = −      𝑦˜ 𝑖 × 𝑙𝑜𝑔(𝑦𝑖𝑡 ) + (1 − 𝑦˜ 𝑖 ) × 𝑙𝑜𝑔(𝑦𝑖𝑛𝑡 )                     (3.6)
                                    ∀𝑥𝑖
Note that, the input to neural network is the representations learned in the first component of
PUTeacher described in Section 3.3.1.
3.4    Experiments
    To verify the effectiveness of the proposed method, we conduct some experiments. In Sec-
tion 3.4.1, we explain the experimental settings. In Section 3.4.2, we verify the data assumptions
    3 One can set 𝛽 = 1− 𝛼. However, we prefer to keep 𝛼 and 𝛽 independent for flexibility purposes.
However, their ranges should not overlap.
                                                   28


discussed in Section 3.3 i.e., the smoothness and separability assumptions. Ultimately, we compare
the performance of PUTeacher with several baselines in Section 3.4.5.
3.4.1   Experimental Settings
In this part, we present the experimental settings, including the dataset, input features, and hyper-
parameter tuning.
3.4.1.1   Dataset
Our users include 540 surveyed teachers and their followers and followees as described in Sec-
tion 2.2.1. In addition to the surveyed teachers, we manually annotated 3,058 teachers and 2,079
non-teachers. The annotation procedure is described in the Appendix. Hence, the number of unla-
belled users is 78,091 i.e., |X 𝑛 | = 78, 091. Table 3.2 demonstrates the specific data used to train,
evaluate, and test PUTeacher’s components. To train the Unsupervised Representation Learning
component described in Section 3.3.1, we only used the unlabelled users. The validation set of
this component, utilized for hyperparameter tuning, is 5-fold cross-validation on its training set.
To train the User Classification component, we used 3038 teachers and 3038 non-teachers. The
teachers consist of 1519 annotated teachers and 1519 automatically identified teachers acquired
from the Automatic User Labeling. All 3038 non-teachers are automatically identified. To tune
this component’s hyperparameter, we used 3-fold cross-validation on its training set. Eventually,
to evaluate the performance of PUTeacher, we created a test set for the User Classification com-
ponent. This set consists of 1539 annotated teachers, 540 surveyed teachers, and 2079 annotated
non-teachers, i.e., in total, 2079 teachers (1539 + 540) and 2079 non-teachers. Note that teacher
and non-teacher samples used in the final evaluation of PUTeacher have ground truth labels. This
helps reinforce the reliability of the test set. Moreover, using the entire 540 surveyed teachers
as part of the test set further strengthens its reliability. Note that the Automatic User Labeling
component utilizes the unlabelled users and the surveyed teachers to mark users, and thus, unlike
the other two components, it does not entail any learning process.
                                                  29


Table 3.2: Samples used in training, evaluating, and testing PUTeacher’s components. Ann:
Annotated, Auto: Automatically identified, Surv: Surveyed
                Component                Split                       Samples
               Unsupervised
                                         Train               78,091 unlabelled users
          Representation Learning
               Unsupervised
                                       Validation             5-fold cross validation
          Representation Learning
               Unsupervised
                                          Test                           –
          Representation Learning
                                                    3038 teachers: 1519 Ann + 1519 Auto
             User Classification         Train
                                                             3038 Auto non-teachers
             User Classification       Validation             3-fold cross validation
                                                     2079 teachers: 1539 Ann + 540 Surv
             User Classification          Test
                                                             2079 Ann non-teachers
3.4.1.2   Input Features
We used the following input features to represent each user.
    Topic. As mentioned in Section 2.2.1, each pin belongs to one of 34 general topics (categories)
pre-defined by Pinterest, e.g., food, fashion, education. Hence, this feature vector has 34 values
corresponding to 34 existing topics. Each element of the vector is the number of the user’s pins in
a topic divided by the total number of their pins (i.e., the input is normalized).
    Domain. To represent the domain features, we extracted the top 200 domains in the entire
dataset. Then for each user, we created a vector of size 200. Each element of the vector holds the
number of pins whose domain is the corresponding domain in the top 200 domains. To normalize
the vector, we divide it by the total number of the user’s pins.
    Description. We extracted the top 100 words used in the description of pins shared by a user.
Then we represented each word using a pre-trained word embedding model known as fastText,
which includes one million word vectors trained on Wikipedia 2017 [103] (a vector of size 300
represents each word). We took the average of word embeddings for the top 100 words. Moreover,
we included an weighted average of the top 100 word embeddings where weights are frequencies
of the words in the pin descriptions of the user. Hence, the dimension of this feature vector is 600.
Note that, before acquiring word ebmeddings, we used NLTK package [104] and pre-processed pin
                                                  30


descriptions, e.g., removed punctuations and stopwords (e.g., ‘!’, ‘the’), stemmed the tokens (e.g.,
‘education’ to ‘educ’).
3.4.1.3   Hyperparameter Tuning
The encoder and decoder of the Unsupervised Representation Learning are two-layer, fully con-
nected neural networks. There is one hyperparameter associated with this autoencoder, namely
the dimension of the hidden representation. To tune this hyperparameter, we performed 5-fold
cross-validation on the unlabelled samples and evaluated the dimensions {10, 20, 30, 40, 50}. The
dimension size 20 yielded the best performance based on the L2 loss in Eq. 3.1. The Automatic
User Labeling has two crucial hyperparameters, namely 𝛼 and 𝛽 in Eq. 3.5. We did not tune
these two hyperparameters since we treat them as flexible variables to be set by the practitioner of
PUTeacher. Despite this, the selection criteria for these two hyperparameters are that 𝛼 should
be close to 1 and 𝛽 close to 0. Hence, we set 𝛼 to 0.9 and 𝛽 to 0.05. The low value of 𝛽 ensures
identifying reliable non-teacher users, which are crucial for the User Classification component. In
Section 3.5.3, we will explain how different values of 𝛼 and 𝛽 compete and affect the performance
of PUTeacher. We also set 𝜎 = 12 in Eq. 3.3. Eventually, for the User Classification component,
we developed a two-layer fully neural network connected. To tune the dimension of the hidden
layer of this network, we performed 3-fold cross-validation on its training set.
3.4.2   Verification of Two-stage PU Learning Assumptions
As mentioned before, two crucial assumptions in the two-stage PU learning are separability and
smoothness. To verify these assumptions, we trained a supervised multi-layer neural network on the
input features of 3,598 labeled teachers (3,058 annotated + 540 surveyed) and 2,079 non-teachers.
We randomly selected 70% of the data for the training and 30% for the test. We call this classifier,
data-assumption-verification-classifier. Note that data-assumption-verification-classifier is merely
used for the verification of the above assumptions and is distinct from our proposed automatic teacher
identification approach, i.e., PUTeacher.
                                                  31


3.4.3   Separability
Figure 3.2: t-SNE visualization of teacher and non-teacher embeddings for the verification of the
separability assumption.
    Data-assumption-verification-classifier achieved a very high performance of 0.95 for AUC
(Area Under Curve) and 0.92 for F1-score. Also, we used t-SNE (t-distributed stochastic neighbor
embedding) [105] and visualized the learned representations for test teachers and non-teachers, as
demonstrated in Figure 3.2. As it can be observed, the two classes are perfectly separated. Hence,
we can conclude that the data separability assumption holds for our dataset.
    The question is, what does this separability mean in the context of teachers in social media?
It means that teachers are using online social media (here Pinterest) in such a way that we can
distinguish them from other users. More specifically, as far as topics, domains, and descriptions
of pins are concerned, teachers’ online activity makes them identifiable from other Pinterest users.
This is in line with previous studies [77, 23, 24, 4, 34] showing that teachers leverage social
media for their specific professional needs. However, our findings corroborate this in a large-scale
data-driven manner.
                                                 32


3.4.4   Smoothness
To verify the smoothness property, we calculate the Pearson correlation between two variables,
namely 𝑑 (𝐹𝑥𝑖 , 𝐹𝑥 𝑗 ) and |𝑠𝑥𝑡 𝑖 − 𝑠𝑥𝑡 𝑗 | for all pairwise 𝑥𝑖 and 𝑥 𝑗 in the test set. 𝐹𝑥𝑖 denotes the
final embedding from data-assumption-verification-classifier (i.e., the output before the last linear
layer), 𝑠𝑥𝑡 𝑖 denotes the score of being a teacher (i.e., the output of the last linear layer before the
softmax), and 𝑑 is the Euclidean distance. The correlation is 0.88 with a p-value of 1.0×10−8 . This
high positive correlation between these two variables indicates a high degree of smoothness since
samples that are mapped to the same regions (thus having a small distance in the embedding space)
belong to the same class (thus a small difference in scores). Finally, the high correlation between
embedding level distances and teacher score differences is visually demonstrated in Figure 3.3 by
drawing a fitted linear regression between these two variables.
   Based on the above analysis regarding the verification of the separability and smoothness
assumptions, we can conclude that the two-stage PU learning is suitable for the automatic teacher
identification problem, and thus, the design of PUTeacher is justified.
                                             14
                                             12
                      Embedding level distance
                                             10
                                                 8
                                                 6
                                                 4
                                                 2
                                                 0
                                                     0         2           4        6
                                                         Teacher score difference
Figure 3.3: The fitted regression line between pairwise embedding distances and teacher scores
differences for the verification of the smoothness assumption.
                                                                   33


3.4.5   Baseline Comparison
We compare the performance of PUTeacher with the following baseline methods.
    • ElkaNoto [97]. This method is based on training a non-traditional classifier to predict
      whether a sample is labeled. Then, it utilizes the SCAR (selected completely at random)
      assumption and adjusts the classifier to a traditional one, i.e., predicting the label of a sample.
    • WElkaNoto [97]. It is similar to Elkanoto, except it assigns weights to training samples.
    • BaggingPU [95]. This method is based on bootstrap aggregation. It repeatedly trains
      classifiers to identify positive examples in the unlabelled set and eventually takes the average
      of these classifiers to distinguish positive samples from negative ones.
    • nPU [106]. This method proposes a convex formulation while canceling the bias introduced
      when one attempts to separate unlabelled data from positive data.
    • nnPU [107]. The authors of this paper proposed a method to minimize the risk while reducing
      the bias and overfitting of flexible models in unbiased risk estimation.
    • ProbTagging [108]. This method is a recent two-stage PU learning approach that identifies
      reliable negatives and positives and trains multiple ordinary supervised classifiers. Their
      tagging process is based on the k-Nearest Neighbor (kNN) in the input space. Due to the
      high-dimensionality of the input space in our dataset, however, kNN was not very effective.
      Hence, we trained this method based on representations learned in the first component of
      PUTeacher.
    • Supervised. For this method, we trained a supervised neural network classifier. Its
      test set is the same with PUTeacher as described in Section 3.4.1.1. For teachers in its
      training set, we used the same 1,519 annotated teachers used in the User Classification
      component of PUTeacher. Additionally, we selected 1,519 users from the unlabelled set
      whose unsupervised representations learned from the first component of PUTeacher is
                                                   34


        closest to the representations of annotated non-teachers. Evidently, this method is not a PU
        learning approach as it uses both labeled teachers and non-teachers. However, it acts as a
        yardstick for other methods and informs us of the problem’s upper bound performance.
    For the baselines from literature, we used their publicly available codes. We tuned all methods’
hyperparameters based on 3-fold cross-validation. Each method, including PUTeacher, was run
five times, and the average performance on the test set is reported. The performance metrics are
AUC and F1-score. We implemented our method using the PyTorch package [109]. Table 3.3
shows the results. In addition to reporting the performance on the entire test set, we exclusively
report the performance against the surveyed teachers in terms of the recall. The reason to do this
is the following. Our model uses data from social media (Pinterest) to perform automatic teacher
identification. Given this, we need to ensure that the model is generalizable to other types of
stratified data, i.e., the surveyed teachers. We make the following observations based on these
results.
                       Table 3.3: Comparing PUTeacher with baseline methods.
                                        Entire Test Set         The Surveyed Teachers
                  Method             AUC           F1-score              Recall
                Supervised       0.95 ± 0.008 0.92 ± 0.004            0.96 ± 0.001
                 ElkaNoto        0.82 ± 0.006 0.80 ± 0.002            0.83 ± 0.005
                 WElkaNoto       0.79 ± 0.008 0.79 ± 0.007            0.86 ± 0.002
                 BaggingPU       0.90 ± 0.008 0.87 ± 0.004            0.90 ± 0.008
                     nPU          0.88 ± 0.01    0.87 ± 0.01          0.87 ± 0.01
                    nnPU          0.90 ± 0.01    0.89 ± 0.01          0.91 ± 0.03
               ProbTagging       0.91 ± 0.002 0.90 ± 0.007            0.93 ± 0.007
                 PUTeacher       0.93 ± 0.001 0.91 ± 0.001            0.96 ± 0.003
    ElkaNoto and WElkaNoto achieved low performance. This is primarily due to their simplistic
dependence on a non-traditional classifier and then adjusting it based on the probability of a positive
sample being labeled, which is hard to estimate in practice. BaggingPU alleviates this problem
by training a bag of classifiers and thus has outperformed ElkaNoto and WElkaNoto. nPU and
nnPU achieved a relatively good performance. However, their dependence on the class prior has
made it hard for them to obtain high performance. ProbTagging achieved a good performance.
                                                   35


However, in addition to being costly, their employed k-NN is not suitable for high-dimensional
data. PUTeacher outperformed all baselines, and its performance is very close to the supervised
classifier. Regarding the surveyed teachers, we can observe that PUTeacher has achieved a very
high recall, has outperformed all baselines methods, and is on par with the Supervised method.
This indicates that PUTeacher is generalizable to other types of stratified data.
3.5          Resiliency Analysis
        In the previous section, we demonstrated that PUTeacher offers excellent performance in
classifying users as teachers and non-teachers. To further ensure the robustness of PUTeacher,
in this section, we perform several important resiliency experiments. These experiments look into
different aspects of PUTeacher (e.g., its input and output) and enable us to ascertain its resiliency.
The resiliency analysis presented in this section is particularly crucial since identified teachers from
PUTeacher are the basis of the studies in the following two chapters of the dissertation.
3.5.1        Input Perturbation
      0.92
      0.90
      0.88
      0.86
AUC
      0.84
      0.82       Topic
                 Domain
                 Description
      0.80       Topic + Domain
                 Topic + Description
                 Domain + Description
      0.78       Topic + Domain + Description
          0.00                  0.02            0.04                0.06        0.08           0.10
                                                       Noise ( 2)
                       Figure 3.4: Perturbing the input features using the Gaussian noise.
                                                        36


    The first analysis includes testing the robustness of PUTeacher against noise. To this end, we
added noise to the input data and inspected the performance of the model. More specifically, we
perturbed the three input features, i.e., topic, domain, and description, by adding the Gaussian noise
from a normal distribution N (𝜇, 𝜎 2 ) where 𝜇 is the mean and 𝜎 2 is the standard deviation of the
distribution. For this analysis, we set 𝜇 = 0 and considered 𝜎 2 in range [0.001, 0.1]. That is, we
changed the magnitude of the noise. Note that we only perturbed the input data of the test set since
our goal is to determine the robustness of the trained PUTeacher when faced with predicting the
class of unseen noisy instances. Given the uncontrolled and prone-to-noise nature of social media,
encountering such instances is likely in practice. Figure 3.4 demonstrates the results where the
x-axis is the amount of noise (𝜎 2 ), and the y-axis is the AUC on the perturbed test set. We ran
each experiment ten times. Also, as shown in this figure, we considered all combinations of feature
types. We make the following observations based on the results presented in Figure 3.4.
     • Even when all features are perturbed and the noise is as high as 0.1, PUTeacher manages
       to deliver a good performance in terms of the AUC score. This indicates that our proposed
       model, to a large degree, is robust against the noise and can be used as a reliable model to
       identify teachers.
     • Three cases wherein the descriptions have been kept intact are consistently above the other
       ones at each noise level. This shows that among feature types, the description is the most
       robust against the noise. The main reason is that while non-teachers can curate resources from
       similar domains and similar topics with teachers, the specific vocabulary used by teachers to
       describe their pins is somehow unique.
     • The robustness of the topic and domain against noise exhibits a similar pattern. This means
       that these two feature types have some similarities. This seems logical since there are specific
       domains tied to particular topics, e.g., teacherspayteachers.com to education. Despite this,
       both feature types are essential since when both are perturbed, the performance has dropped
       significantly– See the red plot in Figure 3.4.
                                                    37


3.5.2     Imbalance in Number of Pins
As far as the number of curated pins is concerned, not all users on Pinterest are equally active. As
was illustrated in Figure 2.7, the distribution of the number of pins follows a power-law distribution
where most of the users have a small number of pins while a tiny portion of users has a massive
number of pins. Similarly, the distribution of the number of pins for unlabelled users follows
a power-law distribution as demonstrated in Figure 3.5. Furthermore, the first component of
PUTeacher learns representations from this skewed set of users in terms of the number of pins.
Given the importance of the first component of PUTeacher, we need to investigate whether the
imbalance in the number of pins of unlabelled users influences the performance of PUTeacher?
                                   1.0
                                   0.8
                                   0.6
                          P(X>x)
                                   0.4
                                   0.2
                                   0.0
                                         100   101   102       103         104   105
                                                      Number of pins (X)
        Figure 3.5: The CCDF of the number of pin for unlabelled users. x-axis is in log scale.
   To answer this question, we trained four distinct versions of PUTeacher. In each version, we
trained PUTeacher only on unlabelled users whose number of pins is in a certain range. These
ranges include [1, 500], [501, 5000], [5001, 20000], and [20001, 287762]4, which cover 23319,
36749, 13253, and 3785 number of users, respectively. We call these ranges low, medium, high, and
very high, respectively, signifying the number of pins they include. Note that the only difference
in the four versions of PUTeacher is their unlabelled users. Other parts of the framework are kept
intact. Figure 3.6 shows the ROC curves of these four models. According to this figure, we can
   4 The   maximum number of pins for unlabelled users is 287,762.
                                                           38


                              1.0
                              0.8
                              0.6
         True positive rate
                              0.4
                              0.2
                                                                                        Low (0.85)
                                                                                        Medium (0.85)
                                                                                        High (0.93)
                              0.0                                                       Very High (0.93)
                                    0.0   0.2   0.4                         0.6   0.8             1.0
                                                      False positive rate
Figure 3.6: The ROC curves of training PUTeacher on four ranges of the number of pins. Numbers
in the parentheses are AUC scores.
observe that training on unlabelled users in the low and medium ranges leads to decreasing the
performance. In contracts, ranges high and very high deliver a better performance. Hence, we can
state that users with a larger number of pins have a larger impact on the performance of PUTeacher.
   Based on the above observation, we can assert that the only way that the number of pins causes
a problem is when most unlabelled users have a low or medium number of pins. Nevertheless, this
did not occur in our dataset since our collected unlabelled users consisted of a diverse set of users
in terms of their number of pins. So the question is, what can one do in circumstances when most
users have a relatively low number of pins, i.e., in ranges low and medium as described above?
Here, we briefly mention two possible ways to address this and leave exploring more advanced
approaches for the future. First, one can collect more unlabelled users. Note that unlabelled
data from social media is significantly cheaper and easier to acquire than conducting surveys or
annotating samples. For instance, using only 540 surveyed teachers, we easily collected thousands
of their online friends as unlabelled users. Second, sample generation, e.g., using generative
adversarial networks [110, 111], is another worthwhile direction. Using a generative method, for
instance, one can synthesize pins for a user based on their pin attributes’ distributions.
                                                           39


3.5.3     Teacher Filtering Parameter Analysis
                            (a)                                              (b)
                         Figure 3.7: Sensitivity analysis of the hyperparamter 𝛼.
     As described in Section 3.3.2, there are two important hyperparameters in the Automatic User
Labeling component of PUTeacher, namely 𝛼 and 𝛽. In this part, we perform a sensitivity analysis
of these two hyperparameters. Figures 3.7 and 3.8 show the sensitivity analysis for 𝛼 and 𝛽,
respectively. For each analysis, we keep one hyperparameter fixed and change the other one. While
changing a hyperparameter, we report two measures: the AUC score of PUTeacher (Figures 3.7a
and 3.8a) as well as the number of automatically labelled teachers and non-teachers (Figures 3.7b
and 3.8b, respectively).
     To assess the effect of 𝛼, we set 𝛽 = 0.03. First, from Figure 3.7a, we can observe that for a
proper 𝛼, our framework delivers a perfect performance, which indicates that the Automatic User
Labelling component effectively identifies reliable teachers from unlabelled data. Moreover, when
𝛼 is too small or too large, the performance drops. The reason for the former is that the Automatic
User Labeling mistakenly marks many non-teachers as teachers. In other words, the teacher labeling
filter is not restrictive enough. However, the latter case (i.e., when 𝛼 is too large) makes the teacher
                                                    40


labeling filter too restrictive, and thus the framework marks only a small number of teachers–
See Figure 3.7b. Consequently, this makes it hard for the final component of PUTeacher, User
Classification, to learn an effective classifier. Finally, we can observe from Figure 3.7b that when
𝛼 increases, the number of automatically labelled teachers decreases.
                        (a)                                                  (b)
                       Figure 3.8: Sensitivity analysis of the hyperparamter 𝛽.
    To assess the effect of 𝛽, we set 𝛼 = 0.95. We can observe from Figure 3.8b that when 𝛽 is small,
the performance is high since automatically labelled users enjoy high reliability. However, when 𝛽
is becoming larger, the performance drops. Additionally, an interesting phenomenon occurs when
𝛽 is set to a tiny number (less than 0.01): the performance drops significantly. The reason is that the
number of identified non-teachers becomes very small, as shown in Figure 3.8b. Consequently, the
User Classification component cannot properly learn to distinguish the two classes. Finally, one
can observe in Figure 3.8b that by increasing 𝛽, the number of automatically labelled non-teachers
increases.
                                                   41


3.5.4          Applying PUTeacher to Unlabelled Users
            health_fitness
                     kids
              hair_beauty
          womens_fashion
          holidays_events
  Topic            quotes
              home_decor
               food_drink
                diy_crafts
                education
                         0.0            0.1                 0.2                      0.3            0.4
                                                                  Proportion (%)
                                                    (a) predicted-as-teachers
                weddings
                  animals
              hair_beauty
                    travel
                   quotes
  Topic                art
              home_decor
                diy_crafts
          womens_fashion
               food_drink
                        0.000   0.025     0.050     0.075         0.100          0.125     0.150   0.175   0.200
                                                                  Proportion (%)
                                                  (b) predicted-as-non-teachers
               Figure 3.9: The top 10 topics of pins of unlabelled users classified by PUTeacher.
                                                                    42


             day
             fun
       classroom
            ideas
            make
Word         love
        students
             kids
            great
             free
               0.000   0.005    0.010         0.015        0.020       0.025           0.030
                                             Proportion (%)
                                       (a) predicted-as-teachers
          perfect
            cake
             one
           dress
             free
   Word   chicken
            make
            easy
             love
           recipe
               0.000   0.005   0.010        0.015       0.020      0.025       0.030       0.035
                                              Proportion (%)
                                  (b) predicted-as-non-teachers
Figure 3.10: The top 10 words of pin descriptions of unlabelled users classified by PUTeacher.
                                                  43


                         bloglovin.com
           options-trading-mastery.com
                           indulgy.com
                        instagram.com
                         buzzfeed.com
     Domain                  flickr.com
                          youtube.com
                              etsy.com
                      Uploaded by user
               teacherspayteachers.com
                                     0.00   0.01          0.02           0.03       0.04         0.05      0.06          0.07          0.08
                                                                                Proportion (%)
                                                                    (a) predicted-as-teachers
                         polyvore.com
       spaceshipsandlaserbeams.com
                 hellopeacefulmind.com
                         buzzfeed.com
                   mymommystyle.com
  Domain                  youtube.com
                        instagram.com
                             flickr.com
                              etsy.com
                      Uploaded by user
                                     0.00          0.02           0.04           0.06               0.08          0.10          0.12
                                                                                Proportion (%)
                                                                 (b) predicted-as-non-teachers
                 Figure 3.11: The top 10 domains of pins of unlabelled users classified by PUTeacher.
As mentioned before, the main goal of PUTeacher is to predict the class of unlabelled users, which
reflects the practical scenario of how PUTeacher would be used. Hence, we used PUTeacher
and predicted the class of unlabelled users (i.e., X 𝑢 ).5 Now, we qualitatively assess two sets of
predicted users: predicted-as-teachers and predicted-as-non-teachers. To make this assessment
comparable, we shuffled predicted users and randomly select 5,000 predicted-as-teachers and 5,000
predicted-as-non-teachers. We retrieved the top 10 topics, 10 words (from pin descriptions), and 10
domains for predicted-as-teachers and predicted-as-non-teachers, as demonstrated in Figures 3.9,
3.10, and 3.11, respectively. We make the following observations based on these results.
              • We can observe from Figure 3.9a that, as expected, education is the predominant topic of
                 pins for predicted-as-teachers while it is not even in the list of top 10 topics for predicted-as-
                 non-teachers.
       5 Recall                that we do not have ground truth labels for these users.
                                                                                44


 • As far as the top 10 words are concerned, we can observe an interesting pattern. Almost
   all 10 words of pins belonging to predicted-as-teachers are somehow related to education
   and teaching, e.g., kids, students, classroom. In contrast, words associated with pins of
   predicted-as-non-teachers are related to other areas such as cooking– Note some words like
   recipe, chicken, and cake in Figure 3.10b.
 • The word ‘free’ is the first and sixth frequently used word for predicted-as-teachers and
   predicted-as-non-teachers, respectively. We believe the main reason is as the following.
   First, Pinterest is widely used for business and marketing purposes. Given this, sometimes
   users mention the word free to indicate that their shared resources are free of charge explicitly,
   e.g., free lessons on how to do photography in nature. Hence, they probably mention the word
   free to help propagate their resources. In particular, teachers mention free in pin descriptions
   to attract the attention of their fellow teachers to their curated resources.
 • The word ‘ideas’ is among the top 10 frequently used terms for pin descriptions of predicted-
   as-teachers. We believe such a significant emphasis of teachers on ‘ideas’ in their curated
   resources speaks to the distinct pattern employed by teachers in leveraging social media,
   in particular Pinterest, where they look for novel and innovative teaching ideas to supple-
   ment their educational resources. Such resources might be otherwise unavailable in their
   curriculum-based resources [4, 28, 34, 46, 91, 92].
 • Another noteworthy word for predicted-as-teachers is ‘fun’. We think the main reason for
   this word is that teachers tend to make their curated educational resources more engaging,
   e.g., Fun counting coins games for first grade and second grade students.6
 • Ultimately, regrading the top 10 domains of pins, it is interesting to observe that teachers-
   payteachers.com is the top domain for predicted-as-teachers. teacherspayteachers.com is
   an online marketplace connecting millions of teachers and containing more than 5 million
   educational resources.
6 https://www.pinterest.com/pin/27443878969323415/
                                                45


    From these observations, we can conclude that 1) PUTeacher can reliably identify teachers
when it is applied to unlabelled (unseen) samples on Pinterest, and 2) the way that teachers
leverage Pinterest for resource curation makes them outstanding among other Pinterest users, i.e.,
non-teachers.
3.5.5     State Representativeness
In this part, we shed light on the resiliency of PUTeacher from the perspective of the U.S. states of
users in our dataset. First, we present the distribution of the U.S. states of users in our dataset. Then,
we specify whether overrepresentation by certain states affects the generalization of PUTeacher to
users from underrepresented states. Eventually, we present the distribution of the U.S. states for
teachers automatically identified by PUTeacher.
3.5.5.1       The U.S. State Distribution
              Indiana                                                                                    1281
                 Texas                                                                              1188
            Michigan                                                                   790
            California                                                      610
                Illinois                                            502
            New York                                            392
               Florida                                     334
                   Ohio                                229
       North Carolina                              197
         Washington                               191
              Georgia                            189
           Tennessee                          159
            Colorado                          158
              Arizona                     119
              Virginia                    117
               Kansas                    108
        Pennsylvania                    106
           Wisconsin                    104
           Minnesota                    103
       South Carolina                   102
             Missouri                  98
              Oregon                   96
           Oklahoma                   88
          New Jersey                  88
            Kentucky                 83
                   Utah              82
       Massachusetts                 81
             Alabama                78
            Louisiana               74
            Maryland               67
                   Iowa            66
              Nevada             48
         Connecticut            44
           Mississippi          39
               Hawaii          30
                 Idaho        29
                 Maine        27
        West Virginia         27
            Nebraska          24
      New Hampshire          22
         New Mexico          21
             Montana         17
            Delaware        14
             Vermont        13
          Puerto Rico       13
            Arkansas        12
            Wyoming        11
               Alaska      10
        South Dakota       10
        North Dakota       9
        Rhode Island       7
     Washington, D.C.      6
                         0                      200            400       600          800  1000   1200
                                                                                Count
                         Figure 3.12: The distribution of the U.S. states for users in our dataset.
                                                                        46


12,302 users in our dataset (around 14%) have shared their locations. We processed the strings of
shared locations and managed to extract the U.S. states for 8,313 users. The state distribution for
these users is shown in Figure 3.12. As can be seen in this Figure, the majority of teachers come
from Indiana, Texas, and Michigan (around 40% combined). The main reason is that our surveyed
teachers were sampled from four Midwest states (including Michigan, Indiana, Illinois, and Ohio)
and Texas– See Figure 2.5. Consequently, those who are friends with these teachers probably come
from the same states. Recall from Section 2.2 that other users in our dataset are Pinterest followers
and followees (i.e., friends) of the surveyed teachers.
3.5.5.2   The U.S. State Generalization of PUTeacher
                               1.0
                               0.8
          True positive rate
                               0.6
                               0.4
                               0.2
                                                                                         Low (0.96)
                                                                                         High (0.91)
                               0.0                                                       Middle (0.92)
                                     0.0   0.2   0.4                         0.6   0.8           1.0
                                                       False positive rate
Figure 3.13: The ROC curves of PUTeacher’s performance for three levels of state representative-
ness. Numbers in the parentheses are AUC scores.
   Now the question is, does this state overrepresentation affect the performance of PUTeacher
when it is used to predict the class of teachers in underrepresented states? To answer this question,
we assessed the performance of PUTeacher on the test set users for three levels of the U.S. state
representativeness, namely over-representative, middle-representative, and under-representative
states. Following the top-down order of states in the y-axis of Figure 3.12, over-representative
states include Indiana, Texas, and Michigan. Middle-representative states are from California to
                                                           47


South Carolina. Ultimately, under-representative states include the rest of the states from Missouri
to Washington D.C. Figure 3.13 demonstrates the results of this experiment. Interestingly, we can
observe that despite the overrepresentation by several states, PUTeacher delivers an excellent AUC
score for middle-representative and under-representative states. Surprisingly, the performance for
under-representative states is even better than the other two cases. From this experiment, we can
conclude that PUTeacher is entirely robust against the underrepresentation in terms of the U.S.
states of users (including teachers).
3.5.5.3          The U.S. State Distribution of Automatically Identified Teachers
               Texas                                                                              224
            Indiana                                                                  158
          Michigan                                                           119
          California                                                  99
             Florida                                             75
              Illinois                                        64
                 Ohio                                      59
            Georgia                                      56
     North Carolina                                    53
          New York                                  49
         Wisconsin                              33
         Tennessee                             31
          Colorado                             31
        New Jersey                            30
       Washington                          25
         Minnesota                         25
     South Carolina                        25
            Arizona                      22
            Virginia                     22
      Pennsylvania                      21
           Alabama                      21
         Oklahoma                       21
     Massachusetts                     20
           Missouri                    20
                 Iowa                 19
          Maryland                   18
          Louisiana                  18
             Kansas                  18
            Nevada                  16
       Connecticut                  16
          Kentucky                 15
            Oregon                14
          Arkansas               13
          Nebraska             9
                 Utah          9
               Maine           9
         Mississippi           9
           Vermont            8
    New Hampshire             8
       New Mexico            7
               Idaho       5
          Delaware        4
          Wyoming         4
      West Virginia       4
      North Dakota        4
             Hawaii 3
      South Dakota 3
      Rhode Island       3
           Montana 3
   Washington, D.C. 2
             Alaska 1
                       0                           50               100          150     200
                                                                         Count
           Figure 3.14: The distribution of the U.S. states of automatically identified teachers.
    Finally, it is worthwhile to look into the U.S. state distribution of automatically identified
teachers. To this end, similar to Section 3.5.4, we applied PUTeacher to all unlabelled users
and considered a user as a teacher if their probability of being a teacher is larger than 0.9 i.e.,
according to the notation in Section 3.3.3, 𝑦𝑖𝑡 > 0.9. From those predicted as teachers, we managed
to extract the U.S. states for 1,769 teachers. Figure 3.14 shows the U.S. state distribution of these
                                                                      48


1,769 automatically identified teachers. Similar to the U.S. state distribution of the entire dataset
demonstrated in Figure 3.12, there are high numbers of teachers from Indiana, Michigan, and
Texas. Again, this is because most of the surveyed teachers come from these three states, and
consequently, their online teacher friends likely come from the same states. Despite this, we can
observe that PUTeacher has been able to identify teachers from all states, including Washington
D.C. This indicates the effectiveness of our proposed method and the widespread usage of Pinterest
by teachers.
                                                 49


                                              CHAPTER 4
                  GENDER ANALYSIS OF TEACHERS ON SOCIAL MEDIA
4.1      Introduction
     Despite being a controversial topic, it is well known that there is a gap between male and female
students when it comes to educational achievements [38]. Some studies attempting to determine
this gap has focused on the role of teacher’s gender. The general motivation behind this focus is
that the academic environment created by teachers can significantly influence the way students see
themselves as learners/students. Then, arguably, the teacher’s gender has a significant role in the
dynamic of this environment. For instance, students who have been discouraged from participation
in classroom activities based on their gender have shown to be uniquely disadvantaged [112].
Moreover, some studies have explicitly investigated the difference in male and female teachers’
treatment of boy and girl students [39, 40, 41, 42]. Although the evidence for the relationship
between same-gender teachers and improvement in students’ achievement is arguable [39], some
still find same-gender teachers educationally relevant. The main reason is that same-gender teachers
can affect engagement via perpetuating the role model effect and stereotype threat [42]. Regarding
the latter, several studies have shown that male teachers face societal prejudice and judgment for
violating gender stereotypes [113] such as the fear of being accused of inappropriate contact with
students [114, 115] or being labeled as “weird”, “gay”, or “weak” [116]. Given this, perhaps it is no
surprise to know that more than 75% of teachers in the U.S. are females (this number is around 82%
for elementary school teachers) [117]. A similar trend persists in most other countries, especially
in the Western nations [118]. In addition, it is worth mentioning that there have been efforts to
recruit more male teachers due to the lack of male role models for boy students, otherwise knows as
the decline of masculinity [119, 120]. Finally, the discussion about the impact of teachers’ gender
on the quality of education has extended beyond the academic literature, where teacher’s gender
among their other demographic attributes (e.g., race) have been the focus of many discussions
                                                    50


among policymakers, parents, students, and other educational stakeholders.
    From the brief discussion above, we can infer that the current studies concerning the gender
of teachers assume either: a) the teacher’s gender has an innate value, e.g., the role model effect
or b) the behavioral difference in male vs. female teachers is what indeed matters, e.g., the way
teachers manage the classroom or prepare materials [43]. The former is beyond the scope of this
dissertation, and thus this chapter focuses on the latter, where we attempt to determine differences
and similarities in male and female teachers’ behavior. However, unlike current studies, we focus
on analyzing the behavior of male and female teachers through the lens of online social media
where, as discussed in Chapter 1, nowadays plays a significant role in teachers’ professional career
development and is reshaping the entire teaching profession. Moreover, teacher gender analysis
using online social media data compared to traditional educational data (e.g., surveys or interviews)
have several advantages such as the fast and accessible data, less selection bias, and larger sample
size (refer to Chapter 1).
    Hence, in this study, we perform an exploratory analysis of male and females teachers on
Pinterest. We focus only on teachers in the U.S. One of the main challenges to perform such a study
is having a representative sample of male and female teachers whereby we mean the percentage of
male and female teachers should be as close as possible to that of the general population of teachers
in the U.S, i.e., 76.5% and 23.5% for female and male teachers (89% and 11% for elementary school
teachers), respectively [117]. Moreover, to effectively perform a quantitative data-driven analysis,
the sample size should be sufficiently large. Using only our surveyed teachers does not satisfy these
two criteria. First, the percentage of male and female teachers in the surveyed teachers is 2.95%
and 97.05%, respectively, which significantly differs from that of the teacher population in the U.S.
Second, using only 540 teachers on Pinterest might bring into question the statistical significance
of any quantitative analysis. Thanks to the automatic teacher identification framework proposed in
Chapter 3, however, we can address these challenges. As will be presented in Section 4.2, using
this framework, we can automatically identify thousands of teachers on Pinterest while their gender
distribution matches that of the U.S. population of teachers. Using this dataset at our disposal, we
                                                 51


study male and female teachers on Pinterest from two crucial aspects, as briefly discussed in the
following.
    First, we investigate various online activities of male and female teachers, e.g., topics of pins,
domains of pins, the number of boards. The motivation for this type of analysis is to understand
male and female teachers through their resource curation process. Second, we investigate male
and female teachers in the context of the social network (graph) they belong to, e.g., comparing
the centrality of male and female teachers, determining gender homophily. The performed social
network analysis complements online activity analysis by examining how male and female teachers
are connected in a social network.
    The novel analysis presented in this chapter sheds light on an unexplored area of research –male
and female teachers on social media– which we believe has a great potential in fostering further
research as it illuminates an important part of teachers’ professional life in the information age (i.e.,
social media). From a practical perspective, our analysis and findings in this chapter can serve as
a useful reference for many entities concerned with teachers’ gender, e.g., educational scientists,
policymakers, principals, state-level and national-level institutes. Moreover, as will be presented,
some of the findings of this chapter are generalizable to all teachers regardless of their gender. In
summary, our contributions in this chapter are as follow:
     • Using our previously developed automatic teacher identification framework (Chapter 3), we
       build a large and representative sample of male and female teachers on Pinterest.
     • To the best of our knowledge, this is the first data-driven study that analyses the teachers in
       social media concerning their gender.
     • Our analysis considers two crucial aspects of teachers in social media and further informs us
       how teacher’s gender plays a role in these aspects.
    The rest of this chapter is organized as follows. First, in Section 4.2, we explain how we set
up the dataset. Afterward, the online activity analysis is presented in Section 4.3, followed by the
social network analysis in Section 4.4.
                                                  52


4.2     Dataset
    In this section, we present how we constructed our dataset of male and female teachers. First, we
discuss how we used our automatic teacher identification framework to incorporate more teachers
in our teacher gender investigation. Afterward, we explain how we specified the gender of teachers,
and finally, discuss the privacy concerns and taken measures to address these concerns.
4.2.1    Employing Automatic Teacher Identification
             Pinterest Network
                                                                                  Teacher
                                   T                                                          (p)
      T
                                                                  Automatic
                      ?                                 ?          Teacher
                                          ?                     Identification
                                                                                            (1-p)
          T                                                                     Non-teacher
                                ?
    T      Labelled/Surveyed Teacher               ?   Unknown User
Figure 4.1: An overall illustration of our proposed automatic teacher identification approach
(PUTeacher) presented in Chapter 3.
    As mentioned in Section 4.1, two major data-related challenges facing our exploratory analysis
of male and female teachers on Pinterest are 1) an insufficient number of teachers and 2) representa-
tiveness of teachers regarding their gender distribution. Using only our surveyed teachers does not
resolve these challenges. While surveying more teachers seems like an immediate option, needless
to say, that surveying is time-consuming, costly, and cumbersome. Hence, it is highly beneficial to
devise a method that can automatically identify teachers on Pinterest. Fortunately, in Chapter 3, we
proposed such an approach. For reference, we have included an overall illustration of our teacher
                                                  53


identification framework (PUTeacher) in Figure 4.1. To recall, as the input, PUTeacher takes the
data of an unlabelled user, i.e., a Pinterest user connected to the surveyed teachers. As the output,
it yields the probability that the user is a teacher, which is denoted as 𝑝 in Figure 4.1. Obviously,
1 − 𝑝 would be the probability of being a non-teacher.
                                            22000
   Number of Automatically Identified Teachers
                                            20000
                                            18000
                                            16000                                                                 (0.9, 15978)
                                            14000
                                            12000
                                            10000
                                                      0.5            0.6           0.7           0.8             0.9             1.0
                                                                                     Threshold τ
                                                 Figure 4.2: The number of identified teachers for different values of threshold 𝜏.
           Since the output of our automatic teacher identification approach is a probability distribution,
we can set a threshold 𝜏 to specify the classification outcome, where if 𝑝 > 𝜏, the user will be
considered a teacher and otherwise a non-teacher. In Figure 4.2, we have plotted the number of
automatically identified teachers when 𝜏 changes from 0.5 to 1. As shown in this figure, even being
as conservative as 𝜏 = 0.9, we can identify around 16,000 teachers on Pinterest, which significantly
enlarges the sample size and thus addresses the first challenge mentioned above. In this study, we
set the threshold to 0.9 to ensure high reliability in included users.
                                                                                        54


    Remark. One might speculate that there might be some non-teachers among automatically
identified teachers. While this can be possible, we believe it does not drastically affect the subsequent
analyses in this chapter due to the following reasons. First, our rigorous evaluation in Chapter 3
revealed that the error in our method is very small. In particular, in Section 3.5, we performed
a thorough resiliency analysis of PUTeacher and ensure it is a robust and reliable approach for
automatic teacher identification on Pinterest. Second, the impact of a small number of incorrectly
identified users will be “smoothed out" by a large number of correctly identified teachers and thus
would not harm the generalizability of our results.
4.2.2   Gender Identification
Now, we have access to a large set of teachers on Pinterest; we need to a) identify their gender and
b) keep only teachers who reside in the U.S. To do so, we perform the following three steps one by
one.
    Step 1. As mentioned in Chapter 2, the Pinterest API provided us with the self-declared gender
of users– See Table 2.3. We only kept users whose recorded genders are specified (i.e., “Male” or
“Female”) and excluded “Unspeccified” ones.
    Step 2. To further ensure that genders recorded in our dataset are correct, we utilized a secure
and reliable commercial tool named Gender API.1 For a given first name, Gender API determines
its gender. It additionally provides an accuracy value in the range [0, 1] specifying the certainty
in the gender determination. Therefore, we passed all the first names of users from Step 1 to
this program. Afterward, we applied two filtering operations. First, we excluded users whose
corresponding accuracy acquired from Gender API is less than 0.8. Second, we kept only users
whose genders from Pinterest API and Gender API match, i.e., both are male or female.
    Step 3. As mentioned before, we need to restrict our analysis to teachers in the U.S. To this
end, we included teachers whose field of country is “US” –See Table 2.3.
    1 https://gender-api.com/
                                                  55


    We believe Step 1 and Step 2 combined offer a robust and reliable way of the gender specification
of users. Moreover, Step 3 ensures including only teachers in the U.S. We also excluded users
who had less than 20 pins in their accounts since they were very inactive on Pinterest. Table 4.1
demonstrates some basic statistics about our dataset after the above steps. Given this dataset of male
and female teachers, we can assert that the second challenge (i.e., the gender representativeness)
has been alleviated drastically. More specifically, compared to the surveyed teachers (2.95% male
and 97.05% female), our new set of teachers (88% female and 12% male) is significantly more
similar to the overall gender distribution of teachers in the U.S. (76.5% female and 23.5% male).
Furthermore, the percentages of males and females in our dataset, i.e., 88% female and 12% male,
perfectly match percentages of American elementary school teachers, i.e., 89% female and 11%
male. This is particularly important since it has been shown that most teachers on Pinterest are
elementary school teachers [117]. It is worth mentioning that the overrepresentation of female
teachers on Pinterest is driven by two major factors: a) as mentioned before, the majority of
teachers in the U.S. are female [117], and b) most of Pinterest users are female (around 77% [121]).
In fact, Pinterest has been referred to as ‘feminine’ social media [122, 123, 124].
    In addition to the number of users, Table 4.1 shows the statistics of several basic attributes about
male and female teachers, e.g., the number of pins, the number of boards. In the remainder of
this chapter, we will provide a rigorous analysis of these attributes. Finally, we emphasize that our
constructed dataset of male and female teachers is the largest dataset of teachers in social media,
with available gender information.
                                                  56


         Table 4.1: Basic statistics of our constructed dataset of male and female teachers.
                                    Female Teachers       Male Teachers           Total
                   #Users              11,675 (88%)          1,592 (12%)         13,267
                    #Pins           67,705,475 (84%)     13,026,307 (16%)     80,731,782
                  #Boards             762,669 (88%)        102,986 (12%)        865,655
                #Followees            975,775 (82%)        209,165 (18%)       11,84,940
           #Followees (unique)         61,016 (70%)         25,940 (30%)         86,956
                #Followers            738,326 (70%)        308,403 (30%)       1,046,729
            #Followers (unique)        69,288 (68%)         33,036 (32%)        102,324
                 #Friends            1,714,101 (82%)       517,568 (18%)       2,231,669
             #Friends (unique)         81,813 (67%)         40,120 (32%)        121,933
4.2.3   Privacy Concerns
Dealing with the demographic information of human subjects (acquired either through surveys or
online social media) is not without privacy concerns. Nonetheless, we have been fully aware of
these concerns and ensured they are appropriately addressed, as explained in the following.
    • Pinterest data is publicly available, and we used authorized Pinterest API to acquire this data.
      Notwithstanding this, only authorized individuals have had access to this data.
    • Regarding using the Gender API tool, we only submitted users’ first names to this tool. In
      addition, we carefully reviewed the privacy statement of this tool2 and ensured it is in line
      with our guidelines.
    • Although we could have proceeded with using the Gender API tool to determine the gender
      of users whose value in our dataset is “Unspecified”, we respected their decision in disclosing
      their gender and thus excluded those users in the analysis of this Chapter.
    • Sharing our dataset with the scientific community can further advance the research in PK-12
      education in general and teachers in social media in particular. Nevertheless, to protect the
      privacy of individuals, this sharing will only be possible upon proper communication and
      further institutional approval.
   2 https://gender-api.com/en/privacy-policy
                                                  57


4.3        Online Activity Analysis
   In this section, we present a set of investigations related to the online activities of male and
female teachers. These activities are primarily associated with pins and boards curated by teachers
on Pinterest. We aim to delineate similarities and differences in how male and female teachers
perform their online activities. In Section 4.3.1, we investigate the pin and board curation rate
of male and female teachers. Sections 4.3.2 and 4.3.3 investigate the topics and domains of pins
curated by male and female teachers, respectively. Afterward, in Section 4.3.4, look into the
language used by male and female teachers to describe their pins and name their boards. Finally,
in Section 4.3.5, we investigate how male and female teachers interact with Pinterest over the time.
4.3.1          Resource Curation Rate
           1.0                                                      1.0
                                         Female Teachers                                       Female Teachers
                                         Male Teachers                                         Male Teachers
           0.8                                                      0.8
           0.6                                                      0.6
      P(X>x)                                                   P(X> )
           0.4                                                      0.4
           0.2                                                      0.2
           0.0                                                      0.0 −2
                 10−2      10−1      100      101       102            10        10−1       100        101
                        Normalized n mber of pins (X)                        Normalized number of boards (X)
                             (a) Pinning rate                                  (b) Board curation rate
               Figure 4.3: The CCDF of the number of pins and boards. x-axes are in log scale.
   As its name suggests, Pinterest is all about pins. Also, as mentioned before, pins on Pinterest
are organized in boards. Hence, our first online activity analysis is concerned with the number
of pins and boards generated by male and female teachers. Figure 4.3a and Figure 4.3b show the
cumulative distribution function (CCDF) of the number of pins and boards, respectively. Since not
                                                              58


all teachers have joined Pinterest at the same time, we need to consider the duration of an account.
To this end, we divided the number of pins and boards by the number of days and weeks from the
time a teacher has joined Pinterest to their last pin and board curation time, respectively. The reason
for normalizing the number of pins and boards by different scales (days vs. weeks) is to account for
the faster rate of pin curation than board curation. As shown in Figure 4.3a, the pinning rates for
males and females are very similar, while for very active users (the tail of the distribution), male
teachers tend to generate more pins. We also found almost identical distributions of the numbers
of boards for male and female teachers (Figure 4.3b). This is in line with [125], which investigated
the board creation rate of the general population of male and female Pinterest users and showed
that their distributions are very similar.
     1.0                                                        1.0
                                                                                           Female Teachers
                                                                                           Male Teachers
     0.8                                                        0.8
     0.6                                                        0.6
P(X>x)                                                     P(X>x)
     0.4                                                        0.4
     0.2                                                        0.2
                 Female Teachers
                 Male Teachers
     0.0                                                        0.0
                 10−2          100                   102                        10−2          100          102
             Normalized number of re   -pins   (X)                       Normalized number of non-repins (X)
                   (a) Rate of re-pins                                        (b) Rate of non-repins
                  Figure 4.4: The CCDF of re-pins and non-repins. x-axes are in log scale.
         A pin on a user’s account can be either a re-pin of someone else’s pin or an original pin curated
by the same user, which we call non-repin. Note that non-repin does not necessarily mean the user
has created the pin’s content (e.g., image). Instead, it simply means the user has not obtained it
from another Pinterest user. To determine how male and female teachers behave regarding the
re-pinning, Figure 4.4a and Figure 4.4b illustrate the CCDF of the number of re-pins and non-
repins, respectively. In terms of re-pinning, we can observe from Figure 4.4a that both male and
                                                                    59


female teachers behave very similarly. However, as far as the number of non-repins is concerned,
we can observe from Figure 4.3 that male teachers tend to curate more non-repins (original pins)
than female ones. To put it from a different perspective, female teachers on Pinterest are more
“receptive” to sharing others’ resources than male teachers. This is in line with [83] wherein the
authors showed that female users tend to participate more in re-pinning than male users.
4.3.2   Topic of Pins
As discussed in Chapter 2, previous studies have approached teachers in social media from several
different theoretical frameworks. One of these theories, which is pertinent to our investigation in
this part of the dissertation, is affinity space [76] which has been the basis of several teachers in
social media studies [69, 70, 75, 126]. In the context of teachers in social media, affinity space
means teachers leverage social media to interact with each other about certain topic(s) relevant
to their profession. Hence, motivated by the affinity space theory, in this part, we perform an
in-depth investigation of teachers’ topics of interest in their resource curation process. Since the
main focus of this Chapter is the gender analysis of teachers, we conduct our investigation and
present the findings while considering teachers’ genders. As far as topics are concerned, luckily,
Pinterest supports a set of pre-defined topics covering a wide range of categories such as art,
animal, travel, education. A pin can belong to any of the existing 33 topics or ‘others’, i.e., in total,
we have 34 topics.3 Since these topics essentially encode the inherent users’ preference, previous
Pinterest-based studies have incorporated them into their user behavior analysis [86, 83, 125].
4.3.2.1    Top Topics
Figure 4.5 demonstrates the average percentage of each topic for male and female teachers. We
make the following observations based on this figure.
    3 Both the supported topics and their numbers have slightly changed during the past few years.
Hence, the topics in our dataset might be different from those in previous studies or future ones.
                                                   60


                      education
                     food_drink
                      diy_crafts
                   home_decor
             womens_fashion
                         quotes
               holidays_events
                   hair_beauty
                      weddings
                             kids
                 health_fitness
                          humor
                           travel
                      gardening
                        animals
                              art
                  photography
Topic                     others
            film_music_books
                         tattoos
                          sports
                         history
                     celebrities
                            geek
                       outdoors
                       products
                        for_dad
                   architecture
                science_nature
            cars_motorcycles
                          design
                    technology                                                                                 Male Teachers
                 mens_fashion
                                                                                                               Female Teachers
        illustrations_posters
                                    0            10            20                30         40            50
                                                                         Proportion (%)
                                    Figure 4.5: The average proportion of topics for male and female teachers.
 • A crucial pattern that is immediately noticeable is that education is the predominant topic for
          both gender groups. Regarding our discussion about the affinity space, this signifies that Pinterest
          acts as a proper affinity space for teachers since they have leveraged Pinterest as a medium to
          focus on a specific topic relevant to their profession, i.e., education.
 • Although the predominant topic for both gender groups is education, the percentage of education
          is higher for male teachers (55.57%) than female teachers (43.87%). This indicates that male
          teachers have focused more on educational resources while female teachers have explored other
          types of pins as well. This can be further inferred by looking at the other top topics, e.g.,
          food_drink, diy_crafts, wherein female teachers have higher contributions than male teachers. In
          the subsequent analysis of this part, we dig deeper into this difference.
 • The top four topics cover 78.21% and 83.02% of pins for female and male teachers, respectively.
          These skewed distributions indicate the existence of the power-law phenomenon in the topic
          distribution. Previous studies have observed a similar pattern regarding the topic distribution on
          Pinterest [83, 86, 125].
                                                                       61


 • Excluding education, male and female teachers’ preferences in some other topics match what
         previous studies have identified. For instance, food_drink and diy_crafts are of the high interest
         for both male and female Pinterest users [86, 125].
                          others                                                                    Male Teachers
                        animals
                                                                                                    Female Teachers
                   architecture
                              art
            cars_motorcycles
                     celebrities
                          design
                      diy_crafts
                      education
            film_music_books
                     food_drink
                      gardening
                            geek
                   hair_beauty
                 health_fitness
                         history
Topic          holidays_events
                   home_decor
                          humor
        illustrations_posters
                             kids
                 mens_fashion
                       outdoors
                  photography
                       products
                         quotes
                science_nature
                          sports
                         tattoos
                    technology
                           travel
                      weddings
             womens_fashion
                                    0     10               20                    30            40
                                                                Proportion (%)
                    Figure 4.6: Average proportion of topics of non-repins for male and female teachers.
             To deepen our understating of the distribution of topics, we computed the contribution of male
and female teachers in each topic where we included only non-repins. The result is shown in
Figure 4.6. Consistent with what was presented in Section 4.3 and the previous studies on the
behavior of male and female Pinterest users [83, 84], for all topics, male teachers are more active
in pinning than re-pinning. Moreover, both male and female teachers have made more non-repins
in topics for which the overall number of pins is low, e.g., illustrations_posters. In other words,
teachers curate the primary resources of their interest (i.e., educational pins) by taking them from
other teachers and other Pinterest users. However, they locate pins related to their secondary
interests (i.e., non-educational pins) on the web or directly upload them from their device.
                                                            62


4.3.2.2   Topic Entropy
For the chart demonstrated in Figure 4.5, we combined all pins curated by a gender group and
then calculated the proportion of each topic. This reveals the overall behavior of male and female
teachers regarding topics. Nevertheless, it does not inform us how an individual teacher behaves
with respect to the topics of the pins they have curated. Therefore, to acquire such information,
similar to previous studies [86, 83, 125], we utilize the notion of topic specialization. A completely
topic-specialized user merely sticks to a single topic while a less topic-specialized user contributes
to various topics. To quantify the topic specialization, we introduce the topic entropy. Suppose we
have 𝑘 topics T = [𝑡1 , 𝑡2 · · · 𝑡 𝑘 ] (𝑘 = 34 in our dataset). Further, for a given user, let 𝑝 𝑡𝑖 denote the
fraction of their whose topic is 𝑡𝑖 . Then, the topic entropy (𝑇 𝐸) of a user 𝑢 is defined as follows:
                                                       𝑚   Õ
                                         𝑇 𝐸 (𝑢) = −     ×   𝑝 𝑡𝑖 × 𝑙𝑛( 𝑝 𝑡𝑖 )                          (4.1)
                                                       𝑘
                                                             ∀𝑡𝑖 ∈T
where 𝑚 is the number of topics wherein the user has at least one pin. The term 𝑚𝑘 smooths out
the effect of the natural logarithm in the entropy formulation by accounting for the total number
of topics. 𝑇 𝐸 is in the range [0, 𝑙𝑛(𝑘)] and the closer 𝑇 𝐸 is to 0 (𝑙𝑛(𝑘)) the more (less) topic-
specialized a user is.
                                     1.0
                                                                      Female Teachers
                                                                      Male Teachers
                                     0.8
                                     0.6
                                P(X>x)
                                     0.4
                                     0.2
                                     0.0
                                           0.0   0.5     1.0      1.5     2.0   2.5
                                                        Topic Entropy (X)
                         Figure 4.7: The CCDF of the topic entropy (Eq. 4.1).
                                                             63


     Figure 4.7 shows the CCDF of the topic entropy for male and female teachers. As it can
be observed from this figure, female teachers exhibit a smaller degree of topic specialization.
Ottoni et al. [125] and Chang et al. [86] discovered a similar finding regarding female Pinterest
users. Moreover, several psychological/medical studies have demonstrated that men are more
focused than women while women are better at multi-tasking than men [127, 128]. Perhaps our
finding regarding the difference in the topic specialization between male and female teachers (and
even males and females on social media) can be linked to these psychological/medical studies.
Nevertheless, this connection needs further rigorous analysis, and we leave it for future work.
       (a) Male & female teachers          (b) Female teachers               (c) Male teachers
                      Figure 4.8: The topic entropy based on the number of pins.
     One might wonder whether the topic specialization for a user is related to the number of their
pins or not. For instance, we might expect a higher degree of topic specialization for a user with
more pins since potentially more topics could be covered. Hence, we investigated the relationship
between the topic entropy and the number of pins, as demonstrated in Figure 4.8. Figure 4.8a shows
a case where we combined the data of male and female teachers. To exclude the gender from the
relationship between the topic entropy and the number of pins, we investigated this relationship for
each gender group separately. Specifically, Figures 4.8b and 4.8c show the topic entropy vs. the
number of pins for female teachers and male teachers, respectively. For each chart, we also included
a fitted regression line between the topic entropy and the number of pins and the Pearson correlation
between these two variables. For the male and female teachers combined shown in Figure 4.8a, we
can observe that as the number of pins increases, the topic entropy exhibits moderate growth, and
the fitted line enjoys a positive slope. Also, the correlation is 0.28 with the p-value of almost zero,
                                                   64


which is statistically significant.4 For female teachers, the trend is similar to that of Figure 4.8a,
where we observe a positive correlation (c=0.38) between the topic entropy and the number of
pins. Also, this correlation is statistically significant (p=0.0). Nevertheless, the story is different
for male teachers. According to Figure 4.8c, there is no correlation between the topic entropy and
the number of pins for male teachers. However, since the p-value is large (p=0.24), we cannot
categorically assert no correlation. This motivated us to dig deeper into the relationship between
the topic entropy and the number of pins. To this end, we discretized the number of pins into
three distinct ranges for both male and female teachers. Afterward, we determined the relationship
between the topic entropy and the number of pins in each range. Next, we present the results of
these experiments.
        (a) Range [20, 1461]                          (b) Range [1462, 5762]            (c) Range [5762, 244333]
                                   2.5
                                   2.0
                       Topic Entropy
                                   1.5
                                   1.0
                                   0.5
                                   0.0
                                         [20, 1461]           [1462, 5762]     [5772, 244333]
                                                           Pin number ranges
                    (d) Boxplots of the topic entropy for each range of the number of pins
Figure 4.9: The topic entropy for male teachers across three distinct ranges of the numbers of pins.
   4 Hereafter,   we consider p < 0.05 as statistically significant.
                                                                65


        (a) Range [20, 1716]                        (b) Range [1716, 5067]            (c) Range [5068, 173103]
                                 2.5
                                 2.0
                     Topic Entropy
                                 1.5
                                 1.0
                                 0.5
                                 0.0
                                       [20, 1716]           [1716, 5067]     [5068, 173103]
                                                         Pin number ranges
                  (d) Boxplots of the topic entropy for each range of the number of pins
Figure 4.10: The topic entropy for female teachers across three distinct ranges of the numbers of
pins.
   Figure 4.9 illustrates the results for male teachers.5 Here, we can observe an interesting pattern:
in the low range of the number of pins, i.e., in Figure Figure 4.9a, when the number of pins
increases, male teachers tend to try their hand in different topics, thus increasing the topic entropy.
However, for the middle range, shown in Figure 4.9b, the correlation, while still being positive,
drops significantly. Eventually, for the high range of the number of pins demonstrated in Figure 4.9c,
the correlation becomes negative, which indicates that male teachers tend to become more topic-
specialized. Hence, overall, we can conclude that the more prolific a male teacher becomes, the
more he focuses on specific topics, i.e., becoming more topic-specialized. This can be observed
from the boxplots in Figure 4.9d as well.
   5 The first ranges start from 20 since we filtered out those teachers with less than 20 pins in their
accounts– See Section 4.2.
                                                              66


    We conducted the same experiment for female teachers and the results are shown in Figure 4.10.
Similar to male teachers, by moving from the lowest range of the number of pins (Figure 4.10a) to
the highest range of the number of pins (Figure 4.10c), the correlation between the topic entropy
and the number of pins drops significantly. However, unlike the male teachers’ case, the correlation
always remains positive regardless of the number of pins. This shows that prolific female teachers,
similar to prolific male teachers, tend to focus on specific topics (being more topic-specialized)
while, compared to males, the extent of this specialization is consistently smaller.
4.3.2.3    Topic Oscillation
Teachers might exhibit topic variation in the sequence of their pins over time. Capturing this vari-
ation can help us understand how much teachers stay on-topic while curating resources. However,
the topic entropy (Eq. 4.1) cannot capture this variation since it calculates the entropy in a set
of pins without considering the sequential order between them. More specifically, a user can be
topic-specialized (i.e., focusing on certain topics) while still frequently drifting in these topics. To
fix the idea, consider a simple example where a user has curated 8 pins having this sequence of
topics [𝑐 1 , 𝑐 2 , 𝑐 1 , 𝑐 2 , 𝑐 1 , 𝑐 2 , 𝑐 1 , 𝑐 2 ]. Further, suppose the total number of topics is 10 (𝑘 = 10). 𝑇 𝐸
for this user is 0.13. Compared to the maximum value of the topic entropy, i.e., 2.30,6 this is a low
value for the topic entropy of a user. However, while this user is topic-specialized to a large degree,
they have varied from one topic to another in every two consecutive pins. Therefore, to capture
the variation in topics of pins, we propose the topic oscillation. Let 𝑆 = [𝑐 1 , 𝑐 2 · · · 𝑐 𝑛 ] denote the
chronologically ordered sequence of topics of pins for a given user where 𝑐𝑖 ∈ T . Then, the topic
oscillation (𝑇𝑂) of a user is defined as follows:
                                                                      𝑛−1
                                                                1
                                                                          1 (𝑐𝑖 = 𝑐𝑖+1 )
                                                                      Õ
                                                  𝑇𝑂 (𝑢) =          ×                                              (4.2)
                                                              𝑛−1
                                                                      𝑖=1
    6 𝑙𝑛(10)   = 2.30.
                                                                    67


where 1 is the indicator function.7 The minimum value of 𝑇𝑂 (.) is 0 and it occurs when the user
has pinned resources from a single topic. The maximum value of the 𝑇𝑂 (.) is 1 and it occurs when
the topic has changed for every two consecutive pins.
                                         1.0
                                                                            Female Teachers
                                                                            Male Teachers
                                         0.8
                                         0.6
                                P(X>x)
                                         0.4
                                         0.2
                                         0.0
                                               0.0   0.2       0.4        0.6      0.8    1.0
                                                           Topic oscillation (X)
                     Figure 4.11: The CCDF of the topic oscillation (Eq. 4.2).
   Figure 4.11 shows the CCDF of the topic oscillation for male and female teachers. As it can be
observed from this figure, female teachers exhibit a smaller degree of the topic oscillation. In other
words, overall, compared to female teachers, male teachers stay more on-topic while curating pins.
   (a) Male & female teachers                        (b) Female teachers                        (c) Male teachers
                  Figure 4.12: The topic oscillation based on the number of pins.
   Does having a high or a low number of pins in a user’s account is correlated with their topic
oscillation? To answer this, similar to the topic entropy, we investigated the relationship between
the topic oscillation and the number of pins, as shown in Figure 4.12. When we combined the data
   7 https://en.wikipedia.org/wiki/Indicator_function
                                                                  68


        (a) Range [20, 1461]                         (b) Range [1462, 5762]            (c) Range [5762, 244333]
                                  1.0
                                  0.8
                   Topic Oscillation
                                  0.6
                                  0.4
                                  0.2
                                  0.0
                                        [20, 1461]           [1462, 5762]     [5772, 244333]
                                                          Pin number ranges
                 (d) Boxplots of the topic oscillation for each range of the number of pins
Figure 4.13: The topic oscillation for male teachers across three distinct ranges of the numbers of
pins.
of male and female teachers (i.e., Figure 4.12a), there is almost no correlation between the topic
oscillation and the number of pins. However, the p-value is relatively high, and thus the Pearson
correlation in Figure 4.12a is not statistically significant. A closer look at Figures 4.12b and 4.12c
reveals why the combined data of male and female teachers exhibit no correlation, and the p-value
is high. The reason is that the data of female teachers and male teachers contains two distinct
patterns in terms of the correlation between the topic oscillation and the number of pins where
the former, in general, shows almost no correlation while the latter is associated with a negative
correlation. Therefore, to understated the behavior of male and female teachers regarding the topic
oscillation, we investigated the correlation between the topic oscillation and the number of pins in
the three ranges similar to what we performed for the topic entropy in Section 4.3.2.2.
                                                               69


        (a) Range [20, 1716]                         (b) Range [1716, 5067]            (c) Range [5068, 173103]
                                  0.8
                                  0.6
                   Topic Oscillation
                                  0.4
                                  0.2
                                  0.0
                                        [20, 1716]           [1716, 5067]     [5068, 173103]
                                                          Pin number ranges
                 (d) Boxplots of the topic oscillation for each range of the number of pins
Figure 4.14: The topic oscillation for female teachers across three distinct ranges of the numbers
of pins.
   Figure 4.13 shows the correlation between the topic oscillation and the number of pins for male
teachers in the three ranges of the number of pins. Here we observe a similar pattern with the
topic entropy. For the low range of the number of pins illustrated in Figure 4.13a, the correlation
is positive. For the next range (Figure 4.13b), the correlation decreases, yet it remains positive.
However, the correlation becomes negative for the last range (Figure 4.13c). Hence, based on the
results shown in Figure 4.13, we can conclude that industrious male teachers are more likely to stay
on-topic in their pinning endeavor.
   The correlation between the topic oscillation and the number of pins for female teachers in the
three ranges is shown in Figure 4.14. The correlation is positive for the first two ranges, while
for the last one, it becomes negative. Also, the correlation decreases as we move from a range to
                                                               70


the adjacent one. Thus, as far as female teachers are concerned, they exhibit two distinct patterns
regarding the topic entropy and the topic oscillation. While they accumulate a larger number of
pins, they tend to become less topic-specialized, whereas they manage to stay on-topic.
                                   Male Teachers Female Teachers                                       Male Teachers Female Teachers
                                                                                                                                       1.0
                                                                    2.5
                                       0.40           0.64                                                 0.28           0.35
                          Low                                                                 Low                                      0.8
                                                                    2.0
        Range of pin numbers                                                Range of pin numbers
                                                                                                                                       0.6
                                                                    1.5
                                       1.00           1.09                                                 0.39           0.42
                          Medium                                    1.0
                                                                                              Medium                                   0.4
                                                                                                                                       0.2
                                                                    0.5
                                       1.03           1.38                                                 0.33           0.43
                          High                                                                High
                                                                    0.0                                                                0.0
                               (a) The topic entropy in the three ranges                  (b) The topic oscillation in the three ranges
Figure 4.15: A summary of the topic entropy and the topic oscillation for male and female teachers
(values are median in ranges).
   Finally, Figure 4.15a summarizes the topic entropy and the topic oscillation values for the
three defined ranges of the number of pins. We have labeled the three ranges as low, medium,
and high, signifying their relative coverage of the number of pins. In summary, we can conclude
that, compared to female teachers, male teachers are more topic-specialized and tend to stay more
on-topic in their pinning endeavor.
4.3.3                     Domain of Pins
As demonstrated in Section 2.2.1, a Pinterest user can pin (save) an image/video from virtually
anywhere on the web in their account. Because of this feature, Pinterest has been referred to as a
social curation website [83, 84, 85, 86, 87, 88]. The social curation nature of Pinterest has made it
very appealing to teachers [4, 80]. Moreover, since the source of a resource can be practically any
place on the web, it is very important in the context of teachers in social media. More specifically,
                                                                           71


the source informs us whom teachers turn to for obtaining educational resources. Then, through
this knowledge, we can characterize the educational resources curated by teachers and assess their
quality which ultimately paves the way to determine the teacher’s quality [79]. Additionally, not
only for teachers but also for the general Pinterest users, the sources of pins play an essential role
in understating the behavior of users [125, 129, 86]. This is because the sources of pins essentially
embed crucial information about the user’s preference and pinning behavior. Hence, in this part
of the dissertation, we investigate the sources of pins. While doing so, we go one step further and
incorporate the gender of teachers in our investigation. Pinterest records the source of a pin, a URL
(Uniform Resource Locator) of the image/video– See Figure 2.2. This URL includes the domain
of the source, e.g., teacherspayteachers.com. Therefore, we investigate the sources of pins via their
domains.
4.3.3.1    Top Domains
Figures 4.16 and 4.18 demonstrate the percentage of each of the top 20 domains of pins for male
and female teachers, respectively. In addition, we calculated the distribution of the top domains
across topics, as shown in Figure 4.17 for male teachers and Figure 4.19 for female teachers. More
specifically, we created the domain-topic matrices 𝐷𝑇 𝑚 and 𝐷𝑇 𝑓 for male teachers and female
                                             𝑚 or 𝐷𝑇     𝑓
teachers, respectively. Then, an entry 𝐷𝑇(𝑑,𝑡)          (𝑑,𝑡)
                                                               represents the percentage of pins whose
domain is 𝑑 and topic is 𝑡 where 𝑑 is a domain from the 20 top domains and 𝑡 is a topic from 34
                                           𝑚
existing topics i.e., T . For instance, 𝐷𝑇(𝑒𝑡𝑠𝑦.𝑐𝑜𝑚,𝑑𝑖𝑦_𝑐𝑟𝑎      in Figure 4.17 is 75.2, which means the
                                                            𝑓 𝑡)
topic of 75.2% of male teachers’ pins coming from etsy.com is diy_craft. We make the following
observations from these figures.
• First of all, we can observe that the predominant domain for male and female teachers is teachers-
   payteachers.com. This seems reasonable since, as mentioned before, teacherspayteachers.com
   is the largest marketplace of online educational resources and is very popular among American
   educators. In fact, teacherspayteachers.com is colloquially considered ebay.com for educational
   resources.
                                                   72


           bookunitsteacher.com      0.26
                   bloglovin.com     0.29
  missgiraffesclass.blogspot.com     0.3
 upperelementarysnapshots.com        0.34
               notconsumed.com       0.35
   hojosteachingadventures.com       0.36
        blog.maketaketeach.com       0.37
                    buzzfeed.com      0.39
                 igamemom.com         0.39
  fernsmithsclassroomideas.com        0.4
                  instagram.com       0.41
                        flickr.com    0.49
         theintentionalmom.com         0.58
       mrswillskindergarten.com        0.6
    options-trading-mastery.com         0.73
        sharingkindergarten.com          0.83
                         etsy.com         1.03
                     youtube.com           1.24
                Uploaded by user                3.59
       teacherspayteachers.com                                                                  21.77
                                   0                 5      10             15            20
                                                            Proportion (%)
                       Figure 4.16: The top 20 domains of pins for male teachers.
• Although the predominant domain for both gender groups is teacherspayteachers.com, its per-
  centage is significantly higher for male teachers (21.77%) than for female teachers (7.88%).
  Similar to what we observed for topics of pins in Section 4.3.2, this indicates that male teachers
  have focused more on certain domains while female teachers have diversified their attention on
  different domains.
• Following on the previous point, we can observe from Figure 4.19 that although male teachers
  have curated resources from various domains, their attention has been fairly concentrated on
  educational content from these domains. On the contrary, female teachers have explored curating
  other types of resources from their domains of interest. For instance, 81.7% of pins from
  polyvore.com curated by female teachers are related to women’s fashion. In other words, the
  domain-topic matrix 𝐷𝑇 𝑚 shown in Figure 4.17 is more sparse than 𝐷𝑇 𝑓 in Figure 4.19. As a
  simple measure of sparsity, we calculated the number of zero elements in each matrix divided
  by 680, the number of entries in a matrix, since 20 domains × 34 topics = 680. The sparsity for
  𝐷𝑇 𝑚 is 0.62, while its value for 𝐷𝑇 𝑓 is 0.47.
                                                       73


                                                                                                                     Topics
                                                     animals
                                                     architecture
                                                     art
                                                     cars_motorcycles
                                                     celebrities
                                                     design
                                                     diy_crafts
                                                     education
                                                     film_music_books
                                                     food_drink
                                                     for_dad
                                                     gardening
                                                     geek
                                                     hair_beauty
                                                     health_fitness
                                                     history
                                                     holidays_events
                                                     home_decor
                                                     humor
                                                     illustrations_posters
                                                     kids
                                                     mens_fashion
                                                     others
                                                     outdoors
                                                     photography
                                                     products
                                                     quotes
                                                     science_nature
                                                     sports
                                                     tattoos
                                                     technology
                                                     travel
                                                     weddings
                                                     womens_fashion
                                                                                                                                                                                                 100
                       teacherspayteachers.com       0.0 0.0 0.0 0.0 0.0 0.0 0.3 99.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                Uploaded by user     0.3 0.0 0.5 0.0 0.1 0.0 2.4 89.9 0.3 0.8 0.0 0.1 0.1 0.1 0.7 0.1 1.0 0.4 0.9 0.0 0.3 0.0 0.5 0.0 0.0 0.2 0.8 0.0 0.1 0.0 0.0 0.1 0.0 0.3
                                     youtube.com     0.2 0.0 0.4 0.0 0.0 0.0 0.8 95.3 1.2 0.2 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.0 0.2 0.0 0.5 0.0 0.3 0.0 0.1 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                         etsy.com    0.1 0.0 2.5 0.0 0.0 0.5 75.215.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 1.3 0.7 0.2 0.0 0.5 0.0 0.3 0.0 0.0 1.6 0.7 0.0 0.3 0.0 0.0 0.1 0.4 0.3
                                                                                                                                                                                                 80
                        sharingkindergarten.com      0.0 0.0 0.0 0.0 0.0 0.0 0.4 99.4 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                    options-trading-mastery.com      0.0 0.0 1.3 0.4 0.0 0.0 16.756.5 0.4 2.1 0.0 0.0 0.4 0.4 0.8 0.0 0.0 1.3 0.0 0.0 0.4 0.0 0.0 0.0 0.4 8.4 7.9 0.4 0.0 0.0 2.1 0.0 0.0 0.0
                       mrswillskindergarten.com      0.0 0.0 0.0 0.0 0.0 0.0 0.4 99.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                         theintentionalmom.com       0.3 0.0 0.0 0.0 0.0 0.0 38.2 6.5 0.0 4.7 0.0 0.0 0.0 0.0 0.7 0.0 0.1 2.8 0.0 0.0 43.5 0.0 0.1 0.1 0.0 0.0 2.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Top 20 domains
                                                                                                                                                                                                 60
                                        flickr.com   1.2 0.0 1.1 0.0 0.2 0.5 13.575.5 0.5 1.1 0.0 1.1 0.0 0.2 0.0 0.2 0.9 1.1 0.2 0.0 0.2 0.0 1.2 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0.2 0.6 0.0 0.2
                                  instagram.com      0.0 0.0 0.0 0.0 0.0 0.0 1.2 91.4 0.1 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 6.1 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                  fernsmithsclassroomideas.com       0.0 0.0 0.0 0.0 0.0 0.0 1.1 98.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                 igamemom.com        0.0 0.0 0.0 0.0 0.0 0.0 3.5 91.9 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0.0 0.0 0.0 3.4 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0
                                                                                                                                                                                                 40
                                    buzzfeed.com     0.6 0.0 0.2 0.0 0.0 0.0 7.4 78.3 2.0 2.6 0.0 0.2 0.1 0.2 0.1 0.1 2.1 1.2 1.7 0.0 0.5 0.0 0.7 0.2 0.1 0.0 1.1 0.0 0.0 0.0 0.0 0.5 0.2 0.0
                        blog.maketaketeach.com       0.0 0.0 0.0 0.0 0.0 0.0 0.1 99.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                   hojosteachingadventures.com       0.0 0.0 0.0 0.0 0.0 0.0 1.1 98.0 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                               notconsumed.com       0.0 0.0 0.0 0.0 0.0 0.0 13.028.0 0.7 3.4 0.0 0.0 0.0 0.0 1.0 0.0 3.6 0.3 0.0 0.0 41.7 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0
                                                                                                                                                                                                 20
                 upperelementarysnapshots.com        0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                  missgiraffesclass.blogspot.com     0.0 0.0 0.0 0.0 0.0 0.0 0.1 99.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                   bloglovin.com     0.0 0.0 0.3 0.0 0.0 0.0 2.3 95.4 0.0 0.4 0.0 0.1 0.0 0.0 0.0 0.0 0.3 0.1 0.2 0.0 0.1 0.0 0.2 0.0 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.2
                           bookunitsteacher.com      0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.3 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                                                                                                                                                                                 0
                   Figure 4.17: The distribution of the top 20 domains for male teachers across topics (𝐷𝑇 𝑚 ).
    • Despite the difference in sparsity between 𝐷𝑇 𝑚 and 𝐷𝑇 𝑓 , for both gender groups, the pre-
                 dominant topic of pins coming from the top 20 domains is still education– See the column
                 corresponding to education in 𝐷𝑇 𝑚 (Figure 4.17) and 𝐷𝑇 𝑓 (Figure 4.19). While some of these
                 domains are evidently educational websites such as missgiraffesclass.blogspot.com, both male
                 and female teachers have curated a significant number of educational resources from the general-
                 purpose sources as well e.g., youtube.com, amazon.com. This indicates that, overall, both gender
                 groups seek educational materials from most sources they encounter. Figure 4.20 illustrates an
                 example of an educational pin curated from youtube.com.
    • A relatively large portion of pins for both gender groups does not have a domain from the web
                 as the user has directly uploaded them, namely 3.59% for male teachers and 5.72% for female
                 teachers.8 To put these numbers in perspective, we processed the data of non-teachers in our
                 dataset9 and discovered that there is a tiny percentage (0.23%) of pins whose domain is “Uploaded
                   8 Technically,
                    the domain for these pins in our dataset is “Uploaded by user”–See Table 2.1.
                   9 For
           non-teachers, we used a similar procedure described in Section 4.2 and considered a user
   as non-teacher if 𝑝 < 0.05.
                                                                                                                74


                squidoo.com      0.17
  notimeforflashcards.com        0.17
                   houzz.com     0.18
    classroomfreebies.com        0.2
    moneysavingmom.com            0.21
           someecards.com         0.21
                     bhg.com      0.22
               polyvore.com       0.23
    teachersnotebook.com           0.25
              facebook.com         0.25
                amazon.com         0.26
              bloglovin.com         0.31
                 indulgy.com         0.37
             instagram.com             0.49
              buzzfeed.com               0.61
                    flickr.com                0.89
               youtube.com                         1.35
                     etsy.com                             2.09
          Uploaded by user                                                         5.72
 teacherspayteachers.com                                                                        7.88
                               0               1        2       3     4          5     6 7       8
                                                                  Proportion (%)
                          Figure 4.18: The top 20 domains of pins for female teachers.
   by user”. Furthermore, results presented in Figures 4.17 and 4.19 reveal that the topic of most
   of these pins is education (89.9% and 80.4% for male and female teachers, respectively). Given
   this, we can state that both male and female teachers actively curate educational resources not
   only by acquiring them from online sources but by directly creating and sharing them with their
   peers.
     Consistent with Torphy et al. [79], our findings indicate that teachers frequently turn to their
fellow teachers online for educational resources and professional materials. We mentioned “fellow
teachers” because online educational resources are mainly prepared by other teachers/educators.
For example, resources from teacherspayteachers.com are primarily curated by teachers themselves.
However, compared to Torphy et al. [79], our investigation has three distinct differences. First, we
performed the domain analysis of teachers’ online resources in a significantly larger scale fashion:
ours includes 80,731,782 pins from 13,267 teachers (male and female teachers combined) while
[79] included 140,287 pins from 197 teachers. Second, they found out that educator blogs were
the predominant sources of pins, followed by “Teacher-to-Teacher Consumption Markets (TTM)”
websites. However, according to our findings, TTMs, especially teacherspayteachers.com are the
                                                               75


                                                                                                                  Topics
                                                 animals
                                                 architecture
                                                 art
                                                 cars_motorcycles
                                                 celebrities
                                                 design
                                                 diy_crafts
                                                 education
                                                 film_music_books
                                                 food_drink
                                                 for_dad
                                                 gardening
                                                 geek
                                                 hair_beauty
                                                 health_fitness
                                                 history
                                                 holidays_events
                                                 home_decor
                                                 humor
                                                 illustrations_posters
                                                 kids
                                                 mens_fashion
                                                 others
                                                 outdoors
                                                 photography
                                                 products
                                                 quotes
                                                 science_nature
                                                 sports
                                                 tattoos
                                                 technology
                                                 travel
                                                 weddings
                                                 womens_fashion
                                                                                                                                                                                              100
                 teacherspayteachers.com         0.0 0.0 0.0 0.0 0.0 0.0 0.4 99.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                          Uploaded by user       0.5 0.0 1.1 0.0 0.1 0.1 5.7 80.4 0.4 1.5 0.0 0.2 0.1 0.7 0.3 0.0 1.2 1.0 1.2 0.0 0.6 0.0 0.5 0.0 0.3 0.1 2.5 0.0 0.1 0.1 0.0 0.4 0.4 0.6
                                     etsy.com    0.4 0.0 2.3 0.0 0.0 0.6 38.923.5 0.6 0.7 0.0 0.3 0.2 0.2 0.2 0.2 5.0 5.0 0.4 0.0 7.2 0.0 0.6 0.2 0.4 0.6 5.2 0.0 1.2 0.0 0.0 0.3 2.7 3.2
                               youtube.com       0.1 0.0 0.4 0.0 0.0 0.0 1.2 95.4 0.7 0.1 0.0 0.0 0.0 0.1 0.2 0.1 0.1 0.0 0.4 0.0 0.1 0.0 0.5 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.1 0.0 0.0
                                                                                                                                                                                              80
                                    flickr.com   0.2 0.2 2.0 0.0 0.0 0.1 13.372.5 0.2 2.2 0.0 2.7 0.0 0.2 0.0 0.1 2.0 1.6 0.3 0.0 0.2 0.0 0.3 0.0 0.6 0.0 0.6 0.0 0.0 0.1 0.0 0.5 0.1 0.0
                              buzzfeed.com       0.8 0.0 0.2 0.0 0.0 0.0 8.1 71.9 1.0 4.2 0.0 0.6 0.0 0.7 0.2 0.1 1.9 3.4 2.7 0.0 1.5 0.0 0.6 0.2 0.1 0.0 0.5 0.0 0.0 0.1 0.0 0.4 0.7 0.1
                             instagram.com       0.0 0.0 0.1 0.0 0.0 0.0 3.6 90.2 0.3 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.5 1.1 0.8 0.0 0.2 0.0 0.5 0.0 0.0 0.4 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.8
                                 indulgy.com     0.1 0.0 0.1 0.0 0.0 0.0 5.8 88.4 0.0 0.6 0.0 0.1 0.0 0.3 0.0 0.0 0.6 1.0 0.0 0.0 0.5 0.0 0.6 0.0 0.9 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.6
                                                                                                                                                                                              60
Top 20 domains
                              bloglovin.com      0.0 0.0 0.8 0.0 0.0 0.0 2.8 94.5 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.1 0.1 0.0 0.1 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4
                                amazon.com       0.2 0.0 0.1 0.0 0.0 0.1 2.1 81.3 7.1 0.6 0.0 0.1 0.1 0.2 0.1 0.2 0.6 0.4 0.1 0.0 5.9 0.0 0.2 0.1 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.2
                              facebook.com       0.4 0.0 0.0 0.0 0.0 0.1 3.8 85.8 0.9 0.4 0.0 0.0 0.0 0.5 0.3 0.0 1.1 0.9 1.2 0.0 0.3 0.0 0.4 0.0 0.1 0.3 1.7 0.0 0.0 0.0 0.0 0.2 0.0 1.7
                    teachersnotebook.com         0.0 0.0 0.0 0.0 0.0 0.0 1.1 98.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                                                                                                                                                                              40
                               polyvore.com      0.0 0.0 0.0 0.0 0.0 0.0 0.3 12.9 0.0 0.0 0.0 0.0 0.2 2.8 0.0 0.2 0.0 0.3 0.0 0.0 0.2 0.0 0.0 0.0 0.2 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.3 81.7
                                     bhg.com     0.1 0.0 0.0 0.0 0.0 0.0 27.2 0.7 0.0 26.1 0.0 4.9 0.0 0.0 0.1 0.0 10.328.7 0.0 0.0 1.1 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0
                           someecards.com        0.0 0.0 0.0 0.0 0.0 0.0 0.0 55.6 0.6 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 41.4 0.0 0.6 0.0 0.3 0.0 0.0 0.0 0.0 0.0 1.4 0.0 0.0 0.0 0.0 0.0
                    moneysavingmom.com           0.0 0.0 0.0 0.0 0.0 0.0 34.3 8.3 0.6 41.4 0.0 0.0 0.0 0.0 2.0 0.0 0.2 4.0 0.0 0.0 5.6 0.0 0.1 0.0 0.0 0.0 2.3 0.0 0.0 0.0 0.0 0.3 0.0 0.7
                                                                                                                                                                                              20
                    classroomfreebies.com        0.0 0.0 0.0 0.0 0.0 0.0 2.0 97.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                   houzz.com     0.0 0.0 0.0 0.0 0.0 0.3 2.0 0.6 0.0 0.0 0.0 6.7 0.0 0.0 0.0 0.0 0.0 87.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.3 0.0 0.0 0.0 0.0 0.0 0.6 0.0
                  notimeforflashcards.com        0.0 0.0 0.0 0.0 0.0 0.0 15.075.7 0.2 0.1 0.0 0.0 0.0 0.0 0.0 0.0 4.1 0.2 0.0 0.0 4.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                squidoo.com      0.2 0.0 0.1 0.0 0.0 0.0 11.182.1 0.5 1.5 0.0 0.3 0.0 0.0 0.5 0.4 2.1 0.1 0.2 0.0 0.1 0.0 0.2 0.0 0.0 0.0 0.3 0.0 0.1 0.0 0.0 0.1 0.0 0.1
                                                                                                                                                                                              0
                  Figure 4.19: The distribution of the top 20 domains for female teachers across topics (𝐷𝑇 𝑓 ).
 predominant sources of pins. According to [79], “educator blogs include independent websites
 created by individuals or groups of teachers who openly reflect and share their professional values,”
 e.g., missgiraffesclass.blogspot.com. Finally, we incorporated the gender of teachers in our domain
 analysis and demonstrated male and female differences and similarities regarding the sources of
 their pins.
 4.3.3.2                       Domain Entropy
 Similar to the topic entropy in Section 4.3.2, we also investigated the domain specialization. A
 completely domain-specialized user gets their pins from a single domain while a less domain-
 specialized user tries out different domains.10 Let D = [𝑑1 , 𝑑2 · · · 𝑑 𝑘 ] denote the set of 𝑘 domains.
 In the dataset used in this chapter, combining the domains of pins of male and female teachers, in
 total, we have 46, 377 unique domains (i.e., k= 46, 377). Further, for a given user, let 𝑝 𝑑𝑖 denote
                 10 The
          term “domain-specialized” is used in the context of this dissertation and the domains of
 teacher-curated resources on Pinterest. Therefore, it should not be confused with the term domain
 expert (https://en.wikipedia.org/wiki/Subject-matter_expert).
                                                                                                                 76


             Figure 4.20: An example of an educational pin curated from youtube.com.
the fraction of pins whose domain is 𝑑𝑖 . Then, we define the domain entropy (𝐷𝐸) as follows:
                                                Õ
                                  𝐷𝐸 (𝑢) = −         𝑝 𝑑𝑖 × 𝑙𝑛( 𝑝 𝑑𝑖 )                         (4.3)
                                              ∀𝑑𝑖 ∈D
Compared to the topic entropy in Eq. 4.1, here we do not account for the number of domains from
which a user has at least one pin, i.e., an equivalent of the term 𝑚𝑘 in Eq. 4.1 is omitted from the
definition of the domain entropy. The reason is we have many unique domains (specifically 46, 377),
and such a term would be almost zero and consequently would zero out the domain entropy.
    Figure 4.21 shows the CCDF of the domain entropy for male and female teachers. As it can
be observed from this figure, female teachers exhibit a smaller degree of domain specialization.
Combined with what we found out in Section 4.3.2, male teachers are more specialized in both
topic and domain. Since the range of existing domains is significantly larger than existing topics
(46, 377 versus 34), the domain entropy has a larger scale than the topic entropy. More specifically,
the maximum value for the domain entropy is 𝑙𝑛(46377) = 10.74 while for the topic entropy the
maximum value is 𝑙𝑛(34) = 3.52.
    Similar to Section 4.3.2, here we attempt to determine the relationship between the number of
pins and the domain entropy. Figure 4.22 demonstrates the plots of the domain entropy versus the
                                                 77


                                   1.0
                                   0.8
                                   0.6
                              P(X>x)
                                   0.4
                                   0.2
                                           Female Teachers
                                           Male Teachers
                                   0.0 0       2         4         6   8
                                                  Domain Entropy (X)
                     Figure 4.21: The CCDF of the domain entropy (Eq. 4.3).
      (a) Male & female teachers             (b) Female teachers             (c) Male teachers
                  Figure 4.22: The domain entropy based on the number of pins.
number of pins with fitted regression lines for three cases: male and female teachers combined
(Figure 4.22a), female teachers only (Figure 4.22b), and male teachers only (Figure 4.22c). The
overall trend shown in Figure 4.22a shows that the more pins a user has, the higher the domain
entropy is. Nevertheless, this trend is driven by two opposite forces from the data of male and
female teachers. More specifically, for female teachers, the higher number of pins results in the
higher domain entropy, whereas for the male teachers, the domain entropy decreases when the
number of pins increases. This can be further confirmed by the positive (negative) correlation as
well as the positive (negative) slope for the fitted regression lines in Figures 4.22b and Figures 4.22c,
respectively.
   The distinct patterns that emerged from the data of male and female teachers regarding the
relationship between the domain entropy and the number of pins motivated us to dig deeper into
                                                        78


        (a) Range [20, 1461]                      (b) Range [1462, 5762]            (c) Range [5762, 244333]
                                 8
                                 6
                    Domain Entropy
                                 4
                                 2
                                 0
                                     [20, 1461]           [1462, 5762]     [5772, 244333]
                                                       Pin number ranges
                 (d) Boxplots of the domain entropy for each range of the number of pins
Figure 4.23: The domain entropy for male teachers across three distinct ranges of the numbers of
pins.
this relationship, similar to what we performed in Section 4.3.2. To this end, we investigated this
relationship in three ranges of the number of pins. Figure 4.23 and 4.24 show the results for male
and female teachers, respectively. For male teachers, we can observe a similar pattern with what we
presented in Section 4.3.2. At the first range, demonstrated in Figure 4.23a, the correlation between
the number of pins and the domain entropy is positive and relatively large. This correlation remains
positive yet drops significantly in the middle range (Figure 4.23b) and eventually becomes negative
in the last range (Figure 4.23c). Hence, the more prolific male teachers become, the more they
tend not to explore more domains, i.e., they prefer to focus on a smaller set of domains. Regarding
the changes from a range to the next one, female teachers exhibit a similar behaviour where the
correlation from the first range (Figure 4.24a) to the second range (Figure 4.24b) and eventually
                                                            79


          (a) Range [20, 1716]                      (b) Range [1716, 5067]            (c) Range [5068, 173103]
                                   8
                                   6
                      Domain Entropy
                                   4
                                   2
                                   0
                                       [20, 1716]           [1716, 5067]     [5068, 173103]
                                                         Pin number ranges
                  (d) Boxplots of the domain entropy for each range of the number of pins
Figure 4.24: The domain entropy for female teachers across three distinct ranges of the numbers of
pins.
to the third one (Figure 4.24c) decreases monotonically. Nevertheless, compared to male teachers,
there exists a major difference: at each range, the magnitude of the correlation is significantly
higher for female teachers, i.e., 0.8 vs. 0.47, 0.33 vs. 0.16, and −0.03 vs. −0.36, respectively.
4.3.3.3   Domain Oscillation
Although teachers might focus on certain domains (i.e., being domain-specialized), they might
frequently switch from one domain to another. The domain entropy cannot capture this variation
based on the same reasoning discussed for the topic entropy. Hence, similar to the topic oscillation
defined in Eq. 4.2, we define the domain oscillation. Suppose 𝑆 = [𝑎 1 , 𝑎 2 · · · 𝑎 𝑛 ] denote the
chronologically ordered sequence of domains of pins for a given user where 𝑎𝑖 ∈ D. Then the
                                                              80


                                  1.0
                                  0.8
                                  0.6
                             P(X>x)
                                  0.4
                                  0.2
                                            Female Teachers
                                            Male Teachers
                                  0.0 0.0     0.2     0.4      0.6       0.8   1.0
                                                  Domain Oscillation (X)
                    Figure 4.25: The CCDF of the domain oscillation (Eq. 4.4).
domain oscillation (𝐷𝑂) is defined as follows:
                                                             𝑛−1
                                                1
                                                       1 (𝑎𝑖 = 𝑎𝑖+1 )
                                                     Õ
                                      𝐷𝑂 (𝑢) =     ×                                                     (4.4)
                                               𝑛−1
                                                               𝑖=1
where 1 is the indicator function. The minimum value of 𝐷𝑂 (.) is 0 and it occurs when the user
has pinned resources from a single domain. The maximum value of the 𝐷𝑂 (.) is 1 and it occurs
when the domain has changed for every two consecutive pins.
   Figure 4.25 demonstrates the CCDF of the domain oscillation for male and female teachers. As
shown in this figure, female teachers exhibit a smaller degree of domain specialization, i.e., overall,
they are less domain-specialized.
     (a) Male & female teachers               (b) Female teachers                    (c) Male teachers
                Figure 4.26: The domain oscillation based on the number of pins.
   Again, we attempt to look into the domain oscillation for male and female teachers while con-
                                                          81


sidering the number of pins. In other words, does the number of pins is related to the domain
oscillation? The same as before, we performed this investigation for three cases as shown in Fig-
ure 4.26: male and female teachers combined (Figure 4.26a), female teachers only (Figure 4.26b),
male teachers only (Figure 4.26c). Interestingly, for all three cases, the domain oscillation nega-
tively correlates with the number of pins. In other words, the more pins a teacher has, the less they
oscillate from a domain to another. Here, unlike the domain entropy, male and female teachers
show similar behavior.
        (a) Range [20, 1461]                         (b) Range [1462, 5762]          (c) Range [5762, 244333]
                                  1.0
                                  0.8
                 Domain Oscillation
                                  0.6
                                  0.4
                                  0.2
                                  0.0
                                        [20, 1461]           [1462, 5762]     [5772, 244333]
                                                          Pin number ranges
               (d) Boxplots of the domain oscillation for each range of the number of pins
Figure 4.27: The domain oscillation for male teachers across three distinct ranges of the numbers
of pins.
   To deepen our understating of the relationship between the domain oscillation and the number
of pins, again, we discretized the number of pins into three equal-size ranges for male and female
teachers and then inspected the relationship. Figure 4.27 shows the results for male teachers. There
                                                               82


is no correlation between the domain oscillation and the number of pins for the first two ranges
(i.e., Figures 4.27a and 4.27b). Nevertheless, the p-values for these two ranges are large, and thus
they are not statistically significant. The main reason that the correlations in these two ranges are
not statistically significant is the presence of a relatively large number of outliers. The outliers are
visually visible in Figures 4.27a and 4.27b, which are the data points whose domain oscillation is
less than 0.8. Later, we will discuss more about the observed phenomenon in these two ranges
for male teachers (i.e., the results shown in Figures 4.27a and 4.27b) and how to make sense of
it. However, the story for the third range, shown in Figure 4.27c, is different, where we observe
a moderate negative correlation (−0.36), which is also statistically significant. Hence, for prolific
male teachers, the more they pin, the less they tend to vary in the domains of the pins.
        (a) Range [20, 1716]                          (b) Range [1716, 5067]          (c) Range [5068, 173103]
                                   1.0
                                   0.8
                  Domain Oscillation
                                   0.6
                                   0.4
                                   0.2
                                   0.0
                                         [20, 1716]           [1716, 5067]     [5068, 173103]
                                                           Pin number ranges
               (d) Boxplots of the domain oscillation for each range of the number of pins
Figure 4.28: The domain oscillation for female teachers across three distinct ranges of the numbers
of pins.
                                                                83


     Figure 4.28 shows the results of the correlation between the number of pins and the domains
oscillation across the three defined ranges of the numbers of pins for female teachers. The first
range (Figure 4.28a) shows a small positive correlation between the domain oscillation and the
number of pins. This correlation becomes almost zero in the second range (Figure 4.28b). Hence,
if the number of pins for female teachers is not very large (i.e., the first two ranges in Figure 4.28),
the more they pin does not have an impact on their behavior regarding the domain oscillation.
For the last range, we observe a similar pattern with that of males teachers: a moderate negative
correlation between the domain oscillation and the number of pins. Hence, similarly, for prolific
female teachers, the more they pin resources, the less they tend to vary in the domains of pins.
     Referring back to the results demonstrated in Figure 4.27, we can discern an interesting similarity
between the plots in the first two ranges for male teachers and female teachers i.e., Figure 4.27a
with Figure 4.28a and Figure 4.27b with Figure 4.28b. It seems the data in male-related ranges
are the sparse versions of female-related ranges, where for male teachers, we only have fewer data
points while its pattern is similar to female teachers. This fewer number of data points consequently
made the p-value high. Furthermore, in the first two ranges for female teachers, more data points
helped obtain statistically significant correlations. Hence, for the first two ranges, male and female
teacher data distributions are very similar, and we can generalize our findings for female teachers
to male teachers. Based on the above discussion, it is safe to state that when the number of pins for
male teachers is not very large (i.e., the first two ranges in Figure 4.27), the more they pin does not
have an impact on their behavior regarding the domain oscillation.
     Figure 4.29 summarizes the domain entropy and the domain oscillation values for the three
defined ranges of the number of pins. We have labeled the three ranges as low, medium, and high,
signifying their relative coverage of the number of pins. In summary, we can conclude that male
teachers are more domain-specialized than female teachers and tend to vary less in sources of pins
they curate.
                                                    84


                                      Male Teachers Female Teachers                                                        Male Teachers Female Teachers
                                                                                        9                                                                                          1.0
                                                                                        8
                                              4.88                        5.57                                                  0.92                    0.94
                             Low                                                        7
                                                                                                                  Low                                                              0.8
           Range of pin numbers                                                                 Range of pin numbers
                                                                                        6
                                                                                                                                                                                   0.6
                                                                                        5
                                              6.52                        6.63                                                  0.93                    0.94
                             Medium                                                                               Medium
                                                                                        4
                                                                                                                                                                                   0.4
                                                                                        3
                                                                                        2
                                                                                                                                                                                   0.2
                                              6.64                        7.20                                                  0.88                    0.93
                             High                                                       1                         High
                                                                                        0                                                                                          0.0
                                  (a) The domain entropy in three ranges                                           (b) The domain oscillation in three ranges
Figure 4.29: A summary of the domain entropy and the domain oscillation for male and female
teachers (values are median in ranges).
        chart                                                                                              cute
       grade                                                                                              color
       freebi                                                                                                like
       teach                                                                                                  diy
          easi                                                                                           grade
        color                                                                                         christma
     chicken                                                                                           printabl
      school                                                                                              write
      project                                                                                                get
     teacher                                                                                              book
        math                                                                                            school
           get                                                                                         teacher
  classroom                                                                                               word
        book                                                                                                way
          way                                                                                               one
          one                                                                                                fun
          day                                                                                             recip
         read                                                                                              read
           fun                                                                                      classroom
        word                                                                                                easi
           kid                                                                                              day
        activ                                                                                             activ
       make                                                                                            student
        write                                                                                               free
         love                                                                                             great
          free                                                                                                kid
         idea                                                                                                use
        great                                                                                              love
           use                                                                                           make
     student                                                                                               idea
             0.00                      0.05          0.10               0.15     0.20   0.25                           0.00   0.25   0.50   0.75    1.00      1.25   1.50   1.75         2.00
                                                            Proportion (%)                                                                         Proportion (%)
                                               (a) Male teachers                                                                        (b) Female teachers
                                  Figure 4.30: The top 30 words of pin descriptions for male and female teachers.
4.3.4                        Language of Resources
To further investigate pins and boards curated by male and female teachers, we now look into the top
words associated with pins and boards. For pins, we acquired the top words from pin descriptions
and for boards from board names. For both of these textual inputs, we used the NLTK package [104]
                                                                                               85


       languag                                                    holiday
           word                                                        fun
            food                                                      year
          social                                                     word
           recip                                                    decor
           studi                                                  teacher
         lesson                                                       stuff
      christma                                                      organ
             fun                                                     activ
           book                                                       love
           write                                                   scienc
 kindergarten                                                          gift
           craft                                                        kid
          home                                                       write
         scienc                                                       food
              kid                                                   teach
           educ                                                      book
              art                                                    parti
            read                                                        art
        teacher                                                      recip
              tpt                                                    craft
           activ                                                 christma
          teach                                                     grade
        resourc                                                     home
             day                                                     math
         school                                                       read
          math                                                     school
          grade                                                        day
    classroom                                                  classroom
            idea                                                      idea
                0.00    0.05      0.10            0.15 0.20               0.0 0.2 0.4   0.6         0.8    1.0 1.2 1.4
                                   Proportion (%)                                           Proportion (%)
                             (a) Male teachers                                    (b) Female teachers
                     Figure 4.31: The top 30 words of board names for male and female teachers.
and performed appropriate pre-processing, e.g., removing punctuations and stop words (e.g., ‘!’,
‘the’), stemming the tokens (e.g., ‘education’ to ‘educ’). As a result, figure 4.30 demonstrates the
top 30 words associated with pins curated by male and female teachers. Similarly, Figure 4.31
shows the top 30 words associated with board names for male and female teachers.
         According to Figures 4.30 and 4.31, almost all the top words associated with pin descriptions
and board names curated by both male and female teachers are related to education and teaching
e.g., student, classroom, grade, school, math. This signifies that both male and female teachers
predominately leverage Pinterest to curate resources related to their teaching profession. In other
words, in line with previous studies [28, 5, 34], teachers utilize Pinterest for professional purposes.
However, our study in this chapter is the first one corroborating this fact for both male and female
teachers. Furthermore, similar to our findings in Section 3.5.4, some words like ‘fun’ and ‘idea’ are
among the top words in the board names and pin descriptions of both gender groups. Again, this
demonstrates the distinct way that teachers leverage social media for education-related purposes.
         A closer look at Figure 4.30 reveals that 25 out of the top 30 pin-related words are common
between males and females. Regarding board-related words, 22 out of the top 30 words are common
between the two gender groups. These two observations indicate that male and female teachers
                                                            86


                                                                                                                           0.7
                           0.6
                                                                                                                           0.6
                           0.5
   Similarity in ranking (r)                                                                       Similarity in ranking (r)
                                                                                                                           0.5
                           0.4
                           0.3                                                                                             0.4
                           0.2
                                                                                                                           0.3
                           0.1
                                                                                                                           0.2
                                 0   25   50   75        100        125     150   175   200                                      20     40               60         80   100
                                                Size of top word list (k)                                                               Size of top word list (k)
                                          (a) Pin-related words                                                                   (b) Board-related words
  Figure 4.32: Similarity of the top-k pin-related word lists using Rank-biased Overlap (RBO).
employ a very similar professional vocabulary in their pin and board curation activities. However,
the order of the top pin-related (and board-related) words for male teachers is different from that of
female teachers. Therefore, to more rigorously compare the ranked list of the top words for male and
female teachers, we used Rank-biased Overlap (RBO) [130]. RBO takes two ranked lists as the input
and returns a numeric value (0 ≤ 𝑟 ≤ 1), indicating the similarity between the two lists. The closer
𝑟 is to 1, the more similar the two ranked lists are. Compared to traditional ranked list comparison
methods like Kendall tau11, the main advantage of RBO is that it can handle non-conjointness, i.e.,
the items in the two ranked lists do not necessarily need to overlap (for more detail about RBO,
refer to [130]). We calculated the similarity between the two ranked lists of words while varying
the length of lists. Figures 4.32a and 4.32b demonstrate the result of this experiment for the top
words of pins and boards, respectively. When the size of lists is small, the similarity between the
male-related ranked list and the female-related ranked list is small. This seems reasonable since
for a small list, the two list are very different as can be seen in Figures 4.30 and 4.31. Nevertheless,
once we expand the ranked lists, the similarity increases. This indicates that, overall, male and
female teachers act similarly in terms of the language of their curated resources.
  11 https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
                                                                                              87


4.3.5   Resource Curation Over Time
Another aspect of teachers in social media is concerned with when teachers participate in online
activities. Studying time helps us understand how teachers interact with space outside of their class-
room. This is particularly important since teachers spend a considerable amount of time seeking out
educational materials for their pedagogical needs, e.g., up to 12 hours, according to [131]. Given
this, characterizing when teachers interact with social media has been investigated in the literature.
For instance, Rosenberg et al. [75] considered the conversations around state-level education-related
hashtags on Twitter as affinity spaces and further answered when teachers/educators participate in
these spaces. They analyzed, for example, the percentage of participants per day of the week, which
helped understand the unique ways that their sampled teachers are using Twitter for professional
purposes. We have performed similar analyses, as will be explained in this part. In another relevant
study, Greenhalgh and Koehler [126] characterized the timing around #educattentats, a hashtag
about Paris terrorist attacks that occurred in November 2015. #educattentats attempted to organize
teachers on how to discuss the attacks with their students.
    Our investigation in this part has several unique characteristics. First, the social media data in the
previous studies are associated with a specific scenario or project that gathers teachers, e.g., specific
hashtags [126, 75]. Nevertheless, our data reflects the entire timeline of thousands of teachers on
Pinterest from the day they joined until November 2019. Given this, we have a better picture of
teachers’ interaction with social media. Second, most previous studies looked into the interaction
of teachers with social media through inter-teacher online conversations, e.g., tweeting about a
specific topic or posting in a specific teacher-related Facebook group. This has its own merits e.g.,
helping to understand teacher professional development process [47, 46, 48, 49]. However, we
believe approaching the temporal analysis of the interaction of teachers with social media through
the lens of their curated resources is closely related to teachers’ classroom pedagogical activities
and, thus, educationally carries more weight. Third, for the first time, we have incorporated the
gender of teachers in the temporal analysis of teachers’ interaction with social media.
                                                   88


                           17.5       Male Teachers
                                      Female Teachers
                           15.0
  Proportion of Pins (%)
                           12.5
                           10.0
                            7.5
                            5.0
                            2.5
                            0.0
                                   Mon          Tue   Wed      Thu        Fri   Sat   Sun
                                                            Day of Week
Figure 4.33: The average percentage of pin curations on each day of the week for male and female
teachers.
4.3.5.1                      Days of the Week
Figure 4.33 demonstrates the average percentage of pins curated on each day of the week for male
and female teachers. Similarly, Figure 4.34 demonstrates the average percentage of boards curated
on each day of the week for male and female teachers. Regarding the gender of teachers, there
is no significant difference between male and female teachers, neither for pins nor for boards.
However, during the weekends, female teachers tend to become slightly more active in pin and
board curation than male teachers. Moreover, interestingly, distributions of pin and board curations
are very similar. This suggests that teachers have a specific weekly schedule to utilize social
media for resource curation. Another crucial observation from Figures 4.33 and 4.34 is that
during weekends, teachers (both males and females) are more active. Considering being busy at
                                                               89


                             17.5       Male Teachers
                                        Female Teachers
                             15.0
  Proportion of Boards (%)
                             12.5
                             10.0
                              7.5
                              5.0
                              2.5
                              0.0
                                     Mon       Tue    Wed       Thu     Fri   Sat   Sun
                                                            Day of Week
Figure 4.34: The average percentage of board curations on each day of the week for male and
female teachers.
school during weekdays, it seems logical that teachers allocate more time on their weekends to use
Pinterest. This finding is especially outstanding since the higher activity of teachers on Pinterest
during the weekends is in contrast with the overall usage pattern of Pinterest, where weekends have
the least amount of traffic [132]. Similarly, Rosenberg et al. [75] found a distinct weekly pattern
of tweeting for teachers on Twitter, which happened to be at odds with the overall Twitter usage
pattern.
4.3.5.2                        Months of the Year
Continuing the temporal analysis of pin and board curations, in this part, we look into months of the
year (i.e., January, February, · · · , December). Figure 4.35 demonstrates the average percentage of
                                                               90


                                                                      Male Teachers
                             10
                                                                      Female Teachers
                             8
    Proportion of Pins (%)
                             6
                             4
                             2
                             0
                                  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
                                                    Month of Year
Figure 4.35: The average percentage of pin curations in each month of the year for male and female
teachers.
pin curations in each month of the year for male and female teachers. Similarly, Figure 4.36 shows
this for board curations. According to these figures, there is no significant difference between male
and female teachers concerning the number of curations in months of the year. An interesting
observation is the higher curation activity in the summer months, namely June, July, and August.
In this period of the year, teachers have more free time, and they can curate more educational
materials and thus prepare themselves for the next teaching semester in August/September. Another
pattern is a relatively high degree of board curations in January. Note that boards are essentially
organizational folders representing teachers’ professional perspectives regarding pins worth saving
and sharing [79]. Given this, perhaps at the beginning of the year, teachers start to create more
boards to collect resources for the rest of the year.
                                                       91


                                                                          Male Teachers
                               10                                         Female Teachers
                               8
    Proportion of Boards (%)
                               6
                               4
                               2
                               0
                                    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
                                                      Month of Year
Figure 4.36: The average percentage of board curations in each month of the year for male and
female teachers.
4.3.5.3                        Days of the Month
Finally, in this part, we investigate the curation pattern for each day of the month, i.e., 1, 2, · · · 31.
Figure 4.37 demonstrates the average percentage of pin curations on each day of the month for male
and female teachers. Similarly, Figure 4.38 shows this for board curations. Since in the Gregorian
calendar, the months are either 28, 29, 30, or 31 days long, there is less amount of activity for
days 29, 30, and 31. For the rest of the days, we can observe that, on average, teachers perform
resource curation across all days without any significant variation. However, the numbers for male
teachers exhibit more variation than for female teachers. We believe the reason is the artifact of the
underrepresentation of male teachers, where we might not have enough data to populate each day
as much as we do for female teachers.
                                                         92


                           3.4
                           3.2
                           3.0
      Proportion of Pins (%)
                           2.8
                           2.6
                           2.4
                           2.2
                           2.0       Male Teachers
                                     Female Teachers
                                 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
                                                                        Day of Month
Figure 4.37: The average percentage of pin curations in each day of the month for male and female
teachers.
   In conclusion, our large-scale temporal investigations in Sections 4.3.5.1, 4.3.5.2, and 4.3.5.3
indicates three crucial findings:
    • Teachers are committed to using social media for educational purposes. This commitment
      is especially notable since our analysis showed that they spend time from their leisure to
      interact with social media (here Pinterest), e.g., over the weekends.
    • Teachers’ usage of social media is persistent and perpetual, as demonstrated in previous
      studies as well [4, 92].
    • Teachers shape their unique pattern of using social media compatible with their teaching
      schedule and the school-year calendar.
                                                                           93


                             3.75
                             3.50
                             3.25
      Proportion of Boards (%)
                             3.00
                             2.75
                             2.50
                             2.25
                                        Male Teachers
                             2.00       Female Teachers
                                    01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
                                                                           Day of Month
Figure 4.38: The average percentage of board curations in each day of the month for male and
female teachers.
4.4               Social Network Analysis
   In Chapter 1, we discussed the importance of teachers in social media and how they improve
education. In particular, online social media helps teachers share information and resources. These
resources can be diffused rapidly to them classroom. What has made this rapid diffusion possible
is the network formed among teachers. Within this network, teachers form ties (connections) and
communities (or socialized knowledge communities [78, 133]) to exchange knowledge. Not only
online networks but also traditional school-level teacher networks have been shown to provide
educational opportunities [25, 134]. Hence, to better understand teachers in social media, we
need to investigate the network that embeds teachers. To this end, social network analysis offers
us analytical approaches to study the network. Social network analysis is widely used in various
areas [102, 135, 136, 137, 138]. In particular, it has been used an analytical framework to study
teachers in social media [6, 139, 140, 141, 8, 142].
   In this section, we utilize social network analysis and study our Pinterest network from multiple
                                                                              94


perspectives. While performing the network/graph analysis, we consider the gender of teachers
as well. In Section 4.4.1, we look into the distribution of online connections for male and female
teachers. Then, in Section 4.4.1, we investigate centrality measures. Eventually, In Section 4.4.3,
we investigate the gender homophily among male and female teachers.
4.4.1                 Distribution of Connections
          1.0                                                                      1.0                                                                     1.0
                                                        Female Teachers                                                         Female Teachers                                                       Female Teachers
                                                        Male Teachers                                                           Male Teachers                                                         Male Teachers
          0.8                                                                      0.8                                                                     0.8
          0.6                                                                      0.6                                                                     0.6
 P(X>x)                                                                   P(X>x)                                                                  P(X>x)
          0.4                                                                      0.4                                                                     0.4
          0.2                                                                      0.2                                                                     0.2
          0.0                                                                      0.0                                                                     0.0
                100     101              102             103                             100    101              102             103                             100   101           102             103          104
                              Number of followers (X)                                                 Number of followees (X)                                                Number of friends (X)
                       (a) #Followers                                                          (b) #Followees                                               (c) #Followers + #Followees
Figure 4.39: The CCDF of number of connections for male and female teachers. x-axes are in
log-scale.
            Figures 4.39a, 4.39b, 4.39c demonstrate the CCDF of the number of followers, the number of
followees, and the total number of friends (i.e., the followers and followees combined), respectively.
As it can be observed, when the number of followers and the number of followees are small, the
distributions are almost identical for male and female teachers. Nevertheless, the probability of
having many followers/followees is slightly more for male teachers than females. Also, in general,
male teachers tend to have more friends.
            To go deeper into the distribution of the number of connections, Figure 4.40 shows the CCDF
of the number of followers and the number of followees for each gender group separately. First,
regarding the distributions of the number of followers vs. the number of followees, we can observe
from Figure 4.40b that, in general, female teachers have more followees than followers. According to
the CCDF curve, it means: ∀ 𝑛 𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑟 𝑠 > 𝑛) ≤ 𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑒𝑠 > 𝑛). We face a different
                                                                                                            95


            1.0                                                                      1.0
                                                           Followees                                                              Followees
                                                           Followers                                                              Followers
            0.8                                                                      0.8
            0.6                                                                      0.6
   P(X>x)                                                                   P(X>x)
            0.4                                                                      0.4
            0.2                                                                      0.2
            0.0                                                                      0.0
                  100   101          102             103                                   100   101          102           103          104
                         Number of connections (X)                                                 Number of connections (X)
                        (a) Male teachers                                                        (b) Female teachers
Figure 4.40: The CCDF of number of connections based on their types for each gender group
separately. x-axes are in log-scale.
scenario for male teachers demonstrated in Figure 4.40a. We see the same pattern when the numbers
of followers and followeess are low: male teachers have more followees than followers. This
technically means ∀ 𝑛 ∈ [1, 𝑛1 ] 𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑟 𝑠 > 𝑛) < 𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑒𝑠 > 𝑛) where 𝑛1 in our data
is 126. However, when the number of followers and followees increases, the chance of having more
followers is higher than followees, i.e., ∀ 𝑛 > 𝑛1                             𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑟 𝑠 > 𝑛) > 𝑃(# 𝑓 𝑜𝑙𝑙𝑜𝑤𝑒𝑒𝑠 > 𝑛).
Hence, we can conclude that when male teachers expand their networking, they can have more
people follow them.
   Is there any correlation between the number of followers and followees? Furthermore, how do
male teachers and female teachers differ regarding that correlation? To answer these questions,
Figure 4.41 illustrates the regression plot between the number of followers and the number of
followees for male and female teachers. For both gender groups, the correlation is positive, not
very large, however. This means that more number of followees leads to more number followers.12
This correlation is higher for male teachers than for female ones. Another question is, does the
  12 The reason we mentioned the causality from followee to follower stems from the usual online
user behavior where they follow more people to get more followers, not the other way around.
                                                                       96


                  (a) Male teachers                                          (b) Female teachers
            Figure 4.41: Regression plots of the number and the number of followees.
increase in the number of followers come from those who have already been followed? More
specifically, are friendships reciprocal in a way that if a user (teacher) follows someone, that person
follows the user back? To answer this question, we define reciprocity for a user as follows:
                                   1.0
                                                                  Female Teachers
                                                                  Male Teachers
                                   0.8
                                   0.6
                          P(X>x)
                                   0.4
                                   0.2
                                   0.0
                                         0.0   0.2    0.4      0.6     0.8     1.0
                                                     Reciprocity (X)
             Figure 4.42: The CCDF of the reciprocity for male and female teachers.
                                                         97


                                                    |𝐹 𝐿 (𝑢) ∩ 𝐹𝑂 (𝑢)|
                                 𝑟𝑒𝑐𝑖 𝑝𝑟𝑜𝑐𝑖𝑡𝑦(𝑢) =                                                (4.5)
                                                          |𝐹𝑂 (𝑢)|
where 𝐹 𝐿(𝑢) denotes the list of followers for user 𝑢 and 𝐹𝑂 (𝑢) denotes the list of followees.
Essentially, 𝑟𝑒𝑐𝑖 𝑝𝑟𝑜𝑐𝑖𝑡𝑦(𝑢) determines out of those whom 𝑢 follows what fraction have followed
them back. Note that 𝑟𝑒𝑐𝑖 𝑝𝑟𝑜𝑐𝑖𝑡𝑦(𝑢) ∈ [0, 1]. Figure 4.42 demonstrates the CCDF of reciprocity
index for male and female teachers. We can observe that, in general, male and female teachers have
a close reciprocity value. The reciprocity of female teachers is slightly higher, though (the average
values of reciprocity for male and female teachers are 0.24 and 0.27, respectively). A previous
study on some general users on Pinterest found similar results regarding the reciprocity of males
and females [125]. Furthermore, to put the reciprocity of male and female teachers in perspective,
the average value of reciprocity for non-teachers in our dataset is 0.35, which is not very high either.
Hence, overall, both male and female teachers do not have a high reciprocity value. We speculate
this is related to the social curation nature of Pinterest, where users might not feel “obligated” to
follow back since the formation of friendship in order to receive information is not as crucial as
other social media platforms such as Facebook and Twitter. We will leave further investigation
about the exact reasons behind the relatively low reciprocity on Pinterest, especially for teachers,
in the future.
4.4.2    Centrality
Centrality is one of the most important notions in social network analysis. It essentially assigns
a node a number based on their position in the network. Centrality has many applications in
various networks [143, 144, 145]. As will be discussed in the next chapter, centrality significantly
influences the diffusion of information. Hence, as part of our investigation of male and female
teachers on Pinterest, it is essential to compare their centrality. To this end, we computed three
notable measures of centrality, namely eigenvector centrality, closeness centrality, and betweenness
centrality. Eigenvector centrality assigns a high score to a node if connected to other high central
nodes. Closeness centrality is the average length of the shortest path between the node and all other
                                                  98


            1.0                                              1.0                                             1.0
                                                                                                                                        Female Teachers
                                                                                                                                        Male Teachers
            0.8                                              0.8                                             0.8
            0.6                                              0.6                                             0.6
   P(X>x)                                               P(X>x)                                          P(X>x)
            0.4                                              0.4                                             0.4
            0.2                                              0.2                                             0.2
                      Female Teachers                                 Female Teachers
                      Male Teachers                                   Male Teachers
            0.0                                              0.0 −5                                          0.0
                    10−36 10−29 10−22 10−15 10−8 10−1           10     10−4    10−3    10−2      10−1               10−10 10−8 10−6 10−4             10−2
                         Eigenvector centrality (X)                      Closeness centralit (X)                        Betweenness centrality (X)
                  (a) Eigenvector centrality                       (b) Closeness centrality                      (c) Betweenness centrality
Figure 4.43: The CCDF of centrality measures for male and female teachers. x-axes are in log
scale.
nodes in the graph. Thus the more central a node is, the closer it is to all other nodes. Finally,
betweenness centrality quantifies the number of times a node acts as a bridge along the shortest
path between two nodes. Figure 4.43 demonstrates the CCDF of these three centrality measures
for male and female teachers. As it can be observed from this figure, both gender groups enjoy
very similar values of centrality. However, the betweenness centrality is slightly higher for male
teachers. Regardless of the gender, we can observe that centrality measures follow a power-law
distribution where most nodes have low centrality while a small percentage has massive scores. In
conclusion, based on our findings, the positional importance of teachers on Pinterest is not driven
by their gender.
4.4.3              Gender Homophily
Homophily is a notion that individuals with similar personal or social traits tend to have a relation-
ship with each other [146], as eloquently put in the famous proverb “birds of a feather flock together”.
Homophily-driven relationships and interactions based on different attributes such as race, gender,
religion, and education level have long been identified and studies in the sociology literature [147].
With the advent of online social media platforms and the formation of friendships/ties on these plat-
forms, homophilic behaviors have been identified and measures on these networks [148, 149, 150].
Some studies have pointed out positive impact of homophily e.g., improving coordination [151],
                                                                                 99


enhancing tolerance and cooperation [152], formation of social norms [153], better diffusion of
information [154, 155]. However, it has been shown that homophily causes negative effects, e.g.,
political polarization [150], reducing diversity and negatively impacting minorities [156, 157].
     Regarding teachers in social media, as explained by Frank et al. [25], social media networks
provide a great potential to reduce differences among teachers by diffusion of information and
exchange of social capital. Nevertheless, because of the homophilic behavior of teachers (like
other human beings), such potential can be disrupted [25]. To devise effective measures to prevent
or at least mitigate this disruption, the first and essential task is to understand and characterize
homophily among teachers in social media. Therefore, in this part of the dissertation, we analyze
gender homophily among our identified teachers on Pinterest .
     We perform our homophily analysis through the lens of dyads and triads in the network. A dyad
is a pairwise relationship between two individuals, which is the basic structure within a network and
the core of any “intersubjective relationship” [158]. A triad or triangle is the relationship between
three individuals, which acts as the building block of social order and society [159]. Dyadic and
triadic relationships play an essential role in classic sociology [160]. Hence, studying gender
homophily through dyadic and triadic relationships offers a better picture of homophilic structure
in the network.
     To characterize dyads, we recognize seven types of such relationships as demonstrated in
Figure 4.44. Each circle denotes either a male or female teacher. We have three types of male-
female relationships: a male follows a female, a female follows a male, both follow each other
(reciprocal), which, respectively, are denoted as Type 1, Type 2, and Type 3 in Figure 4.44. These
three types are non-homophilic relationships since connections have been established between
the opposite genders. Type 4 denotes a female teacher follows another female teacher and Type 5
represents a bidirectional relationship between two female teachers. These two types are homophilic
relationships since they involve only the same gender. Similarly, we define Type 6 and Type 7 for
male-only relationships as illustrated in Figure 4.44.
     Figure 4.45a demonstrates the proportion of each type in our dataset. Since the number of
                                                  100


                               Non-homophilic
          Type 1                     Type 2                     Type 3
      M             F             F            M            M              F
                      Homophilic
          Type 4                      Type 5
                                                                        M      Male Teacher
       F            F              F             F
                                                                        F      Female Teacher
                                                                               Half-connected
                     Homophilic                                                Reciprocal
          Type 6                      Type 7
      M            M               M            M
                                     Figure 4.44: Dyad types.
male teachers and female teachers is not equal, we normalized each type and divided it by the total
number of such types that could potentially exist. More specifically, the number of relationships of
Type 1, Type 2, and Type 3 are divided by 11675
                                                  1592
                                             1    × 1 since we have 11675 female teachers and
1592 male teachers–See Table 4.1. The number of relationships of Type 4 and Type 5 are divided
by 11675                                                                1592
         
      2 . Finally, for Type 6 and Type 7, the normalization factor is 2 . As it can be observed
from Figure 4.45a, Type 6 has the highest value. One might wonder that this is due to the artifact of
the lower normalization factor for Type 6 since the number of male teachers is significantly smaller
than female teachers. Nevertheless, Type 7 has the same factor while its proportion is smaller than
Type 6. The high value for the percentage of dyads of Type 6 indicates a high degree of homophily
between male teachers. We speculate this behavior is due to the low representation of male teachers
                                                101


                     0.0200
                     0.0175                                                                                         0.010
                     0.0150
                                                                                                                    0.008
 Normalized proportion                                                                          Normalized proportion
                     0.0125
                                                                                                                    0.006
                     0.0100
                     0.0075                                                                                         0.004
                     0.0050
                                                                                                                    0.002
                     0.0025
                     0.0000                                                                                         0.000
                               Type 1   Type 2   Type 3     Type 4 Type 5   Type 6   Type 7                                 Non-homophilic       Homophilic (female)   Homophilic (male)
                                                          Dyad types
                             (a) Proportion of different types of dyads                                                 (b) Homophilic and non-homophilic dyads
                                                    Figure 4.45: Gender homophily in dyadic relationships.
on Pinterest, which encourages them to seek each other actively on this platform. Figure 4.45b
aggregates the results in Figure 4.45a based on homophilic and non-homophilic relationships. We
can observe that homophilic relationships prevail over non-homophilic ones—nonetheless, there is
a considerable percentage of non-homophilic relationships between male and female teachers.
                             Type 1                                    Type 2                                               Type 3                                     Type 4
                         M                   F                    M                     F                               F                    F                    M                        M
                                M                                           F                                                   F                                           M
                                                                                Figure 4.46: Triad types.
                         In addition to dyads, we also investigated triads. Similarly, we identified four types of triads as
shown in Figure 4.46. Type 1 denotes a relationship between two male teachers and a female one.
Type 2 denotes a relationship between two female teachers and a male teacher. Type 3 and Type 4
are all-female and all-male relationships, respectively. We normalized the number of occurrences
of Type 1, Type 2, Type 3, and Type 4 through dividing them by 11675    × 1592
                                                                                11675
                                                                                        × 1592
                                                                                              
                                                                   1        2 ,    2        1 ,
 11675       1592
   3 , and      3 , respectively. Figure 4.47a demonstrates the proportion of different types of
                                                                                              102


                              1e−6                                                                            1e−6
                       2.00                                                                            2.00
                       1.75                                                                            1.75
                       1.50                                                                            1.50
   Normalized proportion                                                           Normalized proportion
                       1.25                                                                            1.25
                       1.00                                                                            1.00
                       0.75                                                                            0.75
                       0.50                                                                            0.50
                       0.25                                                                            0.25
                       0.00                                                                            0.00
                                     Type 1   Type 2           Type 3   Type 4                                  Non-homophilic   Homophilic (female)   Homophilic (male)
                                                    Triad types
                           (a) Proportion of different types of triads.                                    (b) Homophilic vs. homophilic in triads.
                                                 Figure 4.47: Gender homophily in triadic relationships.
triads in our network. As can be observed from this figure, all-male triadic relationships are the
most common type of relationship. Interestingly, the proportion of Type 1 is higher than Type
2, which means while two males establish homophilic relationships, their higher-order structure
is less homophilic. Figure 4.47b shows the proportion of homophilic vs non-homophilic triadic
relationships. Based on this figure, homophilic relationships prevail over non-homophilic ones.
           In conclusion, our empirical evaluation shows that both in dyadic and triadic relationships,
homophily between teachers exists. Nevertheless, non-homophilic relationships have also been
established. Knowing the fact that more male teachers are being introduced in K-12 education [119,
120], it would be interesting to investigate whether the same homophilic patterns persist or not in
the future.
                                                                                 103


                                              CHAPTER 5
        DIFFUSION OF TEACHER-CURATED RESOURCES ON SOCIAL MEDIA
As mentioned before, teachers’ primary motivation to join online social media, especially Pinterest,
is to curate resources for their pedagogical activities. These resources come from other teachers
who have joined social media and established ties with their peers. These ties promote diffusion
of information on social media and help teachers access an extensive collection of educational
resources [45]. Given this, what makes social media very efficient for educational resource curation,
unlike, say, asking a colleague, is the fast and widespread diffusion of resources across the network.
Moreover, the fast and widespread diffusion of resources allow teachers to cross the traditional
school-level boundaries and facilitate large-scale collaboration. Hence, it is of great importance
to study the diffusion of teacher-curated resources on social media. Nevertheless, two challenges
need to be addressed. The first challenge is concerned with the data. We need to construct
the entire diffusion process of a sufficiently large sample of resources so we can perform an
effective data-driven study. This process should entail several key elements about the diffusion,
including the teacher who has initially curated a resource (i.e., the producer), other users who have
further re-pinned (adopted) the resource, and the time when the resource has been re-pinned by
a user. Essentially, we need to construct the diffusion tree of a resource, as will be explained in
Section 5.1. Figure 5.1 illustrates an example of diffusion tree. Unfortunately, a large-scale dataset
of diffusion trees is missing in current studies. Hence, to fill this gap, we construct the diffusion
trees of more than 1 million teacher-curated resources on Pinterest. The second challenge is how
we can characterize the diffusion process. More specifically, we need to have some measures
quantifying the dynamics of diffusion. To this end, we introduce three crucial measures about the
diffusion of resources on Pinterest. These measures consider several key aspects of the diffusion of
information on Pinterest, namely the number of users who have received a teacher-curated resource,
the popularity of a resource, and how fast a resource has been diffused. Using these measures, we
perform a large-scale analysis of the diffusion of teacher-curated resources and answer two crucial
                                                  104


research questions: a) do different resources (e.g., in terms of their topics) affect the diffusion?,
and b) how teacher attributes (e.g., the number of followers) affect the diffusion?
    The remainder of this chapter is organized as follows. First, in Section 5.1, we discuss the
constructed dataset of diffusion trees. Then, in Section 5.2, we introduce three measures to
characterize the diffusion. Finally, in Section 5.3, we present the results of our diffusion analysis
and answer the two research questions.
                                          u2
                                           Sun, 14 Jun 2015 05:22:45
                                          u3                                 u6
                                          Wed, 17 Jun 2015 22:37:44        Thu, 18 Jun 2015 11:22:03
      u1
     Sat, 13 Jun 2015 00:05:36
                                          u4
                                          Sat, 13 Jun 2015 14:47:40
                                          u5
                                         Sat, 13 Jun 2015 05:16:19
                               Figure 5.1: An example of diffusion tree.
                                                     105


5.1     Dataset: Diffusion Trees
    As mentioned above, to investigate the diffusion of curated resources on Pinterest, we need
to construct diffusion trees of resources. A diffusion tree is a tree representing the cascade of
information among users. Formally, we define a diffusion tree as a directed graph 𝑇 = (𝑈, 𝐸, 𝑝, 𝑟),
where 𝑈 is the set of users participating in the diffusion, 𝐸 is the set of directed edges between users
𝑈, 𝑝 is the pin being diffused among 𝑈, and 𝑟 is the root of the tree– a teacher who has initially
curated pin 𝑝. Each edge 𝑒 = (𝑢𝑖 , 𝑢 𝑗 ) ∈ 𝐸 indicates that user 𝑢𝑖 ∈ 𝑈 has re-pinned (received) pin
𝑝 from user 𝑢 𝑗 ∈ 𝑈. Figure 5.1 demonstrates an example of diffusion tree. In this Figure, user 𝑢 1
(the root) has curated a resource, which has been diffused in the network and further re-pinned by
users 𝑢 2 , 𝑢 3 , 𝑢 4 , 𝑢 5 , and 𝑢 6 .1 In this example, we have also shown the curation time below each
node. As it can be seen, this pin has been diffused very fast, which signifies the power of social
media in the rapid diffusion of (educational) resources.
    We created diffusion trees for 1,162,983 unique pins curated by our identified teachers.2 To
create an edge, we used the parent pin field in our dataset, which holds the previous pin (refer to
Table 2.1). If a pin does not have the parent pin, it means it is in the root of the tree, e.g., the one
curated by user 𝑢 1 in Figure 5.1. Note that the content of a pin (i.e., image/video, description, etc.)
does not change when it gets diffused. However, Pinterest assigns a unique identifier to each pin
once a user re-pins it. Using these identifiers, we could trace back pins and construct diffusion
trees. Moreover, we created trees for all types of resources curated by teachers, either educational
or non-educational. We did this for two reasons. First, through incorporating non-educational
pins, we can effectively contextualize how educational resources, compared to non-educational
ones, are diffused. Second, in addition to investigating the diffusion of teacher-curated resources
on Pinterest, the overarching goal of this dissertation is the behavior analysis of teachers in social
media. Therefore, we believe investigating the diffusion of all types of teacher-curated resources
is contributing to this goal. Finally, it is worth mentioning that our dataset of diffusion trees is the
    1 Names     of the users have been redacted for privacy purposes.
    2 Identified    teachers are the same teachers used in Chapter 4.
                                                         106


largest dataset of teacher-curated resources on Pinterest, which can foster future research on the
diffusion of resources on social media.
5.2    Characterizing Diffusion
    Now we have diffusion trees; we need to characterize them based on some measures. These
measures should signify what previous studies have emphasized about the diffusion of educational
resources on social media, particularly Pinterest, namely a) educational resources are diffused in
a large scale manner among teachers, and b) the diffusion of educational resources is fast [4, 133,
45, 78]. We adopted these measures from Han et al. [161]. They introduced them to study the
diffusion of information on Pinterest.
5.2.1   Volume
The first measure is the volume (𝑉 𝐿), which is defined as the number of nodes in a diffusion tree:
                                              𝑉 𝐿(𝑇) = |𝑈|                                         (5.1)
For instance, the the volume of the tree in Figure 5.1 is 6. Despite its simplicity, the volume
has a significant implication since it informs us how widely a piece of information has been
diffused. In particular, the number of users who have received a piece of information is used in
the popularity prediction/assessment of information on social media, e.g., the number of retweets
on Twitter [162, 163]. Pertinent to this chapter, by determining the volume, we can ascertain how
much other users/teachers are interested in a teacher-curated resource.
5.2.2   Virality
The volume, while still being important, reports the number of individuals who have adopted
a resource. Nevertheless, depending on the structure of a diffusion tree, this adoption can take
different forms. To fix the idea, Figure 5.2 demonstrates three distinct diffusion trees, all having
volume 8. In 𝑇1 , there is a broadcast from the root to other nodes where only the root has participated
                                                  107


                     T1                               T2                                    T3
            VI (T1) = 0.875                        VI (T2) = 1.26                          VI (T3) = 1.5
  Figure 5.2: Three structurally different trees with the same volume but different virality values.
in the information propagation. However, in 𝑇2 , the resource has been relayed by different nodes
where more nodes have participated in the diffusion. 𝑇3 is an extreme scenario where we see a
chain-wise ‘deep’ tree, and the message has been passed on consecutively. Distinguishing between
diffusion scenarios based on their tree structure informs us about the virality and penetration of
a message across the network [161]. Furthermore, such distinction is important in the context of
teachers in social media as we can specify how others have responded to a teacher-curated resource.
To this end, we define the virality (𝑉 𝐼) of a diffusion tree:
                                                 2               Õ
                              𝑉 𝐼 (𝑇) =                                   𝑑 (𝑢𝑖 , 𝑢 𝑗 )                  (5.2)
                                         (|𝑈|) × (|𝑈| − 1)
                                                              ∀𝑢𝑖 ,𝑢 𝑗 ∈𝑈
where 𝑑 (𝑢𝑖 , 𝑢 𝑗 ) is the shortest distance between two users 𝑢𝑖 and 𝑢 𝑗 in the diffusion tree 𝑇. The
sum of shortest distances between nodes in a graph is known as Wiener Index [164, 165]. Han et al.
[161] used a similar metric for the virality. The term (|𝑈|)×(|𝑈|−1)  2        normalizes the Wiener Index.
5.2.3   Velocity
In addition to the number of people who have received a resource and how viral the resource has
become, the speed of the diffusion is also important. In particular, previous studies have pointed
out that the fast diffusion of educational resources on social media and then to the classroom is
what makes online social media very appealing to teachers [4, 91, 166]. Hence, the third diffusion
                                                    108


measure is about the velocity (or speed) of diffusion. To this end, we introduce two metrics. The
first metric is the average re-pin time, which calculates the average time between two re-pins in the
diffusion tree. The average re-pin time (𝐴𝑅𝑇) for a diffusion tree is defined as follows:
                                                 1     Õ
                                  𝐴𝑅𝑇 (𝑇) =                 𝑢 𝑗 (𝑡) − 𝑢𝑖 (𝑡)                         (5.3)
                                              |𝑈| − 1
                                                      ∀𝑒∈𝑇
where (𝑢𝑖 , 𝑢 𝑗 ) is an edge in the diffusion tree and 𝑢𝑖 (𝑡) (𝑢 𝑗 (𝑡)) denotes the re-pin time by user 𝑢𝑖
(𝑢 𝑗 ). Note that, in Eq. 5.3, we have subtracted 𝑢𝑖 (𝑡) from 𝑢 𝑗 (𝑡) since user 𝑢𝑖 has received the pin
earlier. Given the fast diffusion of information on social media, we use an hour as the scale of
time. 𝐴𝑅𝑇 for the example tree demonstrated in Figure 5.1 is 46.2 hours. Sometimes a resource
can continue to get further diffused for a long time (say months), and thus makes 𝐴𝑅𝑇 (𝑇) large.
Therefore, to better capture the velocity of diffusion, we additionally define the first re-pin time
(𝐹 𝑅𝑇). It is the amount of time from the initial curation of a pin to when someone re-pins it for
the first time:
                             𝐹 𝑅𝑇 (𝑇) = 𝑚𝑖𝑛{𝑢𝑖 (𝑡) − 𝑟 (𝑡)} 𝑠.𝑡. (𝑟, 𝑢𝑖 ) ∈ 𝐸                        (5.4)
where 𝑟 (𝑡) denotes the time that the root has curated the pin. 𝐹 𝑅𝑇 for the example tree in Figure 5.1
is 5.16 hours.
5.3      Diffusion Analysis
     In this section, we analyze the constructed diffusion trees. First, in Section 5.3.1, we present
some statistics about diffusion measures. Then, in Section 5.3.2, we present the results of our
investigation regarding how different types of resources are diffused. Finally, in Section 5.3.3, we
analyze the relationship between the introduced diffusion measures and some teacher attributes.
5.3.1    Distribution of Diffusion Measures
In this part, we look into the statistics and distributions of the three diffusion measures. Figure 5.3
demonstrates the CCDF of the volume and vitality for all resources. We can observe that both the
                                                    109


volume and virality follow a power-law distribution where most resources have low volume and
vitality, and a small percentage has very high values for these two measures. In addition, Table 5.1
shows some statistics about the virality, volume, and velocity measures. According to this table,
only the top 0.1% of diffused resources have the volume and virality larger than 174 hours and 5.99
hours, respectively. This means that there are a handful of resources curated by teachers that have
become extremely popular. This is in line with previous studies on the virality and popularity of
information on social media, where they have shown that some information becomes significantly
viral across the network [167, 168, 161]. Finally, on average, around five people have adopted each
teacher-curated resource on Pinterest.
           1.0                                                    1.0
           0.8                                                    0.8
           0.6                                                    0.6
  P(X>x)                                                 P(X>x)
           0.4                                                    0.4
           0.2                                                    0.2
           0.0                                                    0.0
                      101           102        103                      100                          101
                            Volume (X)                                                Virality (X)
                      (a) Volume (Eq. 5.1)                                     (b) Virality (Eq. 5.2)
                 Figure 5.3: The CCDF of the volume and virality. x-axes are in log scale.
Table 5.1: Some statistics of the introduced diffusion measures of the constructed diffusion trees.
           Measure     Min         Max        Mean     Median               Std      top 0.1%              top 0.01%
           Volume        2         1,129        5.4      2                 13.58       > 174                  > 434
           Virality      1         29.72       1.33      1                  0.54      > 5.99                 > 11.45
            ART       0.0012      2,159.4     192.4     35.8               317.3    > 1,950.7               > 2,113.2
            FRT       0.0008     65,655.0    1,814.4    12.5              4,975.0   > 45,020.7             > 56,960.9
                                                       110


           1.0                                                       1.0
           0.8                                                       0.8
           0.6                                                       0.6
  P(X>x)                                                    P(X>x)
           0.4                                                       0.4
           0.2                                                       0.2
           0.0                                                       0.0
                 10−3 10−2 10−1 100 101 102         103                    10−2       100         102     104
                            Avg re-pin time (X)                                   First re-pin time (X)
                    (a) Average re-pin time (Eq. 5.3)                      (b) Fist re-pin time (Eq. 5.4)
                      Figure 5.4: The CCDF of the velocity measures. x-axes are in log scale.
     Figure 5.4 shows the CCDF of the velocity measures: the average re-pin time (𝐴𝑅𝑇) in
Figure 5.4a and the first re-pin time (𝐹 𝑅𝑇) in Figure 5.4b. Unlike the volume and virality, neither
of the velocity measures follows a power-law distribution. Moreover, there is a significant difference
between the mean and median for 𝐴𝑅𝑇 and 𝐹 𝑅𝑇. Specifically, while pins have a small median of
the average re-pin and the first re-pin times (35.56 and 12.56 hours, respectively), their means have
been skewed because of some outliers.
     From the results presented in Figures 5.3 and 5.4 and Table 5.1, we can conclude that teacher-
curated resources diffuse rapidly and are received by a significant number of other users on Pinterest,
including other teachers. While this finding has been reported before [4, 133, 45, 78], this is the first
study that confirms it through a large-scale data-driven assessment. In the next part, we perform
a more detailed evaluation of the diffusion of teacher-curated resources based on the resources’
attributes.
                                                          111


5.3.2         Resource Attributes and Diffusion Measures
In this part, we analyse the diffusion measures based on two crucial attributes of pins, namely the
topic and the domain.
             home_decor
                education
                diy_crafts
                       kids
         holidays_events
               food_drink
       film_music_books
                   quotes
            photography
        womens_fashion
             hair_beauty
                    humor
           health_fitness
              technology
                     travel
               celebrities
Topic              history
                 outdoors
                      geek
                        art
               gardening
                weddings
                   tattoos
                  animals
                    design
                 products
                    sports
           mens_fashion
             architecture
          science_nature
   illustrations_posters
       cars_motorcycles
                  for_dad
                              0   1           2              3            4            5             6
                                                           Volume
                                  Figure 5.5: The average volume per topic.
5.3.2.1           Topic
Figure 5.5 shows the average value of the volume per topic. As shown in this figure, the pins whose
topic is education have the highest volume where on average, each of such pins is adopted by six
users on Pinterest. Interestingly, kids is the second topic in terms of volume. This can partially be
explained by the similarity of this topic with education and being attractive to teachers, especially
for pre-kindergarten or homeschooling-specific materials. Regarding the other topics, they all have
low volumes, mostly below 4. Also, since the predominant topic is education and there is a small
amount of data for the other topics, they exhibit high standard errors.
        Figure 5.6 shows the average value of virality per topic. Similar to the volume, education has the
highest virality. This signifies the high penetration of teacher-curated educational resources across
                                                     112


             home_decor
                education
                diy_crafts
                       kids
         holidays_events
               food_drink
       film_music_books
                   quotes
            photography
        womens_fashion
             hair_beauty
                    humor
           health_fitness
              technology
                     travel
               celebrities
Topic              history
                 outdoors
                      geek
                        art
               gardening
                weddings
                   tattoos
                  animals
                    design
                 products
                    sports
           mens_fashion
             architecture
          science_nature
   illustrations_posters
       cars_motorcycles
                  for_dad
                          0.0   0.2           0.4        0.6              0.8       1.0   1.2     1.4
                                                               Virality
                                      Figure 5.6: The average virality per topic.
Pinterest. The average value of virality for the topic kids is relatively high. Moreover, comparing
the volume and virality values in Figure 5.6 and Figure 5.5, we can infer that the high volume does
not necessarily mean high virality. For instance, the pins whose topic is quotes, on overage, have
relatively high virality while their volume is not very high.
        Figures 5.7 and 5.8 demonstrate the median of the average re-pin time and the first re-pin time,
respectively. Here, we used the median to plot the charts since, as mentioned in Section 5.3.1, the
values of 𝐴𝑅𝑇 and 𝐹 𝑅𝑇 of our constructed diffusion trees have some outliers. Furthermore, there
is a small amount of data for certain topics, and consequently, their velocity measures are very
skewed, as can be observed from Figures 5.7 and 5.8. Hence, for clarity purposes, in Figure 5.9,
we have included the median of the velocity measures for topics whose pin proportion is at least
10%. We make two major observations based on the results shown for the velocity measures.
First, education has a short average re-pin time and first re-pin time. In particular, the median of
the first re-pin time for education is only 12 hours. This means a teacher-curated resource takes
approximately only half a day to be adopted by another user on Pinterest, which signifies the fast
                                                         113


                 education
                 diy_crafts
                        kids
          holidays_events
                food_drink
              home_decor
             photography
         womens_fashion
              hair_beauty
                    quotes
                     humor
        film_music_books
                      travel
            health_fitness
                    history
                         art
 Topic            outdoors
                gardening
                   animals
                 weddings
                  products
                       geek
            mens_fashion
               technology
                     design
                    tattoos
    illustrations_posters
                     sports
        cars_motorcycles
           science_nature
                celebrities
              architecture
                               0        200       400        600                800        1000   1200   1400
                                                                         Avg re-pin time
                                   Figure 5.7: The median of the average re-pein time per topic.
diffusion of educational materials across Pinterest. Second, compared to the first re-pin time, the
average re-pin time is generally longer. We believe this occurs because a user quickly saves a pin
curated by the root, and then the pin is spread across the network at a lesser pace. There are a few
exceptions, however, e.g., travel, art. We speculate this is due to the special nature of these topics
to teachers, where their pins might take some time to attract someone’s attention, while once they
do, they eventually get diffused.
5.3.2.2          Domain
In this part, we investigate the diffusion of teacher-curated resources based on their domains. For
this analysis, we only included the top 10 domains utilized by teachers. Figure 5.10 shows the
average volume and virality values for the top 10 domains of teacher-curated resources. Figure 5.11
shows the median of the average re-pin time and first re-pin time for the same top 10 domains. We
make the following observations based on these results.
         Except for youtube.com and Uploaded by User (directly uploaded from the user’s device), the rest
of the domains are specifically related to education, e.g., moffattgirls.blogspot.com. Interestingly,
                                                                   114


                    home_decor
                       education
                       diy_crafts
                              kids
                holidays_events
                      food_drink
              film_music_books
                          quotes
                   photography
               womens_fashion
                    hair_beauty
                           humor
                  health_fitness
                     technology
                            travel
                      celebrities
 Topic                    history
                        outdoors
                             geek
                               art
                      gardening
                       weddings
                          tattoos
                         animals
                           design
                        products
                           sports
                  mens_fashion
                    architecture
                 science_nature
          illustrations_posters
              cars_motorcycles
                         for_dad
                                     0           1000                       2000        3000                   4000              5000            6000                       7000         8000
                                                                                                               First re-pin time
                                               Figure 5.8: The median of the first re-pin time per topic.
                                                                                                                  home_decor
              education
                                                                                                                    education
              diy_crafts
                                                                                                                    diy_crafts
                    kids
                                                                                                                          kids
         holidays_events
              food_drink                                                                                       holidays_events
            home_decor                                                                                              food_drink
 Topic                                                                                                 Topic
         womens_fashion                                                                                   film_music_books
                 quotes                                                                                                quotes
                 humor                                                                                         womens_fashion
    film_music_books                                                                                                   humor
                  travel                                                                                                travel
                     art                                                                                                   art
                           0         20         40                     60          80          100                               0   20    40    60             80          100    120    140
                                                     Avg re-pin time                                                                                    First re-pin time
                                     (a) Average re-pin time                                                                              (b) First re-pin time
                                          Figure 5.9: The median of the velocity measures for the top topics.
we can observe that pins from moffattgirls.blogspot.com have the highest volume. This blog is run
by a former elementary school teacher who exclusively produces educational materials. Our further
investigation reveals that this teacher is a prolific and influential user on teacherspayteachers.com.3
Hence, it is no surprise that her educational materials are of interest to many others. Moreover, the
pins from this domain have high virality and are diffused very fast. We have showcased an example
pin from moffattgirls.blogspot.com, which has been re-pinned by 936 other users on Pinterest.
          3 https://www.teacherspayteachers.com/Store/The-Moffatt-Girls
                                                                                                     115


             themeasuredmom.com                                                                                                   themeasuredmom.com
           moffattgirls.blogspot.com                                                                                          moffattgirls.blogspot.com
           teacherspayteachers.com                                                                                            teacherspayteachers.com
           notimeforflashcards.com                                                                                               notimeforflashcards.com
             classroomfreebies.com                                                                                                classroomfreebies.com
 Domain                                                                                                              Domain
                    pre-kpages.com                                                                                                       pre-kpages.com
                  Uploaded by user                                                                                                      Uploaded by user
           blog.maketaketeach.com                                                                                                blog.maketaketeach.com
                       youtube.com                                                                                                           youtube.com
     fantasticfunandlearning.com                                                                                         fantasticfunandlearning.com
                                       0   1        2        3        4             5     6        7        8                                            0.0      0.2        0.4        0.6            0.8        1.0        1.2        1.4
                                                                          Volume                                                                                                                  Virality
                               (a) The average volume per domain                                                                                    (b) The average volume per domain
Figure 5.10: The average of the volume and virality for the top 10 domains of teacher-curated
resources.
             themeasuredmom.com                                                                                                     themeasuredmom.com
           moffattgirls.blogspot.com                                                                                              moffattgirls.blogspot.com
           teacherspayteachers.com                                                                                                teacherspayteachers.com
            notimeforflashcards.com                                                                                                notimeforflashcards.com
             classroomfreebies.com                                                                                                  classroomfreebies.com
  Domain                                                                                                                Domain
                  Uploaded by user                                                                                                         pre-kpages.com
            blog.maketaketeach.com                                                                                                       Uploaded by user
                    pre-kpages.com                                                                                                blog.maketaketeach.com
      fantasticfunandlearning.com                                                                                                             youtube.com
                       youtube.com                                                                                            fantasticfunandlearning.com
                                       0       10       20       30                  40       50       60       70                                            0         10         20                   30              40         50
                                                                      Avg re-pin time                                                                                                         First re-pin time
          (a) Median of the average re-pin time per domain                                                                (b) The median of the first re-pin time per domain
Figure 5.11: The median of the velocity measures for the top 10 domains of teacher-curated
resources.
These inspiring and active teachers signify the power of teachers in social media and how they can
impact their fellow teachers in this digital age.
            Materials from teacherspayteachers.com also have high volume and virality, which indicates
the popularity of educational materials from this source. Interestingly, the velocity measures for
pins from teacherspayteachers.com have a short first re-pin time while relatively longer average
re-pin time. This is due to the artifact that pins from this popular source keep being diffused across
Pinterest for a long time; thus, the average re-pin time becomes long.
            Another observation is the long first re-pin time for pins whose domain is Uploaded by User.
We believe this is because of the following reason. Since these pins have essentially no domain
(i.e., they do not come from a website on the Internet), other users might hesitate to adopt them
quickly, e.g., due to the lack of trust. Nevertheless, once they accumulate some initial popularity,
                                                                                                                116


Figure 5.12: A showcase of a popular pin from moffattgirls.blogspot.com adopted by 936 other
users.
they become more popular and diffuse across the network.
5.3.3   Teacher Attributes and Diffusion
In Section 5.3.2, we demonstrated that attributes of a teacher-curated resource, namely its topic
and domain, are related to its diffusion. In addition to the resource itself, a resource producer (here
a teacher) can also affect the diffusion process [169]. For instance, there is a rich literature on
identifying influential spreaders in social media based on the spreader’s attributes [170, 171, 172].
Hence, in this part, we investigate whether teacher attributes are related to the diffusion of the
resources they curate. To this end, we consider ten teacher-related attributes and inspect their
relationship to diffusion measures as explained in the following.
    We consider the number of pins and the number of boards for the first two attributes, respectively.
The reason to include these two attributes is to assess whether a teacher’s activity level leads to
the widespread and fast diffusion of their materials. For the next attribute, we consider the number
of followers. The reason to include this attribute is that the resources of a teacher who has more
followers might enjoy a higher chance of being disseminated in the network. Following this
argument, we also include the number of followees and the total number of friends (i.e., followers
                                                 117


and followees combined) to discover how these two attributes affect the diffusion measures. In
Section 4.4.1, we defined reciprocity, which captures the proportion of bidirectional connections
of a user. Here, we include reciprocity as a teacher’s attribute to investigate its relationship to
diffusion. The idea is to investigate whether having a stronger connection between a teacher
and their online friends, measured via reciprocity, affects the diffusion. Moreover, we include
three crucial centrality attributes, namely eigenvector, betweenness, and closeness centrality. As
mentioned before, centrality attributes calculate the structural importance (or influence) of a node
in a network. The question is, do resources of the more central teachers have a higher chance to be
adopted by other users and perhaps at a faster rate? The final attribute includes the local clustering
coefficient, which quantifies how close a user’s neighbors are to a complete graph (a clique).
Including the local clustering coefficient is particularly important since previous studies [25, 173]
have shown that cliques in school-level teacher networks can lead to a better diffusion of information.
    To investigate the relationship between teacher attributes and diffusion measures, we perform
four regression analyses. In each analysis, teacher attributes are the independent variables used
to predict the corresponding diffusion measure, i.e., the dependent variable. Thus, we attempt to
determine how much information in each teacher’s attribute can explain a diffusion measure. Note
that we only investigate teachers who are the roots of the diffusion trees since we aim to determine
the attributes of the producers of pins, not others who further do the re-pinning. Given this, a teacher
can be associated with more than one diffusion tree as the root. Hence, to perform a teacher-level
analysis, we aggregated values of each diffusion measure for all diffusion trees associate with a
teacher. For the volume and virality, we computed the mean values. For the velocity measures, we
computed the median since it offers a better estimation than the mean, as discussed in Section 5.3.2.
Finally, we used the statsmodels package [174] in Python and fit ordinary least squares (OLS) for
each regression analysis.
    Tables 5.2, 5.3, 5.4, and 5.5 show the results of the regression analysis for the volume, the
virality, the average re-pin time, and the first re-pin time, respectively. Each table has four columns.
The first column is the coefficient of each teacher attribute in the regression analysis. The larger
                                                    118


        Table 5.2: Regression analysis results of predicting volume using teacher attributes.
                   Attribute             Coefficient Std error         t          P > |t|
                     #Pins                4.449e-07 2.44e-06 0.182                0.855
                   #Boards                 -0.0011       0.000     -3.049         0.002
                  #Followers               -0.0005       0.000     -3.590         0.000
                  #Followees                0.0003       0.000      2.539         0.011
                   #Friends                -0.0003     9.35e-05 -2.976            0.003
                  Reciprocity               0.2359       0.084      2.816         0.005
            Eigenvector Centrality         54.8268      12.913      4.246         0.000
            Betweenness Centrality         28.2263      75.308      0.375         0.708
             Closeness Centrality           7.4408       0.190     39.110         0.000
         Local Clustering Coefficient       1.7113       0.283      6.039         0.000
                                         Mean squared error: 2.19 Adj. R-squared: 0.539
      Table 5.3: Regression analysis results of predicting the virality using teacher attributes.
                   Attribute             Coefficient Std error          t         P > |t|
                     #Pins               -2.419e-06 2.29e-07 -10.574               0.000
                   #Boards                 -0.0001     3.31e-05 -3.861             0.000
                  #Followers             -1.318e-05 1.41e-05 -0.936                0.349
                  #Followees             -3.879e-05 9.63e-06 -4.029                0.000
                   #Friends              -5.197e-05 8.77e-06 -5.923                0.000
                  Reciprocity               0.0894       0.008      11.379         0.000
            Eigenvector Centrality          2.5898       1.211       2.138         0.033
            Betweenness Centrality         24.0530       7.065       3.405         0.001
             Closeness Centrality           3.5286       0.018     197.698         0.000
         Local Clustering Coefficient       0.6131       0.027      23.064         0.000
                                         Mean squared error: 0.04 Adj. R-squared: 0.965
the magnitude of the coefficient, the more impact the attribute has on the corresponding predicted
diffusion measure. The second column shows the standard error of the coefficient. The third
column is the t-value retrieved from a t-test. It essentially calculates the difference between the
predicted value of a diffusion measure using an independent variable and the actual value of the
measure. The last column is the p-value of the t-test, which is a measurement of how likely a
coefficient measured through a regression model is by chance. In the last row of each table, we
have included two major pieces of information. The first one is the mean squared error, which
measures the difference between the actual values of a diffusion measure and the predicted values.
The second one is the adjusted R-squared, which measures how much of the independent variables
                                                 119


Table 5.4: Regression analysis results of predicting the average re-pin time using teacher attributes.
                    Attribute             Coefficient Std error         t            P > |t|
                      #Pins                  -0.0015       0.000    -4.077           0.000
                    #Boards                  0.1079        0.061     1.781           0.075
                   #Followers                0.0096        0.022     0.440           0.660
                  #Followees                 0.0020        0.016     0.127           0.899
                    #Friends                 0.0116        0.014     0.822           0.411
                  Reciprocity              -143.8068      17.347    -8.290           0.000
            Eigenvector Centrality        -8090.1645 1884.586 -4.293                 0.000
            Betweenness Centrality         8936.4525 1.1e+04         0.811           0.417
              Closeness Centrality          607.2976      36.660    16.566           0.000
         Local Clustering Coefficient       410.1702      62.894     6.522           0.000
                                          Mean squared error: 85667.98 Adj. R-squared: 0.263
  Table 5.5: Regression analysis results of predicting the first re-pin time using teacher attributes.
                  Attribute             Coefficient Std error         t              P > |t|
                    #Pins                  0.0308        0.005    -6.450             0.000
                  #Boards                  1.3398        0.691     1.940             0.052
                 #Followers                0.3589        0.294     1.223             0.222
                #Followees                -0.3620        0.201    -1.804             0.071
                  #Friends                -0.0031        0.183    -0.017             0.986
                Reciprocity             -1005.0319 163.868 -6.133                    0.000
          Eigenvector Centrality        -8.951e+04 2.53e+04 -3.544                   0.000
          Betweenness Centrality        1.404e+05 1.47e+05 0.953                     0.340
           Closeness Centrality         85375.8527 372.131 14.446                    0.000
      Local Clustering Coefficient       4871.5493      554.231    8.790             0.000
                                        Mean squared error: 18323220.30 Adj. R-squared: 0.143
( i.e., teacher attributes) is explained by changes in a dependent variable (i.e., a diffusion measure).
We make the following observations based on the results presented in these four tables.
     • The adjusted R-squared is high for the volume and virality. Nevertheless, its value is very low
         for the velocity measures. This means teacher attributes can explain the volume and virality
         while failing to do so for the velocity measures. This can also be inferred by looking at the
         mean squared errors whose values are low for the volume and virality while they are high for
         the velocity measures.
     • As far as the number of pins is concerned, their coefficients are generally low for all measures.
                                                     120


  This indicates that the diffusion of pins is not related to the high activity rate of their producer.
  This seems logical since merely saving more pins and creating more boards on Pinterest does
  not guarantee that these resources will be diffused widely. The only exception is the number
  of boards for the first re-pin time, for which the coefficient is positive and relatively large.
  We think the reason is that Pinterest users can independently follow a board without even
  following its curator. Hence, the more boards a user has, the higher chance someone can
  quickly re-pin from any of these board. However, based on our results, such rapid adoption
  has not necessarily led to the pin’s popularity (high volume) and virality.
• The coefficients of the number of connections (i.e., number of followers, followees, and
  friends) are very low. While the coefficient of the number of followers is relatively high for
  the first re-pin time, it is not statistically significant– See the P>|t| column. We think the low
  value of coefficients of the number of connections can be explained by the fact that Pinterest
  is essentially a social curation website where users can re-pin resources from others without
  following them.
• Reciprocity has a low coefficient for the virality while it is relatively high for the volume.
  A teacher with high reciprocity has a strong relationship with their online friends. Conse-
  quently, their friends trustfully re-pin their resources. Nevertheless, virality is a complicated
  measure that cannot be explained adequately by reciprocity. Moreover, as far as the velocity
  measures are concerned, reciprocity coefficients have a large magnitude and negative sign.
  The explanation behind this requires further investigation.
• The most critical finding of this part is the relationship between the centrality metrics and the
  volume and virality. Except for betweenness centrality for the volume, the centrality metrics
  can perfectly explain the virality and volume. The reason is that centrality metrics take into
  account the network’s structure, which plays an essential role in how information is diffused.
  For instance, a high eigenvector centrality means a teacher is connected to other high central
  users. Therefore, after these central neighbors re-pin a resource, there is a bigger chance
                                                 121


      the resource will be widely diffused because of their own high structural influence. Also,
      closeness and betweenness centrality have to do with the shortest paths in the graph, which
      play an essential role in the diffusion of information [175].
    • As mentioned before, the local clustering coefficient is an important factor in the diffusion
      of information in school-level teacher networks [25, 173]. Moreover, our results in the part
      of the dissertation indicate that this attribute is also crucial in the diffusion of information
      across the network of teachers on Pinterest.
    Based on the above observations, we can conclude that teacher attributes significantly affect
the volume and virality of their curated resources. In particular, the network-level structural
characteristics of a teacher play an essential role in determining the volume and virality of their
resources. In contrast, how fast these resources are diffused cannot be adequately determined by
teacher attributes.
                                                  122


                                             CHAPTER 6
                          CONCLUSION AND FUTURE DIRECTIONS
In this chapter, we provide a summary of our research results and further present promising future
research directions.
6.1     Summary
    In this dissertation, we proposed novel research in the three primary directions of teachers in
social media: automatic teacher identification, teacher gender analysis, and diffusion of teacher-
curated resources. Next, we summarize our contributions in each direction one by one.
    To supplement their students’ educational needs and improve their teaching quality, many
teachers turn to online social media platforms where an enormous number of educational resources
have been curated. Such resources are precious materials for teachers and students, especially with
the COVID-19 pandemic affecting traditional education. Hence, for the past few years, teachers
in social media have been the subject of many educational studies. Despite the progress in this
line of research, one of the major obstacles is the limited number of teachers being investigated.
More specifically, the current studies usually suffice to at most a few hundred surveyed/interviewed
teachers. However, to offer a better picture of teachers in online social media and enable modern data
science approaches to find meaningful patterns in the teacher-related data, we need to identify more
teachers. Thus, in the third chapter of this dissertation, we proposed a framework to automatically
identify teachers on Pinterest-an image-based social media popular among teachers. For the first
time, our framework formulated the teacher identification as a positive unlabelled learning task
where positive samples are a small set of surveyed teachers, and unlabelled samples are their
friends on Pinterest. We performed extensive experiments on a real dataset of teachers on Pinterest
and showed that our framework outperforms strong baselines. Moreover, using our framework,
we reliably identified thousands of other teachers on Pinterest. Finally, we believe our proposed
framework can improve the quality of many research endeavors concerned with studying teachers
                                                   123


in social media.
    Some studies in the education literature have shown that the gender of teachers affects their
behavior [43, 176, 177, 178]. For instance, male and female teachers may differ in the way
they structure their classroom, selecting topics and examples in their pedagogical practices [43].
However, while such behavioral differences between males and females have been investigated
before, there is a lack of study on how male and female teachers behave on social media. Such
investigation is crucial since online social media is now an integral part of the teacher’s professional
career development. Perhaps, one reason for the lack of such study had been the unavailability
of a rich and large-scale dataset of teachers on social media. Nevertheless, we addressed this
issue in the third chapter and identified a large dataset of teachers on Pinterest. Hence, we used
this dataset in the fourth chapter and performed a thorough exploratory analysis of male and
female teachers on Pinterest. We performed our study in two main parts: online activity analysis
of male and female teachers and their social network analysis. In the first part, we discovered
that male and female teachers curate similar types of resources and mainly utilize Pinterest for
educational purposes. Moreover, we performed a thorough investigation on the topic and domain
of the resources curated by teachers using several novel measures: the topic/domain entropy and
the topic/domain oscillation. As a result, we found out that male teachers are more focused on their
resource curation process while females are more receptive to exploring non-educational content.
We also identified the unique patterns that male and female teachers use social media in terms of the
time of their access. In the social network analysis part, we found out that male teachers tend to have
more connections and more actively follow other users on social media. Moreover, male and female
teachers showed having very similar structural centrality scores. Eventually, we investigated gender
homophily and identified some homophilic behavior, primarily by male teachers. In conclusion,
as far as teachers’ professional activities are concerned, based on our findings, males and females
behave very similarly and see social media as a means to advance their careers.
    Previous studies have referred to the widespread and fast diffusion of online educational re-
sources on social media [4, 133, 45, 78]. Nevertheless, they have shown this using qualitative
                                                  124


analysis (interviews/surveys with teachers) or anecdotal reports. Therefore, we recognized a need
to investigate the diffusion of resources on social media through a data-driven investigation. Thanks
to our proposed method in Chapter 3, we had access to a large set of teachers and their diffused
resources on Pinterest. Given this, in the fifth chapter, we performed an analysis of the diffusion
of teacher-curated resources on Pinterest. First, we built a large set of diffusion trees of these
resources on Pinterest. Then, we defined three crucial measures to characterize the diffusion pro-
cess, namely, i.e., volume, virality, and velocity. Our analysis of the diffusion of teacher-curated
resources showed that educational materials are disseminated across the network widely and in
a rapid manner. Eventually, we performed several regression analyses to determine what teacher
attributes affect the diffusion process. Our study showed that the structural attributes significantly
impact the diffusion of teacher-curated resources on Pinterest. We believe our large-scale data-
driven study in this chapter of the dissertation has deepened our understanding of the diffusion of
teacher-curated resources materials on Pinterest and can foster further research.
6.2     Future Directions
    In this section, we present several possible future directions across the major areas of teachers
in social media and how data science can help these critical directions.
     • Closing the loop in the automatic teacher identification. In the first chapter, we proposed a
       method to identify teachers on Pinterest automatically. We showed that by using this method,
       we could answer interesting research questions. However, it would be valuable to conduct
       surveys/interviews with these newly identified teachers. The main reason to do this is that
       we can benefit from both worlds: direct data of teachers in online social media as well as
       obtaining detailed and controlled information via surveys/interviews. In fact, educational
       researchers have recommended using both types of data to reach better conclusions [34]. It
       is worth mentioning that conducting surveys/interviews with automatically identified teach-
       ers is related to snowball sampling, where researchers recruit their samples (teachers) via
       broadcasting an invitation on social media [72]. Snowball sampling, however, has a low
                                                  125


  response rate; while using our method, we can precisely control whom we should contact for
  the survey/interview.
• Teachers in multiple social media platforms. Most current social media studies, including
  this dissertation, have investigated teachers in a single platform. However, teachers, like other
  people, use different platforms and probably for different purposes. Therefore, it would be
  interesting to examine teachers on multiple social media platforms. For instance, it would
  be interesting how information diffuses in different social media platforms, e.g., Pinterest
  vs. Twitter. One of the challenges for this direction is identifying the same teachers on
  multiple social media platforms. A preliminary solution to this challenge is to use network
  alignment [138], where the same nodes across the two networks are mapped together.
• Quality assessment of online educational materials. As shown in this dissertation, on-
  line educational resources are widely diffused across social media and are adopted by many
  teachers and used in classroom activities. Given this, an important research direction is to
  characterize the content of these educational resources and assess their quality. Moreover,
  with the proliferation of online misinformation [179, 180], assessing the quality of online
  educational material is particularly important. Moreover, as we press onward to utilize ma-
  chine learning models for developing practical educational applications, e.g., recommending
  educational materials to teachers and students, quality assurance of the disseminated on-
  line materials is critical. In particular, machine learning algorithms have been shown to be
  vulnerable to biased and low-quality content [181, 182].
• Unifying theoretical frameworks. Teachers in social media have been studied from the
  perspective of several theories, e.g., the affinity space, the community of practice, the pro-
  fessional learning network [46]. Despite this, there is no conclusive framework that could
  delineate what social media is to teachers? We believe empirical evaluation using a large-
  scale data-driven analysis, similar to this dissertation, can be a great help to demystify the
  core notion of social media to teachers.
                                              126


APPENDIX
   127


                                      THE ANNOTATION PROCEDURE
As mentioned in Chapter 3, we annotated some users to teachers and non-teachers. In this appendix,
we describe the annotation procedure. Figure .1 illustrates the flowchart of this procedure. In the
following, we describe each component of the procedure.
                                                       Teacher-related
                                                         Keywords
                                                                        Coder
                                                        Specifying              Labeling
                                                         Potential              Teacher
                                                         Teachers               Accounts
                                          Specifying
        Start                              Existing
                Self-Descriptions                                                          End
                                         Occupations
                                                                        Coder
                                           List of      Specifying              Labeling
                                        Occupations      Potential             Non-teacher
                                                       Non-teachers             Accounts
                            Figure .1: The flowchart of the annotation procedure.
A.1      Specifying Existing Occupations
                               Website                 Self-description
        Figure .2: An example of self-description and website URL in a Pinterest’s account.
    On Pinterest, users can include a maximum of 160 characters about themselves. This textual
information, which we call self-description, is the basis of our annotation procedure. In addition,
users are allowed to include an URL to their website, which is visible in their profile. Figure .2
illustrates an example of self-description and website address in an account. We acquired these
two pieces of information through the provided Pinterest API (refer to Section 2.2). In our dataset,
                                                     128


19,588 users had included the self-description (around 21% of users). In the following, we explain
how we used self-descriptions in our annotation procedure.
    Sometimes users mention their occupation in their self-description, e.g., “I am an accountant
living in ...”. Self-declared occupations are trustworthy information that can help us determine
the person’s occupation effectively. To this end, first, we acquired a list of 965 common human
occupations from an online repository.1 This popular repository maintains lists of words related to
different entities, e.g., travel, sport, religion. Our investigation discovered that their maintained list
of occupations contains typical occupations in our modern society. Second, we performed keyword
matching and determined those occupations that appeared in the self-descriptions of users. We
found the match for 102 occupations, e.g., designer, singer, therapist.
A.2     Specifying Potential Teachers
    After determining the existing occupations in self-descriptions, we specified potential teach-
ers. The process is as follows. We marked a user as a potential teacher if two conditions were
met: 1) the self-description contained several teacher-related keywords, and 2) the self-description
did not contain any of the existing occupations except ‘teacher’ and ‘instructor’. The reason for
enforcing the first condition was to comprehensively consider those accounts that potentially be-
long to teachers. Moreover, the purpose of the second condition was to exclude those accounts
that mentioned other occupations, e.g., designer. For the first condition, we used the follow-
ing keywords: ‘teacher’, ‘teachers’, ‘teach’, ‘teaching’, ‘teaches’, ‘math’, ‘educator’, ‘instructor’,
‘kindergarten’, ‘grade’, ‘school’, ‘classroom’, and ‘teacherspayteachers’ and ‘tpt’. The selection
of the keywords is based on the common words that appeared in the self-descriptions of the
surveyed teachers. Notably, we included ‘teacherspayteachers’, and ‘tpt’ since active (Ameri-
can) teachers usually mention their teacherspayteachers.com account on Pinterest, e.g., “visit my
store at TeachersPayTeachers http://www.teacherspayteachers.com/Store/X” or “find me on TPT:
https://www.teacherspayteachers.com/Store/X”. Two occupations were exempt from the second
    1 https://github.com/dariusk/corpora/blob/master/data/humans/occupations.
json
                                                    129


condition, namely ‘teacher’ and ‘instructor’, since they are obviously related to teachers. After
applying these two conditions, we ended up with a list of 3,624 potential teachers. Two human
coders manually processed this list and determined the final labels, as will be explained later.
A.3       Specifying Potential Non-teachers
     To specify potential non-teacher accounts, we did the opposite of what we performed for
potential teachers. More specifically, two conditions had to be met to mark an account as a
potential non-teacher: 1) if one of the existing occupations appeared in the self-description except
for ‘teacher’ and ‘instructor’, and 2) none of the previously mentioned teacher-related keywords
(i.e., ‘teacher’, ‘teachers’, ‘teach’, etc.) were in the self-description. The primary purpose of these
two conditions was to enhance the confidence in marking accounts as potential non-teachers. As
a result, we marked a list of 2,503 users as potential non-teachers. Similar to potential teachers,
two human coders manually processed this list and determined the final labels, as will be explained
next.
A.4       Labeling Potential Teachers and Non-Teachers
     After preparing the lists of potential teachers and non-teachers, two coders manually labeled
the users. The first coder is the author of this dissertation and the second coder is a senior
computer science undergraduate student. The categories for labeling were teacher, non-teacher,
and uncertain. The first source of information for labeling was reading the self-description and
looking for definite cues about the person’s occupation. For instance, in self-descriptions, “I am a
middle school teacher, I welcome creative ideas and pins!” and “Wedding | Portrait | Real Estate
photographer in Lansing and serving all surrounding areas”, the occupations are teacher and
photographer, respectively. The second source of information was the included website of a user if
any, e.g., hamidkarimi.com in Figure .2. Therefore, the coders were allowed to refer to the user’s
website to infer their occupation, e.g., reading the ‘About’ page on the website. This was particularly
useful for potential teachers since teachers often mention their websites or blogs. The third source
                                                     130


of information was other social media accounts mentioned in the self-descriptions, if any, e.g.,
Instagram or Facebook accounts. Therefore, the coders were allowed to visit a user’s other social
media accounts to determine the label. The fourth source of information was resources curated by
a user (i.e., pins and boards). For instance, boards such as ‘back to school’, ‘for the classroom’,
‘second-grade math’ in a user’s account indicate that the person’s occupation is teacher/educator.
Eventually, the coders were allowed to use other external sources such as Google search to specify
the label of an account.
    The coders performed their labeling independently. Regarding labeling the list of potential
teachers, both coders agreed on 3,508 users to be teachers. For the list of potential non-teachers,
the agreement on the non-teachers was 2,079. Those users for which the coders’ labels did not
agree were added back to the unlabeled users.
                                                131


BIBLIOGRAPHY
     132


                                       BIBLIOGRAPHY
[1]  Statista. Number of social network users worldwide from 2017 to 2025 (in bil-
     lions).     https://www.statista.com/statistics/278414/number-of-worldwide-social-network-
     users/, 2020.
[2]  Barry Wellman. Computer networks as social networks. Science, 293(5537):2031–2034,
     2001.
[3]  Tiffany A Pempek, Yevdokiya A Yermolayeva, and Sandra L Calvert. College students’
     social networking experiences on facebook. Journal of applied developmental psychology,
     30(3):227–238, 2009.
[4]  Kaitlin Torphy and Corey Drake. Educators meet the fifth estate: The role of social media
     in teacher training. Teachers College Record, 121(14):1–26, 2019.
[5]  Kaitlin Torphy, Yuqing Liu, Sihua Hu, and Zixi Chen. Sources of professional support:
     Patterns of teachers’ curation of instructional resources in social media. American Journal
     of Education, 127(1):13–47, 2020.
[6]  Martin Rehm and Ad Notten. Twitter as an informal learning space for teachers!? the role
     of social capital in twitter conversations among teachers. Teaching and Teacher Education,
     60:215–223, 2016.
[7]  Fernando Rosell-Aguilar. Twitter: A professional development and community of practice
     tool for teachers. Journal of Interactive Media in Education, 1, 2018.
[8]  Hamid Karimi, Kaitlin T Torphy, Tyler Derr, Kenneth A Frank, and Jiliang Tang. Un-
     derstanding and promoting teacher connections in online social media: A case study on
     pinterest. In 2020 IEEE International Conference on Teaching, Assessment, and Learning
     for Engineering (TALE), pages 536–541. IEEE, 2020.
[9]  Hamid Karimi, Kaitlin T Torphy, Tyler Derr, Kenneth A Frank, and Jiliang Tang. Character-
     izing teacher connections in online social media: A case study on pinterest. In Proceedings
     of the Seventh ACM Conference on Learning@ Scale, pages 249–252, 2020.
[10] Hamid Karimi, Tyler Derr, Kaitlin Torphy, Kenneth Frank, and Jiliang Tang. A roadmap for
     incorporating online social media in educational research. Teachers College Record Year
     Book, (14), 2019.
[11] Kaitlin Torphy, Hamid Karimi, Sihua Hu, Frank Kenneth, and Jiliang Tang. Educational
     research in the 21st century: Leveraging big data to explore teachers’ professional behavior
     and educational resources accessed within pinterest. The Elementary School Journal, 2021.
[12] Lauren M Bagdy, Vanessa P Dennen, Stacey A Rutledge, Jerrica T Rowlett, and Shannon
     Burnick. Teens and social media: A case study of high school students’ informal learning
                                                133


     practices and trajectories. In Proceedings of the 9th International Conference on Social
     Media and Society, pages 241–245, 2018.
[13] Christine Greenhow, Beth Robelia, and Joan E Hughes. Learning, teaching, and scholar-
     ship in a digital age: Web 2.0 and classroom research: What path should we take now?
     Educational researcher, 38(4):246–259, 2009.
[14] Stacey Rutledge, Vanessa Dennen, and Lauren Bagdy. Exploring adolescent social media
     use in a high school: Tweeting teens in a bell schedule world. Teachers College Record, 121
     (14):1–30, 2019.
[15] Vimala Balakrishnan and Chin Lay Gan. Students’ learning styles and their effects on the
     use of social media technology for learning. Telematics and Informatics, 33(3):808–821,
     2016.
[16] R Arteaga Sánchez, Virginia Cortĳo, and Uzma Javed. Students’ perceptions of facebook
     for academic purposes. Computers & Education, 70:138–149, 2014.
[17] Hamid Karimi, Tyler Derr, Jiangtao Huang, and Jiliang Tang. Online academic course
     performance prediction using relational graph convolutional neural network. International
     Educational Data Mining Society, 2020.
[18] Alan Daly, Yi-Hwa Liou, Miguel Del Fresno, Martin Rehm, and Peter Bjorklund Jr. Edu-
     cational leadership in the twitterverse: Social media, social networks, and the new social
     continuum. Teachers College Record, 121(14):1–20, 2019.
[19] Maeve Duggan, Amanda Lenhart, Cliff Lampe, and Nicole B Ellison. Parents and social
     media. Pew Research Center, 16, 2015.
[20] Uğur Gündüz. The effect of social media on identity construction. Mediterranean Journal
     of Social Sciences, 8(5):85–85, 2017.
[21] Cam Escoffery, Melissa Kenzig, Christel Hyden, and Kristen Hernandez. Capitalizing on
     social media for career development. Health promotion practice, 19(1):11–15, 2018.
[22] Jeffrey Carpenter. Preservice teachers’ microblogging: Professional development via twitter.
     Contemporary Issues in Technology and Teacher Education, 15(2):209–234, 2015.
[23] Jeffrey P Carpenter and Daniel G Krutka. Engagement through microblogging: Educator
     professional development via twitter. Professional development in education, 41(4):707–
     728, 2015.
[24] Catharyn C Shelton and Leanna M Archambault. Who are online teacherpreneurs and what
     do they do? a survey of content creators on teacherspayteachers. com. Journal of Research
     on Technology in Education, 51(4):398–414, 2019.
[25] Kenneth Frank, Yun-jia Lo, Kaitlin Torphy, and Jihyun Kim. Social networks and educational
     opportunity. In Handbook of the Sociology of Education in the 21st Century, pages 297–316.
     Springer, 2018.
                                              134


[26] Sihua Hu, Kaitlin T Torphy, Kim Evert, and John L Lane. From cloud to classroom:
     Mathematics teachers’ planning and enactment of resources accessed within virtual spaces.
     Teachers College Record, 122(6):n6, 2020.
[27] John Lane, Brian Boggs, Zixi Chen, and Kaitlin Torphy. Conceptualizing virtual instructional
     resource enactment in an era of greater centralization, specification of quality instructional
     practices, and proliferation of instructional resources. Teachers College Record, 121(14):
     1–36, 2019.
[28] Kenneth Frank, Diana Brandon, Alan Daly, Christine Greenhow, Sihua Hu, Martin Rehm,
     and Kaitlin Torphy. Welcome to cloud2class: social media in education. Teachers College
     Record, 121(14):1–12, 2019.
[29] James Robson. Performance, structure and ideal identity: Reconceptualising teachers’
     engagement in online social spaces. British Journal of Educational Technology, 49(3):
     439–450, 2018. doi: https://doi.org/10.1111/bjet.12551.
[30] Madeline Will. Looking for more support, new teachers turn to online communities. Edu-
     cation Week, 2016.
[31] Ying Zhao, Yong Guo, Yu Xiao, Ranke Zhu, Wei Sun, Weiyong Huang, Deyi Liang, Liuying
     Tang, Fan Zhang, Dongsheng Zhu, et al. The effects of online homeschooling on children,
     parents, and teachers of grades 1–9 during the covid-19 pandemic. Medical Science Monitor:
     International Medical Journal of Experimental and Clinical Research, 26:e925591–1, 2020.
[32] Delroy L Paulhus. Measurement and control of response bias. 1991.
[33] Kamal Mahtani, Elizabeth A Spencer, Jon Brassey, and Carl Heneghan. Catalogue of bias:
     observer bias. BMJ evidence-based medicine, 23(1):23, 2018.
[34] Kenneth Frank and Kaitlin Torphy. Social media, who cares? a dialogue between a millennial
     and a curmudgeon. Teachers College Record, 121(14):1–24, 2019.
[35] Mark LaVenia.        The state of the instructional materials market: 2019 report.
     www.edreports.org/resources/article/2019-state-of-the-market-report, 2020.
[36] V. Darleen Opfer, Julia H. Kaufman, and Lindsey E. Thompson. Implementation of K-
     12 state standards for mathematics and English Language Arts and literacy: Findings
     from the American Teacher Panel. RAND Corporation, Santa Monica, CA, 2016. doi:
     10.7249/RR1529-1.
[37] Jessa Bekker and Jesse Davis. Learning from positive and unlabeled data: A survey. Machine
     Learning, 109(4):719–760, 2020.
[38] Daniel Voyer and Susan D Voyer. Gender differences in scholastic achievement: a meta-
     analysis. Psychological bulletin, 140(4):1174, 2014.
                                               135


[39] Marcus A. Winters, Robert C. Haight, Thomas T. Swaim, and Katarzyna A. Pickering. The
     effect of same-gender teacher assignment on student achievement in the elementary and
     secondary grades: Evidence from panel data. Economics of Education Review, 34:69–75,
     2013. ISSN 0272-7757. doi: https://doi.org/10.1016/j.econedurev.2013.01.007.
[40] Alison Kelly. Gender differences in teacher–pupil interactions: a meta-analytic review.
     Research in education, 39(1):1–23, 1988.
[41] Jere Brophy. Interactions of male and female students with male and female teachers. Gender
     influences in classroom interaction, pages 115–142, 1985.
[42] Thomas S Dee. Teachers and the gender gaps in student achievement. Journal of Human
     resources, 42(3):528–554, 2007.
[43] Dario Sansone. Why does teacher gender matter? Economics of Education Review, 61:
     9–18, 2017.
[44] Jim Duffy, Kelly Warren, and Margaret Walsh. Classroom interactions: Gender of teacher,
     gender of student, and classroom subject. Sex roles, 45(9):579–593, 2001.
[45] Yuqing Liu, Kaitlin T Torphy, Sihua Hu, Jiliang Tang, and Zixi Chen. Examining the virtual
     diffusion of educational resources across teachers’ social networks over time. Teachers
     College Record, 122(6):n6, 2020.
[46] Christine Greenhow, Sarah M Galvin, Diana L Brandon, and Emilia Askari. A decade of
     research on k-12 teaching and teacher learning with social media: Insights on the state of
     the field. Teachers College Record, 122(6):n6, 2020.
[47] Camille Rutherford. Facebook as a source of informal teacher professional development.
     2010.
[48] Emrah Cinkara and Fadime Yalçin Arslan. Content analysis of a facebook group as a form
     of mentoring for efl teachers. English Language Teaching, 10(3):40–53, 2017.
[49] Maria Ranieri, Stefania Manca, and Antonio Fini. Why (and how) do teachers engage in
     social networks? an exploratory study of professional use of f acebook and its implications
     for lifelong learning. British journal of educational technology, 43(5):754–769, 2012.
[50] Rene Dubos. Social capital: Theory and research. Routledge, 2017.
[51] Hüseyin Bicen and Hüseyin Uzunboylu. The use of social networking sites in education: A
     case study of facebook. J. UCS, 19(5):658–671, 2013.
[52] Andrea K Veira, Coreen J Leacock, and S Joel Warrican. Learning outside the walls of the
     classroom: Engaging the digital natives. Australasian Journal of Educational Technology,
     30(2), 2014.
[53] Evren Sumuer, Sezin Esfer, and Soner Yildirim. Teachers’ facebook use: their use habits,
     intensity, self-disclosure, privacy settings, and activities on facebook. Educational Studies,
     40(5):537–553, 2014.
                                                136


[54] Fu Wen Kuo, Wen Cheng, and Shu Ching Yang. A study of friending willingness on snss:
     Secondary school teachers’ perspectives. Computers & Education, 108:30–42, 2017.
[55] Alona Forkosh-Baruch, Arnon Hershkovitz, and Rebecca P Ang. Teacher-student relation-
     ship and sns-mediated communication: Perceptions of both role-players. Interdisciplinary
     Journal of e-Skills and Lifelong Learning, 11:273–289, 2015.
[56] Christa S.C. Asterhan and Hananel Rosenberg. The promise, reality and dilemmas of sec-
     ondary school teacher–student interactions in facebook: The teacher perspective. Computers
     & Education, 85:134–148, 2015. ISSN 0360-1315. doi: https://doi.org/10.1016/j.compedu.
     2015.02.003.
[57] Baruch Schwarz and Galit Caduri. Novelties in the use of social networks by leading
     teachers in their classes. Computers & Education, 102:35–51, 2016. ISSN 0360-1315. doi:
     https://doi.org/10.1016/j.compedu.2016.07.002.
[58] Shari Tishman, Eileen Jay, and David N Perkins. Teaching thinking dispositions: From
     transmission to enculturation. Theory into practice, 32(3):147–153, 1993.
[59] C Matt Seimears, Emily Graves, M Gail Schroyer, and John Staver. How constructivist-based
     teaching influences students learning science. In The Educational Forum, volume 76, pages
     265–271. Taylor & Francis, 2012.
[60] Ron Blonder and Shelley Rap. I like facebook: Exploring israeli high school chemistry
     teachers’ tpack and self-efficacy beliefs. Education and Information Technologies, 22(2):
     697–724, 2017.
[61] Radzuwan Ab Rashid. Dialogic reflection for professional development through conversa-
     tions on a social networking site. Reflective Practice, 19(1):105–117, 2018.
[62] Esteban Vázquez Cano. Mobile learning with twitter to improve linguistic competence at
     secondary schools. New Educational Review, 29(3):134–147, 2012.
[63] Carol Van Vooren and Corey Bess. Teacher tweets improve achievement for eighth grade
     science students. Journal of Education, Informatics & Cybernetics, 11(1), 2013.
[64] Anna Noble, Patrick McQuillan, and Josh Littenberg-Tobias. “a lifelong classroom”: Social
     studies educators’ engagement with professional learning networks on twitter. Journal of
     Technology and Teacher Education, 24(2):187–213, 2016.
[65] Pamela M Wesely. Investigating the community of practice of world language educators on
     twitter. Journal of teacher education, 64(4):305–318, 2013.
[66] Etienne Wenger. Communities of practice: Learning, meaning, and identity. Cambridge
     university press, 1999.
[67] Virginia G Britt and Trena Paulus. “beyond the four walls of my building”: A case study of#
     edchat as a community of practice. American Journal of Distance Education, 30(1):48–59,
     2016.
                                               137


[68] Kathryn Holmes, Greg Preston, Kylie Shaw, and Rachel Buchanan. " follow" me: Networked
     professional learning for teachers. Australian Journal of Teacher Education, 38(12):n12,
     2013.
[69] Jeffrey P Carpenter and Daniel G Krutka. How and why educators use twitter: A survey of
     the field. Journal of research on technology in education, 46(4):414–434, 2014.
[70] Jeffrey P Carpenter and Daniel G Krutka. Engagement through microblogging: Educator
     professional development via twitter. Professional development in education, 41(4):707–
     728, 2015.
[71] Ryan D Visser, Lea Calvert Evering, and David E Barrett. # twitterforteachers: The impli-
     cations of twitter as a self-directed professional development tool for k–12 teachers. Journal
     of Research on Technology in Education, 46(4):396–413, 2014.
[72] Torrey Trust, Daniel G Krutka, and Jeffrey Paul Carpenter. “together we are better”: Profes-
     sional learning networks for teachers. Computers & education, 102:15–34, 2016.
[73] Kerry Davis. Teachers’ perceptions of twitter for professional development. Disability and
     rehabilitation, 37(17):1551–1558, 2015.
[74] Carrie R Ross, Robert M Maninger, Kimberly N LaPrairie, and Sam Sullivan. The use
     of twitter in the creation of educational professional learning opportunities. Administrative
     Issues Journal, 5(1):6, 2015.
[75] Joshua M Rosenberg, Spencer P Greenhalgh, Matthew J Koehler, Erica R Hamilton, and Mete
     Akcaoglu. An investigation of state educational twitter hashtags (seths) as affinity spaces.
     E-Learning and Digital Media, 13(1-2):24–44, 2016. doi: 10.1177/2042753016672351.
[76] James Paul Gee. Situated language and learning: A critique of traditional schooling.
     routledge, 2012.
[77] Jeffrey P Carpenter and Daniel G Krutka. How and why educators use twitter: A survey of
     the field. Journal of research on technology in education, 46(4):414–434, 2014.
[78] Sihua Hu, Kaitlin T Torphy, Amanda Opperman, Kimberly Jansen, and Yun-Jia Lo. What
     do teachers share within socialized knowledge communities: A case of pinterest. Journal of
     Professional Capital and Community, 2018.
[79] Kaitlin Torphy, Sihua Hu, Yuqing Liu, and Zixi Chen. Teachers turning to teachers: teacher-
     preneurial behaviors in social media. American Journal of Education, 127(1):49–76, 2020.
[80] Jeffrey Carpenter, Amanda Cassaday, and Stefania Monti. Exploring how and why educators
     use pinterest. In Society for Information Technology & Teacher Education International
     Conference, pages 2222–2229. Association for the Advancement of Computing in Education
     (AACE), 2018.
[81] John S Kendall. Understanding common core state standards. ASCD, 2011.
                                                138


[82] V. Darleen Opfer, Julia H. Kaufman, and Lindsey E. Thompson. Implementation of K-
     12 State Standards for Mathematics and English Language Arts and Literacy: Findings
     from the American Teacher Panel. RAND Corporation, Santa Monica, CA, 2016. doi:
     10.7249/RR1529-1.
[83] Jinyoung Han, Daejin Choi, A-Young Choi, Jiwon Choi, Taejoong Chung, Ted Taekyoung
     Kwon, Jong-Youn Rha, and Chen-Nee Chuah. Sharing topics in pinterest: understanding
     content creation and diffusion behaviors. In Proceedings of the 2015 ACM on Conference
     on Online Social Networks, pages 245–255, 2015.
[84] Eric Gilbert, Saeideh Bakhshi, Shuo Chang, and Loren Terveen. " i need to try this"? a
     statistical overview of pinterest. In Proceedings of the SIGCHI conference on human factors
     in computing systems, pages 2427–2436, 2013.
[85] Jinyoung Han, Daejin Choi, Byung-Gon Chun, Ted Kwon, Hyun-chul Kim, and Yanghee
     Choi. Collecting, organizing, and sharing pins in pinterest: interest-driven or social-driven?
     ACM SIGMETRICS Performance Evaluation Review, 42(1):15–27, 2014.
[86] Shuo Chang, Vikas Kumar, Eric Gilbert, and Loren G Terveen. Specialization, homophily,
     and gender in a social curation site: Findings from pinterest. In Proceedings of the 17th ACM
     conference on Computer supported cooperative work & social computing, pages 674–686,
     2014.
[87] Daehoon Kim, Jae-Gil Lee, and Byung Suk Lee. Topical influence modeling via topic-level
     interests and interactions on social curation services. In 2016 IEEE 32nd International
     Conference on Data Engineering (ICDE), pages 13–24. IEEE, 2016.
[88] Regina Kasakowskĳ, Thomas Kasakowskĳ, and Kaja Fietkiewicz. Pinterest: A unicorn
     among social media? an investigation of the platform’s quality and specifications. In ECSM
     2020 8th European Conference on Social Media, page 399. Academic Conferences and
     publishing limited, 2020.
[89] Stephanie Schroeder, Rachelle Curcio, and Lisa Lundgren. Expanding the learning network:
     How teachers use pinterest. Journal of Research on Technology in Education, 51(2):166–186,
     2019. doi: 10.1080/15391523.2019.1573354.
[90] Amanda Sawyer, Lara Dick, Emily Shapiro, and Tabitha Wismer. The top 500 mathematics
     pins: An analysis of elementary mathematics activities on pinterest. Journal of Technology
     and Teacher Education, 27(2):235–263, 2019.
[91] Kaitlin T Torphy, Diana L Brandon, Alan J Daly, Kenneth A Frank, Christine Greenhow,
     S Hua, and Martin Rehm. Social media, education, and digital democratization. Teachers
     College Record, 122(6):1–7, 2020.
[92] Christine Greenhow, Sarah Galvin, Emilia Askari, and Diana Brandon. # cloud2class: The
     disruption and reorganization of educational resources with social media. American Journal
     of Education, 127(1):1–11, 2020.
                                                139


[93] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks.
      nature, 393(6684):440–442, 1998.
[94] Lev Muchnik, Sen Pei, Lucas C Parra, Saulo DS Reis, José S Andrade Jr, Shlomo Havlin,
      and Hernán A Makse. Origins of power-law degree distribution in the heterogeneity of
      human activity in social networks. Scientific reports, 3(1):1–8, 2013.
[95] Fantine Mordelet and J-P Vert. A bagging svm to learn from positive and unlabeled examples.
      Pattern Recognition Letters, 37:201–209, 2014.
[96] Jinfeng Yi, Cho-Jui Hsieh, Kush Varshney, Lĳun Zhang, and Yao Li. Scalable demand-aware
      recommendation. arXiv preprint arXiv:1702.06347, 2017.
[97] Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data.
      In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery
      and data mining, pages 213–220, 2008.
[98] Fengxiang He, Tongliang Liu, Geoffrey I Webb, and Dacheng Tao. Instance-dependent pu
      learning by bayesian optimal relabeling. arXiv preprint arXiv:1808.02180, 2018.
[99] Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan Chang. Pebl: positive example based
      learning for web page classification using svm. In Proceedings of the eighth ACM SIGKDD
      international conference on Knowledge discovery and data mining, pages 239–248, 2002.
[100] Dino Ienco and Ruggero G Pensa. Positive and unlabeled learning in categorical data.
      Neurocomputing, 196:113–124, 2016.
[101] Michael Tschannen, Olivier Bachem, and Mario Lucic. Recent advances in autoencoder-
      based representation learning. arXiv preprint arXiv:1812.05069, 2018.
[102] Courtland VanDam, Pang-Ning Tan, Jiliang Tang, and Hamid Karimi. Cadet: A multi-view
      learning framework for compromised account detection on twitter. In 2018 IEEE/ACM
      International Conference on Advances in Social Networks Analysis and Mining (ASONAM),
      pages 471–478. IEEE, 2018.
[103] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin.
      Advances in pre-training distributed word representations. In Proceedings of the Interna-
      tional Conference on Language Resources and Evaluation (LREC 2018), 2018.
[104] Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python:
      analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009.
[105] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of
      machine learning research, 9(11), 2008.
[106] Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. Convex formulation for learning
      from positive and unlabeled data. In International conference on machine learning, pages
      1386–1394. PMLR, 2015.
                                               140


[107] Ryuichi Kiryo, Gang Niu, Marthinus C du Plessis, and Masashi Sugiyama. Positive-
      unlabeled learning with non-negative risk estimator. arXiv preprint arXiv:1703.00593,
      2017.
[108] Liwei Jiang, Dan Li, Qisheng Wang, Shuai Wang, and Songtao Wang. Improving posi-
      tive unlabeled learning: Practical aul estimation and new training method for extremely
      imbalanced data sets. arXiv preprint arXiv:2004.09820, 2020.
[109] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
      Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative
      style, high-performance deep learning library. Advances in neural information processing
      systems, 32:8026–8037, 2019.
[110] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and
      Anil A Bharath. Generative adversarial networks: An overview. IEEE Signal Processing
      Magazine, 35(1):53–65, 2018.
[111] Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep
      neural networks. arXiv preprint arXiv:1912.11460, 2019.
[112] Kathleen Elizabeth Kaump Truitt. The relationship between elementary school administra-
      tors’ and teachers’ perceptions of the influence of male teachers and schools’ male student
      achievement growth in english language arts. 2019.
[113] Yena Kim and Allyson J Weseley. The effect of teacher gender and gendered traits on
      perceptions of elementary school teachers. Journal of Research in Education, 27(1):114–
      133, 2017.
[114] Margaret H Cooney and Mark T Bittner. Men in early childhood education: Their emergent
      issues. Early Childhood Education Journal, 29(2):77–82, 2001.
[115] Shaaista Moosa and Deevia Bhana. Men teaching young children:“you can never be too sure
      what their intentions might be”. Oxford Review of Education, 46(2):169–184, 2020.
[116] Bryan G Nelson. The importance of men teachers: And reasons why there are so few. a
      survey of members of naeyc. 2002.
[117] National Center for Education Statistics.             Teacher characteristics and trends.
      https://nces.ed.gov/fastfacts/display.asp?id=28, 2020. Accessed: 2021-03-20.
[118] Organisation for Economic Co-operation and Development. Distribution of teachers by age
      and gender. https://stats.oecd.org/Index.aspx?DataSetCode=EAG_PERS_SHARE_AGE,
      2018. Accessed: 2021-02-20.
[119] Simon Brownhill. ‘build me a male role model!’a critical exploration of the perceived
      qualities/characteristics of men in the early years (0–8) in england. Gender and Education,
      26(3):246–261, 2014.
[120] Kevin McGrath and Mark Sinclair. More male primary-school teachers? social benefits for
      boys and girls. Gender and Education, 25(5):531–547, 2013.
                                                141


[121] Hootsuite DataReportal and We Are Social. Distribution of pinterest users worldwide as of
      january 2021, by gender. 2021. Accessed: April 08, 2021.
[122] Emily S Johnson. Feminism, Self-presentation, and Pinterest: The Labor of Wedding
      Planning. Lexington Books, 2020.
[123] Elana Levine. Cupcakes, pinterest, and ladyporn: Feminized popular culture in the early
      twenty-first century. University of Illinois Press, 2015.
[124] Amanda Friz and Robert W Gehl. Pinning the feminine user: gender scripts in pinterest’s
      sign-up interface. Media, Culture & Society, 38(5):686–703, 2016.
[125] Raphael Ottoni, Joao Paulo Pesce, Diego Las Casas, Geraldo Franciscani Jr, Wagner Meira Jr,
      Ponnurangam Kumaraguru, and Virgilio Almeida. Ladies first: Analyzing gender roles and
      behaviors in pinterest. In Proceedings of the International AAAI Conference on Web and
      Social Media, volume 7, 2013.
[126] Spencer P Greenhalgh and Matthew J Koehler. 28 days later: Twitter hashtags as “just in
      time” teacher professional development. TechTrends, 61(3):273–281, 2017.
[127] Paul T Costa Jr, Antonio Terracciano, and Robert R McCrae. Gender differences in person-
      ality traits across cultures: robust and surprising findings. Journal of personality and social
      psychology, 81(2):322, 2001.
[128] Madhura Ingalhalikar, Alex Smith, Drew Parker, Theodore D Satterthwaite, Mark A Elliott,
      Kosha Ruparel, Hakon Hakonarson, Raquel E Gur, Ruben C Gur, and Ragini Verma. Sex
      differences in the structural connectome of the human brain. Proceedings of the National
      Academy of Sciences, 111(2):823–828, 2014.
[129] Sudip Mittal, Neha Gupta, Prateek Dewan, and Ponnurangam Kumaraguru. The pin-bang
      theory: Discovering the pinterest world. arXiv preprint arXiv:1307.4952, 2013.
[130] William Webber, Alistair Moffat, and Justin Zobel. A similarity measure for indefinite
      rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38, 2010.
[131] J Nguyen. A teacher’s new best friend: Amazon inspire. Edudemic: Connecting, 2016.
[132] Hailley Griffis. Marketing: Common words, popular times, plus 4 experiments to try. 2021.
      Accessed: May 12, 2021.
[133] K Torphy and S Hu. Social media in education: Curation within and outside the schoolhouse.
      Handbook on Social Media Analytics: Advances and Applications.
[134] William R Penuel, Margaret Riel, Ann Krause, and Kenneth A Frank. Analyzing teachers’
      professional interactions in a school as social capital: A social network approach. Teachers
      college record, 111(1):124–163, 2009.
[135] Hamid Karimi, Courtland VanDam, Liyang Ye, and Jiliang Tang. End-to-end compromised
      account detection. In 2018 IEEE/ACM International Conference on Advances in Social
      Networks Analysis and Mining (ASONAM), pages 314–321. IEEE, 2018.
                                                 142


[136] Aaron Brookhouse, Tyler Derr, Hamid Karimi, H Russell Bernard, and Jiliang Tang. Road
      to the white house: Analyzing the relations between mainstream and social media during
      the us presidential primaries. arXiv preprint arXiv:2009.09307, 2020.
[137] Hamid Karimi, Tyler Derr, Aaron Brookhouse, and Jiliang Tang. Multi-factor congres-
      sional vote prediction. In Proceedings of the 2019 IEEE/ACM International Conference on
      Advances in Social Networks Analysis and Mining, pages 266–273, 2019.
[138] Tyler Derr, Hamid Karimi, Xiaorui Liu, Jiejun Xu, and Jiliang Tang. Deep adversarial
      network alignment. arXiv preprint arXiv:1902.10307, 2019.
[139] Elizabeth Homan. The shifting spaces of teacher relationships: Complementary methods in
      examinations of teachers’ digital practices. Journal of Technology and Teacher Education,
      22(3):311–331, 2014.
[140] Fei Gao and Lan Li. Examining a one-hour synchronous chat in a microblogging-based
      professional development community. British Journal of Educational Technology, 48(2):
      332–347, 2017.
[141] Stephen J Aguilar, Joshua Rosenberg, Spencer Greenhalgh, Tim Fütterer, Alex Lishinski,
      and Christian Fischer. A different experience in a different moment? teachers’ social media
      use before and during the covid-19 pandemic. 2021.
[142] Hamid Karimi, Tyler Derr, Kaitlin T Torphy, Kenneth A Frank, and Jiliang Tang. Towards
      improving sample representativeness of teachers on online social media: A case study on
      pinterest. In International Conference on Artificial Intelligence in Education, pages 130–134.
      Springer, 2020.
[143] Martĳn P van den Heuvel and Olaf Sporns. Network hubs in the human brain. Trends in
      cognitive sciences, 17(12):683–696, 2013.
[144] Tracy C Russo and Joy Koesten. Prestige, centrality, and learning: A social network analysis
      of an online class. Communication Education, 54(3):254–261, 2005.
[145] Stephen P Borgatti. Centrality and network flow. Social networks, 27(1):55–71, 2005.
[146] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily
      in social networks. Annual review of sociology, 27(1):415–444, 2001.
[147] Paul F Lazarsfeld, Robert K Merton, et al. Friendship as a social process: A substantive and
      methodological analysis. Freedom and control in modern society, 18(1):18–66, 1954.
[148] Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and
      Filippo Menczer. Friendship prediction and homophily in social media. ACM Transactions
      on the Web (TWEB), 6(2):1–33, 2012.
[149] Halil Bisgin, Nitin Agarwal, and Xiaowei Xu. A study of homophily on social media. World
      Wide Web, 15(2):213–232, 2012.
[150] Luis Abreu and Doh-Shin Jeon. Homophily in social media and news polarization. 2020.
                                               143


[151] Gabriele Chierchia and Giorgio Coricelli. The impact of perceived similarity on tacit
      coordination: propensity for matching and aversion to decoupling choices. Frontiers in
      behavioral neuroscience, 9:202, 2015.
[152] Noah P Mark. Culture and competition: Homophily and distancing explanations for cultural
      niches. American sociological review, pages 319–345, 2003.
[153] Damon Centola, Robb Willer, and Michael Macy. The emperor’s dilemma: A computational
      model of self-enforcing norms. American Journal of Sociology, 110(4):1009–1040, 2005.
[154] Munmun De Choudhury, Hari Sundaram, Ajita John, Doree Duncan Seligmann, and Aisling
      Kelliher. " birds of a feather": Does user homophily impact information diffusion in social
      media? arXiv preprint arXiv:1006.1702, 2010.
[155] Mustafa Yavaş and Gönenç Yücel. Impact of homophily on diffusion dynamics over social
      networks. Social Science Computer Review, 32(3):354–372, 2014.
[156] Fariba Karimi, Mathieu Génois, Claudia Wagner, Philipp Singer, and Markus Strohmaier.
      Homophily influences ranking of minorities in social networks. Scientific reports, 8(1):1–12,
      2018.
[157] Damon Centola, Juan Carlos Gonzalez-Avella, Victor M Eguiluz, and Maxi San Miguel.
      Homophily, cultural drift, and the co-evolution of cultural groups. Journal of Conflict
      Resolution, 51(6):905–929, 2007.
[158] David Laniado, Yana Volkovich, Karolin Kappler, and Andreas Kaltenbrunner. Gender
      homophily in online dyadic and triadic relationships. EPJ Data Science, 5:1–23, 2016.
[159] Thomas Bedorf. Dimensionen des Dritten: sozialphilosophische Modelle zwischen Ethis-
      chem und Politischem. Wilhelm Fink, 2003.
[160] George Herbert Mead. Mind, self and society, volume 111. Chicago University of Chicago
      Press., 1934.
[161] Jinyoung Han, Daejin Choi, Jungseock Joo, and Chen-Nee Chuah. Predicting popular and
      viral image cascades in pinterest. In Proceedings of the International AAAI Conference on
      Web and Social Media, volume 11, 2017.
[162] Thi Bich Ngoc Hoang and Josiane Mothe. Predicting information diffusion on twitter–
      analysis of predictive features. Journal of computational science, 28:257–264, 2018.
[163] Bo Wu and Haiying Shen. Analyzing and predicting news popularity on twitter. International
      Journal of Information Management, 35(6):702–711, 2015.
[164] Andrey A Dobrynin, Roger Entringer, and Ivan Gutman. Wiener index of trees: theory and
      applications. Acta Applicandae Mathematica, 66(3):211–249, 2001.
[165] Harry Wiener. Structural determination of paraffin boiling points. Journal of the American
      chemical society, 69(1):17–20, 1947.
                                                144


[166] Christine Greenhow, Sarah M Galvin, and K Bret Staudt Willet. What should be the role of
      social media in education? Policy Insights from the Behavioral and Brain Sciences, 6(2):
      178–185, 2019.
[167] Lilian Weng, Filippo Menczer, and Yong-Yeol Ahn. Virality prediction and community
      structure in social networks. Scientific reports, 3(1):1–6, 2013.
[168] Piia Varis and Jan Blommaert. Conviviality and collectives on social media: Virality,
      memes, and new social structures. Multilingual Margins: A journal of multilingualism from
      the periphery, 2(1):31–31, 2015.
[169] Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in
      online social networks: A survey. ACM Sigmod Record, 42(2):17–28, 2013.
[170] Qian Li, Tao Zhou, Linyuan Lü, and Duanbing Chen. Identifying influential spreaders by
      weighted leaderrank. Physica A: Statistical Mechanics and its Applications, 404:47–55,
      2014.
[171] Ling-ling Ma, Chuang Ma, Hai-Feng Zhang, and Bing-Hong Wang. Identifying influential
      spreaders in complex networks based on gravity formula. Physica A: Statistical Mechanics
      and its Applications, 451:205–212, 2016.
[172] Frank Bauer and Joseph T Lizier. Identifying influential spreaders and efficiently estimating
      infection numbers in epidemic models: A walk counting approach. EPL (Europhysics
      Letters), 99(6):68007, 2012.
[173] William Penuel, Kenneth Frank, Min Sun, Chong Kim, and Corinne Singleton. The organi-
      zation as a filter of institutional diffusion. Teachers college record, 115(1):1–33, 2013.
[174] Skipper Seabold and Josef Perktold. statsmodels: Econometric and statistical modeling with
      python. In 9th Python in Science Conference, 2010.
[175] Anastasia Mochalova and Alexandros Nanopoulos. On the role of centrality in information
      diffusion in social networks. 2013.
[176] Mary E Yepez. An observation of gender-specific teacher behavior in the esl classroom. Sex
      Roles, 30(1):121–133, 1994.
[177] Kelly Jones, Cay Evans, Ronald Byrd, and Kathleen Campbell. Gender equity training and
      teacher behavior. Journal of Instructional Psychology, 27(3):173–173, 2000.
[178] Heather Antecol, Ozkan Eren, and Serkan Ozbeklik. The effect of teacher gender on student
      achievement in primary school: Evidence from a randomized experiment. 2012.
[179] Hamid Karimi and Jiliang Tang. Learning hierarchical discourse-level structure for fake
      news detection. arXiv preprint arXiv:1903.07389, 2019.
[180] Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, and Jiliang Tang. Multi-source multi-class
      fake news detection. In Proceedings of the 27th international conference on computational
      linguistics, pages 1546–1557, 2018.
                                                   145


[181] Haochen Liu, Wei Jin, Hamid Karimi, Zitao Liu, and Jiliang Tang. The authors mat-
      ter: Understanding and mitigating implicit bias in deep text classification. arXiv preprint
      arXiv:2105.02778, 2021.
[182] Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Anil K Jain, and Jil-
      iang Tang. Trustworthy ai: A computational perspective. arXiv preprint arXiv:2107.06641,
      2021.
                                            146