THE EFFECTS OF PARTICIPATION AND FEEDBACK RECEIVED ON THE LENGTH OF
TIME MEMBERS IN ONLINE COMMUNITIES REMAIN ACTIVE
By
Chandan Sarkar

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Media and Information Studies—Doctor of Philosophy

2013

ABSTRACT
THE EFFECTS OF PARTICIPATION AND FEEDBACK RECEIVED ON THE LENGTH
OF TIME MEMBERS IN ONLINE COMMUNITIES REMAIN ACTIVE
By
Chandan Sarkar
Online communities support extensive interactions among their members. Membership in
most of these communities is voluntary, content supplied by other members is typically a
primary attractant to new members, and barriers to admission and exit are minimal (Lampe,
2009; Lampe, 2010). For a community to thrive, it is necessary that members remain active in
the community and continue to interact with others. Given that sustaining a solid base of active
long-term members is critical to the sustainability of an online community, it is important that
factors that contribute to the length of active membership are identified. Addressing certain
limitations of prior studies, this dissertation examines key factors such as rate of participation,
rate of feedback received, early participation and early feedback received that may influence the
length of time members stay active in a community.
A mixed method approach that included server log analyses for two online communities,
Everything2 and Sploder, and qualitative interviews with members of Everything2, was used to
study how these factors are related to how long members remain active in a community. A Cox
proportional hazard rate model and a Granger causality test were employed to analyze the server
log data.
The results suggest that certain types of early participation (first post submitted in
Sploder and first post and first message submitted in Everything2) and certain type of early
feedback received (deletion of post in Sploder and first positive and negative vote and deletion of

first post in Everything2) are significant predictors of how long a member remains active in
Sploder and Everything2. A member’s average rate of participation (writeups, votes given, and
messages sent) in Everything2 is positively correlated with length of active membership, but not
in Sploder. The rate of feedback received is not significantly correlated in either community.
It is well-known that correlational evidence is not dispositive proof of a causal link.
Therefore, the relationships between the dependent variable and the independent variables
identified by the Cox Proportional Hazard Rate model are further examined using a Granger
causality test, with which time series data can be employed for a more rigorous test of causality.
The results showed no causality between rate of participation and the length of time a member
remains active in a community.
Findings from the quantitative studies are expanded on, based on interviews with longterm members in the community. These results show that the factors contributing to length of
active membership may vary among online communities. While some results may generalize to
other communities if the communities are similar enough, not all results do generalize. The
findings also suggest that early negative feedback has a strong negative impact on how long a
member will remain active in an online community, as both Everything2 and Sploder had a
significant negative correlation with deletion of first post. The implications of these results for
the design online communities are discussed.

For my wife, Jessica Donald Sarkar, and my parents, Sankar & Ajita Sarkar

iv

ACKNOWLEDGEMENTS

This dissertation is the culmination of my four and half years work, which would not
have been possible without the help and support of my family, advisors, my committee members,
fellow colleagues, friends, classmates.
I am indebted to my advisor, Dr. Steve Wildman. Without his brilliance, experience,
support and guidance I would not be able to reach this final destination. Thank you once again
for all the support, guidance, the time you invested in me and your willingness to always guide
me.
I would like to also thank my committee members, Dr. Mark Levy, Dr. Steve Lacy and
Dr. Susan Wyche, for their support, guidance and encouragement in this process.
I would also like to thank Dr. Cliff Lampe from University of Michigan and Dr. Kurt
DeMaagd for advising me during my PhD life. Their expressions helped me to think through
this. I would like to thank Dr. Rick Wash and my colleague, Yvette Wohn, for their help in
developing this idea. I would like to thank Everything2 and Sploder administrators and users for
allowing me to use their data for this dissertation. Finally, I would like to thank the College of
Communication Arts and Sciences of Michigan State Univeristy for providing resources and
support for this work.

v

TABLE OF CONTENTS

LIST OF TABLES……………………………………………………………………………….ix
LIST OF FIGURES………………………………………………………………………………xi
CHAPTER 1
An Introduction to Online Communities………………………….……………………………...1
1.1 Research Goals……………………………………………………………...……..…5
1.2 Everything2 and Sploder as Sites for this Study………………………………….….5
1.2.1 Everything2…………………………………………………………...….6
1.2.2 Sploder……………………………………………………………….…..8
1.3 Why Did I Study Two Communities Instead of One?...............................................13
1.4 Approaches…………………………………………………………………………..14
1.5 Dissertation Outline……………………………………………………………........15
1.6 Contributions………………………………………………………………………...17
1.7 Chapter Summary……………………………………………………………………18
CHAPTER 2
Literature Review on Factors that may Affect Active Membership……….……………………19
2.1Membership in Online Communities………………………….……………………..20
2.2 Participation in Online Communities…………………………….………………….21
2.3 Early Participation in Online Communities…………………………………………24
2.4 Feedback in Online Communities..………………………………………………….25
2.5 Early Feedback in Online Communities…………………………………………….26
2.6 Use of Social Science Theory in this Dissertation…...……………………………..27
2.7 Chapter Summary……………………………………………………………………30
CHAPTER 3
Examining Length of Active Membership for Everything2: A Quantitative Study……......…..31
3.1 Overview of Everything2……………………………………………………………34
3.2 Data and Operationalization of Variables in Everything2…………………………..36
3.3 The Everything2 Data-Set…………………………………………………………...37
3.4 Data and Measures...…………………………………………………………………41
3.5 Examining Length of Active Membership Using a Hazard Rate Model……………43
3.5.1 How is Length of Active Membership Related to the Rate of Participation?
(RQ1)…………………………………………………………………….47
3.5.2 How is Length of Active Membership Related to Early Participation?
(RQ2)………………………………………………………………..…...48
3.5.3 How is Length of Active Membership Related to Rate of Feedback
Received? (RQ3)……………………………………….……………..….50
3.5.4 How is Length of Active Membership Related to Early Feedback Received?
(RQ4)…………………………………………………………………….51
3.6 Examining Causal Links for the Everything2 Community (Granger Causality Tests)
…………………………………………………………………………………....53
vi

3.7 How is Length of Active Membership Affected by the Rate of Participation?
(RQ1)…………………………………………………………………………….54
3.7.1 Members’ Participation Affecting Length of Membership……………….54
3.7.2 Concluding Remarks………………………………………………….……58
3.8 How is Length of Active Membership Affected by Rate of Feedback? (RQ3)…...…58
3.8.1 Feedback Received from Others Affecting Length of Membership...….…58
3.8.2 Concluding Remarks…………………………………………………….…60
3.9 Granger Causality Tests in Regards to Early Participation and Early Feedback
Received from Others on the Content…...……………………………………….60
3.10 Chapter Summary…………………………………………………………………..60
CHAPTER 4
Examining Length of Active Membership for Sploder: A Quantitative Study………….……...62
4.1 Overview of Sploder…………………………………………………………………63
4.2 Data and Operationalization of Variables in Sploder………………………………..64
4.3 The Sploder Data-Set………………………………………………………………...66
4.4 Data and Measures……………………………………………………………...……69
4.5 Examining Length of Active Membership Using a Hazard Rate Model…………….72
4.5.1 How is Length of Active Membership Related to the Rate of Participation?
(RQ1)…………………………………………………………………….76
4.5.2 How is Length of Active Membership Related to Early Participation?
(RQ2)…………………………………………………………………….78
4.5.3 How is Length of Active Membership Related to Rate of Feedback
Received? (RQ3)…………………………………………………………79
4.5.4 How is Length of Active Membership Related to Early Feedback Received?
(RQ4)…………………………………………………….………………80
4.6 Examining Causal Links in the Sploder Community (Granger Causality Tests)……81
4. 7 How is Length of Active Membership Affected by the Rate of Participation?
(RQ1).……………………………………………………………………………83
4.7.1 Members' Participation Affecting Length of Membership……...….….….83
4.7.2 Concluding Remarks……………………………………………….………86
4.8 How is Length of Active Membership Affected by Rate of Feedback Received?
(RQ3)?....................................................................................................................86
4.8.1 Feedback Members Received from Others Affecting Length of
Membership…………………………………..………………….………86
4.8.2 Concluding Remarks……………………………………………………….88
4.9 Granger Causality Tests in Regards to Early Participation and Early Feedback
Received from Others on the Content……………………………………………88
4.10 Chapter Summary…………………………………………………………………..89
CHAPTER 5
A Qualitative Study of Everything2……………………………………………………………..90
5.1 Recruitment of Everything2 Participants……………………………………………91
5.2 Interview Protocols…………………………………………………………………..93
5.3 Coding………………………………………………………………………………..94
5.4 Members’ Participation……………………………………………………………...96
vii

5.5 Factors that Reduced Members’ Participation……………………………………...97
1. Deletion of Writeups …………………………………………………………97
2. Downvotes……………………………………………………………………98
3. The Evolution of the “Wiki-era”……………………………………………..98
4. Life-Changing Events………………………………………………………..99
5.6 Leaving the Community…………………………………………………………….99
5.7 Chapter Summary…………………………………………………………………..100
5.8 Concluding Remarks………………….…………………………………………….101
CHAPTER 6
Discussion of Findings .................................................………………………………….….....103
6.1 Overview of Results from All Three Studies………………………………………104
6.2 Implications for Practice……………………………………………………………108
6.3 Limitations………………………………………………………………………….109
6.4 Future Research…………………………………………………………………….111
6.5 Conclusions…………………………………………………………………………111
APPENDICES ………...……………………………………………………………………….113
Appendix A: Results from a Cox Proportional Hazard Rate Model with a Cutoff Period
of Two Months (Sixty Days) for Everything2…………………………………114
Appendix B: Variance Inflation Factor Analysis for Everything2…………………….117
Appendix C: Survival Function at Mean of Covariates in Everything2……………….118
Appendix D: Model Fit In Terms of a Chi-square Difference for the Hazard Rate in
Everything2……………………………………………………………………..119
Appendix E: Statistical Tests to Determine a Lag Order……………………………….120
Appendix F: Results from a Lag Order Test for Granger Causality in Everything2
Pertaining to Length of Active Membership and Participation………………..122
Appendix G: Results from a Lag Order Test for Granger Causality in Everything2
Pertaining to Length of Active Membership and Feedback Received…………123
Appendix H: Results from a Cox Proportional Hazard Rate Model with a Cutoff Period
of Two Months (Sixty Days) for Sploder………………………………………124
Appendix I: Variance Inflation Factor Analysis for Sploder…………………………..126
Appendix J: Survival Function at Mean of Covariates………………………………...127
Appendix K: Model Fit In terms of a Chi-square Difference for the Hazard Rate in
Sploder………………………………………………………………………….128
Appendix L: Results from a Lag Order Test for Granger Causality in Sploder Pertaining
to Length of Active Membership and Participation…………………………….129
Appendix M: Results from a Lag Order Test for Granger Causality in Sploder Pertaining
to Length of Active Membership and Feedback Received……………………..130
Appendix N: Interview Questions……………………………………………………...131
REFERENCES…………………………………………………………………………………137

viii

LIST OF TABLES

Table 1-1: Similarities and Differences Between Everything2 and Sploder at a Glance …..…..14
Table 3-1: definition of participation and feedback variables in Everything2..……………..…38
Table 3-2: descriptive statistics for participation and feedback factors in Everything2……..…42
Table 3-3: hazard rate model results on participation and feedback factors and length of active
membership (*p <.001)………………………………………………………………….46
Table 3-4: Granger causality results whether members’ rate of participation causes their length of
active membership (*p < .001)…………………………..………………………….…...57
Table 3-5: Granger causality results whether rate of feedback received causes length of active
membership (*p < .001)………………………………………………………………….59
Table 4-1: description of participation and feedback variables in Sploder.………….………….67
Table 4-2: descriptive statistics for participation and feedback received in Sploder……………71
Table 4-3: hazard rate model results for participation and feedback factors and length of active
membership (*p <.001)…………………………..………………………………………75
Table 4-4: Granger causality Wald results whether members’ rate of participation causes their
length of active membership (*p < .001)……………………………...…………………86
Table 4-5: Granger causality results whether rate of feedback causes length of active
membership (*p < .001)………………………………………………………………….88
Table 5-1: Everything2 long term interview participants ...……………………………………..92
Table 5-2: key themes identified based on coding ...…………………………………………….95
Table A-1: descriptive statistics for the variables with a two month cut off time in
Everything2……………………………………………………………………………..114
Table A-2: Omnibus Tests of Model Coefficients for Everything2……………………………115
Table A-3: Hazard rate model results for participation and feedback factors and length of active
membership in Everything2 (*p <.001)………………………......................................115
ix

Table A-4: results from a Variance Inflation Factor (VIF) analysis for Everything2………….117
Table A-5: Omnibus Tests of Model Coefficients……………………………………………...119
Table A-6: lag order results for Granger causality test on length of active membership and
participation in Everything2……………………………………………………………122
Table A-7: results from a lag order test for Granger causality on length of active membership and
feedback in Everything2…...…………………………………………………………...123
Table A-8: descriptive statistics for the variables with a two month cut off time in Sploder …124
Table A-9: Omnibus Tests of Model Coefficients for Sploder…………………………………124
Table A-10: Hazard rate model results on participation and feedback factors and length of active
membership in Sploder (*p <.001)…………………………….....................................125
Table A-11: results from a Variance Inflation Factor (VIF) analysis for Sploder ……………..126
Table A-12: Omnibus Tests of Model Coefficients…………………………………………….128
Table A-13: lag order diagnosis for a Granger Causality Test…...…………………………….129
Table A-14: lag order diagnosis for a Granger Causality Test …..…………………………….130

x

LIST OF FIGURES

Figure 1.1: a sample Everything2 page as it appears to members …………………………..8
Figure 1.2a: the Sploder website as it appears to members ………………………………….9
Figure 1.2b: the Sploder game creation interface…………………………………………….10
Figure 1.2c: the Sploder public games page………………………………………………….11
Figure 1.2d: the Sploder community discussion forum………………………………………12
Figure A-1: Survival graph for members of Everything2………………………………….118
Figure A-2: Survival graph for members of Sploder ………………………………………127

xi

CHAPTER 1
An Introduction to Online Communities
Scholars have been studying communities for a long time. It is still an open question on
what attributes make a community sustainable over time. A community can be defined as a
group of people who share some common attributes such as values, beliefs, expectations, and
locality (Zhang, 2012). Every community has its own set of values, beliefs and expectations;
some of these attributes may overlap with different communities. People may join and participate
in one or more of these communities when their expectations, values and beliefs match with
those of the community. A community is often viewed as a self contained unit, consisting of a
loose collection of individuals, which is continuously shifting due to changes in individual and
group behaviors (Wenger, 2001; Delanty, 2010; Durkheim, 1960).
Communities can be classified into two broad categories: offline communities and online
communities. In offline communities, members communicate primarily face to face with other
members, where they share their experiences, interests, convictions and interact with each other
1

(Bender, 1978; Etzioni and Etzioni, 1999). An Amazon Indian tribe is an example of an offline
community.
An online community can be defined as a network in which members communicate with
each other using interactive online tools such as email, discussion boards, online chat, and video.
An online community is a collection of voluntary members whose primary purpose is to
1

In general, a member can be any person who joins and participates in a community.
This joining can take many forms, depending on the individual community. Some offline
communities require registration (gym), some only require people show up (some churches),
while others are based solely on locality (members are part of the neighborhood that they live in
simply by living there, regardless of all other factors). To be a part of an offline community, one
must live in the vicinity of that community, where as an online community is comprised of
members from many different localities.
1

reinforce the collective welfare of its members; members share their interests, experiences,
convictions and interact with other members primarily using an online medium. Even though
membership is voluntary in nature, most online communities require users to register to become
members in order to be a part of the community.
In online communities, members participate in various activities such as political
deliberation, maintenance of relationship, information seeking, and social collaboration
(Horrigan, 2007). Though the topics, activities, and audiences of these communities vary, it is
important to understand that many of these communities support extensive, voluntary online
interactions among members where barriers to admission and exit are minimal (Lampe, 2009;
Lampe, 2010; Kraut, 2012). Membership in online communities is based on social connections,
common interests, and members’ beliefs and expectations about the community.
An online community allows its members to share content through its interface, where
content could be an article, a comment, a picture or a video. It is members' contributions of
2

content that is referred to as participation in the literature. Online Communities that are solely
dependent on the participation of members to generate content are referred to as content-based
communities in the literature (Bowes, 2002; Ren, 2007). Most of today's online communities are
content-based communities. Content generated by individual members provides value to other
members and to the community in general; this is necessary for a community to thrive over time
2

It is crucial to note that if a member submits posts or if a member submits votes on a
post or if a member sends messages to others, all these contributions are considered participation
in the research literature. Some studies can include feedback as part of participation; however,
this study considers members' participation and feedback as two different entities. It is also
important to note, throughout this dissertation, whenever feedback is mentioned it means
feedback received from other members. Whenever early participation or rate of participation is
mentioned, this refers to all participation variables as a group (discussion articles submitted,
votes given, comments sent). Likewise, whenever early feedback received or rate of feedback
received is mentioned, this refers to all feedback variables as a group (messages received, votes
received, cools received, and deletion of discussion articles).
2

3

(Koh, 2007). According to a study conducted by Toral (2009) on Linux port development
communities, the success of online communities is dependent on participation by long-term
members. Though the motivation for users to become members in a community varies, the basic
fact remains that participation is important for the success of online communities (Lampe 2010;
Kraut 2012, Burke, 2009; Ren, 2011).
While many online communities are successful, the success of individual communities
varies widely (Kraut & Resnick 2012). Online communities find it challenging to consistently
attract a steady stream of contributing members (Richardson 2010). Communities that fail to
retain members turn into “ghost towns” (communities with few or no members) (Phang 2009).
Even popular online communities find it challenging to retain members for a long period of time.
An example is the community associated with the massively multiplayer online game, World of
Warcraft, where a study by Williams (2006) showed that forty-six percent of members leave the
community within a month after they join. For MovieLens, an online community about movie
recommendations, half of new members cease being active within 18 days of the day they join
(Ren, 2010). Although a long tail distribution of continued activity is common in these online
communities, it is essential to understand that for online communities to survive over the long
haul, members must be retained over time.
An indicator of retention is membership length, which is the length of time a user has
been a member of a particular online community (Preece 2001). There are costs associated with
participation in an online community. Some of these costs come when a member first joins, and

3

This value is both social and economical. It provides social value because the more
members providing content, and the more content provided in general, the more the members are
able to interact with each other and the larger the community grows. This also could lead to a
long-lasting, thriving community. This in turn provides economic value for some communities
because larger communities can gain more attention from advertisers (e.g Facebook, Sploder).
3

others during activities such as learning the community’s software, getting acquainted with
community norms, making an effort to be noticed by others or integrating socially. Additionally,
there is a switching cost if members choose to switch from one community to another
community. When members switch from one community to another, it is unlikely that they will
be able to share content or communicate with members from the old community. These costs are
what make it important to the individual member to remain in a community, as the time and
effort put into the previous community no longer generates value if they leave (Kraut, 2012).
Research has shown that potential members often explore an online community before
they join, a practice called lurking, to learn about the benefits of the community before making a
commitment (Nonnecke and Preece, 2000; Nonnecke and Preece, 2001; Nonnecke and Preece,
2003). Members put their time and effort into becoming familiar with the community, both
before and immediately after joining. These costs, when incurred after joining, affect whether
members decide to continue to use the community (Arkes, 1985; Arkes, 2000). If we can
determine what makes a member actively participate and continue to use a site, we may be able
to develop metrics to predict the sustainability of an online community.
Online communities are multifaceted in nature; studies of online communities are often
conducted on certain attributes, such as members’ participation and the feedback they received
from other members on the content. This dissertation focuses primarily on how the length of
active membership is related to members’ participation and to the feedback they receive on
4

content they contribute. Online communities are networks composed of diverse sets of

4

Active membership is the length of time that community members remain active
(length of active membership). Active members are members who have registered in a
community, and still login to the community to use services offered by the community. A
member is considered active as long as they login, whether or not they contribute by submitting
posts, votes, or messages.
4

members, including members who have met each other face to face or expect to meet each other
at some point in the future, and members who do not expect to meet each other face to face at all.
Because there are diverse sets of members with different expectations and levels of participation,
sustaining long-term membership is a challenge. In this dissertation, I examine how different
types of participation and feedback received are related to how long members remain active in
two different online communities.
1.1 Research Goals
This dissertation examines how the participation of members in two different online
communities is associated with how long they remain active in their communities and how
feedback received from other members on content is related to their length of active
membership. It examines the causal nature of links between members’ participation and their
length of active membership, and between feedback received from others and length of active
membership. The goal is to understand the various factors that are related to length of active
membership in online communities. A better understanding of these factors may inform future
research. Findings of this study may also have meaningful implications for those designing
online communities.
1.2 Everything2 and Sploder as Sites for this Study
The following section describes the two communities, Everything2 and Sploder, that are
being examined to answer questions that address the research goals for this dissertation. It also
describes some of the tools that members in these two communities use for collaboration and to
generate content.

5

1.2.1 Everyting2
Everything2 (http://www.everything2.com) is an online peer-production community (a
community which supports collaboration among members to create better quality products)
similar to Wikipedia, but with an emphasis on creative writing, which started in 1998 as a spinoff
of the popular news and discussion site Slashdot. Everything2 is a compelling community for
this research because it has existed for more than 10 years. It contains a heterogeneous set of
members and has records for a very large collection of posts, known as writeups.
Everything2 is structured around writeups, which can be about any topic and in any form
of writing—e.g., fiction, non-fiction, personal experience and poetry. Each writeup is written by
a single author, who is given credit for the content. Users must register and login to post
writeups; Everything2 refers to these registered users as members. Membership and registration
in Everything2 is free. Once registered, members are never deleted from Everything2 for lack of
activity, including failure to log back in.
Members of the Everything2 community can rate other members’ writeups by submitting
votes, which can be positive (upvote) or negative (downvote) with values of +1 or -1. Members
can only vote once on any piece of content (writeup). Positive votes add to the reputation score
(experience points referred to as XP) of the author, and negative votes subtract from the
reputation score of the content author. Members with sufficient reputation in the community can
also rate writeups with a cool, which is a type of super-vote (+20). Cools make the content more
visible in the community. In Everything2, only members with 2300 or more XP are allowed to
submit cools on a writeup.
Writeups with negative ratings are sometimes deleted from the site at the discretion of
content editors, a group of volunteers that serve as administrators for the community, if they

6

think the content does not meet the community’s standards. Other reasons for deletions of
writeups include copyright violation, overly short contribution, and failure to submit posts in
English. Although deletions are made solely by the editors, prior research on deleted content in
Everything2 found that deletions are strongly correlated with negative votes (Sarkar,
2012). While anyone can view the content of Everything2, only users that became members by
creating an account can login to the community and contribute new content. They can provide
feedback, or communicate with other members. Because of these restrictions, spam is rare on
Everything2. Communication within the community can take place through the site’s messaging
feature, which is very similar to email in that it allows members to send a private message to
others. Everything2 allows the messages to be tracked for each individual member.
Everything2 is a suitable case to study for this research for the following reasons: 1)
Everything2 is a decade old site, with a diverse set of members of varying membership length
and a large volume of articles, 2) the long history of the site and its features has created a rich set
of content for analysis, 3) Everything2 is a valid representation of a content based online
community because content is solely generated by members, 4) Everything2 site administrators
have granted me access to nine years, six months of server log data, which provides an
opportunity to examine factors that might influence the length of time members stay active over
a substantial period of time.

7

Figure 1.1: a sample Everything2 page as it appears to members - this figure is to
illustrate a sample page of Everything2. The top of the figure has the Everything2 logo with a
search feature to search for a writeup in a specific topic. The left of the figure is the title
(removed) of the specific writeup that is contained on the page, the screen name of the member
(removed) that created the writeup, date the writeup was created, and the text of the writeup.
The right of the figure is the individual administration section where members can log out,
changing settings, or receive help. Below this box is a box for messages. For interpretation of
the references to color in this and all other figures, the reader is referred to the electronic version
of this dissertation.
1.2.2 Sploder
Sploder (http://www.sploder.info/ ) is an informal, game-making, peer production
community for adolescents and young adults, where membership and registration is free. Sploder
allows its members to build their own games by using a set of visual tools built in Adobe Flash.
They can then submit these games through a web browser interface for other members to play.
Sploder contains a heterogeneous set of members, and a very large collection of games.
Figures 1.2a, 1.2b, 1.2c and 1.2d show how the Sploder interface appears to members.
Members can share their games with other members in Sploder (Figure 1.2a); they can customize

8

their design using the Sploder interface (Figure 1.2b); and once members are satisfied with their
creations (games), they can make their games public in Sploder (Figure 1.2c). Sploder also
allows members to discuss game-related topics on a discussion forum page (Figure 1.2d).

Figure 1.2a: the Sploder website as it appears to members – this figure demonstrates how the
home page of Sploder appears. The top of the figure contains the Sploder logo as well as a log
in/sign up link and links to the different pages in the Sploder website. The most popular games
on the site are listed below the heading with reasons to join the community (sharing your games,
playing games, and voting) given on the right.

9

Figure 1.2b: the Sploder game creation interface – this figure shows the page that allows
members to create their own games. The top gives options to start creating a new game, load a
game, save a game, test a game, and publish a game. The center shows the screen that members
use to create the game, with buttons to the left that allow them to add different features in the
game.

10

Figure 1.2c: the Sploder public games page – this figure shows the page that lists the different
games in the community. The top of the page shows the member that created the games on this
page (name removed). There are options at the top to go to a comprehensive list of games or to a
saved list of favorite games. The bottom of the figure is a list of games that have been created by
this member. Each game has the date created, name of the game (removed), name of the game
creator (removed), rating of the game (out of five stars), how many votes the game received, how
many views the game received, and how many comments the game received.

11

Figure 1.2d: the Sploder community discussion forum – this figure shows how the discussion
forum appears to members. This is the section of the community studied in this dissertation. It
lists each discussion article that has been submitted (name removed), the member that submitted
it (name removed), how many comments it has received, which member posted the last comment
(name removed) and when the last comment was posted.
In the forum, members are allowed to submit posts on a topic. A post is an article-like
entry called a discussion article in Sploder. Discussion articles can be of any length. Only
registered members in Sploder are allowed to submit games and post in the forum.

5

5

It is important to note that the Sploder discussion forum is a community on its own.
Membership in the discussion forum requires separate registration and login in. Membership can
overlap between the original game site and the discussion forum. Being a member of one does
not imply membership in the other. In this study, I analyzed the data from the Sploder discussion
forum as I gained access to the forum data only.
12

Once a member submits a discussion article, other members can provide feedback on the
discussion article through whispers and comments. The feedback on a discussion article could be
from either a Sploder editor or a regular member. Whispers are one-on-one private
communications between an author and a member or an author and an editor. Members can also
submit votes (which can only be positive) on a discussion article.
The community is managed and administered by a group of volunteer editors. These
volunteer editors are appointed by the community manager (Sploder’s owner). Editors can delete
unwanted comments or discussion articles from the forum at their discretion if they think that a
comment or article does not meet community standards.
Sploder is a reasonable online community to study for the following reasons: 1) Sploder
is about five years old and has a heterogeneous set of members and a large collection of
discussions related to games, 2) the discussion forum is fairly large; nearly 4000 members
participate on a regular basis, 3) Sploder administrators have granted access to two years, seven
months of server log data.
1.3 Why Did I Study Two Communities Instead of One?
It is often argued that online community studies lack generalizability; findings for one
online community often do not resemble findings for another community (Gupta, 2004). There
are various reasons why scholars in the past have studied a single community. It is challenging to
study multiple online communities at the same time because the interfaces and designs of these
communities are complex in nature and they may vary substantially. Different communities are
also often constructed to serve different purposes.
Despite differences among online communities, most online communities support a peer
production process and support an environment for collaboration, where members can provide

13

support and feedback to other members. Most of these communities (Wikipedia, Everything2,
Sploder) are managed by members who volunteer their time; these members can be active
members of the community and they are expected to know the community’s norms and
regulations. Although these communities are designed to serve different purposes, for many of
these communities there is still a large overlap of features, such as voting on other members’
posts and sending messages. The various features within these communities’ interfaces also often
support members’ interactions in a similar fashion, and make possible similar types of
interactions.
Similarities
Peer production communities.
Members can provide
feedback/comments and vote on others’
posts.
Administered by volunteers (editors).
Provide a messaging facility for
members.

Differences
Everything2: anyone can participate.
Sploder: community for young adolescents.
Everything2: centered on creative writing.
Sploder: Flashbased game design community.
Everything2: in existence for more than 10 years.
Sploder: in existence for 5 years.
Everything2: facilitates all types of creative writing.
Sploder: facilitates all types of game related
discussions.

Table-1.1: Similarities and Differences Between Everything2 and Sploder at a Glance
Two different communities, Everything2 and Sploder, are selected for this study to
facilitate comparisons that will give a better sense of the extent to which findings for one
community might generalize to others.
1.4 Approaches
This dissertation employs a mix of qualitative and quantitative methods to study how
members’ participation and feedback received from other members influence the length of time
they stay active in these communities.

14

(1) I identify a set of attributes that may be related to length of active membership based on
literature reviews from various fields such as human-computer-interaction (HCI) and social
psychology.
(2) Two different quantitative approaches (empirical methods) are used to examine how factors,
participation and feedback received, are related to length of active membership for two different
communities.
(3) A qualitative study is also conducted by analyzing interviews with long term members of one
of the communities to further understand factors that may be associated with length of active
membership.
(4) The findings from the analyses are then used to identify practical implications for designing
and maintaining online communities.
1.5 Dissertation Outline
In Chapter 1, I identify important attributes of communities that may contribute to the
sustainability of online communities. Based on these attributes, I elaborate on the importance of
long-term active membership. I also state the research goals of this dissertation topic. I describe
the two communities I am studying, Everything2.com and Sploder.com, to provide context for
later chapters.
Chapter 2 reviews the existing literature on membership in online communities. Based on
the literature reviewed, factors that may influence length of active membership are identified. I
used the results from the current literature and gaps from the literature to generate high level
constructs (factors such as members' participation and feedback received from others) that could
affect length of active membership.
Chapter 3 describes the quantitative analyses conducted to gain insights on how length of
active membership is related to different aspects of participation and feedback received from
15

others in the Everything2 online community. Two different types of analyses are used to test the
research hypotheses. They are a Cox proportional hazard rate model and a Granger causality test.
The hazard rate model is used to test whether there is statistically significant correlation between
members’ participation and their length of active membership or feedback they received from
others and their length of active membership. Previous studies have pointed to correlations
between different factors such as participation and feedback received on length of active
membership as evidence of causal links between these factors and length of active membership.
Because these relationships have plausible non-causal explanations, findings of correlation are
not satisfactory evidence of causal connections. In this dissertation, in addition to correlational
evidence I employ a more rigorous statistical test, a Granger Causality test, to more rigorously
determine whether these correlations are consequences of causation.
Chapter 4 describes the quantitative analyses conducted to gain insights on how length of
active membership is related to different aspects of participation and feedback received from
others in the Sploder online community. Again, a Cox proportional hazard rate model in terms
of Survival analyses is used to test whether there is statistically significant correlation between
members’ participation and their length of active membership or feedback they received from
others and their length of active membership. A Granger causality test is further used as a more
rigorous test for causality.
Chapter 5 presents the findings from interviews with long-term members from
Everything2. Data from the interviews is analyzed to gain further insights about factors that may
affect length of active membership.
Chapter 6 summarizes the research and its contributions and provides recommendations
for the design of future online communities. Limitations of the study are also discussed.

16

1.6 Contributions
This dissertation makes the following contributions:
1. The dissertation’s findings should be useful to individuals and organizations who
design and maintain online communities. Understanding how participation and feedback factors
may affect length of membership in online communities could inform the design of tools that
may reduce the burden of peer production editors in managing the community.
2. All previous studies have only examined correlational evidence. Correlation is not
dispositive proof of causality. Plus, there are also plausible non causal explanations. Therefore,
this dissertation applies a more rigorous statistical test for determining whether causal links exist
between members’ participation and length of active membership and between feedback
members receive and length of active membership.
3. It addresses questions about the generality of findings in the literature on membership
in online communities by studying two communities to examine whether the results from the two
communities are similar to each other.
4. It also studies members’ activities for a period of time longer than two years (and
nearly 10 years of one of the communities studied) to examine whether the variables constructed
have an effect on length of active membership. Because earlier studies examined online
community members for shorter periods of time, the findings of this study shed light on the
extent to which relationships found by previous studies persist over longer periods of time.
5. It identifies possible key factors that may influence length of active membership in
online communities proposed by the prior research literature. It examines how these key factors
are related to length of active membership with more rigorous statistical tests. It also considers
rate of participation and rate of feedback received rather than prior studies use of measures of

17

total participation and feedback received, as members who are active in the community for a
longer period of time will most likely have higher measures of total participation and feedback
received simply by virtue of having been members of the community longer.
1.7 Chapter Summary
In this chapter, I have used the term ‘online communities’ to define technology mediated
social interactions. The term online community is often used in a broad sense to describe social
network sites, social media, and social computing sites. It was not my intention to provide a
canonical definition for online communities. Rather, I have considered some of the salient
characteristics of communities such as 1) use of socio-technical features built-in the design, 2)
interactions of members in the community, and 3) value and dependence on the user-generated
content. I used these characteristics to define an online community for this study. In this chapter,
I discussed why length of active membership is important for an online community; based on
this, I also provided the overall research goals for this dissertation.

18

CHAPTER 2
Literature Review on Factors that may Affect Active Membership
6

Online communities have been observed to suffer very different fates. Some
communities, such as Facebook, grow, thrive and seem to endure (relative to the short amount of
time it has been possible to have an online community). Other online communities never gain
any traction and just disappear, and still others seem to thrive briefly and then decline (e.g.,
MySpace). A study conducted by Deloitte Development LLC reported that even though 60% of
businesses invest time and money into building their own online communities to understand their
customers’ needs, 35% have less than one hundred members (Moran, 2008). Researchers have
sought to identify factors that contribute to the sustainability of online communities to determine
why some online communities thrive while others do not.
The research reviewed in this chapter suggests that three key factors contribute to the
sustainability of an online community: the length of time members stay active in the
7

community, the ways in which members engage with the community and the frequency with

6

In this dissertation, an online community is defined as a network in which members
communicate with each other using interactive online tools such as email, discussion boards,
online chat, and video. The characteristics of an online community include: 1) communication
between members of a site, 2) reliance on user-generated content where content is solely
generated by the registered members and 3) the use of online tools for the purpose of
communication and information sharing.
7

A member is any person that has registered with a community. Members have to login
after registering to participate. Active members are members who have registered in a
community, and still login to the community to use services offered by the community. A
member is considered active as long as they login whether or not they contribute by submitting
posts, votes, messages, or other content.
19

8

which they do so, and how other members respond to them when they participate. For an online
community to thrive, it is necessary that members remain active in the community and continue
to interact with each other for a non-negligible amount of time. A community cannot be built
solely with people who use it once. Further, if all new members cease being active shortly after
joining, the community will eventually exhaust the supply of potential new members and the
community will cease to exist. To the extent an online community attracts members by offering
the associational benefits of an online community, it must provide an experience sufficiently
compelling for them to continue using the service and participating with other members for some
time beyond the first time they use the service. Research to date identifies members’ own
participation in communities and the feedback they receive from other members as factors that
may affect length of active membership.
This chapter reviews the relevant literature and identifies gaps in the literature and
limitations of prior research. The subsequent chapters present my research which addresses those
gaps and limitations.
2.1 Membership in Online Communities
Studies have found that keeping members actively engaged in an online community over
the long-term is a challenge, even for the most popular online communities. For example, 60% of

8

It is crucial to note that if a member submits posts or if a member submits votes on a
post or if a member sends messages to others, all these contributions are termed participation in
the literature. If a member receives votes on their post or if a member receives messages from
others, all these interactions are referred to as feedback. Some studies can include feedback as
part of participation, however this study considers member' participation and feedback (received)
as two different things. It is also important to note, whenever early participation or rate of
participation is mentioned, this refers to all participation variables as a group (discussion articles
submitted, votes given, comments sent). Likewise, whenever early feedback received or rate of
feedback received is mentioned, this refers to all feedback variables as a group (messages
received, votes received, cools received, and deletion of discussion articles submitted).
20

members who create a Wikipedia account never log back in after their initial (first) login to
submit or edit an article (Panciera, 2009). More than half of the developers who registered to
participate in the Perl open-source development community never returned after posting their
first message (Ducheneaut, 2005). Given that sustaining a solid base of active long-term
members is critical to the sustainability of an online community, and assuming that we want to
help these communities thrive, it is important that we identify factors that contribute to the length
9

of active membership in online communities. The factors that identified below are members'
participation and the feedback they receive from others.
2.2 Participation in Online Communities
Previous studies reported that unequal participation is a common phenomenon in online
communities. For example, in a study of Usenet communities, Whittaker et al. (1998) reported
that 2.9% of members within the community contributed nearly 25% of the messages, and
around 27% of the messages were contributed by members who posted only once. Studies of the
P2P file-sharing community, Gnutella, reported similar findings. Eytan’s (2000) study found that
25% of Gnutella community members contributed 98% of files, and that 66% of members
contributed almost nothing. A later study reported that nearly 85% of its members contributed
nothing to the Gnutella community (Hughes, 2005). Based on these two studies, it appears that
the longer the Gnutella community exists, the smaller the percentage of members that contribute
to the community.
Another study that examined unequal participation was conducted by researchers from
IBM. They reported that within an internal community maintained by IBM, one percent of the

9

The length of active membership refers to the amount of time a registered member is
active in a community. Active refers to any member that has continued to login to a community.
21

members were super contributors; most of the posts came from them. 66% of members were
moderate contributors, and 33% were peripheral contributors, they contributed almost nil
(Stewart, 2010). Similarly, two other studies reported that 4% of developers contributed 88% of
the new code in the open source Apache community (Mockus, 2002), and only 58% of newmembers to a Usenet group posted a second time (Argullo, 2006). Panciera (2009) further
reported that the modal number of edits on Wikipedia is one per person. All of these studies have
reported that participation inequality is prevalent in online communities (Mockus, 2002; Argullo,
2006; Brothers, 1992; Nielsen, 2006; Stewart, 2010; Eytan, 2000). This also suggests that the
rate (frequency of participation) at which each individual member participates in a community
varies among members.
Scholars have studied whether members in a community will continue to participate
based on their total participation in the first few weeks (Panciera, 2009; Burke, 2009). However,
they did not account for members’ rate of participation, which may change over time.

10

Are

members who have a higher rate of participation more or less likely to have longer active
memberships than those who have a lower rate of participation? And related to that, how is the
rate of members' participation related to their length of active membership? It is not only new
members, but long-term members who may also stop participating, and this could affect the
sustainability of an online community. It is likely that users who are members of a community
for a longer period will have a higher total participation (cumulative total for of each type of
participation such as posts and votes) than members who are in the community for a short period
of time. For example, someone who is a member of a community for five years will have a
10

Rate of participation was computed by dividing members’ total amount of
participation (i.e.total posts submitted, messages sent, or votes submitted) by their total length of
active membership.
22

higher total participation than a member who is in a community for five months if their rates of
participation are the same. Thus, rate of participation is an appropriate construct to use in order
to predict length of active membership.
It is important to note, none of the above studies explicitly examined the length of active
membership in a community (Panciera, 2009; Burke, 2009; Lampe, 2005). However, there is
one study that reported some correlational evidence between length of active membership and
members' total participation (Wang, 2012). The study found messages submitted by members
that contain emotional content (e.g expressing understanding, encouragement, affirmation,
sympathy, or caring to others) was positively correlated with how long members remained in a
community and messages submitted by members which contain informational content (e.g.
giving advice, referrals or knowledge) was negatively correlated with how long members
remained in the community (Wang, 2012). The study did not consider how members'
participation changed over time. Further, neither Wang’s study nor any other previous studies
explicitly test for causal links between members' length of membership and rate of participation.
To address this gap in the literature, I examine the relationship between members' length of
membership and rate of participation beyond simple correlational tests.
Finally, most prior research on members' activities in online communities examined
participation over a short time period, ranging from a minimum of three months (Burke 2009) to
a maximum of sixteen months (Churchill 2004). An exception to this is Wang's (2012) study; the
study contained data for nine years, three months. Considering the fact that, except for Wang's
study, all the studies I am aware of only looked at data for a maximum of sixteen months, I
choose to examine the relationship between members' length of active membership and rate of
participation for two online communities (Everything2 and Sploder) over two years. I examine

23

length of active membership in Everything2 for a period of nine years, six months and length of
active membership in Sploder for a period of two years, seven months. Looking at members'
activities over a longer period of time may reveal interesting patterns that were not identified in
the literature to this point. The advantage of examining a longer period of time is that it could
eliminate the limitations inherent in the shorter observation periods of other studies. This could
assist in identifying potential unmeasured causes among variables and may establish the
causality between them.
Examining the effect of participation, more precisely the rate of participation, on
members’ length of active membership may provide useful insights regarding the sustainability
of online communities. Based on this information, my first research question is:
RQ1: How is length of active membership related to the rate of participation of individual
members in a community?
For this RQ, I am examining both a correlational and a causal relationship between length
of active membership and members' rate of participation in a community.
2.3 Early Participation in Online Communities
Members' early participation might be a significant predictor for their length of active
membership. Studies have reported that early participation is a strong predictor on whether a new
member will continue using a site (community) (Burke, 2009, Joyce, 2006, Lampe, 2005,
Panciera, 2010). These studies examined how members' early participation was related
(positively correlated) with their use of the site for the first few months (maximum 3 months). In
this dissertation, I examine how members’ early participation is related to their length of active
membership for two different communities over two years.

24

I define early participation as whether a member posts a first article in a community,
whether a member posts a first message in a community, and whether a member submitted a first
vote on the content. Each of these is considered as early participation. Early participation can
also include members' second post, members' second vote on a post, members’ second message
submitted and so on. However, due to the complexity of the server log data, I have only used
first post, first message, and first vote on a post as measures of early participation.
My second research question is:
RQ2: How is length of active membership related to early participation of individual members in
a community?
2.4 Feedback in Online Communities
Another factor that may affect members’ retention is feedback they receive from other
members on content they contribute (Burke, 2009; Lampe, 2005). In an online community,
feedback gives an idea to authors on how their posts are received by the community, and whether
the submitted posts need further editing. Feedback can be a comment or vote on a post, or it can
be a direct message from other members to an author.
Studies have reported that feedback may influence members continued use of a
community. For example, Zhang and Zhu (2006) reported that feedback in the form of edits from
other contributors (editors) reduced the author’s incentive to contribute in Wikipedia; Halfaker,
Kittur, and Riedl (2011) found that new members’ continued use of Wikipedia was negatively
associated with the feedback in the form of editorial reverts (corrective edits) on their posts. A
study done by Choi et al., (2010) reported that a positive correlation exists between new
members’ continued use of Wikipedia and negative feedback in the form of editorial reverts
(corrective edits) from the editors. The results from Choi's study differ from Zhang's study and

25

Halfaker's study. Feedback that members receive can be either positive or negative, and it can be
positively or negatively associated with members’ future use of a community. For example, Choi
et al. (2010) found a positive correlation between new members’ number of edits for Wikipedia
articles and negative feedback from editors.

11

Lampe (2005) reported that feedback in terms of receiving a rating on the first post
(binary) is negatively associated with posting a second time. However, whether the rating was
positive or negative feedback on the first post was not explicitly reported in this study. Sarkar
(2012) found that positive feedback on the content is positively correlated with members' length
of active membership. The results from these studies may indicate that the sustainability of an
online community is associated with the feedback members received in some form from other
members. To the best of my knowledge no study has yet tested for causal links between length of
active membership and character of feedback. Therefore, my third research question is:
RQ3-: How is length of active membership related to rate of feedback individual members
receive on the content in a community?
For this RQ, I am examining both a correlational and a causal relationship between length of
active membership and rate of feedback received from others on the content.
2.5 Early Feedback in Online communities
Previous research has found that initial feedback received on an initial post is a predictor
of whether or not new members will continue to use an online community (Lampe, 2005; Ren,
2007). For example, Joyce (2006) reported that when new members received responses from
others on their first post they were more likely to post a second time, compared with new
members who did not receive a response.
11

Choi studied whether the number of edits was correlated with negative feedback in
terms of editorial reverts.
26

The initial feedback received from others on a members' first post could determine
whether they are likely to post in the future (Joyce, 2006; Lampe, 2005; Burke, 2009). Results
from these studies suggest that initial feedback received on the content is a strong predictor on
whether a new member will continue using a site (community). These studies examined how
members' initial feedback received was related to their use of the site for the first few months
(maximum 3 months).
In this dissertation, I examine how members’ early feedback received is related to their
length of active membership for two different communities for over two years. Early feedback is
referred to as members' first vote received on a post and first comment received on a post. In this
dissertation, I am considering multiple instances of feedback received (e.g. first vote received on
a post and first comment received on a post), whereas previous studies have used only one
instance of feedback received (e.g. first comment received on a post). Because of this, I have
used early feedback received rather than initial feedback received. It is important to note, early
feedback can also include members' second vote on a post, second comment on a post and so on.
Due to the complexity of the server log data, I have only used first vote on a post and first
comment on a post as early feedback received.
Thus, my last research question is:
RQ4-: How is the length of active membership related to the early feedback individual members
receive on their content in a community?
Length of active membership in online communities may also be examined through the lens of
social science theories. The next section discusses theories that could explain members’ length of
active membership in a community.
2.6 Use of Social Science Theory in this Dissertation

27

Many different types of social science theories which were originally developed for
offline communities have been used to explain members' behavior in online communities.
However, most of the literature reviewed in this chapter does not describe itself as based on any
specific theory (Wang, 2012; Zhang, 2006; Choi, 2010; Priedhorsky, 2007; Panciera, 2009;
Hughes, 2005; Arguello, 2006; Mockus, 2002). A reason for this could be that most of these
theories were developed for offline communities, an exception being the hyperpersonal model of
Walther (2007). Compared to offline communities, online communities are constrained in
various ways. A fundamental constraint is how the rules and norms governing participation and
membership are developed in online communities compared to offline communities. In online
communities, rules are designed by entrepreneurs and designers of different communities,
whereas in offline communities, rules evolve endogenously over time as members interact with
each other. It is important to note, for an online community, rules may evolve; however, any
change in the rules still needs the approval of the entrepreneurs and designers before they are
implemented.
Even though most studies have not mentioned using a specific theory (or theoretical
framework) when they examined members' behavior in online communities, a few studies did
use a theory. For example, Joyce (2006) used theory of commitment and socialization to groups.
The theory explains how individuals and groups change over time; based on the theory, Joyce
examined the communication exchange between members in an online community. Farzan
(2011) used bond-based identity theory to explain the factors that may contribute to interpersonal
relationship among members in a game community. Neither theory of commitment and
socialization to groups or bond-based identity explains length of active membership in online
communities. These theories rely on the fact that bonds are an important part of members'

28

interaction, yet these bonds are weak in an online community due to the fact that members
participate from different locations, membership is purely voluntary in nature, and the true
identity of the members can be anonymous (Moreland, 2001; Poblocki, 2001).
Another theory used to explain new members’ behavior in an online community is social
learning theory by Lampe (2005) and Burke (2009). The theory states that in a social situation
people observe others and learn how to act based on the observation. The theory was initially
applied to children observing and learning from adults. In both online and offline communities,
people do not always observe and learn from that observation how to act. This is especially true
with adults or people that are already established in a community, because they already have an
established behavioral pattern and they are less likely to change. The theory can explain new
members' behavior in an online community with more success than explaining long-term active
members’ behavior (Burke, 2009).
Though none of these theories fully explain members' behavior in online communities,
Burke (2009) and Lampe (2005) used social learning theory and Joyce (2006) used theory of
commitment and socialization to groups to examine how initial feedback received on a post (first
feedback received on a post) could predict members' continued use of a site (that is the chance of
posting a second time). This dissertation builds on the theoretical constructs borrowed from
these studies. Hence, I used social learning theory and theory of commitment and socialization
to groups (to a certain extent) to generate my participation variables (such as rate of posts, rate of
votes given, rate of messages sent, first post submitted, first votes given) and feedback received
variables (such as first vote received on a post, first comment received on a post, rate of votes
received and rate of comments received).

29

It is important note the variables extracted for the quantitative study in Chapter 3 &
Chapter 4 are derived from these theoretical constructs (theories) in a broad sense. I have not
used these theoretical constructs to identify specific variables incorporated in the study (first vote
received, first post submitted, etc), rather I used them to generate high level constructs such as
members' participation and feedback received from others.
2.7 Chapter Summary
The long-term viability, which is associated with members’ length of active membership,
of online communities depends on several factors. The important question is what these factors
are and how these factors can be generalized across communities. In this chapter, I have reported
on the existing literature in terms of two key factors that could affect members’ length of active
membership: members' participation and feedback they received from others on the content. I
have illustrated some of the existing literature gaps and proposed four research questions to
address the literature gaps.

30

CHAPTER 3

Examining Length of Active Membership for Everything2: A
Quantitative Study
Online communities support extensive interactions among members with minimal
barriers to admission and exit (Lampe, 2009; Lampe, 2010; Kraut, 2010). Membership in these
communities is voluntary. Most communities are content-based communities, where content is
generated by individual members. Because their content is generated by members, the success of
online communities is predominantly dependent on long-term active membership (Toral, 2009;
Lampe 2010; Kraut 2010, Burke, 2009; Ren, 2011). For a community to thrive, it is necessary
that members remain active in the community and continue to interact with each other.
Given that sustaining a solid base of active long-term members is critical to the
sustainability of an online community, it is important that factors that contribute to the length of
active membership are identified. Research to date has identified two key factors that may
contribute to the length of time members remain active in a community (length of active
membership), the ways and frequency with which members engage with the community
(members’ participation) (Burke, 2009; Argullo, 2006; Neilson, 2006 ) and how other members
respond to them when they participate (feedback received from others) (Zhang, 2006; Halfaker,
2011; Choi et al., 2010; Lampe, 2005; Kraut, 2007; Joyce, 2006). Studies of participation found
that members' own participation may be predictive of whether they remain active in a
community (Burke, 2009; Argullo, 2006; Neilson, 2006). Other studies have reported that
feedback received from other members may influence members' continued use of a community
(Joyce, 2006; Lampe, 2005).
While these studies have added to the knowledge of what may contribute to the
sustainability of online communities, there are a few key points that need to be addressed to
31

solidify our understanding of this process. In general, studies have not looked into these points,
and more than one study should look into each to confirm findings of other studies. Also, it may
add to the understanding of what factors increase the amount of time a member remains active in
a community if all of these key points are addressed in a single study.

12

1. Studies to date have not explicitly sought to identify factors that influence length of
active membership in a community.
2. Studies to date also have not addressed the implications of unequal participation and
feedback received over time. They have only looked into total participation and
feedback received over time, but such participation and feedback received could vary.
Rate of participation may be a better indicator of members’ activity level, as members
who have been active for a longer period of time will most likely have a larger total
participation.
3. Prior research has reported findings for analyses of data from online communities
covering members' activities over short periods of time, ranging from three months
(Burke, 2009) to sixteen months (Churchill, 2004). Because members may remain
active participants in online communities for many years, studies using data collected
over much shorter participation periods may not accurately generalize to time spans
that extend much beyond the lengths of the periods studied.

12

In general, studies do not take these five points into account, this does not mean that
no study has considered them, though no single study I am aware of has taken all five into
consideration. For example, Wang’s (2012) study reported correlational evidence on the
relationship between length of active membership and members’ participation using nine years,
three months worth of data, though the study did not examine causal links or take into account
changes in the rate of participation. Cheshire’s (2008) study used rate of participation, but the
study did not examine length of active membership or causal links and only studied data
covering a period of seven months.
32

4. All prior research has used correlational evidence when they reported how
participation and feedback received are related to members continued activity in a
community. While the correlational findings have been interpreted as evidence of
causation, there are also plausible non-causal explanations for these correlational
relations and it is generally understood that correlational evidence is not dispositive
proof of a causal link. No previous study I am aware of explicitly tests for causal
links between length of active membership and members’ participation or between
length of active membership and feedback received from others.
5. Prior research has only studied one community at a time. They have not tested to
what extent the findings can be generalized to different communities. This is the
reason why Chapter 3 and Chapter 4 of this dissertation examine two separate
communities.
This chapter addresses these gaps in the research literature by using data preserved on
server logs for Everything2, an online community that allows its registered members to interact
with other members, to statistically examine the effects of characteristics of their own
participation and feedback received from other members on the length of time that community
members remain active (length of active membership).

13

13

The chapter first describes the server

In this dissertation, a member is any person that has registered with a community.
Members have to login after registering to participate. Active members are members who have
registered in a community and still login to use services offered by the community. For this
study, a member is considered active as long as they login, whether or not they contribute by
submitting posts, votes, or messages.
An online community is defined as a network in which members communicate with each
other using interactive online tools such as email, discussion boards, online chat, and video. The
characteristics of an online community include: 1) communication between members of a site, 2)
reliance on user-generated content where content is solely generated by the registered members,
and 3) the use of online tools for the purpose of communication and information sharing.
33

log data and the operationalization of variables for this study. It examines the length of active
membership through the lens of a Cox proportional hazard rate model, a statistical technique (a
type of survival analysis) that is used to examine the influence of certain explanatory variables
(independent variables) on the length of time from registering that a member remains active in a
community. Hazard rate models are used to examine factors that may influence the amount of
time that elapses before a discrete event, such as an individual catching a disease or adopting a
new product, occurs. In this dissertation, the event is cessation of active membership. Previous
studies have used hazard rate models to examine correlations between members' participation
and their continued use of a community (Wang, 2012; Yang, 2010; Farzan, 2011). Because a
hazard model identifies correlations, and it is well-known that correlational evidence is not
dispositive proof of causation, and in the case of the relationships examined here there are
plausible non-causal explanations for the identified correlations, this dissertation also uses a
Granger causality test to more rigorously test for causal links between members’ participation
and length of active membership and between feedback received and length of active
membership.
3.1 Overview of Everything2
Everything2 (http://www.everything2.com) is an online peer-production community (a
community where members work in collaboration to create better quality products) similar to
Wikipedia, but with an emphasis on creative writing. Content in the community is solely
generated by members. Everything2 contains a heterogeneous set of members and a very large
collection of posts, known as writeups.

14

Writeups can be about any topic and in any form of

14

Everything2 is a global community. A recent visitors’ profile, found on thatweb.com,
showed that visitors to Everything2 are represented by many different countries. Though the
34

writing such as fiction, non-fiction, personal experience, or poetry. Each writeup is written by
one author, who retains complete ownership. However, the authors can, and often do, take
advice from other community members to improve the quality of their writeups.
Members of the Everything2 community can rate other members’ writeups by submitting
votes, which can be positive (upvote, +1) or negative (downvote, -1). Members can only vote
once on a piece of content. Positive votes add to the experience points (commonly referred to as
XP) of the author, and negative votes subtract from the experience points of the content author.
Members can gain XP when they submit a writeup or submit a cool. Members with sufficient
points in the community can also rate writeups with a cool, which is a type of super-vote (+20).
Cools make the content more visible in the community than regular votes. In Everything2, only
members with 2300 or more XP are allowed to submit cools on a writeup.
Writeups with negative ratings are sometimes deleted at the discretion of the content
editors, a group of volunteers that serve as administrators for the community, if they think the
content does not meet the community’s standards. Everything2 supports a messaging feature for
its members, which is very similar to email in that it allows Everything2 members to send private
messages to other members; Everything2 allows the messages to be tracked for each individual
member in order to provide better services to the community member. Messages are also tracked
so that administrators can view and monitor them if needed. Members have to register with the
community, which is free, and login to be able to submit writeups or interact with other
members. Once registered, members are never evicted from Everything2’s membership, even
though they may have been inactive for a long time.

majority of visits (40.8%) come from the United States, visitors also come from India, United
Kingdom, Philippines, Canada, Australia, etc.
35

3.2 Data and Operationalization of Variables in Everything2
I examine the length of active membership for Everything2 members, where length of
active membership is operationalized as the number of days from the date a member’s account is
created to the member’s last login date, both of which are recorded on the server log. Prior
literature suggests that members’ participation and feedback received from other members
should contribute to length of active membership (Burke, 2009; Joyce, 2007; Lampe, 2005). To
study factors that may affect members’ length of active membership, I have constructed the
following measures of participation; rate of participation, which is the total count of each type of
participation (writeups submitted, votes given, and messages sent) divided by the total length of
active membership, and early participation, which includes a member’s first writeup, first upvote
submitted, first downvote submitted and first message sent. Similarly, to examine the effects of
different types of feedback received on length of active membership, I have constructed the
following measures of feedback received; rate of feedback received, which is the total count of
each type of feedback received (votes received, messages received, cools received, and fraction
of deleted writeups) divided by the total length of active membership, and early feedback
received, which includes a cool received on a member’s first writeup, upvote received on a first
writeup, downvote received on a first writeup, and deletion of a first writeup. These measures
have been constructed from data preserved on Everything2 server logs.

15

15

The goal is to examine

Throughout this dissertation, when feedback is mentioned, it always refers to feedback
a member received from other members. Participation always refers to the participation by a
member whose length of membership is being examined. Some form of feedback (especially
votes) can be participation for others. If a member submits writeups, votes, or messages it is
termed as participation, and if the same member (based on user_id) receives votes, or messages
from others, it is termed as feedback. Participation and feedback variables are measure id(s) (or
proxy ids) with records of activities and contributions preserved on server logs. It is also
important to note, whenever early participation or rate of participation is mentioned, this refers to
all participation variables as a group (discussion articles submitted, votes given, comments sent).
36

whether members' participation (rate of participation and early participation) and feedback
received from others (rate of feedback received and early feedback received) on their content are
related to their length of active membership.
I first use a hazard rate model to estimate the effects of participation and feedback
received on length of active membership. I then employ a Granger causality test to examine the
extent to which the correlational evidence generated by the hazard rate model does in fact reflect
causality between the dependent variable and the independent variables in the model.
3.3 The Everything2 Data-Set
I collected information for all Everything2 members (100,682) who created an account
from November 11, 1999 to May 25, 2009. Table 3.1 gives the definitions for the participation
and feedback variables employed for this study of the Everything2 community. The data set
contains timestamps for members' last login and members' account creation dates and times.

16

It

also contains total counts since the first login for each type of participation variable and total
counts for each type of feedback variable, which I have used to calculate the rate of participation
and rate of feedback received. The data set contains timestamps for participation activities (such
as posts submitted, votes given and messages sent) for each member, and for each member,
timestamps for feedback received (such as votes received, messages received, or deletion of a
writeup).

Likewise, whenever early feedback received or rate of feedback received is mentioned, this
refers to all feedback variables as a group (messages received, votes received, cools received,
and deletion of discussion articles).
16

The data I gained access to did not contain other login information, only the last
login. While other logins may have been captured, I did not receive that information.
37

Factors

Variables
Length of Active
Membership

Rate of
Participation

Rate of Writeups

Rate of Messages sent

Rate of Votes given

Rate of
Feedback
Received

Rate of Messages
received
Rate of Votes
received
Rate of Cools
received

Early
Participation

Early
Feedback
Received

Fraction of deleted
Writeups
First Writeup
submitted or not
First Message
submitted or not
First Downvote
submitted or not
First Upvote
submitted or not

Description
Amount of time a member has been active (last
login date-account creation date) in the community
The total count of writeups a member submitted
divided by the amount of time the member was
active in the community
The total count of messages a member sent divided
by the amount of time the member was active in
the community
The total count of votes a member submitted
divided by the amount of time the member was
active in the community
The total count of messages a member received
divided by the amount of time the member was
active in the community
The total count of votes a member received divided
by the amount of time the member was active in
the community
The total count of cools a member received divided
by the amount of time the member was active in
the community
The total deleted writeups divided by the total
amount submitted
Whether or not a member submitted a first post
(writeup)
Whether or not a member sent a first message to
another member
Whether or not a member submitted a first
downvote
Whether or not a member submitted a first upvote

First Upvote received
on First Writeup

Whether or not a member received a first upvote on
first writeup

First Downvote
received on First
Writeup
First Cool received on
First Writeup
Deletion of First
Writeup
Familiarization Time

Whether or not a member received a first downvote
on first writeup

Whether or not a member received a first cool on
first writeup
Whether or not a member’s first writeup was
deleted
Control
The amount of time between a member creating an
account and posting a first writeup
Table 3.1: definition of participation and feedback variables in Everything2

38

Participation variables include rate of writeups submitted, rate of messages sent, and rate
of votes given. With everything else equal, users who are members of a community for a longer
period will have a higher total participation (sum of each type of participation such as sum of
post, sum of votes) than members who are in the community for a shorter period of time (Wang,
2012). I used rate as a measure to account for the effect of unequal lengths of participation. For
each of these variables, rate was calculated by dividing a member’s total count for the variable
17

by the member’s length of active membership.

For example, a member’s rate of writeups is

calculated by dividing the member's total writeups submitted by the member’s length of active
membership.
Participation activities for members vary from community to community. In
Everything2, these participation activities (features) are broken down into a relatively small
number of discrete types of activities that are explicitly named and supported by the community,
such as submitting a writeup, submitting a vote, and submitting a message. If members use these
features, they may become more engaged in the community and stay active longer.
Four early participation variables were used to examine the effects of early participation

17

It is important to note that the server log data I have access to does not contain total
upvotes given, total downvotes given, total upvotes received, or total downvotes received for
individual members. Rather it contains total votes given and total votes received. It also contains
early feedback (such as first upvote submitted on first writeup, first downvote submitted on first
writeup) and members’ early participation activities (such as first writeup submitted or not, first
upvote submitted or not, first downvote submitted or not). It does not contain timestamps for
every individual participation activity or every individual feedback received. The rate of votes
given was measured for individual members by the sum of upvotes and downvotes given divided
by total length of active membership.
Also, Everything2 did not automatically log a member out if a member closed the
browser without signing out. In this instance, the server logs did not capture the next login. This
is not accounted for in the analysis as this information is not recorded in the server log.

39

on length of active membership. The variables include: 1) whether or not a member submitted a
first writeup, 2) whether or not a member sent a first message, 3) whether or not a member
submitted a first upvote, and 4) whether or not a member submitted a first downvote. Early
participation variables are binary.
Both general feedback variables and early feedback variables were constructed from the
server logs. General feedback variables include rate of messages received, rate of votes received,
and rate of cools received. In the analysis, I also included the fraction of writeups deleted as a
general feedback variable. The fraction of writeups deleted was computed by dividing total
18

deleted writeups by total writeups.

I used the first cool received on first writeup, the first upvote received on first writeup,
the deletion of first writeup, and the first downvote received on first writeup as early feedback
variables. Early feedback received from others can be either positive or negative. To assess the
effect of early positive feedback received from others on the length of active membership, I used
a) whether or not a first cool was received on first writeup and b) whether or not a first upvote
was received on first writeup. Similarly, to assess the effect of early negative feedback received
from others on the length of active membership, I used a) whether or not a first downvote was
received on first writeup and b) whether or not a member’s first writeup was deleted (deletion of
first writeup). Early feedback variables are binary.
In addition to the variables, I have also included familiarization time – time between
creation of the account and the first writeup – as a control variable for this analysis. When

18

Fraction of deleted writeups is also included as a control variable. While not a rate, it
can be multiplied by rate of writeups submitted to derive a rate for deleted writeups. For
exposition convenience, I will list fraction of deleted writeups as one of the rate of feedback
received variables for the remainder of this chapter.
40

members join a community, it takes a certain amount of time to become familiar with the
community before they submit their first post (writeup). Familiarization time provides a measure
(a proxy) on how long a member takes to become familiar enough with both the use of features
in a community and norms of participation in a community to submit their first post.
3.4 Data and Measures
First, as a diagnostic test, I ran a missing value analysis test (MVA test) on the data. The
MVA test detected both missing values and outliers in the data. The test used a grid search
approach to detect missing values. It also used the range (Mean + 2*SD, Mean - 2*SD, where
SD is the standard deviation) to detect outliers. The analysis found 200 rows with missing values
or outliers in the data. These 200 rows were removed from the sample.

19

Next, members (registered users) whose length of active membership was less than a day
were removed from the dataset. These registered members created an account in the community
but did not login after a day. It is important to note that these registered members who did not
login after a day also did not post anything. These excluded members had zero active usage. This
reduced the sample size to 39,904 unique members.

20

19

Additionally, I also conducted a P-P (Normal probability plot) as a check on the MVA
test. A probability plot reports values against a straight line and shows deviation of points from
the straight lines. The P-P plot also reported 200 missing values.
20

In this chapter, I have explained the results for members whose length of active
membership is over a day. The cut-off point in this case is a day. Please note, I also conducted a
separate analysis excluding members who did not login after 60 days from their account creation;
these members have a length of active membership of less than 60 days. I reported these results
in Appendix A. The results for members whose length of membership is over a period of 60 days
(N=21,909) are similar to members whose length of membership is over a day. The cut-off
period of 60 days is used as a proof of concept. This technique in data-mining is known as
sensitivity analysis (Yang, 2010).

41

I checked for multi-collinearity associated with length of active membership, members’
participation, and feedback received from others using a Variance Inflation Factor (VIF)
analysis. A VIF value of 5 or above usually indicates multicollinearity among the variables. I
found VIF values between 1.007 and 2.609 for all variables. The VIF test for multi-collinearity
confirms that all of the individual participation variables and all individual feedback variables
that have been used in the analysis are NOT collinear with each other. This means that no
participation variable is collinear with any other participation variable, nor is any participation
variable collinear with any feedback variable. No feedback variable is collinear with any other
feedback variable. Also the VIF values showed that length of active membership is not collinear
with any participation or any feedback received variables.

21

(Please refer to Appendix B for

these results)
The average length of active membership for members in the Everything2 community is
442 days. Table 3.2 presents descriptive statistics for the variables.
N
Length of Membership

Min
39904

1

Max
3451.28

Mean
442.44

S.D
726.19

Rate of Writeups
39904
0
7.0
.040
.235
Rate of Messages sent
39904
0
11.0
.022
.211
Rate of Votes given
39904
0
20.5
.083
.632
Rate of Messages received
39904
0
10.4
.002
.055
Rate of Votes received
39904
0
10.0
.004
.113
Rate of Cools received
39904
0
3.3
.270
.307
Fraction of deleted Writeups
2662
.30
1.0
.598
.274
First Writeup submitted or not
39904
0
1.0
.370
.483
First Message submitted or not
39904
0
1.0
.340
.472
First Downvote submitted or not
39904
0
1.0
.330
.469
Table 3.2: descriptive statistics for participation and feedback factors in Everything2
21

This also suggests members’ participation and feedback received from others on the
content they contribute represent two different constructs.
42

First Upvote submitted or not
First Upvote received on First
Writeup
First Downvote received on First
Writeup
Deletion of First Writeup
First Cool received on First Writeup
Familiarization Time

Table 3.2 (cont’d)
39904
0
39904
0

1.0
1.0

.890
.105

.317
.306

39904

0

1.0

.835

.370

39904
39904
39904

0
0
0

1.0
1.0
5.0

.200
.220
.340

.399
.414
.786

3.5 Examining Length of Active Membership Using a Hazard Rate Model
I used a Cox proportional hazard rate model to examine how participation and feedback
factors are associated with length of active membership in the Everything2 community. A Cox
proportional hazard rate model is a statistical technique (commonly referred to as survival
analysis) for analyzing time to an event (Cox, 1972; Cox, 1984). The hazard represents a
specific event and is often interpreted in terms of survival (Wang, 2012). In this study, the event
occurs when an active member ceases being active in the community. The hazard rate is the
probability of the event occurring in a specific period of time. The coefficients of the model are
estimated in terms of a member's hazard ratio, which is the probability of an event occurring in a
specific period of time compared to the probability of the control (Smith, 2003; Therneau, 2000).
The control is a group of constructed members (hypothetical) whose values are assigned (Smith,
2003; Therneau, 2000). The assignment of the coefficient values for binary and continuous
values for the control group are different. For example, for all rate of participation and all rate of
feedback received variables, the assigned values for the control group are the average rate of
participation and the average rate of feedback received per day. For early participation and
feedback, the control is the group of members who did not participate (submit a first writeup,
submit a first vote) and who did not receive early feedback (first upvote received, first downvote
received) from others. Since this is a ratio, the comparison between the two groups is performed
43

by dividing the first group by the control. The first group is actually multiple groups, each group
being compared to the control. For example, one group is a group of members with one unit
increase in the variable value compared with the control; another group is a group of members
with two unit increase in the variable value compared with the control. The software accounts for
all of these and comes up with a single hazard ratio and all the variances are captured and taken
care of through the software. A hazard ratio is interpreted based on whether the ratio is greater
than or less than 1.000. If it is greater than 1.000, then the probability of the event occurring
increases compared to the control. If it is less than 1.000, then the probability of the event
occurring reduces compared to the control. If it is 1.000, then the probability of the event
occurring (no difference in survival between the control and the first group) is the same for both
groups.
I used a Cox proportional hazard rate model compared to other statistical models because
identifying the actual point in time that a member becomes inactive is a challenge. Long-term
inactivity does not preclude members from coming back; such interrupted inactivity could add
bias to the results. It is possible that members who did not login for the last six months may log
back in. Standard statistical regression models (such as Ordinary Least Square regression and
Logistic regression) do not accurately estimate time to an event (Wang, 2012; Smith, 2003). A
Cox proportional hazard rate model is used in this case to predict future events or the failure of
an event, such as when an active member becomes inactive and vice versa.
A Cox proportional hazard rate model can estimate the hazard ratio of an event as a
function of multiple explanatory variables (independent variables, commonly referred to as
covariates in the model). This type of model is often used in disease contagion studies, with the

44

state measured being health status (individual has or has not caught the disease by time t) (Smith,
2003). A Cox proportional hazard rate model can be represented as:

h(t) = h0(t)*exp(bi*zi )

where the value h(t) denotes the length of active membership given the explanatory variables (zi)
(such as rate of writeups, rate of messages given, rate of votes received, first writeup submitted,
first upvote received, etc., for each individual member). bi is the coefficient for explanatory
variable zi. The term h0(t) is called the baseline hazard for the model. A baseline hazard is the
hazard when all independent variable values are equal to zero.
In this case, I used a Gompertz distribution, which is a commonly used statistical
distribution for proportional hazard rate model. A Gompertz distribution is a density function
22

that can take many different shapes, as it is a flexible distribution.

The dependent variable is length of active membership, which is measured in days. Rates
of participation (rate of writeups submitted, rate of votes submitted, and rate of messages sent)
and rates of feedback received from others (rate of votes received, rate of messages received, rate
of cools received and fraction of deleted writeups), as well as early participation (first vote
submitted or not, first upvote submitted or not, first downvote submitted or not, first message
submitted or not and first deletion of a writeup) and early feedback received from others (first

22

A probability density function is a function that represents the likelihood a random
continuous variable can take a specific value. The variable will fall within a given range of
values represented by the integral of this density of this variable.
45

upvote received or not, first downvote received or not and first cool received or not), are used as
explanatory variables.

23

The EXP(B) column in Table 3.3 gives the estimated coefficient values

for the explanatory variable in terms of hazard ratio, which tells us whether an explanatory
variable in the model is related to the probability of members remaining active in the community.
Factors
Rate of Participation

Rate of Feedback
Received

Early Participation

Variables
Rate of Writeups
Rate of Messages sent
Rate of Votes given
Rate of Messages received
Rate of Votes received
Rate of Cools received
Fraction of deleted Writeups
First Writeup submitted or not
First Message submitted or not

Exp(B)
1.733*
2.715*
1.850*
.001
.993
1.030
.912
1.331*
.377*

SE
.060
.083
.051
2.789
.646
.112
.092
.072
.084

Sig.
.000
.000
.000
.010
.992
.790
.313
.000
.000

1.745
.579
First Downvote submitted or not
1.174
.081
First Upvote submitted or not
Early Feedback
First Upvote received on First
1.678*
.133
Received
Writeup
First Downvote received on First
1.266*
.073
Writeup
First Cool received on First
4.814
.917
Writeup
.420*
.082
Deletion of First Writeup
Control
.916
.034
Familiarization Time
Table 3.3: hazard rate model results on participation and feedback factors and
length of active membership (*p <.001)

23

This is just brief summary of the Cox proportional hazard rate model. For more
information please refer to
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
https://mywebspace.wisc.edu/jmullahy/web/basu%20manning%20mullahy.pdf

46

.336
.048
.000
.001
.087
.000
.009

Due to the large sample size, statistical significance is considered at the p=.001
significance level only (Jensen 2007). Statistical significance should be examined along with the
effect size as indicated by the hazard ratio. This is the same for all of the variables.
3.5.1 How is Length of Active Membership Related to the Rate of Participation? (RQ1)
The coefficient estimates for the explanatory variables and their associated significant
levels from the Cox proportional hazard rate test, where the dependent variable is length of
active membership, are reported in Table 3.3. The study examines how three different
explanatory variables that address RQ1, rate of writeups, rate of messages sent, and rate of votes
given, are related to length of active membership. In the model, a status variable (binary),
censored was also included. The censored variable is a binary variable that was constructed by
using last login date and last posting date. Censored is the probability of an inactive member
becoming active in the future. If the difference between last login and last posting date is less
than sixty days, the member is considered as non-censored.

24

In this study, the coefficients from the Cox proportional hazard model are interpreted in
terms of probability of members remaining active (survival) rather than become inactive.
The hazard ratio for the rate of writeups is 1.733, which suggests that a unit increase in writeups
per day (rate of writeup) will increase the probability of members remaining active in the
24

In Everything2, if the time between members' last login and last post exceeds
consecutive 60 days, they are not likely to post again and thus they are considered censored. 70%
of members did not submit a post if the censoring time is over sixty days. This implies 70%
members were considered active and 30% as censored at the beginning of the analysis. The
model considered 30% as censored which means they are still included and may come back in
the future. The model estimates the probability of them coming back (internally), reclassifies
them (if necessary) and reports a hazard ratio. All of this is done internally in the software using
algorithms. Previous studies have used 70% as a cutoff point for censoring using survival
analysis (Yang, 2010) . In online community research, 70% cutoff is an accepted cut-off. I use a
Survival analysis, a Cox proportional hazard rate model to account for this censoring of the
variables.
47

community by 73.3% {(1.733-1)*100%}; this value is significant at the p =.001 level.

25

Similarly, the hazard ratio for the rate of messages sent is 2.715, which suggests that a unit
increase in messages sent per day (rate of messages sent) will increase the probability of
members remaining active in the community by 171.5% {(2.715-1)*100%}; this value is
significant at the p =.001 level. The hazard ratio for the rate of votes given is 1.850, which
suggests that a unit increase in votes given per day (rate of votes given) will increase the
probability of members remaining active in the community by 85.0% {(1.850-1)*100%}; the
value is significant at the p =.001 level. These results show that the rate of participation
variables and length of active membership in the community are positively correlated. One
possible explanation could be that as members' participation increases, their involvement in the
community increases and since they invest more time, they have more interest in continuing their
active membership. Another reason could be that some members, who by their nature derive
more pleasure from online communities, are naturally motivated to participate more and to stay
active longer, making members’ rate of participation and length of active membership
determined by members’ personality type.
3.5.2 How is Length of Active Membership Related to Early Participation? (RQ2)
Four different explanatory variables that respond to RQ2, first writeup submitted or not,
first message submitted or not, first downvote submitted or not, first upvote submitted or not,
were included in the model. Table 3.3 reports the regression coefficients for early participation
25

This is a comparison between a group of members whose rate of participation, i.e.
writeups, votes, comments, increases by one unit compared to the control group of members
whose rate of participation, i.e. writeups, votes, comments, is the average. The unit increase
implies that an explanatory variable (continuous in nature such as rate of writeups) increases by
one standard deviation per day.

48

factors. The hazard ratio for first writeup submitted or not is 1.331, which suggests that
submitting a first writeup will increase the probability of members remaining active in the
community by 33.1% {(1.331-1)*100%}; the value is significant at the p =.001 level.

26

Perhaps

submitting their first writeup indicates that they are interested in becoming involved in the
community since it requires more effort on the part of the member to submit a writeup than other
forms of participation. Another reason could be that members who derive more pleasure from
online communities are more likely to submit a first writeup and also more likely to remain
active because of the pleasure derived from submitting writeups. The hazard ratio for first
message submitted or not is .377, which suggests that submitting a first message will reduce the
probability of members remaining active in the community by 62.3% {-(.377-1)*100%}; the
value is significant at the p =.001 level. One possible explanation could be that members' first
message was a reaction to negative comments or downvotes they received on their initial
writeups, which might discourage further participation in the community.

27

Future research

should look into this more deeply.
The hazard ratio for first downvote submitted or not is 1.745, which suggests that
submitting a first downvote will increase the probability of members remaining active in the
community by 74.5% {(1.745-1)*100%%}; the value is NOT significant at the p =.001 level.
Similarly, the hazard ratio for first upvote submitted or not is 1.174, which suggests that
submitting a first upvote will increase the probability of members remaining active in the
community by 17.4% {(1.174-1)*100%}; the value is NOT statistically significant at the p =.001
26

This is a comparison between a group of members who submitted early participation,
i.e. first writeup, first message, first vote, compared with the control group of members who did
not submit early participation, i.e. first writeup, first message, first vote.
27

I have speculated a possible reason based on the results from the hazard rate model.
49

level.

These findings suggest that members' early participation has some impact on their

length of active membership, as their first upvote and first downvote did not show any statistical
significance but their first writeup and first message did. Maybe because submitting their first
writeup and first message require more effort on the part of the member than submitting upvotes
and downvotes, members are more concerned about these forms of early participation.
3.5.3 How is Length of Active Membership Related to Rate of Feedback Received? (RQ3)
Four different explanatory variables that address RQ3, rate of messages received, rate of
votes received, rate of cools received, and fraction of deleted writeups, were included in the
model. Table 3.3 reports the regression coefficients for rate of feedback received.
The hazard ratio for the rate of messages received is .001, which suggests that a unit
increase in messages received per day (rate of message received) will reduce the probability of
members remaining active in the community by 99.9% {-(.001-1)*100%}; the value is NOT
significant at the p =.001 level.

28

The hazard ratio for the rate of votes received is .993, which

suggests that a unit increase in votes received per day (rate of votes received) will reduce the
probability of members remaining active in the community by .7% {-(.993-1)*100%}; the value
is NOT significant at the p =.001 level. The negative correlation between rate of votes received
and length of active membership is perhaps due to the fact that a downvote has more of an
impact than an upvote for members desire to participate, and for reasons explained in footnote
14, rate of votes combines downvotes and upvotes. The hazard ratio for the rate of cools received
is 1.030, which suggests that a unit increase in cools received per day will increase the
28

This is a comparison between a group of members whose rate of feedback received
from others, i.e. votes, messages, cools, increases by one unit compared to the control group of
members whose rate of feedback received from others, i.e. votes, messages, cools, is the average.

50

probability of members remaining active in the community by 3% {(1.030-1)*100%}; the value
is NOT significant at the p =.001 level. The hazard ratio for the fraction of deleted writeups is
.912, which suggests that a unit increase in the fraction of deleted writeups (fraction of deleted
writeups) will reduce the probability of members remaining active in the community by 8.8% {(.912-1)*100%}; the value is NOT significant at the p =.001 level.
The results show that none of the feedback variables are significantly correlated with
length of active membership. Perhaps it is members' rate of participation and not the rate of
feedback received from others on the content that was more important to individual members in
choosing to continue using a community. Rate of participation appears to be more important
when members decide whether to remain active in this online community than rate of feedback
received on the content submitted.
3.5.4 How is Length of Active Membership Related to Early Feedback Received? (RQ4)
Four different explanatory variables that address RQ4, first cool received on first writeup,
first upvote received on first writeup, deletion of first writeup, and first downvote received on
first writeup, were included in the model. Table 3.3 reports how the regression coefficients for
early feedback received are related to length of active membership.
The hazard ratio for first cool received on first writeup is 4.814, which suggests that
receiving a first cool on first writeup will increase the probability of members remaining active
in the community by 381.4% {(4.814-1)*100%}; the value is NOT significant at the p=.001
level.

29

The hazard ratio for first upvote received on first writeup is 1.678, which suggests that

29

This is a comparison between a group of members who received early feedback, i.e.
first upvote on a writeup, first downvote on a writeup, first message, and first cool on a writeup,
compared with the control group of members who did not receive early feedback.
Even though receiving a first cool was not significant. Higher coefficient of receiving a
first cool could suggest that members' contributions are of high quality and are valued by
51

receiving a first upvote on first writeup will increase the probability of members remaining
active in the community by 67.8% {(1.678-1)*100%}; the value is significant at the p =.001
level. It could be that members placed a high value on receiving an upvote. Maybe receiving a
cool is a rare event, so not enough members receive a cool to make it significant. Upvotes could
occur more frequently because members do not have to reach the same status to submit an
upvote as to submit a cool. Alternatively, if members are introduced to the community by other
members, they are more likely to receive an upvote on the first writeup from their friends and
likely to remain longer because they value the connections to their friends.
The Hazard ratio for first downvotes received on first writeup is 1.266, which suggests
that receiving a first downvote on first writeup will increase the probability of members
remaining active in the community by 26.6% {(1.266-1)*100%}; the value is significant at the p
=.001 level. Perhaps getting the recognition on a first writeup is enough to encourage members
to remain active, even if it is negative. Previous studies have reported some forms of feedback
encourage members to post a second time (Lampe, 2005; Joyce, 2006). Future research should
look into this in more depth. The hazard ratio for deletion of first writeup is .420, which suggests
that receiving a first deletion on a writeup will reduce the probability of members remaining
active in the community by 58% {-(.420-1)*100%}; the value is significant at the p =.001 level.
One possible explanation could be that only editors can delete a writeup, so this is stronger
negative feedback than a downvote because members may place more value on feedback from
editors than feedback from other members. These results suggest that early feedback received
from others has significant impact on length of active membership, as three of the variables show

members with higher XP's. It can be expected that members who are producing higher quality
posts are likely to stay active in the community. Hence, first cool received on a first writeup can
be an indicator to administrators of whether a member is likely to remain active.
52

significance. Based on these results, members appear to place more emphasis on the feedback
they receive from their first writeup than on feedback received from subsequent writeups.
Familiarization time was used as a control in the model to account for how long a
member takes to start participating in a community. The hazard ratio for familiarization time is
.916, which suggests that a unit increase in familiarization time, will reduce the probability of
members remaining active in the community by 8.4% {-(.916-1)*100%}; the value is NOT
significant at the p =.001 level. This suggests that the longer a member’s familiarization time, the
shorter their length of active membership in a community. One possible explanation could be
that members that take longer to post have a harder time understanding the community and find
it more difficult to use when they start using the different features. This could make them less
likely to remain active. On the other hand, it could be that members who take longer to start
participating just have less interest in the community to begin with and for this reason they are
likely to cease being active earlier.
Please refer to Appendix C for a length of active membership survival graph and to
Appendix D for more information about the model’s fit.
3.6 Examining Causal Links for the Everything2 Community (Granger Causality Tests)
Because there are plausible non-clausal explanations for the statistically significant
relationships revealed by the Cox hazard regression, I also employ a Granger causality test to
more rigorously test for causality. It is important to note that if the results from a Granger
causality test are not statistically significant, we can rule out any possible causal links among
variables. However, if the results from a Granger causality test are statistically significant, the
evidence for a causal relationship is only stronger.

53

The intuition behind a Granger causality test is that if changes in one variable cause
changes in a second variable, then the value of the first variable in any given period should be
correlated with the value of the second variable in a subsequent period or periods (Granger,
30

1969).

A Granger causality test provides stronger evidence for or against causality than the

statistical significance of simple regression coefficients. The temporal evidence for causality is
derived from time series data.

31

In this study, the model relies on the prediction that length of

active membership in period N will be correlated with participation in the community during
periods N-1, N-2, etc. A Granger causality test simultaneously tests for each direction in which a
causal relationship might run between two variables. In this study, I examine whether members'
participation is causally linked to their length of active membership and whether the length of
active membership is causally linked to members’ participation.
3.7 How is Length of Active Membership Affected by the Rate of Participation? (RQ1)
3.7.1 Members’ Participation Affecting Length of Membership
A Granger causality test examines the relationship between two variables using lag
orders, where a lag order is the number of measurement periods for explanatory variables that is
included in the model. A period could be of any length, such as a day, a month, or a year. For
this study, a single lag order is a period of six months. I used the statistical software package
STATA to determine the period of lag order and conduct the Granger causality test. STATA
30

A Granger causality test establishes whether a causal link among variables exist based
on simultaneous correlations among variables (current instance and previous instances among
variables) from time series data.
31

I am unable to ascertain true causality, as I do not have the information why the
members actually left. Using a time series model, I am attempting to ascertain possible causes
(by moving backward in time).

54

selected six months as the appropriate unit of time for measuring lags based on its assessment of
the data.

32

Using the server timestamps from the logs in Everything2, I derived lag orders for

explanatory variables (rate of writeups, rate of votes given, rate of messages) and the dependent
variable (length of active membership). I examined the possible causality between explanatory
variables and the dependent variable. For example, the server log contains timestamps for
writeups, timestamps for votes, and timestamps for messages. From these timestamps, I
constructed lagged values for variables and examined the possible causal links among variables.
For a Granger test, time series data, a collection of observations made sequentially in
time, is decomposed into a stationary trends and residuals (often known as random shocks). A
time series is called stationary if certain statistical properties (mean, standard deviation, and
autocorrelation) of the time series are constant. In a dynamic world, no trend will be stationary
forever. However, if a time series contains stable trends during the observation period of time,
the time series is termed trend stationary.
The Granger test, which utilizes time series model, can be represented (in terms of
participation and length of active membership) with the following equation:
p

p

j=1

j=1

MembershipLength(t)=∑C1j*Participation(t-j)+∑C2j*MembershipLength(t-j)+U1t,

where t is the current time, p is the maximum number of lagged observations included in the
model (the model order), j is the lag order and can take any value from 1 through maximum lag
order p, MembershipLength is the length of active membership, Participation is the rate of

32

A six month period for a lag order was selected based on the assessment of the data
when all dependent and explanatory variables are taken into account.
55

participation, C2j is the coefficient of Membership Length for lag order j, C1j is the coefficient of
Participation for lag order j, and U1t is the model’s residuals (shocks) at the time t.
To select the number of lags (lag order) to consider for a time series model, I employed
five lag order selection statistics (test) reported by STATA. I used the dependent variable and
explanatory variables to find the correct lag order for the model. Five tests such as Likelihood
Ratio test (LR), Akaike Information Criteria (AIC), Hannan-Quinn Information Criteria (HQIC),
Schwarz' Bayesian Information Criterion (SBIC), and Akaike's Final Prediction Error (FPE)
were used to test for the appropriate lag order. (Please refer to Appendix E for more information
about the five tests.) A maximum lag order of 2 was suggested by the LR, FPE, and AIC, where
the lag order of 0 was suggested by the HQIC and SBIC criteria (refer to Appendix F). A lag
order of 2, based on what the majority of the tests suggested (three of the five information
criteria), was selected for the Granger causality test.

33

AIC, SBIC, HQIC, LR, and FPE were run

for all variables, including the dependent variable which is the length of active membership. The
algorithms applied complex statistical tests/models (in a FPE test, the expected variance of the
error is measured when an Auto regressive time series is fitted against another time series of
similar co-variance structure) and suggested a lag order. It is important to note, for this study, 2
lag order is a period of 12 months. In this study, if members who had been members for less than
a year they were still included (in the Granger causality test), and it only used data on a member
for the period the member was active.

33

The approach I used to select a lag order is often referred as majority vote approach in
the field of information science, and more precisely, in data mining and in machine learning.
56

The Granger causality test was conducted using a Vector Auto Regression (VAR).

34

Table 3.4 reports the results for a Granger causality test between rate of participation factors and
length of active membership. The results showed that higher rates of participation do NOT cause
members to stay active longer because the chi-squares values were not statistically significant at
the p=.001 level.
Dependent Variable
Explanatory Variables
Chi-square
Probability
Length of Membership
Rate of Writeups
.219
0.896
Length of Membership
Rate of Messages sent
.230
0.891
Length of Membership
Rate of Votes given
.191
0.909
Length of Membership
ALL
.690
0.995
Table 3.4: Granger causality results whether members’ rate of participation
causes their length of active membership (*p < .001)

34

A Granger causality test can be derived using a VAR model. A VAR model can be
represented as a time series consisting of two variables (x and y), where yp (value of variable y
for time p) can be represented in terms of its past values and past values for the variable x. If x
Granger causes y, some or all lagged x values will have non-zero coefficients. A Vector Auto
Regression (VAR) is a statistical regression model. In a VAR model, multiple time series are
used to estimate linear dependencies among variables. Each variable can be considered evolving
from its own lags, and lags from other variables. In a VAR equation, a set of variables is used.
Each variable is represented as a linear function of v lags of itself and of all of the remaining
variables in the equation. An error term is also included. A first order VAR(1) for n variables
collected in nx1 vector yt can be represented as yt= b(0) +b1y(t-1)+ q(t) where the element qt is the
error term, which can be represented as the iid normal (a diagonal matrix); b(0) is nx1 vector
which represents a constant term in the equation. A VAR(1) model should satisfy the following
matrix equations E(v,v')= W and E(Vt,Vt-j)=0, where W is a positive semi-definite matrix
containing error terms in nxn dimensions and E(Vt,Vt-j)=0 indicates that every error term in the
equation has a mean of zero. The dependencies among variables are represented by the matrix b1
and the contemporaneous dependence is determined by the term qt. The results from a Granger
causality test (null hypothesis is supported or not) can be determined (based on the chi-square
values and the associated probability values) from a Wald test. This is just brief summary of a
Granger causality test and a VAR model.
For more information please refer to
http://academic.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_5.pdf
or not) can be determined from a chi-square test. This is just brief summary of a Granger
causality test and a VAR model.
For more information please refer to
http://academic.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_5.pdf
57

Due to the large sample size, statistical significance should be examined along with the effects
size as indicated at the p=.001 significance level only (Jensen 2007).
3.7.2 Concluding Remarks
The Granger causality tests showed that a higher rate of participation does NOT cause
community members to remain active longer. A hazard rate model showed that correlation does
exist between all participation variables (rate of writeups, rate of messages given, and rate of
votes given) and length of active membership. Even though length of active membership and
members' rates of participation are correlated, the Granger causality tests showed no causality.
3.8 How is Length of Active Membership Affected by Rate of Feedback? (RQ3)
3.8.1 Feedback Received from Others Affecting Length of Membership
Using the server timestamps from the logs, I constructed lagged values for explanatory
variables (rate of votes received, rate of messages received, rate of cools received, fraction of
deleted writeups) and the dependent variable (length of active membership), and tested for
causality between the explanatory variables and the dependent variable.
A Granger causality test between feedback received from others on member supplied
content and length of active membership can be conducted with the following equation:
p

p

MembershipLength(t)=∑C5j*Feedback(t-j)+∑C6j*MembershipLength(t-j)+U3t,
j=1

j=1

where t is the current time, p is the maximum number of lagged observations included in the
model (the model order), j is the lag order and can take any value from 1 through maximum lag
order p, MembershipLength is the length of active membership, Feedback is the rate of feedback

58

received from others, C6j is the coefficient of Membership Length for lag order j, C5j is the
coefficient of Feedback for lag order j, and U3t is the model’s residuals (shocks) at the time t.
To select the number of lags (lag order) for the time series model, I employed five lag
order selection statistics (test) reported by STATA. Five tests, the Likelihood Ratio test (LR),
Akaike Information Criteria (AIC), Hannan-Quinn Information Criteria (HQIC), Schwarz'
Bayesian Information Criterion (SBIC), and Akaike's Final Prediction Error (FPE) were used.
LR, FPE, and AIC suggested a maximum lag order of 1, whereas HQIC and SBIC information
criteria suggested a lag order of 0. A lag order of 1, as suggested by the majority of the tests, was
selected for the Granger causality test. (Refer to Appendix G). The tests were run for all
variables.
A Granger causality test was conducted with a Vector Auto Regression (VAR). Please
refer to Table 3.5 to review the results from the Granger causality test. The Granger causality test
results showed that feedback received from others does NOT cause length of active membership
as the chi-square values were not significant at the p=.001 level for any of the explanatory
variables.
Dependent Variable
Explanatory Variable
Chi-square
Probability
Length of Membership
Rate of Messages received
1.342
0.511
Length of Membership
Rate of Votes received
1.509
0.219
Length of Membership
Rate of Cools received
1.882
0.628
Length of Membership
Fraction of deleted Writeups
1.314
0.390
Length of Membership
ALL
4.704
0.582
Table 3.5: Granger causality results whether rate of feedback received causes
length of active membership (*p < .001)
It is important to note that the Cox proportional hazard rate results did not show
significant correlation between rate of feedback received and length of active membership.
Based on the Cox proportional hazard rate results, causality between rate of feedback received
and length of active membership can be ruled out. However, a Granger causality test was still
59

conducted. The results from a Granger causality test can be viewed as additional validation of the
results of the Cox proportional hazard rate model.
3.8.2 Concluding Remarks
The Granger causality tests showed that more feedback received from others does NOT
cause community members to remain active longer. A hazard rate model showed that no
significant correlation (at the p=.001 level; probability values for the corresponding chi-square is
greater than 1%) exists between feedback members received from others and their length of
active membership. The Granger Causality test supports the no causality interpretation of the
hazard model results.
3.9 Granger Causality Tests in Regards to Early Participation and Early Feedback
Received from Others on the Content
It is important to note that I could not conduct a Granger Causality Test of whether
members' early participation influences their lengths of active membership (RQ2) or whether
early feedback received from others influences length of active membership (RQ4). A Granger
causality test examines whether changes in one variable may cause changes in a second variable
using past values of both variables. In this dissertation, I used members' first participation and
first feedback received from others as measures of early participation and early feedback
received from others. Because there is at most only one event recorded for each participation
variable and each early feedback received variable, there are no prior observations.
3.10 Chapter Summary
Several possible factors that may contribute to length of active membership, which may
in turn contribute to the viability of online communities, were identified and tested in this
chapter. These factors were tested for the online community, Everything2, to determine whether
or not they are related to length of active membership using two rigorous statistical tests, a Cox
60

proportional hazard rate model and a Granger causality test. First, a Cox proportional hazard
rate model was introduced, and then the results from the test were presented. A hazard rate
model tests for any correlation evidence between two variables. The results showed that all three
rate of participation variables (rate of writeups submitted, rate of votes given and rate of
messages sent), one early participation variable (first message submitted or not) and one early
feedback variable (first cool received on a first writeup) were correlated with length of active
membership. A Granger causality test was then conducted and the results from the test were
presented. Even though the hazard rate model found correlational evidence between rate of
participation variables and length of active membership, the stronger Granger causality test
found no evidence for casual relationships, suggesting that in this case correlation was not
evidence of causation. It also found no evidence for causal relationships between rate of
feedback received variables and length of active membership.

61

CHAPTER 4
Examining Length of Active Membership for Sploder: A Quantitative
Study
Chapter 3 reviewed the research literature on online communities that is most relevant to
a study of factors that influence the amounts of time members in these communities remain
active. From the perspective of determining what factors influence length of active membership,
it identified five limitations of the earlier work: (1) that earlier research was not focused
explicitly on factors influencing length of active membership; (2) that earlier work looked at the
effects of total levels of participation and amounts of feedback received overtime but did not
consider the implications of variation among users in rates of participation and feedback received
or variation in the length of time over which different individuals total participation and feedback
received measures were calculated; (3) that the lengths of time covered by almost all prior
studies was short relative to the length of time members might remain active in their
communities; (4) that prior research treated measures of correlation between dependent and
independent variables as evidence of causation even though there were plausible non causal
explanations for the observed relationships among variables; and (5) the focus of previous
studies on single online communities limited the claims that might be made for the generality of
their findings.
The empirical study of the online community Everything2 presented in Chapter 3
addressed the first four of these limitations. This chapter replicates the study of Everything2 for
a second online community, Sploder. In addition to providing an additional study that addresses
the first four limitations of the earlier literature, replicating the Everything2 study design for a
second online community makes possible a more direct and meaningful cross-study comparison

62

of findings than was possible before and allows us to draw stronger conclusions about the extent
to which empirical findings for one online community generalize to other online communities.
The chapter first describes the Sploder server log data made available for this research
and the operationalization of variables for this study. It then examines length of active
membership through the lens of a COX proportional hazard rate model, a statistical technique (a
type of survival analysis) that I use to examine the influence of explanatory variables
(independent variables) on the length of time from registering that a member remains active in a
community. Hazard rate models are used to examine factors that may influence the amount of
time that elapses before a discrete event, such as an individual catching a disease or adopting a
new product, occurs. For this chapter’s study, as for that of Chapter 3, the event is cessation of
active membership. Previous studies have used hazard rate models to examine correlations
between members' participation and their continued use of a community (Wang, 2012; Yang,
2010; Farzan, 2011). Because a hazard model identifies correlational evidence, and it is wellknown that correlational evidence is not dispositive proof of causation, and in the case of the
relationships examined here, there are plausible non-causal explanations for the identified
correlations, this dissertation also uses a Granger causality test to more rigorously test for causal
links between members’ participation and length of active membership and between feedback
received and length of active membership.
4.1 Overview of Sploder
Sploder (http://www.sploder.com/ ) is an informal, peer production community (a
community where members work together to create a better quality product) for adolescents and
young adults that allows its members to build their own games, and then share them with other
members. For users to build games and interact with others, they first have to register as

63

members. Membership is free and once registered, members are never evicted, even though they
may have been inactive for a long time. Sploder allows its members to join the Sploder
discussion forum to discuss game-related topics, where they submit posts on a topic, which is an
article-like entry called a discussion article.

35

This study focuses only on members of the

Sploder discussion forum.
Other members can provide feedback on a discussion article through whispers and
comments, or simply vote on the discussion article. Whispers are private communications
between an author and a member or an author and an editor, comments are public
communications made on a discussion article. Votes in Sploder are only positive; they do not
allow a negative vote. The community is managed and administered by a group of volunteer
editors who are appointed by the community manager (owner). Editors can delete comments or
discussion articles from the forum if they feel they do not meet community standards.
4.2 Data and Operationalization of Variables in Sploder
I examine length of active membership in Sploder where length of active membership is
operationalized as the number of days from the date a member’s account is created to the
member’s last login date, both of which are recorded on the server log. Prior literature suggests
that members’ participation and feedback received from other members should contribute to
length of active membership. To study factors that may affect members’ length of active
membership, I have constructed the following measures of participation. Rate of participation
measures are: discussion articles submitted, votes given, messages sent, and comments given per

35

It is important to note that the Sploder discussion forum is a community on its own.
Membership in the discussion forum requires separate registration and login in. Membership can
overlap between the original game site and the discussion forum. Being a member of one does
not imply membership in the other. In this study, I analyzed the data from the Sploder discussion
forum only.
64

unit of time. Early participation measures are: first discussion article submitted, first vote
submitted, first message submitted, and first comment submitted. Similarly, to examine the
effects of different types of feedback received on length of active membership, I have
constructed rate and early feedback measures of feedback received. Rate of feedback received
measures are: votes received, messages received, comments received, and fraction of write-ups
deleted. Measures of early feedback received are: first vote received on first discussion article,
first comment received on a discussion article, and deletion of first discussion article. These
measures have been constructed from data preserved on Sploder server logs.

36

The goal is to

examine whether members’ participation (rate of participation and early participation) and
feedback received from others (rate of feedback received and early feedback received) on their
content are related to their length of membership.
I first use a Cox proportional hazard rate model to estimate the effects of participation
and feedback received on length of active membership. Then I use a Granger causality test to
examine the extent to which the correlational evidence generated by the hazard rate model does
in fact reflect causality between the dependent variable and the independent variables in the
model.
36

Throughout this study, when feedback is mentioned it always refers to feedback a
member receives from other members. Participation always refers to the participation of the
member whose length of active membership is being examined. Some forms of feedback
(especially votes) can be participation for others. If a member submits discussion articles, votes,
comments or whispers, it is termed as participation, and if the same member (based on userid)
receives votes or comments from others, it is termed as feedback. It is also important to note,
whenever early participation or rate of participation is mentioned, this refers to all participation
variables as a group (discussion articles submitted, votes given, comments sent). Likewise,
whenever early feedback received or rate of feedback received is mentioned, this refers to all
feedback variables as a group (messages received, votes received, cools received, and deletion of
discussion articles).
65

4.3 The Sploder Data-Set
The server log contains information on all (5,373) members who created an account from
February 15, 2008 to September 23, 2010 in the Sploder community. Table 4.1 gives the
definitions for the participation and feedback variables employed for this study. The data set
contains time stamps for members' last login and members' account creation dates and times.

37

It

also contains total counts from the first login for each type of participation variable and total
counts for each type of feedback variable, which I have used to calculate rates of participation
and rates of feedback received. The data set contains timestamps for participation activities
recorded for each member (such as discussion article submitted, vote sent, whisper given, or
comment given) and, for each member, timestamps for feedback received (such as a vote
received, a comment received, a whisper received, or deletion of a post).

38

37

The data I had access to did not contain other login information, only the last login.
While other logins may have been captured, I did not receive that information.
38

The server logs I gained access to only contain early feedback (such as first vote
submitted on first discussion article, first comment submitted on first discussion article) and
early participation of members (such as first discussion article submitted or not, first vote
submitted or not, first comment submitted or not) and total counts of feedback and participation.
It does not contain timestamps for every individual participation activity or every individual
feedback received.
Also, Sploder did not automatically log a member out if a member closes closed the
browser without signing out. In this instance, the server logs did not capture the next login. This
is not accounted for in the analysis as this information is not recorded in the server log.

66

Factors

Variables
Length of Active
Membership

Rate of
Participation

Rate of Discussion
Articles Submitted
Rate of Comments sent

Rate of Votes given

Rate of Whispers given
Rate of Comments
received

Rate of
Feedback
Received

Rate of Votes received
Rate of Whispers
received

Early
Participation

Early
Feedback
Received

Control

Fraction of deleted
Discussion articles
First Discussion Article
submitted or not
First Comment sent or
not
First Whisper submitted
or not
First vote submitted or
not
First Vote received on
First Discussion article
First Comment received
on First Discussion
article
First Deletion of a
Discussion article
Familiarization Time

Description
Amount of time a member has been active (last
login date-account creation date) in the
community
The total count of discussion articles a member
submitted divided by the amount of time the
member was active in the community
The total count of comments a member sent
divided by the amount of time the member was
active in the community
The total count of votes a member submitted
divided by the amount of time the member was
active in the community
The total count of whispers a member submitted
divided by the amount of time the member was
active in the community
The total count of comments a member received
divided by the amount of time the member was
active in the community
The total count of votes a member received
divided by the amount of time the member was
active in the community
The total count of whispers a member received
divided by the amount of time the member was
active in the community
The total deleted discussion articles divided by
the total amount submitted
Whether or not a member has submitted a first
post (discussion article)
Whether or not a member sent a first comment
Whether or not a member submitted a first
whisper
Whether or not a member submitted a first vote
or not
Whether or not a member received a first vote
on first discussion article
Whether or not a member received a first
whisper on first discussion article

Whether or not a member received a first
deletion on a discussion article
The amount of time between a member creating
an account and posting a first discussion article
Table 4.1: description of participation and feedback variables in Sploder
67

I used rate of discussion articles submitted, rate of comments sent, rate of votes given and
rate of whispers given as participation variables. With everything else equal, users who are
members of a community for a longer period will have a higher total participation counts (sum of
each type of participation measure, such as sum of post, sum of votes and sum of comments)
than members who are in the community for a shorter period of time (Wang, 2012). I used rate as
a measure because of the analytical questions raised by unequal lengths of participation as
discussed above. Rates were computed by dividing a member’s total count for a variable by the
member’s total length of active membership. For example, rate of discussion articles is
calculated by dividing members' total discussion articles submitted by their length of active
membership.
Participation activities for members vary from community to community. In Sploder,
these participation activities (features) are broken down into a relatively small number of discrete
types of activities that are explicitly named and supported by the community, such as submitting
a discussion article, submitting a vote, submitting a comment and submitting a whisper. Prior
research suggests that if members use these features, they may become more engaged in the
community and stay active longer.
I used four early participation variables to examine the effects of early participation on
length of active membership. They were whether or not a member submitted a first discussion
article, whether or not a member submitted a first comment, whether or not a member submitted
a first vote, and whether or not a member submitted a first whisper. Early participation variables
are binary.
Two types of feedback variables were constructed from the server logs: General feedback
variables and early feedback variables. General feedback variables include rate of comments

68

received, rate of votes received, rate of whispers received and fraction of deleted discussion
articles. The fraction of deleted discussion articles was computed by dividing total deleted
discussion articles by total discussion articles.

39

Three different variables were used to examine how length of active membership is
related to early feedback received from others on the content. The early positive feedback
variable was whether or not a member received a first vote on a first discussion article. Sploder
does not support negative votes on members' post, but I included deletion of first discussion
article as an early negative feedback variable. I have also included whether or not a member
received a first comment on a first discussion article.

40

These variables are all binary.

Familiarization time—time between creation of an account and posting the first
discussion article —was included as a control variable in the analyses. When members join a
community, it takes a certain amount of time to become familiar with the community before they
submit their first post (discussion article). Familiarization time provides a measure of how long a
member takes to become familiar enough with both the use of the site’s features and norms of
participation in a community to become an active participant in the community.
4.4 Data and Measures

39

Fraction of deleted discussion articles is also included as a control variable. While not
a rate, it can be multiplied by rate of discussion articles submitted to derive a rate for discussion
articles. For exposition convenience, I will list fraction of deleted discussion articles as one of
the rate of feedback received variables for the remainder of this chapter.
40

First comment received on first discussion article can be positive or negative. Since
the content of a comment is not analyzed in this study, it is unknown whether comments are
positive or negative. Nevertheless, receiving a first comment on the discussion article from
others may encourage members to continue using the community as they received some form of
early feedback. Moreover, from an administrator’s point of view, knowing whether comments
will increase or decrease active membership can be an indicator where they should invest time to
grow or nurture the community.
69

First, I ran a diagnostic test on the data. I ran a missing value analysis test (MVA test).
The MVA test detected both missing values and outliers in the data. The test used a grid search
approach to detect missing values. It also used the range (Mean + 2*SD, Mean - 2*SD, where
SD is the standard deviation) to detect outliers. The analysis found 23 rows with missing values
or outliers in the data. These 23 rows were removed from the sample.

41

Members whose length of active membership was less than a day (based on the Unix
timestamp) in the community were removed from the dataset. This reduced the sample size to
1,982 unique members.

42

These registered members created an account in the community but

did not login after a day. It is important to note that registered members who did not login after a
day also did not post anything.
I checked for multi-collinearity associated with length of active membership, members’
participation, and feedback received from others using a Variance Inflation Factor (VIF)
analysis. A VIF value of 5 or above usually indicates multicollinearity among the variables. VIF
test result showed first vote received on first discussion article and first commented submitted or
not have values higher than 5, indicating that they are collinear. These two variables were

41

Additionally, I also conducted a P-P (Normal probability plot) to validate the MVA
test. A probability plot reports values against a straight line and shows deviation of points from
the straight lines. The P-P plot also reported 23 missing values.
42

I also conducted a separate analysis excluding members who did not login after 60
days from their account creation; these members have a length of active membership of less than
60 days. I report these results in Appendix-H. The results for members whose length of
membership is over a period of 60 days (N=971) are similar to members whose length of
membership is over a day. The cut-off period of 60 days is used as a proof of concept. This
technique in data-mining is known as sensitivity analysis (Yang, 2010).
70

removed from the analysis.

43

After removing these two variables, I reran the VIF test. I found

VIF values between 1.004 and 1.504 for the remaining variables. The VIF test for multicollinearity confirms that all of the individual participation variables and all individual feedback
variables that have been used in the analysis are NOT collinear with each other. This means that
no participation variable is collinear with any other participation variable, nor is any participation
variable collinear with any feedback variable. No feedback variable is collinear with any other
feedback variable. Also, the VIF values showed that length of active membership is not collinear
with any participation or any feedback received variables.

44

(Please see Appendix I for these

results.) Table 4.2 presents descriptive statistics for the variables.
Variables

N
Min
1982
1
1982
0
1982
0
1982
0
1982
0
1982
0
1982
0
1982
0
1982
0
1982
0

Max
951.52
9.51
23.44
20.85
7.00
5.00
11.35
1.50
35.0
1.00

Mean
152.571
.322
1.715
.908
.257
2.461
6.187
.749
.143
.228

S.D
207.174
.685
3.682
1.607
.685
1.584
3.088
.475
1.207
.419

Length of active membership
Rate of Discussion Article
Rate of Comments sent
Rate of Votes given
Rate of Whisper given
Rate of Comments Received
Rate of Votes Received
Rate of Whisper Received
Fraction of Deleted Discussion Article
First Comment received on First
Discussion Article
1982
0
1.00
.228
.419
Deletion of First Discussion Article
1982
0
1.00
.712
.452
First Discussion Article submitted or not
1982
0
1.00
.997
.050
First Vote submitted or not
1982
0
1.00
.960
.194
First Whisper submitted or not
1982
0 950.15
238.074
245.275
Familiarization Time
Table 4.2: descriptive statistics for participation and feedback received in Sploder
43

Both variables were dropped because it was difficult to determine whether first vote
received on first discussion article or first commented submitted was the more important factor
to consider.
44
This also suggests members’ participation and feedback received from others on the
content they contribute represent two different constructs.
71

The average length of active membership in the Sploder community is 152 days.
4.5 Examining Length of Active Membership Using a Hazard Rate Model
I used a Cox proportional hazard rate model to examine how participation and feedback
factors are associated with length of active membership in the Sploder community. A Cox
proportional hazard rate model is a statistical technique (commonly referred to as survival
analysis) for analyzing time to an event (Cox, 1972; Cox, 1984). The hazard represents a
specific event and is often interpreted in terms of survival (Wang, 2012). In this study, the event
occurs when an active member becomes inactive in the community. The hazard rate is the
probability of the event occurring within a specific amount of time from the moment
observations on whether the event occurs or not begin (in this case from the time a member joins
an online community). The coefficients of the model are estimated in terms of a member's
hazard ratio, which is the probability of an event occurring in a specific period of time compared
to the probability of the control (Smith, 2003; Therneau, 2000). The control is a constructed
(hypothetical) group of members whose values are assigned (Smith, 2003; Therneau, 2000). The
assignment of the coefficient values for binary and continuous values for the control group are
different. For example, for all rate of participation variables and all rate of feedback received
variables, the assigned values for the control group are the average rate of participation and the
average rate of feedback received per day. For early participation and feedback, the control is the
group of members who did not participate (submit a first discussion article, submit a first vote)
and who did not receive early feedback (first vote received, first comment received) from others.
The comparison between the two groups is performed by dividing the first group by the control.
The first group is actually multiple groups, each group being compared to the control. For
example, one group is a group of members with one unit increase in the variable value compared

72

with the control; another group is a group of members with two unit increase in the variable
value compared with the control. The software accounts for all of these and comes up with a
single hazard ratio, all the variances are captured and taken care of through the software. A
hazard ratio is interpreted based on whether the ratio is greater than or less than 1.000. If it is
greater than 1.000, then the probability of the event occurring increases compared to the control.
If it is less than 1.000, then the probability of the event occurring reduces compared to the
control. If it is 1.000, then the probability of the event occurring (no difference in survival
between the control and the first group) is the same for both groups.
I used a Cox proportional hazard rate model instead of other statistical models because
identifying the actual point in time that a member becomes inactive is challenging. Long-term
inactivity does not preclude members from coming back and such interrupted inactivity could
add bias to the results. For example, it is possible that a member who did not login for the last six
months may still log back in. Standard statistical regression models (such as Ordinary Least
Square regression and Logistic regression) do not accurately estimate this time to an event
(Wang, 2012; Smith, 2003). A Cox proportional hazard rate model is used to predict future
events or the failure of an event to occur, which in this study is when an active member becomes
inactive and stops logging.
A Cox proportional hazard rate model can estimate the hazard ratio of an event as a
function of multiple explanatory variables (independent variables, commonly referred to as
covariates in the model). This type of model is often used in disease contagion studies, with the
state measured being health status (individual has or has not caught the disease by time t) (Smith,
2003). A Cox proportional hazard rate model can be represented as:

73

h(t) = h0(t)*exp(bi*zi),

where the value h(t) denotes the length of active membership given the explanatory variables (zi)
(such as rate of writeups, rate of messages given, rate of votes received, first writeup submitted,
first upvote received, etc., for each individual member). bi is the coefficient for explanatory
variable zi. The term h0(t) is called the baseline hazard for the model. A baseline hazard is the
hazard when all independent variable values are equal to zero.
In this case, I used a Gompertz distribution, which is a commonly used statistical
distribution for proportional hazard rate models. A Gompertz distribution is a density function
45

that can take many different shapes, as it is a flexible distribution.

The dependent variable is length of active membership, which is measured in days.
Members' rate of participation (rate of discussion articles submitted, rate of votes given, rate of
whispers given and rate of comments sent) and rate of feedback received from others (rate of
votes received, rate of comments received, rate of whispers received and fraction of deleted
discussion articles), as well as early participation (first vote submitted or not, first discussion
article submitted or not, first comment submitted or not, and first whisper submitted or not) and
early feedback received from others (first vote received or not, first comment received or not and

45

A probability density function is a function that represents the likelihood a random
continuous variable can take a specific value. The variable will fall within a given range of
values represented by the integral of this density of this variable.
74

first deletion of a discussion article), are used as explanatory variables.

46

The EXP(B) column in

Table 4.3 gives the estimated coefficient values for the explanatory variable in terms of a hazard
ratio, which tells us whether an explanatory variable in the model is related to the probability of
members remaining active in the community. Due to the large sample size, statistical
significance is considered at the p=.001 significance level only for all variables (Jensen 2007).
Statistical significance should be examined along with the effect size as indicated by the hazard
ratio.
Factors
Rate of
Participation

Rate of Feedback
Received

Early Feedback
Received
Early Participation

Variables
Rate of Discussion Article
Rate of Comments sent
Rate of Votes given
Rate of Whisper given
Rate of Comments received
Rate of Votes received
Rate of Whisper received
Fraction of deleted Discussion Articles
First Comment received on First
Discussion Article
Deletion of First Discussion Article
First Discussion Article submitted or
not
First Whisper submitted or not
First Vote submitted or not

SE
.039
.008
.019
.045
.016
.008
.054
.043
.071

Sig.
.670
.127
.134
.537
.971
.548
.231
.963
.164

.071*
.713*

.196
.071

.000
.000

3.084
1.231

.590
.503

.057
.679

.999*

Familiarization Time

Exp(B)
1.017
1.012
.972
.972
.999
1.005
1.067
1.000
.906

.000

.000

Table 4.3: hazard rate model results for participation and feedback factors and
length of active membership (*p <.001)

46

This is just brief summary of the Cox proportional hazard rate model. For more
information please refer to
http://core.ecu.edu/ofe/StatisticsResearch/Survival%20Analysis%20Using%20SPSS.pdf
and http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
https://mywebspace.wisc.edu/jmullahy/web/basu%20manning%20mullahy.pdf

75

4.5.1 How is Length of Active Membership Related to the Rate of Participation? (RQ1)
The coefficient estimates for the explanatory variables and their associated significant
levels from the Cox proportional hazard rate test, where the dependent variable is length of
active membership, are reported in Table 4.3. The study examines how four different
explanatory variables that address RQ1, rate of discussion articles, rate of whispers sent, rate of
comments sent and rate of votes given, are related to length of active membership. In the model,
a binary status variable, “censored,” was also included. Censored was constructed using last
login date and last posting date. Censored is the probability of an inactive member becoming
active in the future. If the difference between last login and last posting date is less than sixty
days, the member is considered non-censored.

47

In this study, the coefficients from the Cox proportional hazard model are interpreted in
terms of probability of members remaining active (survival) rather than become inactive. The
hazard ratio for the rate of discussion articles is 1.017, which suggests that a unit increase in
discussion article posts per day (rate of discussion articles) will increase the probability of
members remaining active in the community by 1.7% {(1.017-1)*100%}; the value is NOT
47

In Sploder, if the time between members' last login and last post exceeds 90
consecutive days, they are not likely to post again and thus they are considered censored. 85% of
members did not submit another post if their censoring time was over ninety days. This implies
85% members were considered active and 15% as censored at the beginning of the analysis. The
model considered 15% as censored which means they are still included and may come back in
the future. The model estimates the probability of them coming back (internally), reclassifies
them (if necessary) and reports a hazard ratio. All of this is done internally in the software using
algorithms. I use a Survival analysis, a Cox proportional hazard rate model to account for this
censoring of the variables. It is important to note, I used a higher cutoff in Sploder (compared to
Everything2) due to the type of community it is. Members are usually part of two different
communities in Sploder, so they are probably still involved in the games-making component of
Sploder while they are not on the discussion forum. It makes sense to give them more time since
they are still active in the Sploder gaming community. I also performed a sensitivity analysis
with an alternative cutoff of 70%. The results obtained showed no major differences from those
with the 85% cutoff reported in this chapter.
76

significant at the p = .001 level. Similarly, the hazard ratio for the rate of comments sent is 1.012,
which suggests that a unit increase in comments sent per day (rate of comments sent) will
increase the probability of members remaining active in the community by 1.2% {(1.0121)*100%}; the value is NOT significant at the p = .001 level.

48

The hazard ratio for the rate of

votes given is .972, which suggests that a unit increase in votes given per day (rate of votes
given) will reduce the probability of members remaining active in the community by 2.8% {(.972-1)*100%}; the value is NOT significant at the p = .001 level. The hazard ratio for the rate
of whispers given is .972, which suggests that a unit increase in whispers given per day (rate of
whispers given) will reduce the probability of members remaining active in the community by
2.8% {-(.972-1)*100%}; this value is NOT significant at the p =.001 level. This suggests that the
rate of participation is NOT statistically significantly related to length of active membership. A
possible explanation for this could be that Sploder is a gaming community for young
adolescents. It is possible that young adolescents are more invested in designing and playing
games than in forum discussions. It could also be possible that members visit the discussion
forum to get answers or ideas for creating games or features. Thus they may login to read, but
may not actively post.

48

This is a comparison between a group of members whose rate of participation, i.e.
discussion articles, votes, whispers, and comments increases by one unit compared to members
of the control group whose rate of participation remains at the average value for the community.
For an explanatory variable, the interpretation of a unit increase is that the explanatory variable
(continuous in nature such as rate of discussion articles) increases by one standard deviation per
day.

77

4.5.2 How is Length of Active Membership Related to Early Participation? (RQ2)
Four different explanatory variables that respond to RQ2, first discussion article
submitted or not, first comment submitted or not, first whisper submitted or not, and first vote
submitted or not were included in the model. Table 4.3 reports the regression coefficients for
early participation factors.
The hazard ratio for first discussion article submitted or not is .713, which suggests that
submitting a first discussion article (first discussion article submitted or not) will reduce the
probability of members remaining active in the community by 28.7% {-(.713-1)*100%}; the
value is significant at the p =.001 level.

49

The negative relationship between a first discussion

article submitted and the length of active membership could be due to the type of members that
are associated with the Sploder community. Sploder is a gaming community; members are
invested in submitting games, and it is possible that if members' games are not well received,
they may submit a discussion article indicating their displeasure before the stop using the site. It
could also be that they post a question looking for help in creating their game. Once this
question is answered sufficiently they may become inactive in the forum.
The hazard ratio for first whisper submitted or not is 3.084, which suggests that
submitting a first whisper (first whisper submitted or not) will increase the probability of
members remaining active in the community by 208.4% {(3.084-1)*100%}; the value is NOT
significant at the p =.001 level. One possible explanation could be that, since whispers are oneon-one communications, members use whispers to make social connections in the community.

49

This is a comparison between a group of members who submitted early participation,
i.e. first discussion article, first comment, first vote, and first whisper compared with the control
group of members who, because they did not submit anything counted as participation, have
early participation measures of zero.
78

Submitting their first whisper could be an indication that members are becoming more involved
in the community, and more involved members may stay active longer. The hazard ratio for first
vote submitted or not is 1.231, which suggests that submitting a first vote (votes submitted or
not) will increase the probability of a member remaining active in the community by 23.1%
{(1.231-1)*100%}; the value is NOT significant at the p =.001 level.
4.5.3 How is Length of Active Membership Related to Rate of Feedback Received? (RQ3)
Four different explanatory variables that respond to RQ3, rate of comments received, rate
of votes received, rate of whispers received, and fraction of deleted discussion articles, were
included in the model. Table 4.3 reports the regression coefficients for rate of feedback received.
The hazard ratio for the rate of comments received is .999, which suggests that a
unit increase in comments received per day (rate of comments received) will reduce the
probability of members remaining active in the community by .1% {-(.999-1)*100%}; the value
is NOT significant at the p =.001 level.

50

The hazard ratio for the rate of votes received is 1.005,

which suggests that a unit increase in votes received per day (rate of votes received) will increase
the probability of members remaining active in the community by .5% {(1.005-1)*100%}; the
value is NOT significant at the p =.001 level. The hazard ratio for the rate of whispers received is
1.067, which suggests that a unit increase in whispers received per day (rate of whispers
received) will increase the probability of members remaining active in the community by 6.7%
{(1.067-1)*100%}; the value is NOT significant at the p =.001 level. The hazard ratio for the
fraction of deleted discussion articles is 1.000, which suggests that a unit increase in fraction of
50

This is a comparison between a group of members whose rate of feedback received
from others, i.e. votes, comments, whispers, and deletion of articles increases by one unit
compared to the control group of members whose rate of feedback received from others remains
the same.

79

deleted discussion articles (fraction of deleted discussion articles) has no affect on how long
members remain active in the community; the value is NOT significant at the p =.001 level. This
suggests that the rate of feedback received from others is NOT significantly correlated with
length of active membership. One possible explanation could be that members joined the
community primarily to read posts and gain ideas from others’ submissions, so they may not be
concerned about feedback received from other members on their own discussion articles.
4.5.4 How is Length of Active Membership Related to Early Feedback Received? (RQ4)
Three different explanatory variables that respond to RQ4, first votes received on first
discussion article, first whispers received on first discussion article, and deletion of first
discussion article, were included in the model. Table 4.3 reports the regression coefficients for
early feedback received.
The hazard ratio for deletion of first discussion article is .071, which suggests that
receiving a deletion of first discussion article will reduce the probability of members remaining
active in the community by 92.9% {-(.071-1)*100%}; the value is significant at the p =.001
51

level.

There at least two potential explanations for this. One could be that because only editors

can delete discussion articles, members may be discouraged from remaining in the community as
they may put more emphasis on feedback from authority figures. Another explanation could be
that new members are more sensitive to harsh criticism than members who are more established.
The hazard ratio for first comment received on a first discussion article is .906, which suggests
that receiving a first comment on a first discussion article will reduce the probability of a
member remaining active in the community by 9.4% {-(.906-1)*100%}; the value is NOT

51

This is a comparison between a group of members who received early feedback, i.e.
first vote on a discussion article, first comment, first whisper, and first deletion of a discussion
article, compared with the control group of members who did not receive early feedback.
80

significant at the p =.001 level. This suggests that the majority of the early feedback received
from others on the content has no significant impact on length of active membership. The impact
of early feedback on length of active membership may be due to the fact that members joined the
community to understand specific game-related topics. They were more interested in gaining this
information than in what other members thought of their posts.
Familiarization time was measured as the time between creation of the account and the
first discussion article. This variable was included as a control in the model. Members need to
settle down in a community and become familiar with the socio-technical system and norms
associated with a community before they begin actively contributing to the site; this time is
referred to as familiarization time. The Hazard ratio for familiarization time is .999, which
suggests that a unit increase in familiarization time will reduce the probability of members
remaining active in the community by .1% {-(.999-1)*100%} at the p =.001 significance level.
This suggests that the longer the familiarization time a member has, the shorter their length of
active membership in a community. One possible explanation could be that members who take
longer to post have a harder time understanding the community and find it more difficult to use
when they start using the different features. This may make them less likely to remain active. On
the other hand, it could be that members who take longer to start participating just have less
interest in the community to begin with and for this reason they are likely to cease being active
earlier.
Please also refer to Appendix J for a length of active membership survival graph and
refer to Appendix K for more information about the model fit.
4.6 Examining Causal Links in the Sploder Community (Granger Causality Tests)

81

Because there are plausible non-causal explanations for the statistically significant
relationships revealed by the Cox hazard regression, I also employ a Granger causality as a more
rigorously test for causality. It is important to note that if the results from a Granger causality test
are not statistically significant, we can rule out any possible causal links among variables.
However, if the results from a Granger causality test are statistically significant, the evidence for
a causal relationship is merely stronger.
The intuition behind a Granger causality test is that if changes in one variable cause
changes in a second variable, then the value of the first variable in any given period should be
correlated with the value of the second variable in a subsequent period or periods (Granger,
52

1969).

A Granger causality test provides stronger evidence for or against causality than the

statistical significance of simple regression coefficients. The temporal evidence for causality is
derived from time series data.

53

In this study, the model relies on the prediction that length of

active membership in period N will be correlated with participation in the community during
periods N-1, N-2, etc. A Granger causality test simultaneously tests for each direction in which a
causal relationship might run between two variables. In this study, I examine whether members'
participation is causally linked to their length of active membership and whether the length of
active membership is causally linked to members’ participation.

52

A Granger causality test establishes whether a causal link among variables exist based
on simultaneous correlations among variables (current instance and previous instances among
variables) from time series data.
53

I am unable to ascertain true causality, as I do not have the information why the
members actually left. Using a time series model, I am attempting to ascertain possible causes
(by moving backward in time) by establishing causality in terms of statistics.

82

It is important to note that the Cox proportional hazard rate results did not show
significant correlation between rate of participation and length of active membership or between
rate of feedback received and length of active membership. Based on the Cox proportional
hazard rate results, causality between rate of participation and length of active membership and
rate of feedback received and length of active membership can be ruled out. However, a Granger
causality test was still conducted. The results for a Granger causality test can be viewed as an
additional validation for the results of the Cox proportional hazard rate model.
4.7 How is Length of Active Membership Affected by the Rate of Participation? (RQ1)
4.7.1 Members' Participation Affecting Length of Membership
A Granger causality test examines the relationship between two variables using lag
orders, where a lag order is the number of measurement periods for explanatory variables that is
included in the model. A period could be of any length, such as a day, a month, or a year. For
this study, a single lag order is a period of six months. I used the statistical software package
STATA to determine the period of lag order and conduct the Granger causality test. STATA
selected six months as the appropriate unit of time for measuring lags based on its assessment of
the data.

54

Using the server timestamps from the logs in Everything2, I derived lag orders for

explanatory variables (rate of writeups, rate of votes given, rate of messages) and the dependent
variable (length of active membership). I examined the possible causality between explanatory
variables and the dependent variable. For example, the server log contains timestamps for
writeups, timestamps for votes, and timestamps for messages. From these timestamps, I
constructed lagged values for variables and examined the possible causal links among variables.

54

A six month period for a lag order was selected based on the assessment of the data
when all dependent and explanatory variables are taken into account.
83

For a Granger test, time series data, a collection of observations made sequentially in
time, is decomposed into stationary trends and residuals (often known as random shocks). A
time series is called stationary if statistical properties (mean, standard deviation, and
autocorrelation) of the time series are constant. In a dynamic world, no trend will be stationary
forever. However, if a time series contains stable trends during the observation period of time,
the time series is termed trend stationary.
A Granger test, which utilizes time series model, can be represented (in terms of
participation and length of active membership) with the following equation:
p

p

j=1

j=1

MembershipLength(t)=∑C1j*Participation(t-j)+∑C2j*MembershipLength(t-j)+U1t,

where t is the current time, p is the maximum number of lagged observations included in the
model (the model order), j is the lag order and can take any value from 1 through maximum lag
order p, MembershipLength is the length of active membership, Participation is the rate of
participation, C2j is the coefficient of Membership Length for lag order j, C1j is the coefficient of
Participation for lag order j, and U1t is the model’s residuals (shocks) at the time t.
To select the number of lags (lag order) for a time series model, I employed five lag order
selection statistics (test) reported by STATA. I used the dependent variable and explanatory
variables to find the correct lag order for the model. Five tests, Likelihood Ratio test (LR),
Akaike Information Criteria (AIC), Hannan-Quinn Information Criteria (HQIC), Schwarz'
Bayesian Information Criterion (SBIC), Akaike's Final Prediction Error (FPE), were used to test
for the appropriate lag order. (Please refer to Appendix E for more information about the five
tests.) A maximum lag order of 2 was suggested by the FPE, AIC, HQIC and LR information
84

criteria, whereas SBIC information criteria suggested a maximum lagged order of 0 (refer to
Appendix-L). A lag order of 2, based on what the majority of the tests suggested (three of the
five information criteria), was selected for the Granger causality test.

55

LR, AIC, SBIC, FPE,

and HQIC were run for all rate of participation variables, including the dependent variable,
which is the length of active membership. The algorithms applied complex statistical tests (in a
FPE test, the expected variance of the error is measured when an Auto regressive time series is
fitted against another time series of similar co-variance structure) and suggested a lag order. It is
important to note, for this study, 2 lag orders is a period of 12 months. In this study, if members
had been members for less than a year they were still included (in the Granger causality test), and
it only used data on a member for the period the member was active.
The Granger causality test was conducted using a Vector Auto Regression (VAR).

56

Table 4.4 reports the results for a Granger causality test between rate of participation factors and

55

The approach I used to select a lag order is often referred as majority vote approach in
the field of information science, and more precisely, in data mining and in machine learning.
56

A Granger causality test can be derived using a VAR framework. A VAR model can
be represented as a time series consisting of two variables (x and y), where yp (value of variable y
for time p) can be represented in terms of its past values and past values for the variable x. If x
Granger causes y, some or all lagged x values will have non zero coefficients. A Vector Auto
Regression (VAR) is a statistical regression model. In a VAR model, multiple time series are
used to estimate linear dependencies among variables. Each variable can be considered evolving
from its own lags, and lags from other variables. In a VAR equation, a set of variables is used.
Each variable is represented as a linear function of v lags of itself and of all of the remaining
variables in the equation. An error term is also included. A first order VAR(1) for n variables
collected in nx1 vector yt can be represented as yt= b(0) +b1y(t-1)+ q(t) where the element qt is the
error term, which can be represented as the iid normal (a diagonal matrix); b(0) is nx1 vector
which represents a constant term in the equation. A VAR(1) model should satisfy the following
matrix equations E(v,v')= W and E(Vt,Vt-j)=0, where W is a positive semi-definite matrix
containing error terms in nxn dimensions and E(Vt,Vt-j)=0 indicates that every error term in the
equation has a mean of zero. The dependencies among variables are represented by the matrix b1
and the contemporaneous dependence is determined by the term qt. The results from a Granger
causality test (null hypothesis is supported or not) can be determined (based on the chi-square
85

length of active membership. The results showed that a higher rate of participation does NOT
cause members to stay active because the chi-squares values were not statistically significant at
the p=.001 level. Due to the large sample size, statistical significance should be examined along
with the effects size as indicated at the p=.001 significance level only (Jensen 2007).
Dependent Variable
Explanatory Variables
Chi-square
Probability
Length of Membership
Rate of Discussion Articles
1.680
0.432
Length of Membership
Rate of Comments sent
1.638
0.441
Length of Membership
Rate of Votes given
3.183
0.204
Length of Membership
Rate of Whisper given
0.831
0.660
Length of Membership
ALL
6.713
0.568
Table 4.4: Granger causality Wald results whether members’ rate of participation causes their
length of active membership (*p < .001)
4.7.2 Concluding Remarks
The Granger causality tests showed that more participation does NOT cause
community members to remain active longer. A hazard rate model showed that NO correlation
(at the p=.001 level; probability values for the corresponding chi-square are greater than 1%)
exists between members’ participation and their length of active membership. The Granger
causality test supports the no causality interpretation of the hazard model results.
4.8 How is Length of Active Membership Affected by Rate of Feedback Received? (RQ3)?
4.8.1 Feedback Members Received from Others Affecting Length of Membership
Using the server timestamps from the logs, I constructed lagged values for explanatory
variables (rate of votes received, rate of comments received, rate of whispers received, fraction
of deleted discussion articles) and the dependent variable (length of active membership), and
tested for causality between explanatory variables and the dependent variable.

values and the associated probability values) from a Wald test. This is just brief summary of a
Granger causality test and a VAR model. For more information please refer to
http://academic.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_5.pdf
86

A Granger causality test between feedback received from others on member supplied
content and length of active membership can be conducted with the following equation:
p

p

MembershipLength(t)=∑C5j*Feedback(t-j)+∑C6j*MembershipLength(t-j)+U3t,
j=1

j=1

where t is the current time, p is the maximum number of lagged observations included in the
model (the model order), j is the lag order and can take any value from 1 through maximum lag
order p, MembershipLength is the length of active membership, Feedback is the rate of feedback
received from others, C6j is the coefficient of Membership Length for lag order j, C5j is the
coefficient of Feedback for lag order j, and U3t is the model’s residuals (shocks) at the time t.
To select the number of lags (lag order) for the time series model, I employed five lag
order selection statistics reported by STATA. Five tests, the Likelihood Ratio test (LR), Akaike
Information Criteria (AIC), Hannan-Quinn Information Criteria (HQIC), Schwarz' Bayesian
Information Criterion (SBIC), and Akaike's Final Prediction Error (FPE) were used. LR, FPE,
and AIC suggested a maximum lag order of 2, whereas HQIC and SBIC information criteria
suggested a lag order of 1. A lag order of 2, as suggested by the majority of the tests, was
selected for the Granger causality test. (Please refer to Appendix M). The tests were run for all
variables, including the dependent variable, which is the length of active membership. The
algorithms applied complex statistical tests/models (in a FPE test, the expected variance of the
error is measured when an Auto regressive time series is fitted against another time series of
similar co-variance structure) and based on the majority of the test results, a lag order of 2 was
selected.
A Granger causality test was conducted through a Vector Auto Regression (VAR). The
87

Granger causality test results showed that feedback received from others does NOT cause length
of active membership as the chi-square values were not significant at the p=.001 level
(probability values for the corresponding chi-square is greater than 1%). Please refer to Table
4.5 to review the results from the Granger causality test.
Dependent Variable
Explanatory Variables
Chi-square
Probability
Length of Membership
Rate of Comments received
.816
0.665
Length of Membership
Rate of Votes received
.512
0.774
Length of Membership
Rate of Whisper received
.261
0.877
Length of Membership Fraction of deleted Discussion Article
1.951
0.377
Length of Membership
ALL
3.436
0.904
Table 4.5: Granger causality results whether rate of feedback causes
length of active membership (*p < .001)
4.8.2 Concluding Remarks
The Granger causality tests showed that more feedback received from others does NOT
cause community members to remain active longer. A Hazard rate model showed that no
significant correlation (at the p=.001 level; probability values for the corresponding chi-square is
greater than 1%) exists between feedback members received from others and their length of
active membership. The Granger causality test supports the no causality interpretation of the
hazard model results.
4.9 Granger Causality Tests in Regards to Early Participation and Early Feedback
Received from Others on the Content
It is important to note that I could not conduct a Granger Causality Test of whether
members' early participation influences their lengths of active membership (RQ2), and whether
early feedback received from others influences length of active membership (RQ4). A Granger
causality test examines whether changes in one variable may cause changes in a second variable
using past values of both variables. In this dissertation, I used members' first participation and
first feedback received from others as measures of early participation and early feedback
88

received from others. Since there is at most only one event recorded for each participation
variable and each early feedback received variable, there are no prior observations.
4.10 Chapter summary
Several possible factors that may contribute to length of active membership, which may
in turn contribute to the viability of online communities, were identified and tested in this
chapter. These factors were tested for the online community, Sploder, to determine whether or
not they are related to length of active membership using two rigorous statistical tests, a Cox
proportional hazard rate model and a Granger causality test. First, a Cox proportional hazard
rate model was introduced, and then the results from the test were presented. A hazard rate
model tests for any correlation evidence between two variables. The results found that only two
early participation variables (first discussion article submitted or not and first whisper submitted
or not) were correlated with length of active membership. A Granger causality test was then
introduced, and the results from the test were presented. A Granger causality test is used to
determine if there is any possible causal link between two different variables. It found no
evidence for a causal relationship between rate of participation and length of active membership
or between rate of feedback received and length of active membership.

89

CHAPTER 5
A Qualitative Study of Everything2
This chapter reports on interviews with participants from the Everything2 online
community. This chapter does not include any results from Sploder. Perhaps because Sploder is a
community of young adolescents and it is challenging to gain approval to interview young
adolescents, only one long-term Sploder member agreed to be interviewed. Qualitative analyses
will allow us to capture some of the insights behind members’ participation over time such as
what contributes to their interest in remaining active in the community.
The interviews were conducted as a part of a previous study.

58

57

The interviewees were

long-term members (members who have been active for over a year) from the community. The
findings for this study were obtained by analyzing data collected for an earlier study of
Everything2. The chapter further explores the reasons why the length of active membership in
online communities varies among members. It elaborates on why some members leave while
other members remain active and some members decrease their activity in a community. Gaining
insights from interviews may help to explain some of the findings from the earlier chapter
(Chapter 3) or add new insights to the results from the earlier chapter that are associated with
members' length of active membership.
It is important to note, a qualitative study is not meant to capture statistical power or
statistical significance. In a qualitative study, the variation among individual cases is what
57

In this chapter, whenever the term 'member' is used, this refers to members in general
of Everything2 community who are registered users of the community. When the terms
'participant' is used, this refers to the specific Everything2 members that took part in the
interviews. Members (registered users) login to the site to access different services of the
community.
58

The data was collected in 2010 for a separate project. I was a part of the research team.
90

matters. For example, this study has pointed out some quotes of participants regarding the
reasons that they use Everything2. The quotes highlight different underlying reasons for why
individual members remain active in the community or why some members leave the
community.
5.1 Recruitment of Everything2 Participants
To acquire a deeper understanding of how length of active membership in online
communities is associated with their participation and feedback received on their content,
59

interview data from thirty-one long-term Everything2 members was analyzed.

Everything2 members selected for the interviews were active participants for over a year.
Using this criterion, potential participants were identified for the interview. A snowball sample
recruitment technique was used to recruit participants. Initial participants selected for the
interview were asked to recommend other long-term members of the community who might be
willing to participate. Please refer to Table 5.1 for a summary of participants' total years in the
community and the gender mix for the participants.
A snowball sample is a technique often used in qualitative research to gain access to a
population otherwise difficult to identify and contact (Heckathorn, 1997; Heckathorn, 2002;
Goodman, 1961). Initial participants were selected by using email addresses gained from the
server logs and through personal contacts of research team members. In the recruiting email,
potential interview participants were asked to provide a phone number and date/time in order to
be contacted for the interview. If no response was received, one reminder email was sent to
59

It is crucial to note that if a user submits posts or if a user submits votes on a post or if
a user sends messages to others, all these content contributions in a community are termed as
participation. If a user receives votes on their post or if a user receives messages from others, all
these interactions are referred to as feedback.
91

potential participants. Once the recruitment reached a point where there was not enough
variation in responses from the interview participants, several additional criteria were used to
recruit interview participants likely to provide more diversity in their responses.
Participants Sex

P1
P2
P3
P4
P5
P6
P7
P8
P9
P 10
P 11
P 12
P 13
P 14
P 15
P 16

Total
Participants Sex
Participation
Years
Female
10
P 17
Male
Male
6
P 18
Male
Female
7
P 19
Male
Male
10
P 20
Male
Male
9
P 21
Female
Male
10
P 22
Male
Female
10
P 23
Male
Female
2
P 24
Female
Everything2
Male
10
P 25
Female
Female
10
P 26
Male
Female
10
P 27
Female
Female
8
P 28
Male
Male
10
P 29
Male
Male
6
P 30
Female
Female
10
P 31
Male
Female
10
Table 5.1: Everything2 long term interview participants

Total
Participation
Years
10
9
10
10
10
10
10
7
4
3
6
10
10
10
8

Additional participants were therefore recruited based on different types of participation
characteristics, which included: members who sent private messages to other members but to a
large extent did not post in the forum, members who messaged or voted for other members rather
than posted articles in the forum, and members who did not login for months after posting their
articles. Recruiting a variety of different types of members helped address a possible homophily
bias in the Everything2 sample (Heckathorn, 1997; Kuzel, 1999). A homophily bias occurs
when a sample’s participants are too similar to each other; this is common with snowball
samples because people tend to become friends with others who are like themselves
(Heckathorn, 1997). It is important to note that the sampling method was designed to recruit a
92

diverse set of participants in order to capture a varied range of responses from the interview
sample. Thirty-one participants from Everything2 were recruited for the purpose of the
interview.
5.2 Interview Protocols
In-depth interviews with the participants were conducted using an interview protocol (see
Appendix N). The protocol was created using King and Horrocks’ (2010) interview guide. The
interview guidelines helped with the design of the questionnaire by providing the research team
with the tools to frame questions and discern how broad a question should be. It further
emphasized the importance of avoiding presuppositions (assuming the answer before asking the
question) in the questions and to consider how the question could change during the course of the
study. After the -protocol was created, a pilot test was given to two participants to explore
whether the questionnaire needed to be modified. If modifications were required, the
questionnaire was refined based on the response received from the pilot interviews and semistructured interviews were conducted to collect the data. In a semi-structured interview, a set of
possible themes are explored while allowing a free-flowing natural progression of conversations
with the interviewees.

60

The interview questionnaire focused on the participation lifecycle of members in an
online community. The study concentrates on 1) participants in the community who were still
participating (active) at the time of the interview, and 2) participants who were inactive
(members who no longer post or login to the community) at the time of the interview.

60

Because the interviews were semi-structured, some of the interview responses deviated
from the original questions.
93

The data was collected through telephone interviews, with each interview lasting sixty to
ninety minutes. Using semi-structured interviews, active participants were asked about whether
their amount of contributions (participation) in the community changed over time, the reasons for
any change in participation, and how their contributions (posts) were received in the community.
In addition to these questions, participants who had ceased to participate (inactive) were also
asked why they decided to leave the community (stop their participation).
Interviews focused on exploring members' lifespans in the Everything2 community,
members’ rates of participation (how often they participate) and any changes to their rates of
participation.

61

The interviews were audio recorded and transcribed using the software Atlas.ti.

After transcribing the interviews, the interview data was coded.
5.3 Coding
An exploratory analysis was conducted to answer generic questions about members'
changes in participation, particularly whether they remained active or became inactive.
Participants’ rationales behind their participation and their perception that influenced a change in
the usage within these communities, such as life constraints and close ties in their network, were
coded. Participants' types of usage such as reading, personal relations and posting behaviors were
also coded. An iterative approach was taken for the coding purpose.

62

An interpretive data analysis approach was used, in which data was systematically
analyzed to identify major categories. Participants' responses pertaining to these categories from
61

It is important to note, the underlying assumptions (interpretations) for rate of
participation and changes to their rate of participation are consistent with interpretation of
quantitative data.
62

An iterative coding process is an approach where qualitative data is categorized by
reviewing the responses multiple times. During the coding process, the qualitative data is
categorized.
94

the data were summarized (Strauss, 1998). The categories were reviewed in terms of the original
transcripts and relevant quotes were pulled from the categories. Themes were identified and
summarized based on the overarching patterns. The themes were then listed in a spreadsheet.
The iterative coding process supported the identification of various patterns in the data.
The process made it possible to examine how themes were similar and different across different
participants and then identify any patterns in the data. In order to construct a story line (that
explores length of active membership), codes were removed during the coding process, which
allowed me to concentrate on specific themes. Using this coding technique, a data matrix was
63

created and participants’ quotes were entered into it.

Please refer to Table 5.2 for the major

themes identified from the coding.
Themes
Posting Rationale
Reduction in Use

Members' use of the
community
Leaving the
community

Descriptions
Reasons why posts were made. This includes social reasons and wanting
to post higher quality material.
Different precipitant factors that included a change in the usage of
members. The factors include:
a) life constraints
b) deletion of posts
c) downvotes
d) the evolution of ‘wiki-era’
How the community is used such as personal relations and postings.
Different reasons members gave for leaving a community.
Table 5.2: key themes identified based on coding

It is important to note that the codes covered commonalities and unique cases among
participants. Participants’ responses were grouped into categories and emerging themes were
identified from the data and reported in this chapter.
63

A comparative method, reading and rereading themes, guided the identification of
final themes. After the interviews were read through once to identify themes, the related
literature was reread and themes were formulated. These themes are presented in this study.
95

5.4 Members’ Participation
Members have used the Everything2 online community for a variety of reasons.
Members viewed the online community as a place to develop skills for themselves and others.
For example, one participant mentioned using others’ feedback to improve their writing skills:
I used Everything2 in terms of my own writing basically to hone my technical
writing skills... (P 12)
The level of participation among members varied over time. Most participants
mentioned that they posted fewer posts (at the time of interview) compared to what they
had previously posted. Various reasons were given for reductions in participation over
time, such as becoming busy in their lives or improving the quality of posts. A couple
of participants stated:
I decreased the amount of posting because I wanted to post higher quality
content. Started reading more, wasn’t posting, when I became busy with life. I
decreased my posting around 2002 when I started grad school. (P 3)
I got wrapped up in work and the thing is like as that happened I started going on
the site less and less ... (P 12)
Participants mentioned that even though they had reduced the frequency of their posts
over the years, they remained active in the community to keep in touch with others through their
posts. They posted so that their friends could read and enjoy their writeups. This desire to keep
in touch with friends in the community encouraged them to continue participating for a longer
period of time, as shown by these participants’ comments:
Some people I considered as friends I can only contact through the site... That's
why I posted the writeup about my father's death and it’s probably the only way I
would have to connect with them. (P 1)
Participants expressed appreciation for the messaging features in Everything2. The
messaging feature in Everything2 allows members to send and receive private messages.

96

Participants indicated that they would check-in (login to the community) even after they had
reduced their activity, sometimes even if they were no longer posting, just to keep in touch with
friends through the messaging center.
So if I wasn't messaging people about their writing as an editor, I was
messaging them to catch up. (P 20)
I still log in today and the only purpose is literally to check my inbox. Because I
definitely do keep in touch with some people that way to this day. (P 7)
One participant mentioned that giving and receiving feedback in the form of messages is
an important part in being active in the community.
I think that message feedback is crucial because it is the glue that holds all this
together. It helps us to become better. (P 8)
5.5 Factors that Reduced Members' Participation
There are four main factors that participants consistently identified as accounting for their
reduced participation in the community: deletion of writeups, downvotes, life changing events,
and the 'wiki-era'.
1. Deletion of Writeups
Participants identified deletions of their writeups by editors as one of the reasons for their
reduced participation. In Everything2, no clear editorial guidelines for deletion of writeups exist;
the community did not promote a clear editorial policy in terms of deletion of writeups as content
editors deleted writeups based on their discretion (Sarkar et al., 2012). In some cases, content
editors expressed preferences for higher quality posts as reasons for deleting writeups. Content
editors also deleted members' posts without clear standards, guidance or reasons, which
discouraged members from participating. Editors also did not inform members when their posts
were deleted.

97

What precipitated the trickling off of my nodding [posting] was not only the
massive purging of various things I had put into the database… And at some point
someone decided these all had to go…and I was never informed until I went
looking for something and realized it missing but weeks and weeks of other work
was also missing and there was no one I could appeal to I thought…this is a good
sign that this site is no longer a great place to spend my spare time because
someone here doesn’t put much value on how I spend it. (P11)
I became editor for a short period of time in the site. I felt it was a mistake, I was
not ready. I deleted posts without explanation; I was not mature enough to do that
job. (P 16)
I was still active but I was, you know what, you guys you know what, delete as
many of my writeups as you want, I’m not going to participate in this f*** absurd
town hall show fest… (P 4)
2. Downvotes
Everything2 supports a voting system that allows members to submit a negative vote,
downvote, on other members’ writeups. Participants mentioned that receiving downvotes
discouraged them from participating further. For example, P17 reported that downvotes by
others on his posts discouraged him from participating in the community.
I thought that was the absolute worst thing that could happen was to get
downvoted. (P17)
3. The Evolution of the “Wiki-era”
The “Wiki-era” is the time when sites like Wikipedia, an online encyclopedia, were
designed (starting in 2001). Some members attempted to make Everything2 more like Wikipedia.
As Everything2 became more like Wikipedia in the way it was run, members found it less
rewarding to participate. They felt that Everything 2 is about writings of any kind, not about
generating fact-oriented entries. A couple of participants expressed these views in their
interviews. Participants felt discouraged by the way Everything2 started imitating other
competing “Wiki-era” sites.

98

Back then we had a pretty gross divergence between growing us into something
that would eventually have competed with Wikipedia ...what Wikipedia grew into
which is fine, whatever the community wanted eventually it's from the ground up,
the entire site is more run by the writers and the coders than anybody really
upstairs. But the social framework of the site is gonna be provided by the noders
[authors]. (P 21)
I felt kind of pushed off the site at that point (referring to the wiki-era) and I only
came back a few times with something that I apparently thought was important.
One Writeup worthy...there were editors who thought that what I had to say was
just not worthy of being on the site I thought well, I guess I’M NOT going to
contribute to this site. (P 1)
4. Life-Changing Events
Participants mentioned that life-changing events changed their rate of participation in the
Everything2 community. Life-changing events include starting a job, getting married, having
kids, graduating from high school, and getting into a college program. Life-changing events led
members' to spend less time in the community than what they were spending at their beginning
of their membership. A couple of participants stated:
I didn’t quite have the time to take classes and after classes to keep up with the
site as much as when I was younger...I was also hired by the university run
student newspaper to design pages for it. (P 19)
I still check e2 periodically and I don’t visit the site very often mainly because I
have almost no personal time anymore. I’m teaching full time, I’m taking grad
school classes, I’m married, my husband has two children that we have every
other weekend and every summer and I’ve found that we have almost no time to
sit back and reflect back on anything which is the main motivation for me to. I’ve
found ironically that I wrote more when I had less of a “life” and now that I have
a life I have no time to sit and reflect and write about it. (P 16)
5.6 Leaving the Community
Despite reductions in their activity levels, many members still remained active in the
community. However, some members left the community, becoming inactive. Participants
mentioned various reasons for leaving the community. Participants stated that they left because
their friends also left.
99

We lost a lot of good people and a lot of the people that I was virtual friends with
decided to go. And as much as I enjoyed the writing, the longer I was there the
more it was about the community to me, the more it was about the people. And
when the people that I knew left, I didn’t really see any need to continue...when
the people left, that was kind of the last thing for me, well there’s really no point
in sticking around now...When those people started to leave, that’s when I chose
to make an exit. (P 14)
Some users might have left because of all the political arguments on the site. I
believe some users were disenfranchised because they were not interested in the
goofier aspects of the site, and tried to remain writing only factual or ended up
leaving. I’m not sure if it's received that way because people like what I'm doing,
or if it's positive because everybody knows my name. (P 30)
Other reasons for leaving the community are similar to reasons participants mentioned for
reducing their participation level.
This shows that members do leave the Everything2 community. Everything2 is a
content-based community that relies solely on its members to generate the content. This means it
is crucial that members remain active in the community for the community to survive. This
supports the claim at the beginning of this dissertation that understanding why some members
remain active in the community while others leave is important to the viability of online
communities.
5.7 Chapter Summary
This chapter has examined interview data from members of Everything2. It reported
reasons offered by different members for why they chose to continue participating in the
community, why they chose to decrease their participation, and why they became inactive and
effectively left the community. Factors that discouraged Everything2 members from
participating were deletions of writeups, downvotes, imitation of other sites, and life-changing
events. The findings from this study provide insights about factors that lead to change in
members' participation and their reasons for leaving the community.

100

5.8 Concluding Remarks
The insights gained from this interview-based study could not all be accounted for in the
quantitative study (Chapters 3) due to limitations in the data sets. Server logs cannot capture
certain factors, such as life changing events, that affect participation. Some findings did help to
clarify some of the results in the quantitative study. For example, in the quantitative study for
Everything2, it was found that the rate of writeups posted and the rate of messages sent is
significantly correlated with members’ length of active membership. The interviews for the
qualitative study found that members used writeups and the messaging center to stay in touch
with friends in the community. This could explain why the rate of writeups and rate of messages
is significantly correlated with length of active membership. Some of the results from the
qualitative study were not shown to be significantly correlated with the length of active
membership in the quantitative study. For example, some participants mentioned deleted
writeups as a reason for reducing their participation. Although deletion of first writeup was
statistically significant, fraction of writeups deleted was found not to be significantly correlated
with length of active membership. Perhaps some members are less sensitive to negative feedback
and the interview sample may have included such members from the community.
I conclude this chapter by discussing differences in the findings reported here from those
of an interview study conducted by Valesquez et. al (2013). This discussion is necessary because
Valesquez et. al analyzed responses from the same set of interviews that generated the responses
reported and discussed in this chapter. Readers familiar with both studies might reasonably ask
why they report different findings. Valesquez et. al (2013) examined patterns of participation for
latent members of the Everything2 community (members of a community familiar with its norms
who are not actively participating at the time of the interview) and their motivations to

101

participate. They found that most members move in stages through a range of membership roles,
such as reader, contributor, collaborator, and leader. Only someone in one of the latter 3 stages of
participation (contributors, collaborators, and leaders) can be classified as latent if they become
inactive. They also reported that members' motivation to participate remain constant over time.
The focus of the study of interview responses reported in this chapter was, by contrast, on factors
that contributed to cessation of active participation by community members and how these
affected lengths of active membership. For this reason, the responses of interest and interview
quotes reported in this study are different from those that were the primary foci of the Velasquez
et al. (2013) study. It is important to note, however, that while this qualitative study was
conducted to examine factors affecting length of active membership, many of the themes (e.g.,
reasons for reduced participation) that emerged as important in this study are similar to those
reported in Velasquez et al (2013).

102

CHAPTER 6:
Discussion of Findings
Without a sustaining base of long-term active members, communities are unlikely to
survive for the long haul. If online communities can identify factors that contribute to increased
length of active membership, they could modify their designs and strategies to take these factors
into account. This dissertation first reviewed the research literature to find candidates for these
factors. Two key factors that may affect length of active membership were identified: attributes
of members' own participation in their online communities and feedback they receive from other
members. The dissertation pointed out some of the gaps in the existing literature and accounts
addresses those gaps in the current study. It also examined two different online communities,
each with their own types of members and goals (Everything2 is about writing, Sploder
(discussion community) is about games to see if findings for one community also apply to the
other community as both communities support environments for collaboration and interaction
among members with similar affordances. This comparison addresses problems determining the
extent to which results of studies of other online communities, all of which focus on single
communities, might hold for online communities in general.
A mixed methods approach was used to examine how participation and feedback
received are related to length of active membership in each community. The approaches included
both quantitative and qualitative studies. The quantitative analysis of each community employed
two rigorous statistical tests. A survival analysis using the Cox proportional hazard rate model
was employed first to examine how participation and feedback factors are associated with length
of active membership.

103

Prior literature has reported correlational evidence that measures of members'
participation in online communities are linked to the length of time they remain active in these
communities. Because correlational evidence is not dispositive proof of causality and there are
plausible non-causative explanations for the relationships found, this dissertation also used a
Granger causality test as a more rigorous and stringent test of the possibility that length of active
membership is causally linked to several types of participation and feedback received measures.
The qualitative component is an analysis of interviews with 31 long-term members of
Everything2 who responded to questions related to changes in their levels of participation over
time. Analysis of the interview data provided additional insights and helped further contextualize
and explain some of the relationships identified in the previous two chapters.
6.1 Overview of Results from All Three Studies
When new members join an online community, they face the challenge of deciding
whether to remain in the community. For an online community, it is important that they remain
active because their engagement with the community provides social and economic value to the
community as a whole. The empirical findings from the Everything2 and Sploder studies
reported in Chapters 3 and 4 show that the factors contributing to longer-term membership may
vary among communities. The Cox proportional hazard rate model showed that all rate of
participation variables were strongly correlated with length of active membership in
Everything2. In the Everything2 community, rate of write-ups submitted, rate of messages sent,
and rate of votes given are positively correlated at statistically significant levels with length of
active membership. However, for the Sploder community, no rate of participation variable is
strongly correlated with length of active membership. Perhaps members of Everything2
considered their own participation more valuable than members of Sploder.

104

The findings therefore show that rate of participation is not a good predictor of length of
active membership for all online communities. For some communities like Everything2,
members’ rate of participation could be a good predictor of length of active membership, while
for other communities like Sploder, this is not the case. The hazard model also showed that the
rates for individual feedback variables, such as rate of messages received, rate of votes received,
rate of cools received, rate of comments received, and rate of whispers received are NOT
strongly correlated with length of active membership for either community.
The results from a Cox proportional hazard model showed that members' early
participation in terms of first message submitted or not, first writeup submitted or not,
(Everything2) and first discussion submitted or not (Sploder) are all correlated with length of
active membership at statistically significant levels. All other types of early participation
variables (first comment submitted or not, first upvote submitted or not, etc.,) were NOT strongly
correlated with length of active membership. This suggests that members in online communities
prefer different types of early participation, possibly based on their motivations for participation.
Perhaps members’ first message (Everything2) was a response to negative messages from other
members early on, which might explain a negative correlation between sending a message and
length of active membership for this community. Perhaps submitting their first writeup indicates
that they are interested in becoming more involved in the community since it requires more
effort on the part of the member to submit a writeup than other forms of participation. Future
research should examine this possibility in more depth. Submitting a first discussion article
(Sploder) reduced the likelihood of members remaining active in the community. Sploder is a
game making community, meaning members are probably more invested in making and playing
games than in posting discussion articles in the forum. Perhaps when they sign up on the

105

discussion forum they are looking for answers to a specific question in designing their game.
Once this question is answered, they may stop logging into the community.
This dissertation generated mixed results regarding the relationship between early
feedback received and length of active membership. For Everything2, the first upvote received
on a first writeup, first downvote received on a first writeup, and deletion of first writeup were
significantly correlated with length of active membership. Receiving an upvote or a downvote on
the first writeup increased the likelihood of a longer length of membership. This suggests that an
upvote encouraged members to remain active in the community longer. This could also mean
that positive feedback encouraged members to continue using the site early in their membership.
A downvote may increase the length of membership because it shows that other members have
noticed the new member, and perhaps just getting noticed by another member is enough to
encourage the new member to remain active, even if the feedback is negative. Deletion of a first
writeup was negatively correlated with length of active membership. This could be because
deleting a first writeup is viewed as negative feedback from the editors, whose feedback new
members may take more seriously than feedback received from ordinary members. Several
participants in the qualitative study expressed displeasure over their writeups being deleted. This
follows the statistical significance of deletion of first writeup.
The analysis for Sploder also showed that deletion of the first discussion article is
significantly negatively correlated with length of active membership. The deletion of a first
discussion article is negatively correlated with length of active membership. New members may
put more emphasis on a deletion of a discussion article than the rest of the early feedback
variables because only editors can delete a discussion article. For both Everything2 and Sploder,

106

fraction of deleted posts—a negative feedback cue—was NOT significantly correlated with the
length of membership.
One early participation variable (first discussion article submitted) and one early
feedback variable (deletion of first discussion article) were significantly correlated with length of
active membership in Sploder. No rate of participation or rate of feedback received variables
showed statistical significance for Sploder.
This dissertation used a Granger causality test, (which uses a multivariate time series
model) to examine whether the dissertation’s participation measures and feedback received
measures are simply correlated with length of membership or if there is causality attributable to
these factors. The results from this test showed that there is no solid evidence for causality
between members’ participation and length of active membership in either direction in
Everything2. Members’ participation in Everything2 is NOT causally linked to their length of
active membership. The results from the Granger causality test showed that receiving feedback
from other members does NOT influence their length of active membership, which supports the
results from the hazard rate model. For the Sploder community, results from the Granger
causality test showed that participation and length of active membership and feedback and length
of active membership are NOT causally linked with each other, further validating the findings of
from the hazard rate analysis.
Previous research on online communities found that feedback received from others was
positively correlated with continued participation in a community (Joyce, 2006; Lampe, 2005;
Burke, 2009; Panciera, 2009). However, these studies examined examined the relationship
between feedback over relatively short periods of time (the first three to sixteen months after a
member joined). However, this dissertation found that rate of feedback received from others was

107

not statistically significantly correlated with a longer length of active membership when behavior
was tracked for over two years for the two communities studied. This suggests that the shortterm effects of feedback on continued use identified by previous studies may decay rapidly.
6.2 Implications for Practice
Online communities such as Everything2 and Sploder enable large numbers of members
to participate. However, many of these members become inactive. The high dropout rates for
online members suggest new research on factors that contribute to longer active membership
would be beneficial. Prior research showed members’ participation and feedback received on the
content might be important for inducing members to remain active in their online communities.
Understanding the effects of participation and feedback factors on length of membership in
online communities could provide valuable insights for designing online services for which
interactions among members are important.
Prior research has suggested certain design decisions that community administrators can
make to increase the time members stay active in online communities, such as publishing clear
guidelines regarding what constitutes acceptable and valuable content in a community (Sarkar,
2012). If these guidelines were posted on the website where new members could read them,
perhaps members would have a better idea of what is acceptable so their first post would be less
likely to be deleted. This could encourage them to remain active longer. The interviewees from
the qualitative study reported in Chapter 5 said that they were not even informed that their posts
were deleted, or why, and this discouraged them from participating further. If there are clear
guidelines and posts are deleted, the administrators could inform the author why the post was
deleted. The same goal can be further advanced by can be achieved by hiring professionals to
generate content at the early stages of the community so that it can set an example for new and

108

mature members in the community (Kraut, 2012). Sustained existence for peer-production
communities is dependent on cordial relationships between members and online editors. If
members are NOT encouraged by editors, they may feel frustrated (Kraut, 2012). This could lead
to a reduction in their participation and, eventually, members may become inactive in a
community and stop using services associated with it. Understanding how participation and
feedback factors affect length of membership in online communities could inform the design of
tools that may reduce the burden of peer production editors in managing the community.
If a member does not remain active in a community long-term, it could have a negative
impact on the longevity of the site. Thus, online administrators should encourage members to
remain active in the community. As the interview participants mentioned, a primary reason for
them to remain active was to maintain friendships they built. Administrators should encourage
these friendships to promote cooperation and a longer period of active membership. The insight
gained from this research about the impact of initial participation and initial feedback on length
of active membership may help administrators determine where they should spend their time and
effort to incentivize members to participate further in a community, which could improve the
sustainability of the community. The findings further suggest that early negative feedback has a
strong negative impact on how long a member will remain active in an online community.
Administrators should use their discretion carefully when providing negative feedback on
content, especially with new members.
6.3 Limitations
While this study can provide insights into factors that influence length of active
membership in online communities, it has limitations that future studies can address to expand
this line of research. This study does NOT generalize to every online community. Findings from
this study may apply to some user generated content-based communities, such as Wikipedia or
109

World of Warcraft, that are similar to Everything2 and Sploder. However, even for the two
communities examined the findings were somewhat different. Further, even though the sample
sizes were large, the data analyzed in this dissertation is a convenience sample based on two
online communities, NOT a true random sample drawn from the universe of online communities.
Quality of posts and quality of feedback, which this study did not take into account, may
also contribute to length of active membership. Evaluating the quality of posts for a data set of
this size would be a challenge. This may require automated natural language processing, which
should be an elaboration of the approach used here for future research.
The dissertation could not account for psychological variables, such as members’
motivations to participate or members’ personalities because server logs capture only actions and
the results of actions. Psychological factors may help further explain what contributes to length
of active membership in a community.
Another limitation is that the two communities did not automatically log a member out if
a member closed the browser without signing out. In this instance, the server logs did not capture
the next login. This may add bias to the results. Future research should address this.
In addition, data sets were derived from a snapshot of members' behaviors. Due to
computational challenges and restrictions in the server logs, analyses of early participation and
early feedback were restricted to members’ first posts only. Future research should expand the
measures of early participation.
Also, the qualitative and the quantitative studies of Everything2 were jointly designed.
The interview responses used for the qualitative study were from a previous study, which
therefore was not designed to address many of the questions addressed by the quantitative
studies. Nevertheless, the answers from the interviewees did still help to provide additional

110

context for some of the findings and they helped identify factors, such as life changing events,
that might influence how long members remained active in an online community that could not
be analyzed in the two quantitative studies reported in this dissertation.
Due to computational challenges, I could not separate feedback received from higher
status members versus other members. This may have skewed the results in terms of how
feedback factors are related to length of membership in the community. However, only 2.29% of
the members in the Everything2 sample have achieved a higher status rank that allowed them to
submit cools. This is a hard problem that future studies should find a way to address when they
examine length of membership in the community.
6.4 Future Research
This study was limited to the first instance of each type of early participation and early
feedback received variables (captured through server logs). Since there is at most only one event
recorded for each early participation variable and each early feedback received variable, there are
no prior observations. Because of this, Granger causality tests could not be performed to examine
whether early participation causally contributes greater length of active membership and whether
early feedback received is causally linked to length of active membership. Future studies should
examine these relationships further by using the first few months of activity for early
participation and early feedback received measures.
6.5 Conclusions
This dissertation focuses on length of active membership in online communities and ways
it might be related to members’ participation and feedback they receive on content they
contribute. Keeping members active is important for the success of online communities. For
many of these online communities, members are the main sources for sharing information and

111

generating content. It is also possible that over time members who currently only receive
information (e.g., only read) may start providing information (e.g., posts) to other members.
However, it is not guaranteed that they will remain active. This research was driven by the goal
of better understanding to what extent different activities may impact length of active
membership and also to see how these findings can be generalized for two communities that are
similar in important ways. In the past, scholars have examined different communities (such as
the online encyclopedia Wikipedia, the health community Breastcancer.org, the game
community World of Warcraft, the news and discussion community Slashdot, the question and
answer forum Yahoo! Answers, etc.) as standalone objects of study. These studies thus reported
findings that were specific to their individual communities.
Studying two communities provides evidence on whether findings from one community
generalize to the other community. The findings reported in this dissertation show that while
some results may generalize across communities, not all results generalize. The findings that
early negative feedback is negatively correlated with how long a member remains active in an
online community applied to the measures of early negative feedback for both Everything2 and
Sploder.

112

APPENDICES

113

Appendix A
Results from a Cox Proportional Hazard Rate Model with a Cutoff Period of Two Months
(Sixty Days) for Everything2
This appendix provides the results from the Cox proportional hazard rate model for Everything2
with a cutoff point of two months, so only members who have remained active in the community
for at least two months after they registered in the community are included in this analysis.
N

Min

Max

Mean

S.D
829.200

Length of Membership
21909
60.003 3451.289
793.151
Rate of Writeups
21909
.0
5.7
.014
.090
Rate of Messages sent
21909
.0
11.0
.038
.284
Rate of Votes given
21909
.0
20.5
.109
.757
Rate of Messages received
21909
.0
10.4
.003
.075
Rate of Votes received
21909
.0
.9
.000
.014
Rate of Cools received
21909
.0
1.5
.384
.297
Fraction of deleted Writeups
1714
.2
.9
.603
.274
First Writeup submitted or not
21909
0
1
.31
.461
First Message submitted or not
21909
0
1
.48
.499
First Downvote submitted or
21909
0
1
.31
.461
not
First Upvote submitted or not
21906
0
1
.98
.135
First Upvote received on a
21909
.0
1
.018
.131
Writeup
First Downvote received on a
21909
.0
1
.311
.462
Writeup
First Deletion of a Writeup
21909
0
1
.36
.480
First Cool received on a
21909
.0
1
.181
.384
Writeup
Familiarization Time
21909
0
5
.50
.899
Table A-1: descriptive statistics for the variables with a two month cut off time in Everything2

114

Table A-2 shows the chi-square difference between the null model and the full model for the Cox
proportional hazard rate model at two months.
-2 Log
Likelihood

Overall (score)
Change From Previous Step
Chidf
Sig.
Chidf
Sig.
square
square
11528.284
517.599
16
.000
381.617
16
.000
Table A-2: Omnibus Tests of Model Coefficients for Everything2

Table A-3 shows the results for the Cox proportional hazard rate model for Everything2 for
members who remained active in the community at least two months. The column labeled
Exp(B) gives the estimated coefficient values for the explanatory variable in terms of hazard
ratio and Sig. gives the significance level for each variable
Factors
Rate of Participation

Rate of Feedback
Received

Early Participation

Variables
Rate of Writeups
Rate of Messages sent
Rate of Votes given
Rate of Messages received
Rate of Votes received
Rate of Cools received
Fraction of deleted Writeups
First Writeup submitted or not
First Message submitted or not

Exp(B)
3.921
1.572*
1.117
.012
2.995
4.609
1.044
2.628
.201*

SE
.474
.080
.049
2.388
1.584
.134
.122
1.289
.112

Sig.
.004
.000
.023
.065
.489
.000
.722
.453
.000

.403
1.284
.480
First Downvote submitted or not
.924
.267
.768
First Upvote submitted or not
Early Feedback
.705
.222
.116
First Upvote received on a Writeup
Received
First Downvote received on a
1.595*
.079
.000
Writeup
5.639
.837
.039
First Cool received on a Writeup
First Deletion received on a
.963
.068
.583
Writeup
Control
1.146*
.039
.001
Familiarization Time
Table A-3: Hazard rate model results for participation and feedback factors and length of active
membership in Everything2 (*p <.001)

115

The results showed a significance difference for the rate of cools received on length of
active membership for members who stayed in the community for at least 60 days compared to
those who stayed in the community over a day but less than 60 days. Rate of cools received is
positively correlated with length of active membership (at the p=.001). First cool received on a
writeup is not statistically significant at p=.001 level.

116

Appendix B
Variance Inflation Factor Analysis for Everything2
Table A-4 gives result for a Variance Inflation Factor (VIF) analysis. A VIF value of 5 or
above usually indicates multi-collinearity among the variables. I found VIF values between
1.011 and 2.609 for all variables. The VIF test for multi-collinearity confirms that all of the
individual participation variables and all individual feedback variables that have been used in the
analysis are NOT collinear with each other. This means that no participation variable is collinear
with any other participation variable, nor is any participation variable collinear with any
feedback variable. No feedback variable is collinear with any other feedback variable. Also the
VIF values showed that length of active membership is not collinear with any participation or
any feedback received variables.
Variables

Collinearity Statistics
Tolerance
VIF
Rate of Writeups
.786
1.272
Rate of Messages sent
.989
1.011
Rate of Votes given
.964
1.037
Rate of Messages received
.986
1.015
Rate of Votes received
.888
1.126
Rate of Cools received
.540
1.853
Fraction of deleted Writeups
.993
1.007
First Writeup submitted or not
.517
1.936
First Message submitted or not
.383
2.609
First Downvote submitted or not
.711
1.406
First Upvote submitted or not
.945
1.058
First Upvote received on First Writeup
.886
1.129
First Downvote received on First Writeup
.845
1.183
First Cool received on First Writeup
.711
1.407
Deletion of First Writeup
.962
1.039
Familiarization Time
.967
1.034
Table A-4: results from a Variance Inflation Factor (VIF) analysis for Everything2

117

Appendix C
Survival Function at Mean of Covariates in Everything2
The x axis is time to cessation of active membership in days. The y axis indicates
probability of survival. Any point on the survival curve shows the probability that an active
member will remain active for at least that amount of time.
Survival Function at Mean of Covariates
1.0

Cum Survival

0.8

0.6

0.4

0.2

0.0

0

1000
2000
3000
Length of Membership in Days
Figure A-1: Survival graph for members of Everything2

118

4000

Appendix D
Model Fit In Terms of a Chi-square Difference for the Hazard Rate in Everything2
The chi-square difference between the null model (only the intercept has non-zero value;
all explanatory (independent) variables in a model have zero regression coefficients in the
equation) and the full model (when all explanatory variables and the intercept are included; the
intercept and at least one explanatory variable has a value in the equation) is significant at the
p=.001 level (refer to Table A-5), suggesting that effects size of the explanatory variables in
terms of hazard ratio is significant for explaining the length of active membership. A null model
provides a baseline measure of a model’s fit. The fitness of a model refers to how well the datapoints fit the equation of a model and how close the predicted values are from the observed
values. A model fit is often assessed in terms of a Chi-square difference. A smaller chi-square
difference between a full model and a null model represents a better model fit. A better model fit
suggests that the margin of unexplained variance (in the model) is low and provides evidence for
stronger correlation between explanatory variables and the dependent variable.

-2 Log Likelihood

Overall (score)

21706.418

Chi-square
3393.323

Change From Previous Step
Sig.
.000

Chi-square
838.576

Table A-5: Omnibus Tests of Model Coefficients

119

Sig.
.000

Appendix E
Statistical Tests to Determine a Lag Order
Five statistical tests were performed to determine the lag order for a Granger causality
test. These tests are Likelihood Ratio test (LR), Akaike Information Criteria (AIC), HannanQuinn Information Criteria (HQIC), Schwarz' Bayesian Information Criterion (SBIC), Akaike's
Final Prediction Error (FPE). These tests are designed to maintain trade-off for selecting a lag
order between too few lags and too many lags. These tests are often used to select a lag order for
forecasting. Granger causality uses forecasting techniques to establish a relationship between
variables using past values of two separate variables. In this study, I have used the software
package STATA for these tests, which provided a corresponding lag order for the Granger
causality test.
Likelihood Ratio test (LR)
A likelihood ratio test is a statistical test, which compares fit between two models, null
model and full model. The null model (a model which contains the intercept and all variables
with zero coefficients), whereas an alternative model (which contain all explanatory variables
and the intercept; the intercept and at least one explanatory variable has a value in the equation).
The test uses likelihood ratio to determine the likelihood of a given outcome of an event from the
data under one model compared to another model. The likelihood ratio is used to generate a pvalue to decide if null model should be rejected in favor of the alternate model.
Akaike Information Criteria (AIC)
Akaike Information Criteria (AIC) is a statistical model, which uses information entropy
to provide a measure of relative quality. If K is the maximum number of parameters in a model

120

and L is the maximum value of a likelihood function, AIC values for a model is measured as 2k2Ln(L).
Hannan-Quinn Information Criteria (HQIC)
Hannan-Quinn Information Criteria (HQIC) is a well known criterion for lag order
detection from time series model. In this study, this information criterion is used to determine the
lag order for the Granger causality test. The HQIC is measured using the following expression
HQIC= ulog(RSS/u) + 2mloglogu, where m is the number of parameters, u is the number of
observations, and RSS is the residual sum of squares which is obtained from a linear regression.
Schwarz' Bayesian Information Criterion (SBIC)
Schwarz Bayesian Information criterion (SBIC) is a statistical test often used for model
selection. Schwarz Information criterion utilizes the expression -2Lp + plnq, where q is the
sample size, Lp is the maximized log-likelihood of the model and p is the number of parameters
in the model. It uses a Bayesian prior in terms of p parameter.
Akaike's Final Prediction Error (FPE)
Akaike's Final Prediction Error (FPE) is often used to estimate a lag order. The model is
defined as expected variance of the prediction error in econometric. The model estimates the
expected variance of the error when an autoregressive time series is fitted against another time
series of similar covariance structure.

121

Appendix F
Results from a Lag Order Test for Granger Causality in Everything2 Pertaining to Length
of Active Membership and Participation

Results from a lag order selection test (members' length of active membership and participation
variables such as rate of writeups, rate of messages sent, and rate of votes) are reported based on
LR, FPE, AIC, HQIC, SBIC information criteria.

Lag Order

P-Value

LR

AIC

HQIC

SBIC

.069489

0

FPE

8.68492

8.72917*

8.79456*

1

.010

31.946

.069559

8.6855

8.90677

9.23372

2

.005

34.221*

.068042*

8.66136*

9.05964

9.64815

Table A-6: lag order results for Granger causality test on length of active membership and
participation in Everything2

122

Appendix G
Results from a Lag Order Test for Granger Causality in Everything2 Pertaining to Length
of Active Membership and Feedback Received

Results from lag order selection test based on LR, FPE, AIC, HQIC, SBIC information criteria.

Lag Order

P-Value

LR

1

0.010

32.101*

AIC

HQIC

SBIC

4.5e+09

0

FPE

33.5715

33.6313*

33.7584*

4.5e+09 *

33.5682*

33.867

34.5023

Table A-7: results from a lag order test for Granger causality on length of active membership
and feedback in Everything2

123

Appendix H

Results from a Cox Proportional Hazard Rate Model with a Cutoff Period of Two Months
(Sixty Days) for Sploder
This appendix provides the results from the Cox proportional hazard rate model for Sploder with
a lower bound membership cutoff of two months such that only members who remained active in
the community for at least two months after they registered in the community were included in
this analysis
Variables

N
Min
Max
971 60.02 951.52
971
0
9.51
971
0 23.44
971
0 19.34
971
0
7.00
971
0
5.00
971
0 11.35
971
0
1.50
971
0
20.1
971
0
1.00

Mean
293.087
.281
1.670
.876
.240
2.463
6.128
.738
.113
.213

S.D
220.533
.663
3.543
1.558
.667
1.597
3.043
.476
.715
.409

Length of active membership
Rate of Discussion Article
Rate of Comments sent
Rate of Votes given
Rate of Whisper given
Rate of Comments Received
Rate of Votes Received
Rate of Whisper Received
Fraction of Deleted Discussion Article
First Comment received on First
Discussion Article
Deletion of First Discussion Article
971
0
1.00
.233
.423
First Discussion Article submitted or not
971
0
1.00
.808
.393
First Vote submitted or not
971
0
1.00
.996
.055
First Whisper submitted or not
971
0
1.00
.955
.205
Familiarization Time
971
0 950.15
279.683
254.756
Table A-8: descriptive statistics for the variables with a two month cut off time in Sploder
Table A-2 shows the chi-square difference between the null model and the full model for the Cox
proportional hazard rate model at two months.
-2 Log
Likelihood
19827.233

Overall (score)
Change From Previous Step
Chidf
Sig.
Chidf
Sig.
square
square
521.038
14
.000
721.626
14
.000
Table A-9: Omnibus Tests of Model Coefficients for Sploder
124

Table A-3 shows the results for the Cox proportional hazard rate model for Sploder for members
who remained active in the community at least two months, with column Exp(B) giving the
estimated coefficient values for the explanatory variables in terms of hazard ratio and Sig. gives
the significance level for each variable.
Factors

Variables

Rate of
Participation

Rate of Discussion Article
Rate of Comments sent
Rate of Votes given
Rate of Whisper given
Rate of Comments received
Rate of Votes received
Rate of Whisper received
Fraction of deleted Discussion Articles
First Comment received on First
Discussion Article
Deletion of First Discussion Article
First Discussion Article submitted or
not
First Whisper submitted or not
First Vote submitted or not

Rate of Feedback
Received

Early Feedback
Received
Early Participation

Exp(B)
1.000
1.001
1.000
.958
.999
.999
1.002
.998
1.022

SE
.056
.012
.028
.070
.023
.012
.078
.050
.096

Sig.
.994
.957
.993
.541
.952
.952
.974
.962
.820

.048*
1.000

.327
.109

.000
.997

1.655
1.001

.593
.582

.396
.998

Familiarization Time
1.000
.000
.971
Table A-10: Hazard rate model results on participation and feedback factors and length of active
membership in Sploder (*p <.001)

125

Appendix I
Variance Inflation Factor Analysis for Sploder
Table A-10 gives result from a Variance Inflation Factor (VIF) analysis. A VIF value of 5
or above usually indicates a multi-collinearity association among the variables. After removing
two variables that showed a VIF value above 5, I found VIF values between 1.004 and 1.504 for
all variables used in the analysis. The VIF test for multi-collinearity confirms that all of the
individual participation variables and all individual feedback variables that have been used in the
analysis are NOT collinear with each other. This means that no participation variable is collinear
with any other participation variable, nor is any participation variable collinear with any
feedback variable. No feedback variable is collinear with any other feedback variable. Also the
VIF values showed that length of active membership is not collinear with any participation or
any feedback received variables.
Variables

Collinearity Statistics
Tolerance
VIF
Rate of Discussion Article
.949
Rate of Comments sent
.679
Rate of Votes given
.683
Rate of Whisper given
.924
Rate of Comments received
.996
Rate of Votes received
.994
Rate of Whisper received
.993
Fraction of deleted Discussion Articles
.982
First Comment received on First Discussion Article
.784
Deletion of First Discussion Article
.773
First Discussion Article submitted or not
.665
First Whisper submitted or not
.841
First Vote submitted or not
.992
Familiarization Time
.757
Table A-11: results from a Variance Inflation Factor (VIF) analysis for Sploder

126

1.054
1.473
1.463
1.082
1.004
1.006
1.008
1.018
1.276
1.293
1.504
1.189
1.008
1.320

Appendix J
Survival Function at Mean of Covariates
The x axis is time to cessation of active membership in days. The y axis indicates
probability of survival. Any point on the survival curve shows the probability that an active
member will remain active for at least that amount of time.
Survival Function at Mean of Covariates
1.0

Cum Survival

0.8

0.6

0.4

0.2

0.0

0

200

400
600
Length of Membership in Days

800

Figure A-2: Survival graph for members of Sploder

127

1000

Appendix K
Model Fit In terms of a Chi-square Difference for the Hazard Rate in Sploder
The chi-square difference between the null model (only the intercept has non-zero value; all
explanatory (independent) variables in a model have zero regression coefficients in the equation)
and the full model (when all explanatory variables and the intercept are included; the intercept
and at least one explanatory variable has a value in the equation) is significant at the p =.001
level, suggesting that the effects size of the explanatory variables in terms of hazard ratio is
significant for explaining the length of active membership (refer to Table A-11). A null model
provides a baseline measure of a model fit. The fitness of a model refers to how well the datapoints fit the equation of a model and how close the predicted values are from the observed
values. A model fit is often measured in terms of a Chi-square difference. A smaller chi-square
difference between a full model and a null model represents a better model fit. A better model fit
suggests that the margin of unexplained variance (in the model) is low and provides evidence for
stronger correlation.
-2 Log Likelihood

Overall (score)

19770.023

Chi-square
570.831

Change From Previous Step
Sig.
.000

Chi-square
778.837

Table A-12: Omnibus Tests of Model Coefficients

128

Sig.
.000

Appendix L
Results from a Lag Order Test for Granger Causality in Sploder Pertaining to Length of
Active Membership and Participation

Results from a lag order selection tests based on LR, FPE, AIC, HQIC, SBIC information
criteria.

Lag

P-Value

LR

AIC

HQIC

SBIC

5.4e+06

0

FPE

29.6966

29.7693

29.9345*

1

0.522

23.96

1.4e+07

30.6266

31.0629

32.0539

2

0.001

53.914*

1.5e+07*

30.4868*

31.2868 *

33.1036

Table A-13: lag order diagnosis for a Granger Causality Test

129

Appendix M
Results from a Lag Order Test for Granger Causality in Sploder Pertaining to Length of
Active Membership and Feedback Received

Results from lag order selection tests based on LR, FPE, AIC, HQIC, SBIC information criteria.

Lag

P-Value

LR

AIC

HQIC

SBIC

395.27

0

FPE

20.1688

20.2415

20.4067

1

0.225

29.968

835.967

20.8842

21.3206*

22.3116*

2

0.081

35.402*

1701.69*

21.4056*

22.2056

24.0224

Table A-14: lag order diagnosis for a Granger Causality Test

130

Appendix N
Interview Questions
1. Think back to when you first heard about Everything2/Sploder. Where did you
hear about the site? What prompted you to join Everything2/Sploder [Initial
motivation]?
2. Think back, what was the first thing you contributed to E2/Sploder?


Do you remember it? (If needed: Our logs indicated that it was node <x>)



Do you think it was a good contribution? Why or why not? What did you
like about it?



Was it well received by the community?



How / why did you decide to contribute content to the site?



Did you contribute privately first with members and gradually started
posted publicly? [Private vs. Public]



What was your sense of the other users of the site?

3. How do you think your use of E2/Sploder changed over time?
a) Were you ever contributing “regularly”?
b) Did you contribute to the site in other ways than just nodes?
c) How often would you say you contributed? Was it always about the same
amount, or did that amount of contribution change at different times? (If
so, why did it change?)
d) On average, how much time was spent in E2/Sploder every week? Did it
change over time?

131

e) Did your role change over the years while contributing to E2/Sploder as a
member? (Trying to probe general contributor vs. moderator)
4. Did you participate in giving feedback to other users on the site?
a) How did you decide what deserved a C!?
b) Did you use these feedback features much? Why or why not?
c) Did your use of these community feedback features change the way you
contributed nodes?
5. Think back, what was the last thing you contributed to the site?
a) Why did you decide to contribute that?
b) How well was it received by the community?
c) What happened that caused you to stop after that?
d) (If accurate) That was contributed around <date of late contribution in
logs>. What else was happening in your life around that time?
e) Did you receive private messages from other members of the
node/community during your contribution? [ validating the spikes in
private messages]
f) After you stopped contributing, did you follow the other members’
contribution in E2/Sploder? [Indirectly validating the lurking behavior as
precursors of Exiting]
g) While contributing to E2/Sploder did you also contribute to any other
similar social networking sites? If yes, did you continued to contribute the
other site even after withdrawing your contributions from E2/Sploder?
[Competing Sites]

132

h) After withdrawing from E2/Sploder did you join any other similar
websites ?[Validity Checking]
6. According to our logs, the last node you contributed was <x>. Do you remember
contributing that?

i) Why did you decide to contribute that?
j) How well was it received by the community?
k) What happened that caused you to stop after that?
l) That was contributed around <date of late contribution in logs>. What
else was happening in your life around that time?
7. What are your current thoughts about Everything2/Sploder?
8. If you could start over, would you contribute again? What would you do
differently?
Contributors who are no longer active
1. Before contributing to E2/Sploder did you observe the other users in the site?
2. What was the content you contributed to E2/Sploder?
a) Do you remember it? (If needed: Our logs indicated that it was
node <x>)
b) Do you think it was a good contribution? Why or why not? What
did you like about it?
c) Was it well received by the community?
d) How / why did you decide to contribute content to the site?
e) Did you contribute privately first and gradually started contributing
publicly? [Private vs. Public]
f) What was your sense of the other users of the site?
133

g) What caused you to stop after that?
h) (If accurate) That was contributed around <date of late
contribution in logs>. What else was happening in your life
around that time?
i) Did you receive private messages from other members of the
node/community during your contribution? [ validating the spikes
in private messages]
j) After you stopped contributing, did you follow the other members’
contribution in E2/Sploder? [Indirectly validating the lurking
behavior as precursors of Exiting]
k) While contributing to E2/Sploder, did you also contribute to any
other similar social networking sites? If yes, did you continued to
contribute to the other site even after withdrawing your
contribution from E2/Sploder? [Competing Sites]
l) After withdrawing from E2/Sploder did you join any other similar
websites ?[Validity Checking]
Long term contributors who are still active
1. Think back to when you first heard about Everything2/Sploder. Where did you
hear about the site? What prompted you to join Everything2[Initial motivation]?
2. Think back, what was the first thing you contributed to E2/Sploder?


Do you remember it? (If needed: Our logs indicated that it was node <x>)



Do you think it was a good contribution? Why or why not? What did you
like about it?

134



Was it well received by the community?



How / why did you decide to contribute content to the site?



Did you contribute privately first with members and gradually started
posting publicly? [Private vs. Public]



What was your sense of the other users of the site?

3. How do you think your use of E2/Sploder changed over time?
f) Were you ever contributing “regularly”?
g) Did you contribute to the site in other ways than just nodes?
h) How often would you say you contributed? Was it always about the same
amount, or did that amount of contribution change at different times? (If
so, why did it change?)
i) On average, how much time do you spend on E2/Sploder every week? Did
it change over the time?
j) Did your role change over the years while contributing to E2/Sploder as a
member? (Trying to probe general contributor vs. moderator)
4. Did you participate in giving feedback to other users on the site?
d) How did you decide what deserved a C!/Whisper?
e) Did you use these feedback features much? Why or why not?
f) Did your use of these community feedback features change the way you
contributed nodes?
5.

a) When and what is the most recent contribution you have made?
b) What comments did you receive about your contribution?
c) What do you feel about the comments? <useful, meaningless>

135

d) Are you also a member of other Social network sites like E2/Sploder?
e) What changes and improvements do you want to see in E2/Sploder?
[N.B: Based on the answers users’ provided in the earlier questions, if needed, or if they were
wrong about their answer to the previous question]

136

REFERENCES

137

REFERENCES

Arguello, J, Butler, B. S, Joyce, L, Kraut, R. E, Ling, K. S, Rosé, C. P. and Wang, X. (2006).
Talk to me: Foundations for Successful Individual-Group Interactions in Online Communities. In
Proceedings of the 2006 ACM Conference on Human Factors in Computing Systems, New
York: ACM Press, 959-968.
Arkes, H. and Blumer, C. (1985). The Psychology of Sunk Cost. Organizational Behavior
and Human Decision Processes, 35(1), 124-140.
Arkes, H. and Hutzel, L. (2000). The Role of Probability of Success Estimates in the Sunk Cost
Effect. Journal of Behavioral Decision Making, 13(3), 295-306.
Bender , T. (1978). Community and Social Change in America. Princeton, NJ: Rutgers
University Press.
Bendor, J. and Swistak, P. (2001). The Evolution of Norms. American Journal of Sociology 106
(6), 1,493–1,545.
Brothers, L, Hollan, J, Nielsen, J, Stornetta, S, Abney, S, Furnas, G. and Littman, M. (1992,
November 1-4). Supporting Informal Communication via Ephemeral Interest Groups.
Proceedings of CSCW 1992. Toronto, Ontario: ACM.
Bowes, J. (2002, March). Building Online Communities for Professional Networks. Proceedings
of the Global Summit of Online Knowledge Networks. Adelaide, Australia.
<http://www.educationau.edu.au/globalsummit/papers/jbowes.htm> checked 25 July 2005.
Burke, M, Marlow, C, and Lento, T. (2009). Feed Me: Motivating Newcomer Contribution in
Social Network Sites. In proceedings of the CHI 2009, ACM Press, 945-954.
Cheshire, C. (2008). The Social Psychological Effects of Feedback on the Production of Internet
Information Pools. Journal of Computer-Mediated Communication, 13, 705–727.
Choi, B, Alexander, K, Braut, R.E. and Levine, J.M. (2010, February). Socialization Tactics in
Wikipedia and Their Effects. Presented at CSCW 2010.
Churchill, E, Girgensohn, A, Nelson, L, and Lee, A. (2004). Information Cities: Blending Digital
and Physical Spaces for Ubiquitous Community Participation. Communication of ACM, 47 (2),
38-44.
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data. New York: Chapman & Hall.
Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society,
Series B 34 (2): 187–220.
138

Cox Reference 1. http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-coxregression.pdf
Cox Reference 2. https://mywebspace.wisc.edu/jmullahy/web/basu%20manning%20mullahy.pdf
Delanty, G. (2010). Community. (Second edition). New York: Routledge.
Ducheneaut, N. (2005). Socialization in an Open Source Software Community: A SocioTechnical Analysis. Computer Supported Cooperative Work, 14 (4), 323-368.
Durkheim, E. (1960). The Division of Labour in Society. New York: Free Press.
Etzioni, A. and Etzioni, O. (1999). Fact-to-Face and Computer-Mediated Communities, a
Comparative Analysis. The Information Society 15: 241 – 8.
Eytan, A. and Huberman, B.A. (2000, October). Free Riding on Gnutella. First Monday, 5(10).
Everything2: http://www.everything2.com/
Farzan, R. and Brusilovskya, P. (2011, January). Encouraging User Participation in a Course
Recommender System: An Impact on User Behavior. Computers in Human Behavior: Volume
27, Issue 1, 276-284.
Goodman, L.A. (1961). Snowball Sampling. Annals of Mathematical Statistics, 32 (1): 148–170.
doi:10.1214/aoms/1177705148.
Granger, C. (1969). Investigating Causal Relations by Econometric Models and Cross-Spectral
Methods. Econometrica 37 (3).
Granger Reference 1.
http://academic.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_5.pdf
Gupta, S. and Kim, H.W. (2004). Virtual Community: Concepts, Implications, and Future
Research Directions. In Proceedings of the Tenth Americas Conference on Information Systems
(New York, NY, August), C. Bullen, and E. Stohr, Eds. AIS, Atlanta, GA.
Halfaker, A, Kittur, A. and Riedl, J. (2011) Don't Bite the Newbies: How Reverts Affect the
Quantity and Quality of Wikipedia Work. In Proceedings of the 7th International Symposium on
Wikis and Open Collaboration (WikiSym '11). New York, NY: ACM, 163-172.
Heckathorn, D. (1997). Respondent-Driven Sampling: a New Approach to the Study of Hidden
Populations. Social problems, 44 (2). 174-199.
Heckathorn, D.D. (2002). Respondent-Driven Sampling II: Deriving Valid Estimates from
Chain-Referral Samples of Hidden Populations. Social Problems. 49: 11-34.

139

Horrigan, J. (2007). A Typology of Information and Communication Users. Pew Internet &
American Life Project. Available from http://www.pewInternet.org/pdfs/PIP_ICT_Typology.pdf
[Accessed on 15 February 2012].
Hughes. D, Coulson,G. and Walkerdine, J. ( 2005). Free Riding on Gnutella Revisited: The Bell
Tolls. IEEE Distributed Systems Online, 6(6): 1–18.
Jensen, C, Sarkar, C, Jensen, C. and Potts. C. (2007, July). Tracking Website Data-Collection
and Privacy Practices with the iWatch Web Crawler. In Proceedings of the Symposium of
Usable Privacy and Security. Pittsburgh, PA.
Jones B.D. (2001). Politics and the Architecture of Choice: Bounded Rationality
and Governance. Chicago: University of Chicago Press.
Joyce, E. and Kraut, R. (2006). Predicting Continued Participation in Newsgroups. Journal of
Computer-Mediated Communication, 11(3):723-747.
Keisler, S. (2007).
http://www.ee.oulu.fi/~vassilis/courses/socialweb10F/reading_material/5/Kiesler07.pdf
King, N, and Horrocks, C. (2010). Interviews in Qualitative Research. London: Sage.
Kollock, P. (1999). The Economies of Online Cooperation: Gifts and Public Goods in
Cyberspace. M. Smith and P. Kollock (editors). Communities in Cyberspace. London:
Routledge, 219–239.
Koh, J, Kim, Y, Butler, B. and Bock, G. (2007). Encouraging Participation in Virtual
Communities. Communications of the ACM 50 (2), 68-73.
Kraut, R. E. and Resnick, P. (2012). Evidence-Based Social Design: Mining the Social Sciences
to Build Online Communities. Cambridge, MA: MIT Press.
Kuzel, A.J. (1999): Sampling in Qualitative Inquiry. In B.F. Crabtree and W.L. Miller (eds):
Doing Qualitative Research. Thousand Oaks, CA.: SAGE, pp. 33–46.
Lakhani, K. and Wolf, R. (2005). Why Hackers do What They do. Perspectives in
Free and Open Source Software. J. Feller, B. Fitzgerald, S. Hissam, and K. Lakhani (Eds.). MIT
Press, Cambridge, MA.
Lampe, C. and E. Johnston. (2005). Follow the (Slash) Dot: Effects of Feedback on New
Members in an Online Community. Proceedings of the 2005 international ACM SIGGroup
conference on supporting group work. New York, NY: ACM Press.
Lampe, C. (2009). Participation Lifecycle in Online Communities. Submitted to National
Science Foundation.

140

Lampe, C, Wash, R, Velasquez, A. and Ozkaya, E. (2010). Motivations to Participate in Online
Communities. In the Proceedings of the ACM Conference on Human Factors in Computing
Systems (CHI) Atlanta, GA.
Ling, K., Beenen, G., Ludford, P., Wang, X, Chang, K Li, X, Cosley, D., Frankowski, D,
Terveen, L, Rashid, A, Resnick, P. and Kraut, R. (2005). Using Social Psychology to Motivate
Contributions to Online Communities. Journal of Computer-Mediated Communication. 10 (4).
Mockus, A. Fielding, R. T. and Andersen, H. (2002). Two Case Studies of Open Source
Software Development: Apache and Mozilla. ACM Transactions on Software Engineering and
Methodology, 11(3): p. 309-346.
Moran, E. (2008). http://www.sitepoint.com/study-why-most-online-communities-fail/
Moreland, R. L. and Levine, J. M. (2001). Socialization in Organizations and Work Groups.
M.E. Turner (Ed.). Groups at Work: Theory and Research, 69–112. Mahwah, NJ: Lawrence
Erlbaum
Nielsen, J. (2006). Participation Inequality: Lurkers vs. Contributors in Internet Communities.
http://www.useit.com/alertbox/participation inequality.html
Nonnecke, B. and Preece, J. (2000). Lurker Demographics: Counting the Silent. Paper presented
at the ACM CHI, The Hague.
Nonnecke, B. and Preece, J. (2001). Why Lurkers Lurk. Paper presented at the Americas
Conference on Information Systems, Boston.
Nonnecke, B. and Preece, J. (2003). Silent Participants: Getting to Know Lurkers Better. C. Lueg
& D.Fisher (Eds.). From Usenet to CoWebs: Interacting with Social Information Spaces,
Springer.
Nov, O, Naaman, M. and Ye, C. (2009). Motivational, Structural and Tenure Factors that Impact
Online Community Photo Sharing. In Proceedings of ICWSM 2009: ACM Press, 555-566.
O'Mahony, S. and Ferraro, F. (2007). The Emergence of Governance in an Open Source
Community. Academy of Management Journal, 50 (5), 1079-1106.
Panciera, K, Priedhorsky, R, Erickson, T. and Terveen, L. (2010). Lurking? Cyclopaths? A
Quantitative Lifecycle Analysis of User Behavior in a Geowiki. In Proceedings of CHI.
Panciera, K, Halfaker, A. and Terveen, L. (2009). Wikipedians are Born, not Made. In ACM
Special Interest Group on Computer-Human Interaction.
Phang, C.W, Kankanhalli, A. and Sabherwal, R. (2009). Usability and Sociability in Online
Communities: A Comparative Study of Knowledge Seeking and Contribution. Journal of the
Association for Information Systems, 10 (10), 721–747.
141

Poblocki, K. (2001, November). The Napster Network Community. First Monday 6(11):
at http://firstmonday.org/issues/issue6_11/poblocki/index.html.
Preece, J. (2001). Sociability and Usability: Twenty Years of Chatting Online. Behavior and
Information Technology Journal, 20 (5), 347–56.
Priedhorsky, R, Chen, J, Lam, S.K, Panciera, K, Terveen, L. and Riedl, J. (2007). Creating,
Destroying, and Restoring Value in Wikipedia. Sanibel Island, Florida: Proceedings of the 2007
international ACM conference on Supporting group work, ACM.
Ren,Y, Kraut, R. and Kiesler, S. (2007) Applying Common Identity and Bond Theory to Design
of Online Communities. Organization Studies, 28(3), 377–408.
Ren, Y, Harper, F. M, Drenner, S, Terveen, L, Kiesler, S, Riedl, J. and Kraut, R. (2010)
Increasing Attachment to Online Communities: Evidence-Based Design. MIS Quarterly (Under
review).
Ren, Y., J. Chen, J. Riedl. (2011). The Impact and Evolution of Group Diversity on Online
Communities.
Richardson. C.R, Buis, L.R, Janney, A.W, Goodrich. D.E, Sen, A, Hess, M.L, Mehari1,
K.S, Fortlage, L. A. and Resnick, P.J. (2010). An Online Community Improves Adherence in an
Internet-Mediated Walking Program. Part 1: Results of a Randomized Controlled Trial. Journal
of Medical Internet Research.
Sarkar, C., Wohn, D. Y., Lampe, C., and DeMaagd, K. (2012). A Quantitative Explanation of
Governance in an Online Peer-Production Community. Proceedings on CHI2012, ACM Press,
2939-2942.
Skinner, B. F. (1953). Science and Human Behavior. New York: MacMillan.
Skinner, B. F. (1957). Verbal Behavior. New York: Appleton-Century-Crofts.
Smith, T, Smith, B. and Ryan, M.AK. (2003). Survival Analysis Using Cox Proportional
Hazards Modeling For Single And Multiple Event Time Data. San Diego, CA.
Strauss, A. and Corbin, J. (1998). Basics of Qualitative Research: Techniques and Procedures for
Developing Grounded Theory. Thousand Oaks, CA: Sage Publications Inc.
Stewart., O, Heights, Y. and Lubensky, D. (2010). Crowdsourcing Participation Inequality: a
SCOUT Model for the Enterprise Domain. Proceedings of the ACM SIGKDD Workshop on
Human Computation.HCOMP '10. New York: ACM.
Takahashi,N. (2000). The Emergence of Generalized Exchange. American Journal of Sociology,
105 (4), 1,105–1,134.
142

Therneau, T. M. and P. M. Grambsch. (2000). Modeling Survival Data: Extending the Cox
Model. New York:Springer.
Toral, S, Martinez-Torres, M. R, Barrero, F. and Cortes, F. (2009). An Empirical Study of the
Driving Forces Behind Online Communities. Internet Research, 19(4), 378-392.
Velasquez, A, Lampe, C, Wash, R. and Bjornrud, T. (2013, March 16). Latent Users in an Online
User-Generated Content Community. Computer Supported Cooperative Work. DOI
10.1007/s10606-013-9188-4.
Walther, J. B. (1996). Computer-Mediated Communication: Impersonal, Interpersonal, and
Hyperpersonal Interaction. Communication Research, 23, 3-43.
Wang, Y, Kraut, R. and Levine, J.M. (2012). To Stay or Leave? The Relationship of Emotional
and Informational Support to Commitment in Online Health Support Groups. Presented at
CSCW 2012.
Whittaker, S, Terveen, L, Hill, W. and Cherny, L. (1998). The Dynamics of Mass Interaction.
Proceedings of CSCW 1998, ACM Press, 257-264.
Williams, D. (2006). Groups and Goblins: The Social and Civic Impact of an Online
Game. Journal of Broadcasting & Electronic Media, 50(4), 651-670.
Wenger, E. (2001). Supporting Communities of Practice: a survey of community-oriented
technologies. Retrieved on January 2, 2012 from http://www.ewenger.com/tech
Yang, J, Wei, X, Ackerman, M.S. and Adamic, L.A. (2010). Activity Lifespan: An Analysis of
User Survival Patterns in Online Knowledge Sharing Communities. Presented at the Fourth
International AAAI Conference on Weblogs and Social Media.
Zhang, G. (2012, April). Community: Issues, Definitions, and Operationalization on the Web.
Presented at WWW 2012.
Zhang, X. and Zhu, F. (2006). Intrinsic Motivation of Open Content Contributors: the Case of
Wikipedia.

143