E - CONSTITUTIONS: CONCEPTUALIZATION, THEORY, DESIGN MODEL AND EXPERIMENTAL EVALUATION S By Hamed Khaledi A DISSERTATION Submitted to Michigan State University i n partial fulfilment of the requirements for the degree of Business Administration Business Information Systems Doctor of Philosophy 2018 ABSTRACT E - CONSTITUTIONS: CONCEPTUALIZATION, THEORY, DESIGN MODEL AND EXPERIMENTAL EVALUATION S By Hamed Khaledi This project address es the problem of collective design in cyberspace using computerized governance rules . Despite many applications of computerized rules, there is no systematic method or model to design them. I conceptualize d an e - constitution as a set of rules in computer code that allocate decision rights and incentives to govern decision making process. This dissertation develop s a design model consisting of a structured representation that breaks down a constitution into 1 4 components including a state transition function and a weighting function. As a meta - artifact, t his model provides a unified architecture for governance structures in a wide range of situations including crowds ourcing, blockchains and corporate governanc e . This model enables use of quantifiable performance measures to evaluate constitutions objectively, liberated f rom the fairness criteria used in impossibility theorems. A systematic methodology is also presented to improve the performance of constitution s efficiently . As a proof of concept, I implemented a generic e - constitution in a web application and measured the effects of different factors on the constitutional pe rformance metrics through online experiments. One finding is that approval voting is significantly superior to the plurality voting, even under a prediction voting incentive scheme. Keywords: Design Science, Constitution, Governance, Mechanism Theory, Game Theory, Crowdsourcing, Collective Intelligence , Blockchain, Dist rib uted Autonomous Organization s . iii ACKNOWLEDGEMENTS Hereby , I want to thank my advisor, and the chair of the dissertation commi ttee, Professor Severin Grabski for his tremendous support and help . Moreover, I would like to express my deepest appreciation and gratitude to Professor Joshua Introne for his precious guidance and direction . I also t hank Professor Bill McCarthy and Pr ofessor Frank Rav i t ch for their support and believing in me. Additionally, I am grateful to the system administrator, Jeremy Isaac for helping me to develop and debug the web application for the experiments. iv TABLE OF CONTENTS LIST OF TABLES ................................ ................................ ................................ ................................ ............... vi LIST OF FIGURES ................................ ................................ ................................ ................................ ........... viii 1. Introdu ction ................................ ................................ ................................ ................................ ...................... 1 2. Literature Review ................................ ................................ ................................ ................................ ............. 6 3. Problem and Motivation ................................ ................................ ................................ ................................ . 11 3.1. Collective Design ................................ ................................ ................................ ................................ ............. 11 3.2. E nforcement versus Implementation ................................ ................................ ................................ ................ 14 4. Constitutions for Collective Design ................................ ................................ ................................ ................ 18 4.1. Iterative Design ................................ ................................ ................................ ................................ ................ 18 4.2. Parallel Design ................................ ................................ ................................ ................................ ................ 21 4.3. Range Voting ................................ ................................ ................................ ................................ .................... 23 5. Incentives ................................ ................................ ................................ ................................ ........................ 25 6. Structured Representation ................................ ................................ ................................ ............................. 31 7. Formalization ................................ ................................ ................................ ................................ .................. 37 8. Weighting ................................ ................................ ................................ ................................ ........................ 40 9. Constitutiona l Design ................................ ................................ ................................ ................................ ..... 48 10. Analytical Model ................................ ................................ ................................ ................................ ........... 50 11. Propositions and Relationships ................................ ................................ ................................ .................... 53 12. Research Method ................................ ................................ ................................ ................................ .......... 60 12.1. Experimentation Procedure ................................ ................................ ................................ ........................... 60 12.2. Factor Selection Guidelines ................................ ................................ ................................ ........................... 65 1 - Accuracy of selection ( p ): ................................ ................................ ................................ ............................. 66 2 - Number of suggestions ( m ): ................................ ................................ ................................ ......................... 70 3 - Average quality of suggestions ( µ ): ................................ ................................ ................................ ............. 70 4 - Variance of suggestions ( ): ................................ ................................ ................................ ......................... 71 5 - Number of rounds ( z ): ................................ ................................ ................................ ................................ .. 71 Para meter - wise Summary: ................................ ................................ ................................ ............................... 72 12.3. Evaluation of Constitutions ................................ ................................ ................................ ............................ 73 13. Proof of Concept ................................ ................................ ................................ ................................ ........... 76 13.1. Implementation ................................ ................................ ................................ ................................ .............. 76 13.2. Pi lot Experiments ................................ ................................ ................................ ................................ ............ 80 v 14. Results and Analysis ................................ ................................ ................................ ................................ ..... 84 14.1. Participants ................................ ................................ ................................ ................................ .................... 84 14.2. Depe ndent Variables ................................ ................................ ................................ ................................ ...... 85 14.3. Treatment (1) ................................ ................................ ................................ ................................ ................. 86 14.4. Treatments a, b and ab ................................ ................................ ................................ ................................ ... 88 14.5. Treatments ac, ad and acd ................................ ................................ ................................ ............................. 92 14.6. Treatments ac'd, acde and ac'de ................................ ................................ ................................ .................... 96 14.7. Including Control Variables ................................ ................................ ................................ ............................ 97 14.8. Subject Level Analysis ................................ ................................ ................................ ................................ ... 101 15. Discussion and Limitations ................................ ................................ ................................ ......................... 109 16. Conclusions ................................ ................................ ................................ ................................ ................. 114 APPENDICES ................................ ................................ ................................ ................................ .................. 116 APPENDIX A: IRB Application Determination Letter ................................ ................................ .............................. 117 APPENDIX B: Registration Page Including the Consent Form ................................ ................................ ................ 118 APPENDIX C: Screenshots of the Webpages for the Experiment ................................ ................................ ........... 119 APPENDIX D: Final Survey Webpage ................................ ................................ ................................ ..................... 123 APPENDIX E: Control Panel for the Experimenter ................................ ................................ ................................ .. 125 APPENDIX F: Computer Code of the Generic E - Constitution ................................ ................................ .................. 126 APPENDIX G: Database of the Website ................................ ................................ ................................ ................. 132 APPENDIX H: HIT Description in MTurk ................................ ................................ ................................ ................. 133 APPENDIX I: Constitution for Treatment (1) ................................ ................................ ................................ .......... 134 APPENDIX J: Price - Based Constitution ................................ ................................ ................................ ................... 135 BIBLIOGRAPHY ................................ ................................ ................................ ................................ ............ 142 vi LIST OF TABLES Table 2 - 1: Eight components of an Information Systems Design Theory (Gregor & Jones, 2007) .. ..8 Table 3 - 2 T able 3 - 2 15 Table 8 - 1: Distribution of Voting Rights using Linear and Concave Weighting Functions 45 Table 12 - 1: Three Levels of Design Artifacts and Methods 6 0 Table 12 - 2: Elements of the Improvement Direction: 65 Table 12 - 3: Pairwise Comparisons b etween the Effects of the Mediators 65 Table 12 - 4: Summary of Guidelines for each Constitutional Para meter 7 2 Table 14 - 1: Summary of the Outcomes for Treatment (1) . 86 Table 14 - 2: Summary of the Outcomes for Treatments a , b and ab . 88 Table 14 - 3: Levels of the Dependent and Independent Variables in the First 0 Table 14 - 4: Regression Results for the First Four Treatments 9 0 Table 14 - 5: Regression Results for the First Four Treatments Excluding Variable n 9 0 Table 14 - 6: Summary of the Outcomes for Treatments ac , ad and acd ..92 Table 14 - 7: Levels of the Dependent and Independent V ariables in the First Seven Treatments ...93 Table 14 - 8: Regression Results for the First Seven Treatments ..94 Table 14 - 9: Stepwise Regression Results for the First Sev ..94 Table 14 - 10: Summary of the Outcomes for Treatments ac'd , acde and ac'de ..96 Table 14 - ..97 Table 14 - 12: Descriptive Statistics and Correlations among Group - Level Variables .. ..98 Table 14 - 13: Regression Results for the Ten Treatmen Table 14 - 14: Stepwise Regression Results for the Ten T Table 14 - 15: Regression Resul ts for the Ten Treatments with all C ontrol Variables ... ... 10 0 Table 14 - 1 6 : Subject - Level Descriptive Statistics on the 185 Final Participants from All Treatments ... . 1 0 1 vii Table 14 - 17 : Path C oefficients for B onus as the D ependen t V . 1 0 4 Table 14 - 18 : Path C oefficients for C omprehension as the D ependent V ariab 1 0 4 Table 14 - 19 : Path C oefficients for E xpertise as the D epe ndent V . 1 0 4 Table 14 - 2 0 : Path C oefficients for B onus as the DV in the Reduced Model 1 0 6 Table 14 - 2 1 : Path C oefficients for C omprehension as the DV in the Reduced Model . 1 0 6 Table 14 - 2 2 : Path C oefficients for E xpertise as the DV in 1 0 6 viii LIST OF FIGURES Figure 2 - 1: ..8 Figure 2 - 0 Figure 3 - 1 : Part of the Versioning (Forking) History of the Linux Operating System (from debian.org ) ..1 1 Figure 4 - 19 Figure 4 - 22 Figure 4 - 3: Ran 24 Figure 6 - 1: Data Flow Diagram of the Generic Design Model for e - 33 Figure 8 - . . . 46 Figure 8 - 46 Figure 8 - 3: Taxonomy of Governance Structures based on the Design Model for Constitutions .47 Figure 10 - 1: Numerical Approximation of Function g(m) for m 1 Figure 11 - 54 Figure 13 - 1: Flow of Control among the Webpages in the Website and MTurk ................................ ........ 77 Figure 13 - 77 Figure 13 - 7 8 Figure 13 - 4: Flow of Versions in an 7 9 Figure 14 - 0 2 Figure 14 - 2: Standardized Results of Path Analysis from AMOS for 0 5 Figure 14 - 0 7 Figure 14 - 0 7 Figure 14 - 5: Path Diagram with the Uns 0 7 Figure A - 17 Figure B - 18 Figure C - 1: Constitution Page 19 ix Figure C - 20 Figure C - 21 Figu re C - 22 Figure D - 23 Figure E - 1: Control Panel for Experimenter to Instantiate E - 25 Figure G - 32 Figure H - 33 Figure J - 1: Prices of Parallel Versi ons Using a Price - 38 1 1. Introduction The open sourcing and crowdsourcing paradigms not only question the traditional hierarchical (Thuan, et al., 2017) . This conception of an organization is inclusive in the sense that it can include everyone who is able and willing to participate in the decision - making or design process. In many circumstances, groups are smarter than their smartest members, as humans have evolved to be smarter collectively (Surowiecki, 2004) . In this regard, the Internet provides a seamless technology to aggrega te millions of dissimilar independent ideas. Brabham (2008) differentiated between crowdsourcing and open sourcing in that open sourcing allows anyone to contribute and modify a product, and freely distribute it, whereas crowdsourcing usually includes a policy for compensating the contributors and allows the origi nator to profit from the outcomes. In the end, the originator calling for solutions to its problems owns the outcomes. Brabham stressed that crowdsourcing , while cost effective , can deliver faster and even better results than top experts in most case s . Sak amoto and Bao (2011) used crowd sourcing to produce creative text solutions for a social problem and found that the best ideas from crowd were as novel and useful as the best ones from experts. C ollective intelligence is a subclass of crowdsourcing that decentralizes some decisions to the crowd . Malone et al. (2009) defined collective intelligence groups of individuals doing things collectively that seem intelligent. web - enabled collective intelligence . C ollective intelligence occurs when an information technology helps a group to reach superior results compared to results obtained by individuals (Kornrumpf & Baumol, 2014) . B ased on design science , Kornrumpf and Baumol developed a conceptual model to design collective intelligence systems for a business challenge. They framed this design process as the inverse problem of predicting outcomes of collective intelligence. Many open sourcing methods let users m odify a solution (e.g., software code) locally and share their own versions to the public, thereby forming a tree of versions. Such divergence can result in contradictions and inefficiencies when there are large externalities and interdependencies s uch as a shared resource or a 2 common objective. Generally, a resource can be allocated one way or another and there can only be one resource allocation scheme at a time. For example, a website can have only one code at a time because a server cannot execut e multiple versions of a code simultaneously. Similarly, Wikipedia needs to choose fective at any time even though it can change over time. To have a blockchain rather than a block tree , we can only add one block of information as the next block at each point , so to have one version of the distributed ledger at any time. In such situations, participants need to reach joint decisions. One way to reach unity of decisions is unity of decision maker, as in centralized organizations. However, central authorities are susceptible to moral hazard and abuse of power. It would be be tter if multiple stakeholders could mak e joint decisions collectively, because a group of people who do not trust each other can become more trustable as a whole. This requires some rules to collect and aggregate individual choices into group decisions. We refer to those rules as a constitution . A constitution distributes power and decision rights among members and controls their power by constraining their set of choices. It should specify how the possible choices and ideas are generated and how one out of many is selected. Moreover, a constitution outlines how to allocate resources and incentivize members. In a firm or a private institution, such rules are usually found in corporate bylaws or operating agreements, but sometimes they are in management contr ol systems, partnership agreements , articles of incorporation, articles of organization or articles of association . A constitution can be regarded as a social contract. Generally, contracting parties rely on courts , and brick - and - mortar institutions for interpretation and enforcement of rules . Traditional enforcement methods are slow and costly and depend on human interpretation, subjectivity and jurisdiction, which is ambiguous for multinational or ganizations and in the Internet . This can lead to uncerta inties in implementation and inconsistencies among intended , actual , perceived , and anticipated enforcement, thereby increasing risk and transaction costs. M ost transaction costs are due to enforcement issues and in the Internet it is impossible to use tra ditional enforcemen t that relies on physical force (Szabo, 1997) . 3 Enforcement of a constitution is even more challenging because it usually involves multiple unspecified parties and is prone to free riding. However, if we could convert constitutional rules into computer code, computers could execute and enforce them . Computers are faster, cheaper and more trustworthy than humans are , thereby reducing transaction costs. Computers cannot misinterpret codes and do not care about jurisdiction. They execute instructions without caprice , whe reas humans ma y refuse to obey (Yu & Nickerson, 2011) . Moreover, we can test, evaluate and compare such digital rules through online experiments, which would be too costly otherwise . Fortunately, many important rules in business settings are precise enough to be convert ible into computer code. ERP and EDI systems already execute many rules in corporate governance structures, managerial control systems and supply chain contracts. G enerally , c omputerized or digital rules are constraints , outcome functions , and conditional executions . Constraints include business integrity constraints, automated controls, and limits on the access levels. Outcome Functions include payoff functions, state transition functions, social choice functions and weighting functions. Conditional Executions include scheduled and conditional transactions. Combining these three kinds of rules , one can produce various pr ocesses such as conditional access level authorization, digital escrows and different voting schemes. If a constitution only consists of such computerized rules , machines can execute and enforce it. I define a n e - constitution as the set of rules or bylaws in computer code that determine the incentive schemes , information access levels, decision rights and possible choices for collective design and decision - making. It can only include instructions for machines and not humans. For examp le, it cannot prohibit collusion among members or force anyone to do something. A n e - constitution may have several different expressions in human language and vice versa . T his project focus es on constitutions that can be implemented in computer code as e - constitutions , but it do es not make any claim about formal verification, even though the proposed meta - model facilitate s effective verification . T his research also presents a meta - model for constitutions defining a family of constitutions . A meta - model enables reuse and customization so that designers manipulate some parameters to generate instances that fulfil their needs (Kyriakou, et al., 2017) . It defines the dimensions for constitutions and enables systemat ic search through the domain space of possible constitution s . In fact, innovation is a search through 4 a multidimensional design space (Brooks, 2010) . Kyriakou et al. (2017) found that meta - models were reused more than other models especially when built by experts. In their case (Thingiverse), meta - models were used to design customized 3D objects, but they suggest ed generalizing t his concept to software design. The meta - model for constitutions facil itates their design and formal verification. Instead of coding each constitution, the designers can modify a generic e - constitution (meta - code) to implement different constitutions. This is especially important when there is possibility of errors and bugs in coding and formal verification is costly and time consuming. Once a generic e - constitution is formally verified, it can be customized and reused thousands of times . If a loophole is discovered, it can be corrected in the generic e - constitution and then reflected in every instantiation. It is like having a generic standard contrac t with default parameter values, so that f or each case, the parties only specify the devi ations from the default values and it is not necessary to analyse all the contract terms every time. This concept is applicable to a wide range of collective action situations. Crowdsourcing protocols and Collective Intelligence schemes are almost entirely composed of digital rules. A Distributed Autonomous Organization (DAO) is essentially a n e - constitution implemented on top of the application layer of a generalized blockchain like Ethereum (Buterin, 2013) or EOS (Grigg, 2017) . Blockchain protocols are special ki nd of e - constitution s that , like constitutional laws and social contracts , rely on inducing a S ubgame P erfect Nash E quilibrium (SPNE) for implementation . SPNE is a Nash equili brium with no incredible threat or promise . Despite numerous applications of e - constitution s, no model or method exists to systematically design them . As a result, man y crowdsourcing campaigns fail as did Wikinove l (Pullinger, 2007) . The first DAO failed in 2016 precisely due to lack of a systemic design and formal verification (Atzei, et al., 2017) . Many of the top financial institutions in the world have joined a consortium to develop and utilize blockchains in the financial sector (Boreham & Rutter, 2018) . T hese institutions are interested in designing effective smart co ntracts to improve efficiency and reduce transaction costs. However, there has been minimal theoretical development to systematically model and design blockchains and their applications. As Norta (2017) explained, vident that the lack of academic involvement is a reason for suboptimal 5 Szabo (1997) introduced the concept of smart contract . A smart contract is an autonomously executing piece of code whose inputs and outputs can include money and other digital rights, thereby eliminating the need for trusted intermediaries or reputation systems (Juels, et al., 2016) . In more recent usage , a smart contract is a program written in a Turing - complete scripting language such as Serpent or Solidity to be executed by a generalized blockchain like Ethereum (Wood, 2015) . A DAO is a multi - party smart contract that governs an organization in the Internet . Norta (2015) investigated the collaboration setup - lifecycle for DAOs and explained how DAOs can make it possible to have Gov ernance - as - a - Service (GaaS) in cloud. N orta (2017) described a conceptual setup lifecycle for the establishment of smart contracts and Distributed Governance Infrastructure (DGI), which relates to the concept of e - constitution . The next chapter reviews the literature and describes various design science models . Chapter three further explains the problem of collective action in the Internet . Chapter four presents three simple constitutions along with the key concept s . Chapter five explores the eff ects of rewards and incentives in constitutions. Chapter six presents a structured representation for e - constitutions, providing a unified architecture for governance structures. Chapter seven formally defines an e - constitution as a 1 4 - tuple, specifying a constitution as a limited set of design parameters and functions. Chapter eight expounds the weighting function in the constitution model and classifies constitutions based on that. Chapter nine is about design ing constitution s and setting constitutiona l parameters. Chapter ten provides a mathematical model for quality of constitutions and chapter eleven presents a structural model for predicting the performance of constitutions, paving the way for experimental evaluations. Chapter twelve proposes a research method to conduct experiments and to improve and design constitution s . It includes an evaluation strategy, factor selection guidelines and a procedure for experimental design. Chapter thirteen explains the implementation of a generic c onstitution (meta - artifact) as a web application demonstrated through some pilot experiments. Chapter fourteen presents the results of the experiments with group level and subject level analyses. Chapter fifteen interprets the results and explains their th eoretical implications and practical applications. It also describes the limitations of the study and suggests avenues for future research. Chapter sixteen sum marizes the contributions of this study . 6 2. Literature Review This project models a constitution as a Distributed Collective Design Process . Constitutions are more distributed than decentralized in the sense that Bonabeau (2009) differentiated between them. A constitution distributes the sources of power , but in decentralization, a central authority or principal delegate s decisions to agents (Melkonyan, 2013) . Hence , a constitution is closer to a multi - principal system than a multi - agent one . Here, a constitution govern s collaborative design of an object or solution. It is more about collaboration rather than collection based on how Malone et al. (2009) delineated them, but also it relates to what Leimeister (2010 ) described as collective Group Decision gene that binds everyone in the group to the same decision (Malone, et al., 2010) . Design can be regarded as a series of interrelated decisions. Modelling a constitution as a design process establishes quantifiable performance measures. I outline quality, cost and time as such measures, consistent with engineering studies (Gardiner & Stewart, 2000) and some crowdsourcing studies (Chilton, et al., 2013) . I define the performance of a constitution as the value or quality of the outcome or final edition of the solution. Essentially, the quality of the outcome reflects the quality of the process (constitution). Cost is t he expected total cost of reaching the outcome . Evaluating a constitution is a multi - criteria problem, but one can fix the speed (time limit) or cost (budg et) and make a trade - off between the other two. Unlike the impossibility theorems (Nisan, et al., 2007) , t hese performance measures do not depend on individual preferences and fairness criteria . This project usually use s the process (set of activities) of designing, c onsistent with Hevner et al. (2004) . They asserted that design science is a problem - essentially a search process to discove of a constitution or define the characteristics of acceptable solutions and the purpose or objective of a co nstitution in a specific situation. A solution can be a product, a policy, a financial portfolio, a trading strategy, a resource allocation scheme or a simple decision. 7 Following Hevner et al. (2004) Design Science Research (DSR) foundations by introducing the concept of e - constitution as a new construct, and by providing a structured represen tation for constitutions as a new mode l. A n e - constitution is a technology - based (digital implementation ), organization - based (structure) and people - based (consensus building) artifact (meta - artifact) , thereby a ddressing the relevance aspect of DSR (Hevner, et al., 2004) . The design evaluation method is experimental, such that one can evaluate the utility , quality and efficacy of a constitution via online experimentation . However, t his project only explores a small subset of the search space of possible e - constitutions through few experiments . As Prestopnik (2010) acknowledged, a comprehensive coverage of all the three aspects (theory, design and evaluation) of a design science research can be too large for one project . He recommended addressing different aspects of such research in different papers. A research study does not need to include every issue of value, but rather the design and development of constructs, models and methods that address important social or org anizational problems is a significant contribution by itself (Niederman & March, 2012) . T his project belongs to the exaptation and improvement quadrants in the DSR knowledge framework (Gregor & Hevner, 20 13) because it extends the smart contract paradigm to constitutions and governance structures, and also enables systemic design and enhancement of crowdsourcing protocols, blockchain protocols and DAOs as they are forms of e - constitution s . It has already been established that c rowdsourcing protocols are algorithms undertaking collective action (Yu & Nickerson, 2013) . Relating to Gregor (2006) , the main contribution of this project is to establish a theory for design and ac tion or theory type V . Gregor identified five theories relevant to IS: (1) theory of analysing, (2) theory of examining, (3) theory for predicting, (4) theory for explaining and predicting, and (5) theory for design and action. She discussed that the theory for d esign can be informed by all the other classes of theor y especially theory for explanation and prediction as figure 2 - 1 illustrates. 8 Gregor and Jones (2007) proposed a structure with eight components for an Information Systems Design Theory (ISDT ) : (1) purpose and scope, (2) constructs, (3) principles of form and function, (4) artifact mutability, (5) testable propositions, (6) justificatory knowledge, (7) pr inciples of implementation and (8) Table 2 - 1: Eight components of an Information Systems Design Theory (Gregor & Jones, 2007) Figure 2 - 1: Interrelationships among Theory Types (Gregor, 2006) 9 an expository instantiation. They described each component briefly in a table like table 2 - 1 here. This project addresses all the eight components in the proper chapters. Relating to Gregor and Jones (2007) , the primary design goal of this research is to develop a model (as an abstract artifact ) for constitutions (as product) and a method for designing them . Particularly, t he scope and purpose of this model is automatic governance of design process es using machine code . This addresses the first component of the ISDT . This design goal and purpose also address the second activity (objective) of the Design Science Research Methodology (DSRM) proposed by Peffers et al. (2007) . DSRM i s a process model for conducting design science research. It has six activities or steps: (1) problem identification and motivation, (2) definition of objectives for solution, (3) design and development, (4) demonstration, (5) evaluation and (6) communicat ion. The next chapter identifies t he problem of governing collective design in cyberspace. Chapters 4 to 11 develop and design a model for e - constitutions and chapter 12 develops and designs a method to improve and design e - constitutions. Chapters 13 and 1 4 and appendices B to E demonstrate how e - constitutions operate and how the proposed method can i mprove e - constitutions efficiently. Chapter 14 presents evaluation of the performance of e - constitutions and the proposed method to design them. Chapter 15 comm unicate s the findings and implications of this rese arch. Deng and Ji (2018) did a comprehensive literature review on design science papers and identified four aspects for Information Systems Design Science Research ( ISDSR ): (1) concept, (2) process, (3) outcome, and (4) evaluation. Then t hey categorized design science papers according to their main topic in an integrated roadmap as shown in figure 2 - 2 . According to their classification, this project falls under the p rocess aspect, because it models a constitution as a design process and proposes a process or method to design constitutions efficiently. I also propose an evaluation strategy to evaluate the model and method through online experiments. The outcome of this project is a model (abstract artifact ) and method for designing e - constitutions, leading to a nascent design theory for e - constitutions. 10 Figure 2 - 2: Integrated Roadmap for Information Design Science Research (Deng & Ji, 2018) 11 3 . Problem and Motivation 3.1. Collective Design T his chapter explains the problem of governing collective design . Many open sourc ing schemes let people use or modify the artifact /solution locally, result ing in a divergent tree of versions as in Figure 3 - 1 . Figure 3 - 1 : Part of the Versioning (Forking) History of the Linux Operating System (from debian.org) 12 This method works best w hen the interdependencies are low, and the costs of externalities are less than the costs of governance , so individuals can have independence and freedom . We might consider such cases as private actions. Essentially , w e first need to establish the border between collective and private actions, and then we need to form the collective decision - making rules (Buchanan & Tullock, 1961) . Table 3 - 1 maps different rules to different situations. Autonomy Multiple Outcomes Local Authority Extractive Governance Unity of Outcome Central Authority Inclusive Governance Unity of Outcome Distributed Authority Low Externalities Private Resources Liberty (Open Source) Totalitarianism (Error Type I) Tyranny of Majority (Error Type I) Large Externalities Shared Resources Anarchy (Error Type II) Autocracy (Error Type III) Democracy , Plutocracy, Table 3 - 1: Situations versus Collective Action Schemes O n one hand, when the externalities are low and cost of governance is higher than that, liberty and local decision - making are efficient , but governance ( t otalitarianism or tyranny of majority ) would c onstrain individual freedom and impose one decision up on everyone even though they are not interdependent . On the other hand, when externalities are larger than the costs of governance ( e.g. sharing a finite resource) , we want to reach common decision s ; otherwise there is contradiction or anarchy . Generally, the externalities can make decisi ons interdependent requiring a g roup d ecision gene that binds everyone in the group to the same decision (Malone, et al., 2010) . A small and homogeneous group of decision makers like a family business may reach unanimity and combine compatible features and possibilities , but in general, unanimity is rare ly achievable . I n crowdsourcing , individual choices may not converge to one group choice even after many iterations (Ba, et al., 2001a) . Disagreement among the writers in a globally writeable wiki that all ows changes coming from anyone result s in chaos and edit wars (Valentine, et al., 2017) . In order to reach a common group decision on each case , we need a set of rules or a constitution that aggregate s individual inputs into a group output . Some define a c onstitution as a collection of common goals, norms, social relations and the responsibilities and rights of the participants (Kline, et al., 2017) , but this dissertation focuses on the distribution of decision rights and power , determin ing the structure of 13 governance. The governance structure can be centralized ( autocratic ) or distribu ted ( democratic , . When externalities are large, a utocracy is categorized as error type III because while solving one problem, creates another one (moral hazard) due to the potential abuse of power. Moreover, centralized control can silence good ideas from the crowd (Valentine, et al., 2017) . Acemoglu and Robinson (2012) label ed the centralized governance structures as extractive and the distributed ones as inclusive. A c entral authorit y may delegate some d ecision s , so that the structure become s hierarchical (decentralized) , but it is still extractive because the source of power is centralized . It is worth noting that many hierarchical structures that appear inclusive are actually extractive. They depend on some kind of pyramid scheme. Pyramid marketing is a more obvious Ponzi scheme, in which the seniors exploit the newcomers who hope they can exploit the future newcomers when they become seniors. Malone and Smith (1988) asserted that c entralization provi des some economies of scale w ith r espect to the coordination costs . They analytically show ed how high coordination costs make hierarchical structures more desirable . However , concentration of power results in a Single Point of Failure for both ability and willingness , so that an intentional or unintentional error at the top propagates thru the organization. In many cases, this offsets the benefits of economies of sc ale that c oncentration provides (Chan, et al., 2016) . Moreover, Chan et al. (2016) show ed centralized evaluation systems scale poorly for crowdsourcing. Particularly, u sing hierarchy to evaluate thousands of submissions is both costly and time - consuming and diminishes the benefits from using crowd. On the other hand, when a central authority allocate s resources, hierarchy can cope with the free riding problem (Ba, et al., 2001a) . A central authority can play the role of a principal who, as Holmstrom (1982) described, can make group penalties credible an d prevent free riding . Some studies contrast anarchy to hierarchy (Fidler, 2008) or argue that a benevolent dictator is better than anarchy (Jain, 2010) . H ere I focus on a thir d option: distributed authority , where people can participate in making decisions without hierarchy or superiority. E dge and Remus (1984) conducted several experiments in simulated business settings, and found that the egalitarian groups demonstrate higher performance than hierarchical groups, and participants 14 in the egalitarian groups are more satisfied with their tasks. They found that groups with supervisors performed less resourceful ly than the groups without a supervisor. Meanwhile, a distributed system is reminiscent of the structure of neurons in the brain. After million s of years of evolution, the neurons have not developed any hierarchy among themselves and make decisions collectively without explicit superiority. Particularly, they always converge to one action towards the outside environment, despite possible hesitations or disagr eements inside themselves. 3.2. Enforcement v ersu s Implementation As mentioned before, a constitution or governance structure consists of a set of rules. There are different mechanisms to make people follow those rules. Here , I classify those mechanisms in to two categories: 1 - Enforcement imposes ex - post costs on violation through incentives and penalties . It relies on economic mechanisms or behavioural theories to deter violation or incentivize adherence. From accounting and auditing perspective, det ective control mechanisms are in this category. 2 - Implementation i mpos es ex - ante costs on violation or mak es it impossible (infinite cost) via laws of nature and physics . It relies on physical mechan isms or mathematical principles to prevent violation. From accounting and auditing perspective, preventative control mechanisms are in this category. Some rules can be implemented and/or enforced and some rules can only be implemented or enforced. Usually we use must or must not to express the r ules that are to be enforced and use can or cannot to express the implemented rules. Unauthorized people must not enter this area enacting a fine of $200 on violators . Alternatively, o Unauthorize d people cannot enter this area Social norms and criminal laws fall under the enforcement category. The incentives or penalties that enforce a rule can be extrinsic, intr insic, financial, reputational, etc. Friedman (2000) explained how reputational enforcement can replace institutional enforcement in cyberspace. Friedman asserts that it is 15 hard to enforce laws and contracts in the virtual w orld upon parties we might not know where they live. In such situations, reputation becomes important and can convince parties to comply with the contractual terms in order to preserve and improve their reputations. However, it can only work for positive r eputation because negative reputation can be abandoned with a new digital identity. Generally, in the cyberspace, there is no way to penalize other than by taking previously accumulated positive reputations or balances. Digital rules and information securi ty measures fall under implementation category because computers follow the laws of physics and mathematics (nature) and implementing rules in a machine applies those laws. D igital R ights M anagement S ystem (DRMS) is an example of digital rules . A DRMS is a computer program that restricts the usage and distribution of a digital product and implements the terms of transferring digital c ontent (Radin, 2000) . A DRMS can prevent the copying of a content, or erase it after an agreed upon time or associate it with a specific machine and make it unusable anywhere else. It may eventually replace the intellectual property laws to govern the distribution of rights. This system is more trustworthy tha n humans are because it is incapa ble of deviating from the terms (Radin, 2004) . Enforcement can depend either on a central authority ( an enforc ement agency ) or on inducing a SPNE (Subgame Perfect Nash Equilibrium) among the enforcers . Similarly, Implementation can depend wither on a central authority (implementer) or on inducing a SPNE among the implementers. Table 3 - 2 provides examples for the four possibilities. Central Authority SPNE Enforcement Corporate Byl aws Constitutional Law Implementation ERP rules , DAO Blockc hain , DAG Table 3 - 2 : Different Mechanism s to Carry out Rules Contracts and c orporate bylaws usually rely on courts and the judicial system as central authority for enforcement. ERP rules, c ollective i ntelligence systems and c rowdsourcing protocols normally rely on a server administrator or an IT provider as the central implementer . 16 U nlike usual contracts, a social contract cannot rely on pre - existing laws for enforcement, and thus should be self - enforcing ( i.e. induce SPNE) so that t he participants can opt out, but the continuation of punishment (or lack of benefits ) sho uld be enough to deter that (Kim, 2016) . Similarly, a constitutional law relies on its own provisions and institutions for enforcement. More precisely, a constitution should enforce itself by inducing a SPNE . Most of participants follow the rules because they expect there are enough people who follow and enforce the rules and have enough power to penalize those who violate a rule. However, Nash equilibrium, even under dominant strategies, is susceptible to collusion unless it is a s trong equilibrium (Nisan, et al., 2007) . While a collusion proof constitution with a (100%) strong equilibrium is almost unattainable , a relatively strong equilibrium can safeguard against plausible collusion s and result in a collusion resistant constitution . However, even that is challenging a t the beginning when no precedent or norm has formed for interpretations yet . A constitution al law should distribute resources strategically to incentivize enough part icipants to follow the rules. In this regard, budgeting and monetary policies play important role s . There is a reason that eve ry country in the world has some institution to control money supply . A currency give s power to the constitution. Conversely , a constitution gives value to its currency by maki ng it scarce and transferable . Usually a constitution uses regulations and laws to make its fiat currency artificially scarce and valuable . Even when precious metals (e.g., gold) back a fiat currency , the currency still depend s on the institutions and laws established by the constitution to hono r that promise and deliver the equivalent precious commodity if demanded . Accordingly, t he value of a currency reflects the effectiveness of its constitution in protecting the property ri ghts an d th e scarcity of the currency. Parallel to constitutional laws , b lockchain s and comparable technologies like D irected A cyclic G raphs ( DAG) bring implementers into an equilibrium (SPNE) . I n fact, blockchain protocol s are e - constitutions that compete for the governance in cyberspace. I nstead of geograph ical boundaries , devices and passwords for m the borders and jurisdiction s in cyberspace own legal institutions (Johnson & Post, 1996) . While cyberspace poses a threat to the local institutions and constitutions, it paves the way for global institutions. A few years ago, Musiani (2013) predicted that the 17 relationship between algorithms and rules address es the problem of governance in the I nternet. I envisage that it will address the problem of governance. This project focuses on the bottom left quadrant of table 3 - 2 and proposes a design model for the family of e - constitutions that rely on a central authority for implementation , such as crowdsourcing protocols , DAOs and ERP driven corporate bylaws . Meanwhile, while the blockchain protocols need to induce SPNE among the implementers (miners), DAOs do not need to induce SPNE because they are implemented on top of a blockchain like the Ethereum Virtual Machine (EVM) as if it is a central authority. This is analogous to laws and contracts that depend on constitutional institutions (central authority) f or enforcement. However, digital rules have only one interpretation and distributed computers can reach consensus on the outcomes and operate as one giant virtual machine. Whereas, rules in human language can yield multiple consensus unlikely. 18 4 . Constitutions for Collective Design This chapter present s three generic constitution s for governing collective design: iterative design , parallel design , and range voting . In all of the se constitutions, t he Initial Edition refers to the first design or edition of the solution at the start of the design process. The Updated Edition is the winner at the end of each selection period b efore further modification. T he modifications to the Initial Edition or an Updated Edition are referred to as Suggestions . Submitting a suggestion is the same as suggesting a modification. These constitutions are democratic and the selection i s based on voting with equal weights for all voters. 4 .1 . Iterative Design General P rocess: The process begins with an initial solution to the problem . Then solution evolve s through several editing rounds . Each round consists of a suggestion period followed by a selection period that results in an Updated Edition for the next round. This iterates until T Z time passes . Anybody can participate in suggestion, voting, or both. Suggestion Period: Each suggestion period is open until a participant submits a suggestion, then it ends and a selection period begins with two choices : accepting or rejecting the suggested modification. Selection P eriod: Each selection period lasts T V time , during which participants can vote for or against the suggestion, but the participant who submitted the suggestion cannot vote. Winning V ersion: After each selection period, if the suggestion has the majority votes (more than 50%) , it becomes the Updated Edition for the next period and the na me is announced. Other suggestions and all votes remain anonymous. Then the next suggestion period begins for further modifications if the design process has not ended. 19 Generally, a constitution has four states or stages : Registration, Suggestion, Voting and Concluded. Fi gure 4 - 1 illustrate s the flowchart o f the state transition function for periods and stages. With some modification s , it can be extended to other constitutional design processes . T his constitution is similar to Wikipedia except it gives the moderation power to an open crowd so that participants cannot be trusted (Moral Hazard) or tested/vetted for having expertise (Adverse Selection). In order to prevent double voting, we may need a physical registration and non - repudiation method . Any person can suggest a modification. O nce someone submits a suggestion , a selection period begins and people can vote for or against the suggestion . A selection period lasts for T V time, a t the end of which , if votes; the suggested edition is approved and announced as Updated . T his is a binary voting scheme and does not fall under the s c onditions or more precisely , Muller - Satterthwaite theorem conditions (Shoham & Leyton - Brown, 2010) . The state transits into concluded when a termination condition is met. Here, the condition is time T Z , but it could be based on budget, number of iterations, or lack of successful modification for a period . It could also be limited to one iteration, after which the selected choice is final without further modification. the object or solution evolves indefinitely as is utilized. V otes are anonymous and confidential and are cast with full privacy. Technically, distributed systems can implement such voting using group signatures (Szabo, 1997) . Accordingly, t he group encompasses all voters and digital signature authenticated memberships (i.e. eligibility to vote). Moreover, votes for the choices are hidden until the end of each selection period , because i nformation cascading can influence Figure 4 - 1: Flowchart of the State Transition Function for Periods and Stages in Iterative Design 20 and make it unreliable (Johnson, 2007) . R bias due to information cascades (Easley & Kleinberg, 2010) . Conversely, lack of interacti on , isolated learning and diversity improve the accuracy of collective prediction or decision (Hong, et al., 2012) . T he efficiency of collective decision - making depends on the number of individuals whose consent is required to approve a decision (Buchanan & Tullock, 1961) . Buchanan and Tullock explain that smaller number s make external costs large r, because a few individuals can impose costs on others. If this number becomes larger the external costs decrease, but the decision - making costs increase because it is harder to reach an agreement among more people. A dd ing the two costs, the social interdependence cost first decreases and then increases as the criterion becomes more inclusive and thus more restrictive. Therefore, the optimal number of individuals required to consent is in the middle , but it can be different for different categories of actions, and in many cases , the majority rule (50%) may not minimize the social interdependence costs (Buchanan & Tullock, 1961) . However, the majority rule can be justified if the criterion is the proportion of people who find the solution desirable or acceptable. The majority rule approves only changes that increase the acceptability of the solution assuming that participants have stable and transitive preferences. A more conservative condition (e. g. super majority), reduces the probability of accepting inferior suggestions (less error type I), but inevitably increases the probability of rejecting superior suggestions (more error type II). This leads to losing opportunities and keeping inferior edit ions. Nevertheless, when there exists an objective precise measure to evaluate and compare versions (e.g. ground truth) , we expect high agreement among the voters and thus conservatism should not hinder progress and may improve security. 21 4 .2 . Parallel Design When suggestions come frequently , h aving a selection period for each suggestion results in the need to reject every poor suggestion and thus a slow and tedious process. Moreover, exposure to one or few idea s may prime or induce similar and uniform ideas while exposure to large number of ideas can stimulate the generation of better ideas (Paulus, et al., 2013) . Rather than allowing only one suggestion per period we can allow multiple suggestions each round . That will likely speed up the process and can improve novelty because it results in the generation of more suggestions independently (Little, et al., 2010) . To this end, the winning criterion can be based on the plurality rule instead of majority and the clause for Suggestion Period in the constitution changes to the following : Each suggestion period ends after T P time if AT LEAST one suggestion is submitted. Then the selection period begins with a minimum of two choices including the submitted suggestion ( s ) Updated T P time , the program waits until one suggestion is submitted and then the selection period begins immedi ately. D uring each suggestion period , a participant can submit only one suggestion if s/he want s to . Limiting the number of solutions that an agent can submit is common in crowdsourcing contests (Archak & Sundararajan, 2009) . W ith a large group (crowd ), the parallel suggestions process may result in t oo many suggestions per period, which may result in cognitive overload and ignoring choices in the selection period (Paulus, et al., 2013) . To control that, we can shorten the length of suggestion period ( T P ) and also limit the suggestions to a maximum of M . Counting the Updated Edition , there can be M+1 possible versions at most. Therefore, the first part of the above clause changes to : Each suggestion period ends after M suggestions are submitted OR after T P time if AT LEAST one suggestion is submitted. Then, the selection period begins with a minimum of two and a maximum of (M+1) Updated 22 Figure 4 - 2 shows a process with parallel suggestions. Appendix I presents an example of a constitution with parallel suggestions. The number of suggestions per period is analogous to what Little et al. (2010) described as the number of parallel ideas . They classif ied crowdsourcing schemes i nto parallel an d iterative processes and tested them via online experiments on Amazon Mechanical Turk (MTurk) . They show ed that iterative process es work better on average but parallel process es can result in higher best quality ideas due to their larger variance and despite their lower average. Here , a constitution conceals parallel suggestions in each suggestion period. I n iterative process es , showing previous works of others negatively affects creativity and lowers diversity (Little, et al., 2010) . Generally, s imultaneous exploration of multiple options results in more innovation than exploring one alternative at a time (Malone, et al., 2017) . A hybrid system of both parallel and iterative processes could be more effective ; thus , the average quality (iterative responses) and variance (parallel responses) should be in balance (Little, et al., 2010) . In other words, t here should be a balance between creative freedom and structure (Chilton, et al., 2016) . T he constitution al model can cover a wide range of iterative and parallel processes , and many hybrids in between via adjusting parameters like M and T P . When M=1 or T P is very small, the constitution becomes an iterative process. W hen M and T P are large, it becomes a parallel process or even a Greenfield process , in which participants generate ideas from scratch (Yu & Nickerson, 2013) . A Greenfield process can result in many creative an d diverse ideas, but highly depends on expertise (Ren, et al., 2014) . The selection process can be modelled as a social choice function that determines the winning choice and repeats every round . With more than two choices , it falls under the Muller - Satterthwaite conditions Figure 4 - 2: Design Process with Multiple Parallel Suggestions per Period 23 (Shoham & Leyton - Brown, 2010) . Precisely, the plurality rule violates the I ndependence of I rrelevant A lternatives . Consequently, two superior similar versions may share votes (steal from each other) so that an inferior version win s over them. A pproval voting copes with this problem by letting participants select multiple choices at each period. It does not fall under the Muller - Satterthwaite conditions, because each participant classifies all choices into two sets of Approved and Disapproved versions . 4.3 . Range Voting In range voting , selectors rate each choice using a range of scores . The y may range from one to seven , as in a Likert scale. The range could be only zero and one as in a pproval voting. Generally, r ange voting is a cardinal valuation and not preference order ing, thus it does not fall under t he impossibility theorems (Shoham & Leyton - Brown, 2010) . The a ggregation of the scores can be based on averag ing , sum mation , median or on other formula tion like mid - mean (Heer & Bostock, 2010) . A ggregate measures such as peer averages enhance efficien cy by informing about common uncertainties (Holmstrom, 1982) . For extensive score ranges, m edian is better than average , as it filter s out outliers , and the m edian of individual guesstimates quickly converges to actual values when the number of people grows (Lorge, et al., 1958) . Nonetheless , rating takes longer and is harder to do than voting; therefore, usually people prefer voting and do more voting than rating (Bao, et al., 2011) . Moreover, Bao, et al. found that the rating results do not have as much resolution at the extremes. Hence , when only the best solutions matter, voting is more efficient and more effective than rating. The score range can be open and infinite , in which case the score for the initial edition sets the standard and reference point for other versions . E ach period takes the score of its updated edition from the previous period and participants rate the other versions (suggestions) relative to that updated e dition . Figure 4 - 3 depicts an example of range voting assuming the range is open valuation starting from a value of 100 for the initial design . The score s of other version s are set to the median of the ir individual scores , so that at the end of each rating period, the version with the highest median score wins and becomes the updated edition for the next period. Each horizontal dotted line represents the updated edition in a period. Each solid line 24 represents a suggestion and its score improvement from the updated edition. The colo red solid lines are the winning suggestions. The small red circles above and below each winning version are indi vidual scores for that version. The median of those scores is the score of the winning version. This example is for five periods and three suggestions per period. Figure 4 - 3: Range Voting with an Open Range of Scores Starting from 100 25 5. Incentives A constitution should induce desirable activities and deter undesirable ones. The desirable activities can include more and better suggestions, more and better selection inputs, delegation to better selectors, payment of fees and investment. The undesirabl e activities can include low quality suggestions (spam), haphazard selection inputs (noise) and malicious selection inputs (manipulate). The se undesirable actions can be deterred through ex - ante costs ( e.g., fees and constraints ) and ex - post costs ( e.g., p enalties ). willingness and ability to participate in that activity. Suggestion demand s the ability to create valuable ideas. Selection require s the ability to evaluate quality. Paying fees and investing requires financial ability. A constitution can increase willingness via motivation participants, and e nough motivation can also induce people to acquire higher skills (Cerasoli, et al., 2014) . Generally, there are intrinsic and extrinsic motivations. Extrinsic motivation includes money, recognition, and stakes in the outcome, w hich results in money for financial stakes. Malone et al. (2017) used a points system as a recognition method to motivate people. They explained that an incentive system should motivate people towards more valuable activitie s, and not motivate people to game (manipulate) the system or waste time. In addition, the incentive system should be fair and easy to understand. If recognition is used to motivate individuals, participants will need to reveal their identities (or pseud o - identities) and thus it cannot motivate anonymous participation. Consequently, most of the anonymous contributions in Wikipedia are due to intrinsic motivation. Intrinsic motivation can include having fun, improving skills, love of community, etc. (Ren, et al., 2017) . For example, in citizen science, gamification of scientific discoveries motivates contribution s to science (Prestopnik & Crowston, 2012) . Citizen science crowdsources scientific contributions to inexpert enthusiasts in public. Mole Game , Eyewire , and Foldit are other examples of gamification (Kornberger, 2016) . However, intrinsic motivations mostly depend on pr oblem s and setting s that constitution s cannot control. Hence, constitutional design is mainly based on extrinsic motivations. 26 Wightman (2010) classified crowdsourcing websites into four classes based on two dimensions: motiv ation, which can be direct or indirect, and competition, which can be competitive or non - competitive. He referred to the process and the set of rules as heuristic, which is actually a special kind of constitution. The constitutional incentives also have tw o dimensions , but different: reward for winning suggestions and reward for accurate selections. I first discuss the reward for winning suggestion s . Many crowdsourcing websites such as Innocentive , Taskcn , TopCoder and Threadless provide monetary incentives to motivate good contributions . Adding the following clause to the constitution can implement a simple reward: After each selection period, if the winning choice is a suggestion (not U pdated Edition), its proposer receives a reward of amount R P . Horton and Chilton (2010) presented a model as a basis of a price theory for crowdsourcing. They conducted experiments on MTurk and found that workers behave rationally and work less for less payment, however, they are insen sitive to an increase in task difficulty. Mason and Watts (2009) also conducted online experiments on MTurk and found that a higher payment increased the quantity of work but not the quality or accuracy. Actually, Heer and B ostock (2010) found that higher rewards slightly decreased accuracy while increasing the rate of task completion. In short, higher rewards result in more and faster but not necessarily better work. Mason and Watts attributed (Ariely, et al., 2003) , that is, the suggested that intrinsic motivation is a better driver to improve qual ity of work, but when such motivation is not viable, it is best to offer as little reward as possible to a large crowd who can provide enough quantity. Moreover, they found that paying a low amount to the workers made them perceive their job less valuable (thus higher performance) than not paying them at all. Contrariwise, other studies (Heyman & Ariely, 2004) found that paying nothing results in higher performance than paying a low wage. Mason and Watts (2009) MTurk, low payment is better than no payment, but when there is not such expectation, intrinsic motivations d ominate. A c expectations also depend on whether a non - profit or a for - profit organization 27 sponsored the project (Hoffman, 2009) . M onetary incentive s can have no effect or even adverse effect on performance, when they dim (Gneezy & Rustichini, 2000) . Liu et al. (2014) conducted randomized experiments on Taskcn and found that higher rewards result in higher quality submissions, more participation and higher quality users. Taskcn is a crowdsourcing website based on all - pay auctions , which have only one winner whereas many users may expend efforts and submit solut ions. The users also gain reputation and credit for submitting good solutions and winning rewards. In addition, Wu et al. (2015) conducted several experiments in MTurk and found that with higher payments, the workers (i.e. Turkers ) generate higher quality designs. They found that even an untrained and unskilled crowd could generate high quality designs and assess designs effectively. With rewards, each suggestion period is like an all - pay auction or a tournament (Lazear & Rosen, 1981) with one prize and perhaps status - seeking subjects. Competitions and tournaments use information efficiently and improve risk sharing by using relative performance evaluation (Holmstrom, 1982) . Moreover, experiments showed that in tournaments , (Hossain, et al., 2014; Delfgauw, et al., 2013) . This performance increase is larger for more able participants who are more likely to win (Freshtman & Gneezy, 2011; De Paola, et al., 2012; Bandiera, et al., 2013) . In a constitution, only the highest perfor mances matter , not the average. Dechanaux et al. (2015) reviewed numerous experimental studies on Tullock contests, all - pay auct ions and rank - order tournaments; and presented a general unified contest model . They described that the performance of contestants depends on the effort , ability and luck. C ontests differ from reverse auctions in that contests declare the winner(s) after the delivery of goods or services (Archak & Sundararajan, 2 009) . Archak and Sundararajan developed a game theoretic model to analys e the properties of crowdsourcing contests when the number of participants is large . They show ed that when agents are sufficiently risk - averse, offering multiple prizes is more efficient than one grand prize, even if only one best solution is desired, whereas for the risk - neutral agents, it is optimal to reward only the best submission. Orrison et al. (2004) have two findings that I use in th e analytical and design model of constitutions. They 28 found that in tournaments, one large prize is more effective than many small prizes, and the number of players does not affect the average effort level if the distribution of noise (i.e. unknown ability) is uniform. Having stakes in the outcome may incentivize better participation including more accurate selections. People are more likely to v ote thoughtfully and truthfully when they share a stake in the outcome. However, when the selectors do not have a stake in the outcome, we need a criterion for accuracy to measure the accuracy of selection inputs. The only endogenous criterion is the selection outcome itself. If there were any better crit erion, we would use that instead of human selection inputs. Therefore, the selectors define the quality or correc tness of versions ( i.e. , ontological relativism ) . The accuracy of individual evaluations is measured based on their alignment with the aggregated selection outcome. Shaw et al. (2011) experimented online crowdsourcing in MTurk and tested different incentive schemes to mo tivate workers to give an accurate qualitative assessment of a content. They achieved the highest performances through financial incentives that tie d with the majority responses. Their explanation is tha and higher cognitive demand, leading to more engagement with the question. A n alternative explanation is that the workers perceived the open crowd more trustworthy and less corruptible than a central authority. If selection is based on plurality voting, the criterion for the quality of choices can be the number of votes the choice received and the best voters are those who aligned with the majority. I assume that the voters cannot or do not want to coordinate among themselves and collude on voting for an inferior version. Therefore, the choice they think will win is the one they think should win. As a result, in (Nash) equilibrium everyone votes for the version they predict others will v ote for, thus it can be labelled as Prediction Voting . Sakamoto and Bao (2011) compared prediction voting, Likert scale rating and other evaluation methods. They observed more participation in prediction voting. Moreover, pr ediction voting is more efficient because the evaluators focus on the best solutions instead of all of them, and in constit utions only the best solutions matter. To reward prediction voting, one can add the following clause to a constitution: After each se lection period, those who voted for the winning choice receive a reward of R V . 29 Having stakes in the outcome can also incentivize better suggestions. A hybrid method is to give shares as reward for the winning suggestions, so their proposers share the value they add. This dilutes the total shares but increases the total value. Particularly the reward shares can be proportional to the contributions that the proposers made to the solution. To this end, we need an evaluation m ethod that indicates how much improvement the winning suggestion made. This approach can work best with rating scores. The constitution issues extra shares for the proposers of successful modifications , proportional to the increase in the score. This provi des more incentives for superior suggestions because the winners will own shares of the solution proportional to their contributions. Meanwhile higher rating scores result in more new shares and more dilution of existing shares. Therefore, shareholders hav e a systematic bias to underrate suggestions. To control for this conflict of interests, we may exclude the share holders from the rating process making it the opposite of plutocracy . Like voting, the criterion for the quality of individual ratings can be t he group results. Shaw et al. (2011) examined several incentive schemes and found that rewarding inexpert raters for giving scores close to the majority scores results in the highest collective rating performances. When selection is based on median of the rating scores, the best raters are the ones wh ose scores are closest to the median scores. The following clause reward s the score that are closest to the median : After each selection period, for each choice, the rater who gave a rating score closest to the median of the scores receives R V reward. If t here is a tie for a choice, the reward is divided equally among the raters who were closest to the median for that choice. In a tie, the reward is divided amongst the raters to deter collusion. Theoretically, a participant can rate multiple versions and receive multiple rewards in a period, but this makes it prone to haphazard ratings. To deter such rating, one can make rating each choice time - consuming or costly as described in chapter 11 . With monetary rewards, the selection process turns into an algori thmic mechanism . A mechanism is an implementation of a social choice function with asset transfers (payments and rewards) amongst the members (Jackson, 2003) . Simply put, mechanism theory is concerned with manipulating rules of the game (i.e. selection process) and payments to agents to direct their decisions so as to realize specific desired 30 outcomes. In an incentive - compatible mechanism , the agents do not benefit from gaming the system, so truthful selection is the dominant st rategy for every agent (Nisan, et al., 2007) . Horton and Chilton (2010) asserted that game theory and mechanism design are not very useful in crowdsourcing, because crowdsourcing is not about workers revealing private information but rather exerting effort and performing tasks. They explained that w hen output is observable and highly correlates with effort, there is not a moral hazard problem. However , even though output is observa ble, its evaluation can be costly and evaluation by crowd has complexities that only mechanism theory can address. Meanwhile, while humans are not perfectly rational, automated computerized agents can closely resemble a homo - economicus (Norta, 2017) . 31 6 . Structured Representation This chapter develops a high - level design model for e - constitution s, defining their domain space. It provides the blueprint and principles of form and function for constitutions and the processes for implementing them, addressing the third and seventh components of ISDT (Gregor & Jones, 2007) . Pederson et al. (2013) did a review on crowdsourcing literature and provided a conceptual model with six components for crowdsourcing: problem, process, people, technology, outcome and governance. They asserted that govern ance is the key success factor in crowdsourcing, but minimal research has been done on it. Tuan et al. (2017) highlighted the high demand for, yet the lack of, a holistic model for crowdsourcing processes. They suggested Business Process Crowdsourcing as one such model with three stages, but it is more of a plan rather than a governance structure. Wu et al. (2015) proposed another plan as a methodology for crowdsourced design. It has four st ages: specification, validation, execution and evaluation. Similarly, Ren (2011) decomposed a web - based crowdsourcing project into four stages: identify the crowd, request ideas from the crowd, evaluate the ideas and retain the crowd. Later , Ren et al. (2017) used this model to compare two cases and concluded that organizers should actively motivate crowd based on a top - down model rather than hoping the crowd commit to the campaign as in bottom - up models. As such, the model presented in this research for constitutions is a top - down holistic one with a limited number of parameters and functions that one may call the genome of e - constitution s. Malone et al. (2009; 2010) classified the building blocks (genes) of collective intelligence into four genes t ask (What? create or decide), staffing (Who? crowd or hierarchy), incentives (Why? [extrinsic or intrinsic]) and structure (How? collection or collaboration). The create and decide tasks are akin to what Leimeister (2010) described as generating new solutions and evaluating them. Yu and Nickerson (2011) also classified crowd activiti es into creation and decision. They followed the principles fact, in which a solution evolves through random variations (mutations) and combinations of existing so lutions. Their 32 experiment showed that a system with combination induces more creative (practical and original) ideas than a system without it (control). They recommended this approach for macro institutional innovation. Human Based Genetic Algorithm (HBGA ) is a system that outsources the innovation (create) and selection (decide) operations of genetic algorithm ( GA ) to human agents, while computers perform the organizational functions and control the flow of process (Kosorukoff, 2000) . Some studies refer to HBGA as interactive genetic algorithm (Bao, et al., 2011) . Kosorukoff (2001) stated that HBGA is a multi - agent system that combines intellectual power of human a gents with the coordination power of computers. He described that some agents are convergent thinkers who tend to participate in the selection process while others are divergent thinkers who are more creative and propose solutions. HBGA takes advantage of both. It is robust because it does not depend on individual agents performing particular functions. The contributions of participants can be regarded as deliberate and directed modifications instead of random mutations, so HBGA is closer to Lamarckian evol ution than Darwinian evolution . Yu and Nickerson (2013) - class of unexplored organizational structures possible. Kosorukoff and Goldberg (2002) asserted that HBGA is a kind of organization that is more reliable and effective than conventional organization al forms. ze workers like a system with humans as its parts. They explained that evolutionary human computation accommodates and utilizes human creativity in a constructive manner. The constitution model developed in this research expands HBGA and human computation systems to cover a wide range of protocols and governance structures. The Suggestion Interface addresses the idea generation or creation task and the Selection Process includes the evaluation or decision task . This model also incorporates other forms of participation and human inputs such as delegation (proxy voting), betting, investing and spending. Moreover, the model includes other structural components such as Filtration, Sorting and Weighting as Figure 6 - 1 illustrates. Green arrows de note transfer of asset. They can have an incentive or deterrence effect on the participant s and change the ir 33 the system states (treasury balance). Bold black arrows indicate the flow of versions (objects) and thin b lack arrows carry nu merical data. Yellow boxes are stock variables. Red ovals are functions and conditions. Blue clouds represent external information entering the process. Blue arrows denote the flow of decisions and information to and from the process. Parallelograms represent the interfaces for participants. An interface mediates the flow of information and structures the interacti ons between internal and external entities (Kornberger, 2016) . The suggestion interface is essential to collect ideas and alternative solutions. Leimeister et al. (2009) emphasized the importance of managing the idea generation process. The suggestion interface controls how people propose different versions for the solution and determines the sources of innovation, creation and generation of ideas and the boundaries of th e organization . It can include one person, a group of authorized experts , employees or a crowd in the Internet. This boundary is a trade - off between diversity and in - depth expertise (Leimeister, 2010) , but open involvement regardless of expertise can bring about more novel ideas (Davis, 2015) . Moreover, electronic participation can result in more creativity and idea generation compared to direct meeting and brainstorming (Leimeister, et al., 2009; Figure 6 - 1: Data Flow Diagram of the Generic Design Model for e - Constitutions 34 Paulus, et al., 2013) . Some studies have shown that di rect interaction could undermine creativity and dissuade contribution of novel ideas (Mullen, et al. , 1991; Lorge, et al., 1958; Gallupe, et al., 1992) . Confidential and independent contributions work better than open discussion forums, which deter uncommon suggestions (Kahneman, 2011) . In fact, it is better to have independent participation (Yu & Nickerson, 2011) and lack of communication among participants (Bao, et al., 2011) . However, the haring specific information (e.g. number of submissions), but mainly it mediates collaboration and coordination by showing previous designs as in HBGA crowdsourcing design processes (Yu & Nickerson, 2011) . The suggestion interf ace can also impose formatting for submissions. The suggestion interface provides the participants with a set of information from internal and external sources and enables them to submit their suggestions. One piece of external information is the specifica tion of the problem to be solved. It determines the nature of the solutions and the objective of the process and relating to Ren, et al. (2014) , it is the first force that affects the type and quality of the generated ideas. Another piece of external information is the initial edition(s) in the first period. Yu and Nickerson (2013) program. It can be large or small or just one edition (e.g. status quo). It can be blank so that crowd generate the first set of versions from scratch as in the greenfield idea generati on system as Yu and Nickerson define. The suggestion interface incorporates internally generated information as well. It can present the best version(s) from the previous round to participants to build upon them. A more complicated interface may algorithmi cally combine the best versions for participants or enable the participants to combine them. Kosorukoff (2001) focused on solving problems by combining existing solutions. Yu and Sakamoto (2011) described a sequential combination process in which one crowd generated initial designs and another combined them. They found that combination improved both originality and practicality features in generations (i.e. rounds). In addition, Yu (2011) and Nickerson et al. (2011) used combination to aggregate ideas of multiple participants and found similar results regarding originality and practicality of designs through generations. Likewise, Yu and Nickerson (2013) found that a sequential combination system results 35 in significantly more creative designs than a greenfield idea generation system after three generations. On the other hand, Ren et al. (2014) showed that modification results in better outcomes than both combination and greenfield systems in all dimensions (divergence, relevance and effectiveness). In a constitution, the suggestion interface is essentially a modificatio n platform, which can also support combination. A constitution needs a selection process to decide and choose one alternative from a set of alternatives and eliminate other ones . Wu et al. (2015) stated that the most important part of every crowdsourcing system is effective selection of design choices. The selection process specifies the sources of selection and distributes power and decision rights among the participants . The sources may consist of one person (autocracy), a group (oligarchy), crowd (democracy, meritocracy, plutocracy) or a computer program (cryptocracy? !). A computer program can perform selection if the quality of solutions is computationally assessable . To incorporate human judge ment, the selection process has an Evaluation Interface , which provides participants with relevant information and collects their assessments in proper format. A form of selection inputs is voting, which is a unity vector with 1 for the preferred choice and 0 for other choices. Approval voting is a vector of 1 0 The weighting function can be a constant and weight all evaluations equally. Alternatively, it can depend on some meas spending (betting amounts), external information (random number, attributes) or a combination of the se factors . The weighting function can be nonlinear and multivariate as will be discussed in chapter eight . Each period, each participant can submit one selection input with one weight and cannot distribute the weight across multiple selection inputs . V oters have strong incentives to use all their voting power (weight) onl y on their most preferred option (Scott & Antonsson, 1999) . The selection process includes an Aggregation Function that combines individual selection inputs and gives a vector of Aggregated Scores for all versions in a period. Depending on the format of the selection inputs, the aggregated scores can be the number of votes, the median of the ratings, market prices, etc. Based on the aggregated scores, the Selection Criterion determines the winning choice(s) at the end of each round. It can be maximum (e.g. number of votes), minimum (e.g. evaluated costs) or meeting a condition 36 (e.g. more than 30% votes). The selection criterion also includes the condition to finalize the selection period and start t he next round. Th is criteri on can be based on time or selection inputs , so that the selection period ends when enough votes are cast, and the selection is statistically conclusive (Ertekin, et al., 2013) . Filtration and sorting can facilitate better evaluation and improve the accuracy of selection. They can be functions of costs, previous performance and time. The suggestion s stock variable is an object vector that accumulates valid suggestions. The Release Criterion incorporates the condition to end the suggestion period and release the accumulated suggestions for sorting and evaluation. It can be based on time or the number of suggestions or both. A constitution needs a source to supply the money for the rewards and expenses. I f an external authority supplies the money and controls the source, the constitution is incomplete because it does not include the source of money supply. If a constitution is financially self - sufficient and (weakly) balances the budget, I call it autonomo us because it does not depend on an external authority. A constitution may issue money (currency) through the payment function as blockchains do. Other possible financial sources include membership fees (taxation), participation fees , betting , advertisement auction and investment by participants or crowdfunding when participants can buy shares. The shareholders may share the ownership of the solution. This not only supplies money but also can provide incentives for better participation. To this end, the solution should have some value. Hong and Page (2001) interpret the value of a solution as the equilibrium price of the outcome in a market. Generally, stakes in the outcome and the reward function can incentivize v aluable participation, whereas the costs and fees can deter haphazard participation. 37 7 . Formalization Generally, a constitution is an automaton with a function that determines the permissible decisions for each person based on the state of the person and the system. For example, a person can only spend less than or equal to his balance or can submit a rati ng score if has paid the selection fee and has more than 10 unit shares. I formally define a constitution as a 1 4 - tuple ( p v , I p , I v , G p , G v o ) that includes 7 sets, two con ditions (binary functions) and 5 vector functions . They cover all elements and units in constitutions, and their variations generate different instantiations of constitutions (i.e. mutability), addressing the second and fourth components of ISDT (Gregor & Jones, 2007) . The components of a constitution are defined as follows: N : The set of possible participants or agents . I define n v (t) as the number of agents who did participate in evaluation in period t , and n p (t) is the number of agents who proposed a suggestion in perio d t , and n(t) is the total number of agents who participated in some way in period t . p : The set of information provided in the suggestion interface. It includes external information such as the specification of the problem and the initial editions (seeds) which can be blank. It also includes the specification of the internal information - such as winning version - to be presented to the proposing agents. v : The set of information provided in the evaluation interface. It can include the submitted suggestions, the updated edition(s), the weights and sometimes the aggregated scores. In some cases (e.g., reputation) , it might include the identities of the proposers of the suggestions. I p : The set of possible individual suggestion inputs. It is the domain of the versions of the solution. It can impose formatting and restrictions on the suggestions. 38 I v : The set of possible individual selection inputs. It determines the format for collecting individual evaluations. It can be votes, approva l votes, scores, price biddings, etc. G p : A condition for releasing submitted suggestions and end ing a suggestion period. It can include a maximum number for suggestions ( M ) and/or a time limit ( T P ) in a logical statement. G v : A function that determines the outcome of the selection process based on the scores and time . It incorporates the condition for ending the selection period and releasing the winning version(s) and yields null when the condition is not met. The condition can be a simple ti me limit ( T v ) or depend on the latest aggregated scores like number of votes. : A condition for filtering out suggestions. It can depend on the characteristics of the suggestion or its proposer. It can impose a submission fee for suggestions by excluding members who did not pay the fee. It can enforce banning or suspension of members based on their past performances. : A vector function sorting the suggestions. It can depend on submission time (chronological), the a property of the suggestions and the amount paid by the proposers for advertisement, which means auction for ranking places. : The set of states for the system and participants. The states of the system include period number, updated edition. The states of the participants include balance, shares, merit score, net proxy votes and their allowable activities o : The initial values for the states for the system and participa nts. 39 : The state transition function for the states of the system and participants. It is a vector function of their current states, the actions of the participants ( suggestion, evaluation, delegation and investment ) in each round and the outcomes of ea ch round . It can depend on whose suggestion and selection won. This when meeting that condition, which can depend on time, number of periods, idleness, lack of progress, etc. Since the balances of the participants are states, the state transition function can reward the proposer s of the winning suggestion s or the voters for the winning suggestions or the raters whose scores where closest to the median. If possible, it can also assign negative rewards to penalize members or impose taxes. W : The output of this function is one n - dimensional vector of weights for n participants each period. It can be a constant (democracy) or depend on some n - dimensional vectors about participants such as their attributes (oligarchy), proxy votes (liquid democracy), spending amounts (proof of work), past performances (meritocracy) and shares (plutocracy). The next chapter provides more detail. F : A vector function that aggregate s scores based on individual selection inputs and weights. This function determines the estimated quality of the submitted suggestions. It can be the sum of the number of votes for each choice or the median of the rating scores for each choice, etc. 40 8 . Weighting The weighting function distributes power among the members by determining how much influence each person has on selection outcome. Direct democracy weights all votes equally and applies no information other than membership (i.e. citizenship). However, not all votes are created equally. Participants may have different levels of incentives and skills vis - à - vis the problem and therefore their evaluations may have different values. Hence, unequal weighing schemes may result in better outcomes. The weighting sch eme can be ones and zeros (binary) based on a criterion, which can depend on belonging to a specific class or group like oligarchy or elected members. Gen erally, there can be multiple classes with different voting weights. In autocracy, the dictator is the only member of a class with non - zero voting weight. A large negative weight can give a member veto power to reject any choice. In proxy voting or liquid democracy , participants can delegate their voting rights to others (target voters), thereby transferring their voting weights. When participants do not have the expertise or time to evaluate choices, they may decide to delegate their votes to those whose j udgement they trust. Proxy voting is mainly based on reputation and trust , and thus the target voters cannot be anonymous . They do not need to reveal their real identities but should have persistent identities . However, in many situations, to reduce the po ssibility of collusion or vote selling , target voters should not know who delegated their votes to them. Proxy voting can also make binary weights such that only the members with more than a specific number of proxy votes have a positive voting weight. In representative democracy , only a limited number of target voters have non - zero weights and the weights can change only at specific times. A binary weighting function can distribute ones and zeros based on a random number, in order to select a random sample of voters each period. To enforce a selection fee, binary weights can depend on explained before , this deters haphazard selection inputs. Betting c ontests weight votes proportional to the 41 amount participants spend (bet) so that whoever spends more receives larger voting power. The betting amounts can reflect the participants degree of confiden ce about their choices. Proof of work is a binary weighti ng function that randomly selects one participant based on the betting amounts and a random number each period, so that the participants who spend more are more likely to have the non - zero weight in a period. Technically, this function partitions a line be tween zero and one into n intervals (probabilities), which n is the number of participants. The lengths of the intervals are proportional ) . A uniform random number fall s into one interval each period and determines whose selection matters. The Bitcoin blockchain distributes the intervals according to a convex function (e.g. ) rather than a linear one. That is because of the economies of scale in mining. Meritocratic weighting schemes dep (2010) decide how much to trust them and weight their votes (Ertekin, et al., 2013) . However, this requires a criterion for expertise. Hill and Ready - Campbell (2011) suggested using past performances to detect and Another dimension of merit, beside expertise, is credibility or trustworthiness. Davis and Lin (2011) used was deduced from their past inputs. They acknowledged the need to develop better aggregation techniques to hand le disagreements. The challenge is to find a criterion that determines expertise or credibility based on endogenous information, not on an external authority to label participants. A constitution can accumulate information on a state vector , merit scores . The performance measurement function, which is part of the state transition function, can adjust the merit scores periodically based on the outcomes of the selection process. The performance measures can relate to the selecti on activities and their alignment with the majority as in Ertekin et al. (2013) . Another approach is to use the success rate in winning suggestions. The suggestions can reflect their understanding of the problem and thus the value of their evaluations. One such measure is the number of 42 votes each suggestion receives, based on which the proposers can obtain extra votes (merit scores) to cas t in upcoming periods. The following clause reflects this approach : At the end of each selection period, the proposers of all suggestions (not the " Up dated Edition") receive extra votes equal to the number of votes given to their suggestions in that perio d. An alternative approach is to give extra votes only to the winner, but the above one is more robust when considering close contests. One can make it even more robust by using relative votes instead of absolute votes. It can be relative to the updated e dition or to the least - voted choice as in the following version: At the end of each selection period, the proposers of all suggestions (not the " Up dated Edition") receive extra votes equal to the number of votes given to their suggestions in that period m inus the least number of votes given to any choice in that period. Hence, the proposer of the least - voted choice shall not receive any extra votes. To keep the power diversified and prevent dominance of a few participants, the weighting function can limit the extra voting power in each period and spread it over multiple periods. Otherwise, some proposers can use their extra votes to make their own suggestions win again and win extra votes again. To obtain reliable signals and discover actual innovative pot for the popularity effects (Chan, et al., 2016) . The following clause controls the usage of extra votes: However, only W extra votes is used in each selection period so the maximum vote weight is (W+1) because every participant has one base vote in each s election period. After voting, the number of extra votes decrease by the number of used votes, and the remaining are carried forward into the next rounds. 43 M eritocratic weighting function can be linear, nonlinear or even binary, which gives voting rights to qualified participants. The extra votes resemble proxy voting. The winning proposers can use their extra votes to select better suggestions, which gives ex tra votes to the proposers of those suggestions. A fter several iterations (asymptotically) , Therefore, m eritocracy and proxy voting are related to the Ranking Systems with Cumulative Voti ng in mechanism theory. In the ranking systems settings, approval voting is the only ranking rule that satisfies ranked independence of alternatives, positive response, and anonymity (Shoham & Leyton - Brown, 2010) . Hence, approval voting can be particularly beneficial in meritocracy and proxy voting . Meritocracy and proxy voting work like a cumulative positive reputation system and makes the selection results less sensitive to registration and double voting. Liu et al. (2014) assumed that each user ID belongs to a unique user because the Taskcn reputation system incentivizes the users to use only one identity for all tasks. The reputation system disincentives users against creating ne w accounts and starting over as newcomers, especially when the expected future profits gained by the reputation exceeds the profit from cheating that ruins reputation (Szabo, 1997) . Nevertheless, if reputation is accumulated ba sed on human subjective judgment, registration is required to reliably aggregate user judgments. It does not need registration if reputation is based on a verifiably objective measure , as in the special case of the N ew Y ork diamond industry , described by F riedman (2000) . While, meritocracy aims to improve the ability to find the best version, plutocracy can improve the willingness to select the best version. In plutocracy, the weighting function depends on the number of shares that selectors hold. I nvestment by participants is not only a financial source , but also a source of incentive for better selection when participants share ownership of the outcome or have a stake in the outcome . P lutocracy weights their decisio ns based on their shares. U proportional to their shares , as in public corporations and proof of stake in most blockchains. It can also be binary and give positive weight only to the part icipants who have enough stakes. I n the Dash blockchain, a master - node with at least 1000 Dash can vote. 44 An advantage of such linear plutocracy is that it is not sensitive to the non - repudiation method, whereas democracy is. In linear plutocracy, the amounts of investments determ ine the voting power and it does not matter if someone controls multiple identities. This obviates the need for non - repudiation and thus plutocracy has investment stage/state instead of registration stage/state. Conversely, democracy requires registration and proof of individuality to prevent double voting and Sybil attack . In a Sybil attack, individuals make multiple fake identities to influence outcome. In democracy, increases with the number of identities or accounts they contr ol. As a result, democracy is not very effective in the cyberspace with anonymous or pseudonymous digital identities and in permission - less blockchains. Some techniques try to prevent double registration by using IP addresses or persistent cookies, but one can easily circumvent them through client - side manipulations. Some techniques make the registration process costly or time - consuming. This turns it into a type of plutocracy, wherein registration is the investment and the number of identities is the share . Essentially, we need to link digital identities to physical entities to detect if different digital signatures belong to the same body. MTurk connects to their bank accounts to ensure each person is associated with one ID (Mason & Watts, 2009) . Conversely, democracy is more resistant to 51% attack or collusion attack compared to plutocracy. Such attack in democracy requires the majority of members collude or vote maliciously, hence the term tyr anny of majority . In plutocracy, a few shareholders can own 51% of the shares, control 100% of the resources, and exploit the other 49% of investments. This concentration of power undermines the impartiality and perceived impartiality of the selection proc ess and disincentives effective participation by minorit y shareholders . In practice, participation is not perfect and a small group of major shareholders with as little as 30% of the shares might be able to influence all decisions most of the time. It may explain why some top executives receive astronomical salaries at the expense of minority stockholders. Boards of directors often claim that such salaries are necessary to hire high quality managers and are worth the benefits. However, it is usually hard to ascribe such salaries to improvement in firm performance. Most of successful managers do not perform better than average ones in the long term, and the success of firms is mostly due to luck and other factors rather than the effectiveness of the top execu tives (Kahneman, 2011) . 45 One may suggest requiring more than 50% of share - votes in the selection criterion. However, as Here, I suggest concave weighting function to contr ol power. One such function is square root so that the weighting function weights the votes proportional to the square root of the shares of the shareholders and then linearly normalizes them to add up to 100% as table 8 - 1 illustrates. This makes the 51% a ttack very difficult. To gain 50% of the voting power, an attacker should have about n / (n+1) of the shares, where n is the number of the other investors with approximately equal shares. Therefore, an investor needs to own more than 95% of the shares to h ave such power if there are only 19 other investors. However, when the weighting function becomes concave, it becomes sensitive to the registration and non - repudiation method. To make 51% attack (by one person) impossible, we can use as the wei ghting function (where S i is the agent i's proportion of share ) and normalize it via adjusting the degree of the root ( r ) to make the total voting p ower sum up to 100% as table 8 - 1 shows. Another way to prevent such attack is to use a hybrid of democracy and linear plutocracy through averaging the percentages resulted from both functions. Notably, hybrid and concave plutocracy schemes need both registration and investment stages. M ore egalitarian and inclusive constitutions distribute power more concavely (figure 8 - 1) . Contrariwise, when the distribution of power is more convex, the governance becomes more centralized and extractive. However, democracy is not necessarily always the best, because first, it is sensitive to the non - repudiation method and susce ptible to the Sybil attack. Second, it provides limited and low incentives for thoughtful participation, which makes it susceptible to the tyranny of majority. Generally, the Table 8 - 1: Distribution of Voting Rights using Linear and Concave Weighting Functions 46 constitutions in the middle balance the powers of majority (low incentives) and m inority (high incentives) and thus their selection process is less biased and more reliable. Effectiveness More Inclusive & Egalitarian Anarchy Autocracy Oligarchy Convex Plutocracy Linear Plutocracy Concave Plutocracy Hybrid Schemes Meritocracy Democracy Figure 8 - 2: Performance of Constitutions with respect to Distribution of Power Figure 8 - 1: Distribution of Power with Respect to the Share of the Strategic Resource 47 Figure 8 - 2 illustrates the relative effectiveness of different degrees of inclusivity. Anarchy includes no decision and ol igarchy extends it to a few. Convex plutocracy is vulnerable to collusion attack, which can concentrate power and turn into oligarchy. As convexity decreases, the selection process becomes more inclusive and egalitarian and thus more impartial and trustabl e. When the function becomes concave, it becomes sensitive to the registration and the non - repudiation method , and the sensitivity increases with concavity, due to more usefulness of multiple identities. Perhaps, linear plutocracy is popular because it is the least vulnerable scheme to collusion attack that is not vulnerable to Sybil attack at all. To sum up, figure 8 - 3 provides a taxonomy of governance structures with respect to the distribution of power. As it shows, the market equilibrium price is the only collective decision that is distributed and insensitive to registration, while strategy proof (with rational agen ts), in the sense that it is resistant to collusion attack and tyranny of the majority . Appendix J presents a constitution that makes the selection s based on market equilibrium price s . Governance Centralized Autocracy Oligarchy Decentralized Hierarchy Distributed Sensitive to Registration Democracy Meritocracy Concave Plutocracy Hybrid Plutocracy Insensitive to Registration Prone to Collusion Linear Plutocracy Convex Plutocracy Collusion Resistant Market Price Figure 8 - 3: Taxonomy of Governance Structures based on the Design Model for Constitutions 48 9 . Constitution al Design Previous chapters focused on the rules for governing collective design of a solution to a problem. A constitution is also a solution to a problem (how to govern collective design). Therefore, we need some rules to design, modify and amend constitutions. In stitutional economists refer to such rules as a second level constitution (Buchanan & Tullock, 1961) . The amendment clause in a constitution is a second level constitution whose problem is to improve the primary constitution. The initial edition is the existing primary constitution. Blockchain communities often refer to such amendment rules as the constitution or governance structure of a blockchain. Bitcoin lac ks an amendment protocol and is hard to upgrade, thus the developers without a fork in the Bitcoin chain . In crowdsourcing context , Chilton et al. (2016) suggested having the crowd discover the micro tasks in design workflows. Nickerson et al. (2011) considered having the crowd modify the crowdsourcing workflow processes, and they called it human based genetic programming . I nterestingly, if we regard a constitution as the DNA of an organization, designing it is analogous to genetic engineering. Generally, t o instantiate and execute a constitution for a class of problem s , several parameters need to be determined . They include the amount of reward s for winning suggestions, the length of each selection period, the form of the weighting function, the specification of the problem, the initial edition, the population of participants, and other components in the design model (14 - tup le) . Constitution designers may exogenously (i.e. autocratically) determine those parameters based on theoretical or experimental analysis. The results may or may not be generalizable to other situations or problems. Moreover, constitution designers may se t suboptimal values if they have conflict of interest with stakeholders. In general , any exogenous value is a potential source of moral hazard. Alternatively, participants may decide upon some parameter values endogenously through some rules or protocols i n an initialization stage after the registration or investment stage . These rules form a second level constitution that specifies part s of the primary constitution. For example, the investors can determine a reward amount by proposing different amounts and then the median of the proposed values (weighted by shares) becomes the reward level. This 49 approach is endogenous but is still sensitive to incentives of the inves tors whose inputs specify the value. A better approach might be to use competition and equilibrium points that require fewer parameters. For example, selection fee has one parameter, whereas betting contest makes selection costly through competition. Simil arly, suggestion fee has one parameter, whereas advertisement auction has none. The i nitial edition of the solution also is a decision variable (included in p ) . It can be decided exogenously or in the initialization stage . It can be the existing solution if one already exists, or it may be set to blank so that suggestions start from nil and the outcome of the first round becomes the initial edition. In blockchains, usually the first block (genesis block ) sets the initial balances to zero. Another parameter is the problem definition and solution specification ( included in p and I P ) , which determine the nature of the solution and the objective of the constitution . Different individuals may represent a problem differently according to their own perspectives (Hong & Page, 2001) . Problem specification is a decision that participants can make collectively in the initialization stage . Some problems are decomposable into multiple sub - problems and segments so that diffe rent parts of the solution can evolve independently, and then combined to yield a complete solution. It is beneficial when people with different skills can work on different parts and the integration cost is low (Kornberger, 2016) . Malone et al. (2017) solutions into a big solution, but it has two challenges: identifying good partitioning and con straint management, which means subcomponents from different sources should be mutually compatible. Malone et al. explained that it can largely become automated using algo rithmic and mathematical rules. R epresentative democracy is different from proxy voti ng in that there can be specific positions with specific voting powers to be filled for specific periods. The electoral process for filling the representative positions is actually a second level constitution that determines how candidates win positions and what decision rights and voting power each position has . The outcome of this constitution is the set of winning candidates who take the positions and make the population of participants ( N ) in the constitution. 50 10 . Analytical Model Here I assume that we have a unidimensional measure for the quality of the solution . T he final quality depends on the improvement i n each period and the number of periods. The quality added per period depends on t he numbe r of suggestions in each period ( m ) , t he average quality (improvement) of the suggestions in each period ( µ ) , t he diversity of the suggestions in each peri od ( 2 ) and t he likelihood of selecting the highest quality in each period. ( p ) . Quality of a suggestion is its contribution to improving the quality of the solution . Since the suggestions are purposeful, we expect µ > 0 , whereas i n Darwinian evolutions, generally µ< 0, because mutations more often result in defection rather than perfection. Assuming m suggestions are submitted in a period and their qualities are random variables q 1 , q 2 q m from the probability density function f(q) , and the cumulative density function F(q) , the n the cumulative density function of the maximum quality: q max = Max{q 1 , q 2 m } is as follows: F max (q) = P( q > q max ) = P( q > q 1 , q 2 m ) = P( q > q 1 ). P( q > q 2 m ) = F m (q) Therefore, the probability density function of the maximum quality amongst m qualities is: Hence the expected level of the maximum quality amongst m (non - negative) qualities is: Sakamoto and Bao (2011) , in their resul ts section, provide d figures for the distributions of ideas generated by crowd. They show ed that the distributions of the qualities - in terms of both practicality and 51 originality - resemble the normal distribution especially in their upper ends. Thus, a ssuming normal distribution N(µ, 2 ) for the quality of the suggestions, t he above equation becomes : Wherein we have: and Here, I define: This function is concave as figure 10 - 1 illustrates. Also, I define µ' as the expected quality of the selected suggestion if it is not the best version. Therefore , each round results in this amount of improvement : Figure 10 - 1: Numerical Approximation of F unction g(m) for m = 1 to 100 using MATLAB® 52 Approximating µ' with µ simplifies it to the following: Now considering t his improvement repeats for z iterations , the objective becomes maximizing : Wherein q i is the final quality compared to the initial quality , a nd h(z) is a concave function of the number of iterations reflecting the saturation of quality after the design evolves to higher qualities : ( h(0)=0 , h(1)= ). A good example would be h(z) = z d where is the degree of robustness to saturation. So, d = 1 means that the quality of design can improve indefinitely, and d = 0 means it can improve once . There fore , equation 10 - 3 becomes: 53 11 . Propositions and Relationships In equation 10 - 4, t he number of iterations ( z ) is important when d is large and quality of the solution does not saturate fast . Shortening the suggestion or selection periods or increasing the total time ( T Z ) can increase z . I f the solution is simple and quickly approaches perfection, there is no reason to have many iterations. Relating to the 14 - tuple, t he state transition function ( ) determines the termination condition and thus the number of rounds . Figure 11 - 1 illustrates other parameters and the ir relationships with the quality (improvement) in one period . The orange boxes are direct antecedents of quality according to equation 10 - 2 , with multiplication presented as moderation . They are the main mediators or moderators for quality . Hereafter, I refer to them as mediators. The green boxes are controllable variables and the blue boxes are the secondary mediators for the primary media tors (mediated mediation) . The black arrows represent positive associations and the red arrows represent negative ones . They provide testable propositions that are supported by economic theories as the justificatory knowledge, thereby address ing the fifth and sixth component s of an ISDT (Gregor & Jones, 2007) . Th e y establish a proto - theory (Niederman & March, 2012) that can be tested . Ren et al. (2014) proposed a brief model composed of three forces that affect the quality of ideas: domain, actors and process. The second force includes motivations and skills of the actors . Higher incentives (willingness) and expertise (ability) can increase the average quality ( µ ) . As equation 10 - 2 shows, the average quality of suggestions ( µ ) matters more when m , or p are small. Filtration of low quality suggestions and submission costs can also increase average quality but obviously decrease the number (m) and diversity ( ) of the suggestions . Per equation 10 - 2 , when the suggestion variance (not mean) is large, th e number of suggestions ( m ) and the accuracy of selection ( p ) become more important. (Mulgan, 2006) . Longer suggestion periods ( T P ) and larger maximum level ( M ) can increase m , but they can decrease the number of iterations ( z ) under a fixed time limit. Higher incentives and a l arger population can increase participation and m . Relating to the 14 - tuple, the state transition function ( ) includes the conditions to end suggestion 54 and selection periods, and thus it determines the lengths of suggestion and selection periods. Meanwhile, I V , F and G V in figure 11 - 1, are the components of the selection process in the 14 - tuple. Remarkably, in equ ation 10 - 2, function g(m) does not have an upper bound, thus with large number of suggestions (e.g. crowd), diversity is more important than the average quality. Diversity of the proposing population can increase the diversity of suggestions. That may partially explain why diverse teams are more effective (Hansen, 2009) . In fact, any good model for crowdsourcing contests should take into account the expertise (Archak & Sundararajan, 2009) . In the 14 - tuple, the first component ( N ) determines the population of participants and their heterogeneity, size, expertise, etc. In addition, the filtration condition determines which and whose suggestions can go through. Chan et al. (2016) linked divergence to novelty and convergence to value. They explained that one way to increase the number and diversity of ideas is to recruit many and more diverse participants. On the other hand, selection and voting should ensure convergence to result i n more valuable and feasible ideas. Anonymity of the proposers can also increase diversity, but the winning proposers may prefer to receive Figure 11 - 1: Antecedents of Constitution Performance (Quality) and their Relationship s 55 recogniti on for their contributions. In the 14 - tuple, v determines what information about the proposers are reveale d to the selectors. The third force (process) that Ren et al. (2014) identified includes the idea selection and evaluation. The value of p reflects the accuracy of the selection process and is associated with its unbiasedness, impartiality and perce ived impartiality. Therefore, it affects the expectations for winning and thus the average quality ( µ ) and quantity ( m ) of their contributions. In fact, c oncerns about the accuracy of output could make participants suspicious about manipulation and undermine their participation (Bonabeau, 2009) . Malone et al. (2017) stressed the importance of using a systematic way to measure the quality of proposal s ( suggestion s) in crowdsourcing . Conversely , Clarkson and Alstyne (2007) showed that i outcomes and it does not need to be perfect or optimal. In the 14 - tuple, the set I V and functions F and W determine the selection process and the social choice function that aggregates individual preferences. T he number of suggestions can affect the accuracy of selection. Ren et al. (2017) argue d that if a crowd is not motivated enough, they submit too many mediocre ideas and make evaluation s costlier. Therefore, high incentives can improve the quality of submissions and thereby reduce the effort in the selection stage. However, their argument may not be complete. W hile higher incentives may (or may not) increase the average quality of suggest ions ( µ ) ; they certainly increase the number of suggestions ( m ), which requires more evaluation effort in the selection stage. When the evaluation is costly or time consuming, the large number of suggestions ( m ) can have an adverse effect on the quality of selection ( p ). More selection inputs c an improve t he accuracy of the selection results ( p ) directly and indirectly by making collusion harder . Longer selection periods ( T V ) and higher incentives for selection can attract more participation and thus more selection inputs . T V is determined by G V as described in chapter seven. A larger population of selectors can also increase selection inputs . Involving a more diverse population can decrease bias and the possibility of collusion , while increasing selection variance . An advantage of distributed systems like blockchain is the large and heterogeneous population of selectors who make the selection process more reliable and secure. Some blockchains like Bitcoin also use random sample of selectors to 56 m akes collusion harder. Referring to a random sample of selectors in each iteration can deter collusion without collecting too many selection inputs. R andom sample voting makes it harder to manipulate the selection process, because no one knows who is going to vote in the next period (Chaum, 2015) . Moreover, a random sample of few voters are more motivated because each vote carries more weight and is more meaningful. Chaum suggested that w hen there is not enough voting turnout or e nough agreement among the random voters, more (random) voters can be invited to vote, up until the outcome becomes statistically conclusive. The main objective of random sampling of voters is to prevent collusion. To this end, we may use demographi c information, location, and IP addresses to make randomization more purposive toward m aximizing heterogeneity . Ertekin et al. (2013) explained that w hen a that the majority of the crowd can detect , the majority vote is a good criterion. They took the majority opinion as the ultimate criterion ( true label) and tested two algorithms to approximate the majority opinion using votes of a representative subset of the c rowd instead of everybody . They balance d be high - quality voters ( labe l lers ). E ach round the algorithm (i.e. constitution) tries a random subset of the crowd to find the best voters . Then it give s more weight to the votes of those who aligned with the majority . However, when there is no ground truth and the choices are hard er to evaluate, we need to rely on subjective judgement s on the relative quality of different choices. Then , compliance with the majority is not necessarily a good criterion especially when the evaluation requires some expertise which may be lacking in most of the crowd . In this type of setting, there is still choice, but a smaller proportio n of the crow d (experts) can detect that choice thereby disqualif ying the majority in favour of the experts. A constitution can give heavier weights to the votes of experts or stakeholders who may not align with the majority. Chapter 8 explains different weighting techniques to improve the accuracy of select ion . In the 14 - tuple, the function W determines the weights of the selection inputs and the pool of selectors (positive weight). It can also pick a random sample of selectors by giving zero or positive weight to participants based on a random number . 57 Generally, when the problems and solutions are more complex and less precise, accurate evaluation requires more expertise (ability) and incentive (willingness). The level of disagreem ent among the voters increases when the evaluation of choices becomes more difficult (Gillick & Liu, 2010) . In such cases, the difference between the evaluation score (e.g. number of votes) of the winning choice and the other c hoices is small and the variance of the distribu tion of scores is high. Particularly, t he selection variance depends on the solution s . The (Ren, et al., 2017) . For example, 99% of physicists may agree on the answer to an equation, but general population may largely disagree . If the evaluation variance is small enough, we do not need weighting or collecting many selection inputs for accurate selection , but l arge variance s compel more attention to improve the selection accuracy . Haphazard and unthoughtful selection activities (v oting) add noise and increase selection variance thereby reducing the accuracy of selection. Incentives d o not deter such activities but can increase them. Excluding outlier selectors can reduce noise . Ertekin et al. (2013) excluded the voters who did not align with the majority. However, restricting a minority in favour of the majority can diminish diversity and lead to a more homogeneous voter population . This can then increase bias or the possibility of collusion. Another way to detect and exclude haphazard voters to use computer generated inapt suggestions (ploy s ). An ex - ante deterrence for haphazard selection activities is to make them costly via i mposing time or a fee for selection . The selection cost should be smaller than its reward, so not to deter valid voting. An endogenous selection cost is via weight ing which reflect their confidence in their choices. However, as chapter eight describe s this opens the door for manipulation and 51% attack especially when it can yield large enough profit s . The problem is to obtain thoughtful but unbiased evaluations. Generally , selection cost can filter out noise and haphazard inputs, but cannot deter malicious activities and collusion attacks. Excessive costs can deter honest participation for small reward s , but leave out malicious participation , which aims for high profit . Higher fee s increase the seriousness of the participants but may not change what they are serious about . The weighting function W , i n the 14 - tuple can depend on the payment of selection fee to impose the fee . 58 Ano ther way to reduce the selecti o n variance is to facilitat e better selection . Making the problem and the ev aluation criterion more precise can reduce the variance and improve accuracy. When the quality is mathematically precise , computers can evaluate the m and reach minimum variance (consensus) . This is how blockchain protocols can differentiate be tween valid and invalid blocks and result in consensus. However, the proof of work also imposes cost (bidding) to deter haphazard inputs. Relating to the 14 - tuple, t he sets P and I P specify the problem and the format of its solutions respectively. A nother facilitation method is ( ex - post ) filtration of spam and irrelevant suggestions. When there are many suggestions, filtering out low quality ones can help selectors focus on evaluating amenable choices. Spam or i rrelevant suggestions can slow down the process and waste evaluation resources. However, such filtration needs a precise formula to classif y suggestions based on meeting some minimum criteria . The challenge is to minimize type two error without getting into type one error and undermi ning diversity. In the 14 - tuple, the condition does the filtration. It can depend on some properties of suggestions, some properties the proposers of the suggestions or the payments made by proposers for their suggestions. A classification formula may specify a popula tion like experts, shareholders or elected participants to act as moderators and perform (ex - post) filtration, resulting in a multi stage selection. T he classification formula could also limit proposers to experts, shareholders or representatives in the first place. An ex - ante filtration m ethod is to exclude or suspend the low er performance proposers for a period or indefinitely. Fullerton & McAfee (1999) suggest that a contest should only include the two most skilled participants. However, that likely reduce s diversity . A more moderate approach is filtering out only the worst performing participants. T he proposer of the least voted suggestion in a period could be considered low per formance. T he follo wing clause is an example : After each selection peri od, the proposer of the least - voted suggestion is suspended for T E time , unless the least - voted choice is the U pdated Edition . A suspended proposer cannot propose a suggestion but still can vote during selection periods and receive reward s for voting for the winning choice. 59 An other ex - ante filtration method to deter spam sugg estions is to make suggestion costly so that only serious proposers, who have enough confidence in their ideas, submit suggestions. This cost can be imposed via a submission fee . Taylor (1995) found that free entry is not optimal in contests. Assessing suggestions is costly and the proposers know best if their suggestions are worth that cost. T he cost of suggestion should be less than the expected benef its , o therwise, it can deter risky and novel suggestions. Generally, f iltration can deter fresh viewpoints that differ from the popular belief (Chan, et al., 2016) . Sorting is another facilitation method that can get the spam suggestions out of the way without eliminating any suggestion. Computers can sort choices based on precise measure s like time of submission ( i.e. chronological) . Sorting criterion can be based on like number of shares, reputation and past performance . T hose who have stake in the outcome are more likely to propose valuable suggestions . voting or selection activities can reflect their awareness o f and attention to the problem , and t he ir previous suggestions can reflect the ir expertise and ability. In the 14 - tuple, the vector function sorts out the suggestions. Here I suggest advertisement auctions as a new sorting technique. In e ach suggestion period , there is an auction for ranking places so that the suggestions of the proposers, who pay more, appear higher and They know about the value s of their sugges tions and this method elicits that information. Like betting, i t imposes a n endogenously costly competition . The following clause implements advertisement auction: Proposers can pay to have their suggestions shown in higher places. The suggestions are sort ed based on Updated One may also combine multiple layers of sorting criteria, linearly or hie rarchically , or let each se lector customize them . Sorting can be dynamic and depend on the selection inputs, but th is can bias the later inputs due to information cascade . Notably , the order of the presentation of the ideas can introduce position bias , but using randomized order for each selector can reduce the overall bias (Malhotra, 1982) . 60 1 2 . Research Method T his chapter outlines a research method to design and improve e - constitution s based on the proposed design model. I refer to t his method as a meta - design method, because its goal is to d esign a design process ( i.e. e - constitution) . Table 12 - 1 illustrates the three levels of the artifacts involved in this research. The levels 1 and 2 resemble levels 1 and 2 described by Purao (2002) . This chapter is about level 2 and proposes a systematic method ology to design constitutions. Level 2 Research Method Meta - Design Method 2 nd Level Constitution Level 1 E - Constitution Design Process (Meta - Artifact) 1 st Level Constitution Level 0 Solution Design Outcome Policy / Decision Table 12 - 1: Three Levels of Design Artifact s and Methods 12. 1 . Experimentation Procedure Modeling a constitution as a process allows the use of response surface methodology (RSM) as the meta - design method to improve constitutions. To estimate the effect of multiple factors on the performance of a process, RSM provides many methods such as factorial designs, fractional factorial designs, central c omposite design, Box - Behnken design and Plackett - Burman design (Khuri & Mukhopadhyay, 2010) . These methods try to spread the experiment points over the design space (factorial combinations) to obtain most information with fewes t experiments without resulting in multicollinearity. The choice of design depends on the number of factors, levels for each factor and the number of experimental runs (sample size). Full factorial design is a good choice when the sample size is large , and factors are few . Zhang, et al. (2011) considered four design features (out of five) as factors that affect the performance of collaboration systems. Full factorial design would be a good choice. However, t hey evaluated the performance of 190 teams using systems with or without the four features , so that s ome teams had all four design features (treatment) and some teams had none (control). 61 Despite the large sample size , this study does not detect the effect of any particu lar factor due to perfect multi - collinearity among the factors . Their claimed findings about the specific features and manipulations are due to their specific assumptions . On the other hand, when there are many factors and the number of data points is limited, the choice of factor combinations becomes challenging. Even though RSM is sequential by nature (Khuri & Mukhopadhyay, 2010) , it needs to t est all factors in the early steps particularly in the screening phase (phase 0) in order to detect and filter out unimportant factors. However, constitutions have too many parameters to screen out as factors . Therefore, we need a better approach to detect important factors. RSM does not consider the nature of the process and the mediational mechanisms through which the factors affect the performance of the process. Therefore, i t treats all factors equally and aims for symmetry, rotatability and orthogonal ity. However, in most practical situations, different factors have different magnitudes of effect and some factor levels may result in much higher or lower performance levels making a subset of data points (low performance) irrelevant because those paramet er levels fall out of the region of interest. For example, a full f actorial de sign with 5 factors requires 32 runs. If one level of two factors produce very low performance, 24 out of 32 runs will be out of the region of interest and 8 out of the 32 runs w ill provide valuable information for estimating the response surface near the optimal point. This section suggests an experimentation procedure that might be labeled as Sequential Factorial Design with Alternative Treatments . T his procedure tries to balan ce between collecting data (exploration) and using data (exploitation) to detect effects and to collect new data efficiently. This procedure does not result in rotatable or symmetric experimental design and does not satisfy any of the alphabetic optimality criteria, rather it collects just enough data to mak e decisions including the design of further experiments . R unning all treatments simultaneously would allow for assigning subjects to all treatment groups randomly, but would not allow for using some tre atment results to design other 62 treatments. Moreover, in many situations (e.g. MTurk), it may not be practical to recruit a large number of subjects at the same time to assign them to all treatments simultaneously. While most of RSM methods consider all fa ctors at the same time, the procedure suggested here introduces batches of factors to the model in sequential rounds. This is important because when a process has many parameters (e.g. constitutions) , we would need a large sample size and many data points to have enough degrees of freedom for considering all parameters as factors at the same time. To select a subset of parameters as factors, we need to analyze the process in deeper level and consider the nature of the factors and how (mediation) they can af fect the performance of the process. For constitutional design, the next section provide s the guidelines to select a subset of parameters that are good candidates as import a n t factors. The suggested procedure uses that information for efficient improvement of constitutions . The procedure has the following steps: 0 - Collect data on an initial constitution. It can be a crowdsourcing protocol that has already been tried. Alternatively, one may experiment a best guess based on experience. This is treatment (1). 1 1 - Apply the factor selection guidelines on the results and select k parameters as factors X 1 ...X k and form hypotheses about their effects. The number of factors depend on the guidelines, the problem and number of runs the budget allows. A safe choice is two factors when the behavior of the response variable is unknown and unpredictable. 2 - Generate a factorial or fractional factorial set of treatments with the k factors. The choice of factorial resolution depends on k and the amount of noise in the response. We need a small statistical power to detect large significant effects and anomalies. The goal is not to test a hypothesis using this set of trials, but rather to detect if any levels of the factors are likely to fall outside the region of interest. With one factor, three additional runs including two runs of treatment A and one replicate of the 1 By convention, treatments are named as the sequence of their upper level factors (but in lowercase) or (1) if all factors are in their lower levels (Myers, et al., 2009) . 63 initial treatment (1) can yield enough information. With two factors, three additional runs ( a, b, ab ) and the initial treatment (1) can form a 2 2 full factorial that can reveal large effects and anomalies. With three factors, seven additional runs ( a, b, c, ab, ac, bc, abc ) and the initial treatment can make a 2 3 full facto rial to yield enough data for detecting large effects. With four factors, seven additional runs ( ab, ac, ad, bc, bd, cd, abcd ) and the initial treatment can form a 2 4 - 1 fractional factorial (resolution IV), which has enough degrees of freedom without too m any trials . 3 - Experiment the factorial treatments preferably simultaneously or in random order. Use random assignment without replacement to assign users to treatments. The inclusion of one or two replicates of the initial treatment or the best performing t reatment from the previous round (block) will allow compar ison of blocks , and the estimation of pure error (SS PE ) and lack of fit. 4 - Run stepwise linear regression to detect significant effects. This is the screening phase of RSM, trying to detect the import ant factors with the available data as the degrees of freedom grow. Stepwise regression also helps to reduce the possible collinearities among predictors. 5 - If the regression resulted in significant coefficients for one or more continuous variables, go to th e next step. Otherwise, go back to step one using the highest performance treatment as the initial constitution and apply the factor selection guidelines to include more factors or different levels of the same factors if the guidelines suggest already incl uded factors. This will add dimensions (factors) to the model while increasing degrees of freedom and statistical power. 6 - Use the highest performance treatment as baseline and estimate the first - order model for continuous factors while keeping discrete fact ors at their high performance levels. Dropping some low performance points may increase R 2 adj of the model and improve the model. That is because of some plausible interaction effects that are not relevant for steepest ascent. 7 - Start from the treatment with the highest performance and use the linear model on continuous variables to move in the direction of the steepest ascent (Myers, et al., 2009) while keeping the discrete variables at their high performance levels. The model not only confirms the importance of specific factors but also shows the direction of the steepest ascent for the continuous ones. 64 8 - Regress a second - order model on the directio n of the steepest ascent (one - dimensional) and find the optimal constitution with respect to the factors in the model. Experiment this constitution. 9 - If the optimal constitution is close enough to the points that formed the first - order model, go to the next step. Otherwise, go back to step 1 using this optimal constitution as the initial constitution and apply the factor selection guidelines to include more factors or different levels of the same factors. 10 - Regress a second - order model on the continuous variab les (multi - dimensional) and find the stationary point (zero gradient). If there are not enough data for a variable, run more trials around the optimal constitution to have enough sample size for the second - order model. 11 - Analyze the eigenvalues and detect the nature of the stationary point. If it is not optimal, move in the proper direction (ridge analysis) and find the optimal point (Myers, et al., 2009) . Experiment the optimal constitution. 12 - Go back to step 1 using this optimal constitution as the initial constitution. The main difference between this procedure and the standard RSM is that this procedure uses the factor selection guidelines to look deep into the results and choose factors, where as RSM looks at the phenomenon as a black box and considers all possible factors. Moreover, most of the RSM designs aim for symmetry, uniform distribution of prediction variance and rotatability, which overlook the fact that different factors have differen t magnitudes of effects. The suggested procedure is asymmetric toward collecting more data points closer to the region of interest and plausible optimal point, leading to better prediction variance there. This enables better estimation of first order and s econd order models in the region of interest with a smaller sample size . It takes advantage of the fact that some factors are more impactful and need less statistical power to show significant effects, while some others have smaller effects needing larger statistical power for significance. 65 12.2. Factor Selection Guidelines To decide which parameters should be use d as factors , we need to look for improvement opportunities. Referring back to equation 10 - 4, we can improve the performance of a constitution through five m e diators : accuracy of selection ( 1: p ), number of versions (choices) per period ( 2: m ), average quality of suggestions ( 3: µ ), variance of the quality of suggestions ( 5: ) and the number of rounds ( 5: z ). Theoretically, we should increase th e mediators that have larger effects per unit and have more room to increase. That means we should estimate the direction of the gradient of expected quality ( ) time s (element - wise) the range of variation s of the mediators . Table 12 - 2 shows the elements of the element - wise multiplication. Table 12 - 2: E lements of the Improvement Direction: Note: p=accuracy of selection results ; m= number of versions (choices) per period ; the quality of suggestions; z= number of rounds ; d=saturation effect on improvements (constant). Therefore, between p and m , we should focus on increasing p if and onl y if , otherwise we aim to increase m . Table 12 - 3 shows all the pair wise comparisons. Table 12 - 3 : P airwise C omparison s between the Effects of the M ediator s Not m if: Not µ if: Not if: Not z if: Increase p if Increase m if Increase µ if Increase if 66 Increasing each mediator requires manipulation of particular constitutional parameters. This section offers some guidelines to detect the parameters that are more relevant and effective in improving each mediator . The guidelines are based on the discussions in the previous chapter , particularly figure 11 - 1 , but the scope of these guidelines is primarily limit ed to crowdsourcing applications. The guidelines rely on observations ( O B ) from a previous trial or run. Accordingly, I assume we have already observed the outputs of an e - constitution (crowdsourcing protocol ) that we want to improve. For every mediator, one of the observation s is the comments left by the participants. The comments can be used to direct analysis toward specific objective measures . 1 - Accuracy of selection ( p ): Investigating tables 12 - 2 and 12 - 3 reveals that an increase in p has the largest impact on the quality, because g(m) and are the largest terms in most practical situations. Therefore, we should focus on improving selection accuracy if it has room for improvement. First , we should examine the previous trials and detect the best choices in each period and measure how often they were selected . This gives us a measure of p . If the best choices won every time with relatively high number of votes, this media tor ( p ) may not have much room for improvement. However, if the best choices have not won as often as expected, we can look for the possible causes for bad selection and the possible remedies. Generally, the causes and remedies can be classified into two c ategories: too few good votes and too many bad votes. Votes refer to any kind of selection inputs including approval voting, scoring, etc. Too few accurate votes : When there is not enough participation in selection of an alternative , the results can become unreliable. There are different ways to collect more votes: 1 - If possible , expand ing the population of selectors can result in more selection inputs . 2 - If budget allows, h igher selection rewards can incen tivize more voting , but if the reward is not linked to performance, it can bring haphazard votes . The selection reward can be based on individual performance ( R V ) or group performance ( R G ). Expectedly, individual rewards should provide stronger incentives , but in many situations , the only criterion for individual selection 67 performance is alignment with the majority. In some circumstances there is no objective measure for the group performance, in which case we should rely on external judgment for performance. 3 - Reducing selection co sts can increase selection activities . 4 - If there are accurate votes towards the end of the selection periods, increasing the time for evaluation ( T V ) may bring about more accurate votes . 5 - When the selection criterion is plurality and multiple good choices compete at each period, they may steal votes from each other so that an inferior choice wins. A quality score that has a limited range (e.g. correct and incorrect) can exacerbate the situation. The t raditional approach to cope with this problem is multi stage elimination , but a simpler solution is approval voting . Too many erroneous votes : Haphazard and intentionally wrong votes can reduce the accuracy of selection results ( p ). One may measure the inaccurate vot ing rate by aver aging the periodical s election variance s across all selection periods. Periodical selection variance is the variance of select ion scores ( e.g. number of votes) across different choices in each period . However, when the selection input is voting and we can detect the best version in each period, a better metric for accuracy of selection is to average the percentage of right votes per period across all periods. This metric takes the best choice into account in addition to the spread of the votes. Moreover, it is not very sensitive to the number of voters whereas the variance is. However, it is sensitive to the number of choices. A small number of choices inflates the percentage of votes cast on the best version, and a large number of choices deflates it even if the best version wins with a high margin. Therefore, I define the accuracy ratio in a period as the score (number of votes) that the actual best version received over the maximum score that any other version received in that period. If i t is more than one , the best version win s . I f it is less than one, a wrong choice wins 2 . The magnitude of this metric reflects the margin of success or failure of the selection. This metric is valid for other selection criterion as well as various voting systems. The average of accuracy ratio s a cross all periods reflects the selection accuracy of the constitution. 2 When it is equal to one, it becomes dependent on the tie - breaking rule in the constitution, but the accuracy ratio still reflects the selection accuracy in any case. 68 Depending on the situation, d ifferent tactics can control for inaccurate votes : 1 - If possible, limiting selectors to experts or stakeholders may improve the accuracy of selection. 2 - If the accuracy of votes increases during each selection period on average , letting selectors revise their choices may improve the quality of votes . 3 - T he selection reward can induce haphazard voting if it does not penalize for wrong choices . Particularly, in approval voting, fixed rewards for selecting the winner can incentivize individuals to vote for all choices (or as many as possible) to improve the ir chance of voting for the winner. One way to cope with this problem is to limit the number of choices a per son can select. Limiting it to one choice makes it regular voting. Another way is to adjust the individual rewards based on the number of choices selected or not selected. For example, the reward can be ( m - V i ). R W , wherein m is the total number of choices in a period and V i is the number of choices voter i selected in that period. R W is the reward per rejecting a wrong choice while selecting the winner. 4 - One can deter bad selection inputs by imposing a selection fee. However, many settings do not allow ch arging voters and h igh selection costs may deter accurate selection as well . Using a b etting contest is a better mechanism w hen the right selection fee is hard to determine or can vary from period to period. Accordingly, selectors decide how much to invest on their evaluations and then the weight of the ir votes and selection rewards are proportional to their bets. 5 - If there is bias toward or against a choice (e.g. last choice) in the list, the constitution may present the choices randomly to each participant to distribute and eliminate the bias. If the bias is toward (against) the updated edition from the previous period ( i.e. too conservative or too progressive ), its selection reward ( R O ) may be lower (hig her) for voting for that choice when it wins. 6 - If a formula can classify those who provided better selection inputs, the constitution can give higher weights to their votes (i.e. meritocracy). If the participants whose suggestions won made better selec tions in su b sequent periods, their votes are worth more . However, too large of voting weights for the winners can bias the results toward specific viewpoints. 69 7 - M eritocra cy can also be on the opposite direction so that the constitution bans (gives zero weight to ) voting by the worst selectors if they can be classified algorithmically. If the proposers of the least voted (least scored) suggestions made the worst choices in subsequent per iods, the constitution may ban them from voting. This can bias results against unpopular viewpoints though. 8 - Presenting too many choices to the selectors can result in inaccurate votes . There are different approa ches to control for the number of suggestions per period: a. Imposing a suggestion fee can filter out poor suggestions if it is possible to charge participants . However, high suggestion costs can deter good suggestions as well. Particularly, different participants have different utility and cost preferences. Hence, a better approach is sorting based on advertisement auction . b. Sor ting can facilitate better selection. A dvertisement auction sorts suggestions based on t he amounts participants pay to place their suggestions. However, sorting can bias sel ection in many situations such as prediction voting. c. If there are enough selectors, the constitution can show a random subset of the choices to the selectors so that they can focus on evaluating a smaller number of choices. d. Banning the proposers of the least voted (least scored) suggestions can deter low quality submissions . However, we need to investigate the results and see if the least voted choices in each period were actually the worst ones. Otherwise, we may lower the quality of suggestions. This can be combined with the item 7 so that the least voted proposers are ban ned from participation in voting and suggestion periods . e. If the best suggestions were submitted in the middle of the suggestion periods, one can limit the number of choices by shortening the suggestion period ( T P ) without losing the best suggestions. One m ight as well impose (or decrease) limit on the number of suggestions per period ( M ) . The limit M makes it uncertain when a suggestion period ends, but the period length T S makes the number of choices in each period uncertain . f. If there is reward for suggestion, decreasing the reward can reduce submissions. 70 2 - Number of suggestions ( m ): As figure 10 - 1 illustrates, g(m) is concave. So one should weigh the benefits of additional suggestions against the possible decrease in selection accuracy. Increasing the number of suggestions can be effective only if there are too few suggestions and the selection results are safely accurate . In that case, there are some strategies to increase the number of suggestions per period: 1 - If possible, expanding the population of proposers can result in more suggestions. 2 - If budget allows, higher rewards for winning suggestion s ( R P ) can incentivize more suggestions . However, t he reward should be linked to the quality of suggestions. Otherwise, it could induce spam suggestions. Th e relative selection scores and winning in selection period can be effective endogenous criterion for the quality of suggestions . 3 - If there are submissions towards the end of suggestion periods, increasing the suggestion time ( T P ) or the maximum number of suggestions ( M ) can bring about more suggestions , depending on which one is limiting the number of suggest ions. 4 - If the constitution bans or suspends suggestions to control number of suggestions, that could reduce number of submissions. Removing or lightening that rule can increase suggestions. 3 - Average quality of suggestions ( µ ): When there are not enough high quality suggestions , the process stagnates and does not improve the quality of solution . There are several strategies that can be used to improve the quality of suggestions: 1 - If possible, expanding the population of proposers to more experts can improve the average quality . 2 - H igher rewards for winning suggestions ( R P ) may incentivize better suggestions , but it can also increase the number o f mediocre suggestions, thereby jeopardizing the selection accuracy. 3 - If better suggestions and winning ones are towards the end of suggestion periods, increasing the suggestion time ( T P ) or the maximum number of suggestions ( M ) can bring better suggestions, depending on which one is limiting the number of suggest ions. 4 - Generally, let ting participants edit their suggestions during each period can improve quality. 71 4 - Variance of suggestions ( ): Here are some strategies to increase the variance of suggestions: 1 - If possible, a more diverse population of proposers can increase the suggestions variance. 2 - Some studies (Paulus, et al., 2013) found that more original ideas are generated lat e in a session. Hence , extending suggestion periods by increasing T P or M may result in more diversity if there are more novel suggestions towards the end. 3 - If the constitution bans or suspends suggestions to control number of suggestions, that could reduce diversity. Removing or lightening that rule can increase diversity. 4 - Meritocracy could deter diverse suggestions and r educing extra votes can improve diversity. 5 - Number of rounds ( z ): If the winning versions settled and stayed a specific version in the process , having more rounds may not be beneficial, but rather it might be better to focus on improving suggestions and selections ( perhaps by increasing T V , T P or M ). Nevertheless , i f the versions improved until the end of process, more rounds may increase the quality of outcome. To increase the number of rounds we should either increase the total time or shorten the suggestion period s or shorten the selection period s as described below: 1 - If it is possible, extending the total process time allows for more rounds. 2 - If the best suggestions are in the middle of suggestion periods (not towards the end), decreasing T P or M (shortening the suggestion periods) can increase the number of rounds in the process. 3 - If enough accurate votes are cast in the middle of selection periods (not towards the end), decreasing T V (shortening selection periods) can increase the number of rounds in the process. Moreover, if selection results are safely accurate and selection scores do not have large variance, th at many votes may not be necessary and shorter selection periods can be more efficient. 72 Parameter - wise Summary : Table 12 - 4 summarizes the factor selection guidelines for each constitutional parameter. Parameter s Too Low [ H 1 : High er Level is Better ] Too High [ H 1 : Lower Level is Better ] T V P<1 & Too few good votes & Good votes at the end of T V Need more rounds ( z ) & Safely Accurate Selection T P , M Too few good or novel suggestions & Good or novel suggestions are toward end of T P P < 1 & Too many bad votes & B ad suggestions towards end of T P & Need more rounds ( z ) Approval Voting P < 1 & Too few good votes & Multiple good choices in plurality P < 1 & Too many bad votes & Over - selection in Approval Voting Revisable Voting P < 1 & Too many bad votes & Accuracy increases during T V Technical limitations R P Too few good suggestions & Enough b udget P < 1 & Too many suggestions & Low b udget R V , R G P<1 & Too few good votes & Enough b udget Low b udget & Safely Accurate Selection Adjusted Selection Reward ( [m - V]. R W ) P < 1 & Too many bad votes & Over - selection in Approval Voting Technical limitations Ro P < 1 & Too many bad votes & Bias against the current updated version P < 1 & Too many bad votes & Bias towards the current updated version Selection Cost P < 1 & Too many bad votes & Can charge selectors P < 1 & Too few good votes & Selection is too costly Suggestion Cost P < 1 & Too many bad votes & Too many bad suggestions Too few suggestions per period Sorting (vs. Random Order) P < 1 & Too many bad votes & Too many bad suggestions P < 1 & Too many bad votes & Bias towards specific choices Weighting Votes P < 1 & Too many bad votes & Can classify the best selectors Low variance of suggestions Banning Votes P < 1 & Too many bad votes & Can classify the worst selectors Low variance of suggestions Banning Suggestions P < 1 & Too many bad votes & Too many bad suggestions & The least voted = The worst Too few suggestions & Low variance of suggestions Random Subset of Choices P < 1 & Too many bad votes & Too many bad suggestions P < 1 & Too few good votes Table 12 - 4: Summary of G uidelines for each C onstitutional P arameter Each row in table 12 - 4 corresponds to one or two relevant constitutional parameters. The contents of the table determine if the level of a parameter is too low or too high. The guidelines suggest increasing the 73 level when it is too low and decreasing it wh en it is too high. For example, for T V , if we see the conditions (symptoms) on the left column (too low), we may hypothesize that the voting period is too short and increasing it can increase performance, but if the conditions on the right column are met, a better hypothesis would be shortening T V can improve performance. For approval voting, the high level means that the voters can select as many choices as they want, but lower levels of approval voting refer to fewer (more restrictive) numbers of choices a voter can select. Sorting can be systematic (high level) or random (low level). Meeting conditions in the left column suggests that the choices are too random and hypothesizes that presenting in a better order can improve the selection accuracy. Similar argument s apply for the right column. 12.3 . Evaluation of Constitutions Having constitutions in computer code enables us to evaluate and compare them efficiently via online experimentation. T his research evaluated different e - constitutions via between - group experiments using subjects recruited from MTurk . MTurk is a web service that enables outsourcing simple tasks to human workers from all over the world (Davis & Lin, 2011) . It provides a reasonable and cost - effective platform with diverse participants that are more representative of a real labor market than university students are (Mason & Watts, 2009) . To evaluate the performance of e - constitutions, I crowdsourced solving a problem whose solutions have different levels of quality or performance. I defined the problem as follows: Problem Definition: Imagine it is June 1, 2013. You have $1000 to invest in stocks, currencies and preciou s metals like silver. What would be the best trading plan and strategy to make the most profit during the 5 years period until May 31, 2018? The goal is to maximize the total wealth on June 1, 2018. To explore the most profitable assets during these 5 yea rs, you can use historical data found in financial websites such as finance.yahoo.com/most - active and tradingview.com/chart . For simplicity, assume that there is no transaction fee, no commission, and no dividend. 74 This problem has several desirable properties. First, the performance of solutions can be evaluated objectively without dealing with raters and interrater reliability issues. Particularly, it ha s virtually no measurement error. Second, the performance has only one dimension (profit) and there is no uncertainty or risk involved. Zhang, et al. (2011) also used a one - dimensional and objectively evaluable quality measu re (bug severity) as the performance of the collaboration processes, but their outcome solutions or products (software programs) had several other important quality dimensions, which they simply ignored instead of combining them or at least justifying thei r choice. Unlike that study, this project uses a specific problem or artifact that actually has one quality measure. However, I acknowledge that in addition to this one - dimensional quality of the outcome solution, a constitution can have other important pe rformance measures such as cost of the process or time of convergence . Third, this problem does not require forecasting expertise and a non - expert crowd can understand the problem and make significant contributions in a limited time. Decisions involving fi nancial forecasts would require high knowledge and long time to yield meaning variations in performance. Essentially this is an investment planning problem without the forecasting part. Fourth, the quality of the solutions can improve to very large numbers likely to result in high statistical power. To start a design process, we need an initial edition. The quality of the initial edition is a pre - test observation a nd is denoted by O 0 . In the experiments of this project the initial solution is the following plan : Initial Edition: On June 1, 2013: Use 100% of Cash to Buy Dow Jones Industrial Average On Feb 23, 2015: Sell Dow Jones On Dec 31, 2015: Use 50% of Cash to Buy IBM & Use 50% of Cash to Buy BTCUSD On May 31, 2018: Sell IBM & Sell BTCUSD 75 This plan is feasible, but it is relatively poor and has some obvious rooms for improvement so that most constitutions can result in improvements in the plan. Its performance i s about $11k. That means it turns $1000 to about $11,000 in five years. Now we see if the constitutions can govern the crowd to improve this initial edition to an edition with a higher return. 76 13. Proof of Concept 13.1. Implementation I used ASP.NET and SQL Server to implement a generic e - constitution in a dynamic website, which currently runs on https:// Hamed - Constitution.Broad.MSU.edu . This generic e - constitution has several parameters that cover a small subset of the p ossible constitutions based on the meta - model. The website , experimenter) can access constitutional parameters and adjust them to instantiate different constitutions for collective design . This website enables online experimentation of e - constitution s as treatment s . H ereafter, a treatment means a n e - constitution with specific parameter values. This website provides proof by construction and an expository instantiation of the e - constitution d esign model as the eighth component of ISDT (Gregor & Jones, 2007) . Appendices B, C and D present the screenshots of the pages of the website. A ppendix F presents the C# code for the generic e - constitution and a ppendix G il lustrates the data structure diagram of the database for the website. The main page in the website enables the participants to login or sign up. To sign up, the users need to accept the consent form agreement and complete the registration form. The adminis trator can also use this page to log into the as appendix E shows . In the control panel, the administrator can see the results of completed treatments. When participants sign up, the website directs them to the test) about the constitution . If they answer correctly, they can participate in the process and the final survey , if they fail the test they are not allowed to participate . This ensures that all participants understand the basic rules in the constitution. This test has a deadline and the subjects should pass the test before its deadline to participate. After passing the test, if the design process has not started yet, the participants wait until it starts. Once the design process starts, it iterates between two pages: suggestion and selection. Participants can propos e modifications during suggestion periods and/or vo t e during selection periods. At the end, participants are asked to complete the final survey and answer a few questions about the process. 7 7 Participants would not receive payment from MTurk if they failed to complete the final survey. During the suggestion see previous versions in the process. Figure 13 - 1 illustrates the flow of operations among these pages. As figure 13 - 1 shows, users can come from MTurk or directly by registering their emails . This project recruited subjects from MTurk. Appendix H shows the HIT (Human Intelligence Task) page. T he HIT page sends the worker ID to the registration form in the website as username. However, outside MTurk, users may sign up directly with their email addresses as username. In such case, a user receives a verification message that includes a link for the user to verify his/he r email addr ess. In either case (MTurk or Direct), the website assigns each subject randomly to an upcoming treatment (constitution) based on timing and availability. Figure 13 - 2 shows the timeline for the events and actions starting from HIT in MTurk. Figure 1 3 - 3 shows the sequence of events and actions in different pages. Blue boxes present enter. At the end, when the participants complete the final survey, the website provides them with a code and the experimenter approves their assignment in MTurk and pays them. Figure 13 - 1 : Flow of Control among the Webpages in t he Website and MTurk Figure 1 3 - 2: Timeline for one Treatment from the HIT in MTurk to the Final Survey 78 Variable Period controls the timing and period change. When Period is null, it means that the constitution (treatment) is not operational yet. Once the administrator sets the values of t he constitutional parameters, and decides to release it, s/he sets the value of period to zero ( Period 0 ). Then at the starting time of the treatment group, Period automatically becomes one ( Period 1 ) and the first suggestion period begins. Then at spe cific times the value of period increases one by one. Odd numbers of Period are suggestion periods and even numbers (except zero) are voting periods. At the end of the design process (Closing Time), Period becomes minus one ( Period - 1 ) indicating the survey period. At the end of the survey period, Period becomes - 11 indicating the end of the experiment for that treatment group. Web Page Participant Experimenter / Code Post the Treatment Period = 0 ; Publish HIT MTurk HIT Read Task ; Accept HIT ; Click on Link Get Worker ID ; Open First Page (Registration Form) Log In Read Consent ; Fill the Form ; Sign Up Assign a Treatment - Group ; Show Constitution Constitution Read Constitution ; Answer Questions ; Submit Answers If Correct Prompt the Participant to Wait Process Starts: Switch to Suggestion Period Period = 1 Suggestion Switch to Selectin Period Period = 2 Selection . . . Switch to the Final Period Period = - 1 Final Survey Complete the Final Survey ; Submit Completion Code Approve Work and Pay Balance Figure 13 - 3: Sequence of Events and Actions for one Participant in one Treatment 79 Figure 1 3 - 4 illustrates more details about how t he website governs the incremental design process in one treatment. At first, the experimenter sets the value of period to zero and releases the constitution. When the constitution is available for the participants, they can read it and pass the constituti on test by answering a few questions. They can do so until its deadline, which is T A time after the Starting time of the process as figure 1 3 - 4 shows. At the S tarting time, the value of Period automatically becomes one and the design process starts with the first suggestion period. For technical reasons, the versions of the solution are assigned to the even periods and the initial edition is assigned to period zero. Hence, at the beginning of each suggestion period, the winner from the previous selection period is copied to the next selection period as choice zero and becomes the updated edition. This makes it easier to track the versions across the periods. For each version of the solution, there is a proposer, except f or the Figure 13 - 4: Flow of Versions in an Incremental Design Process (Example) 80 any period, the initial edition is carried from period zero to the last period and the final edition is the same as the initial edition with Experimenter as its proposer. As figure 1 3 - 4 shows, the design process ends at the Closing time point , which is T Z minutes after the Starting time point. Then the final survey begins and lasts for T F minutes ending at the Ending time, which is the end of that treatment. At the Closing time point, the last winning version is considered as the final edition and the outcome of the treatment. During the final survey period, participants answer a few questions to receive the compensat ion for participation in the experiment. 13.2. Pilot Experiments To make sure of the functionality of the website, several pre - test or pilot experiments were conducted. Pilot #0 or pre - pilot was with some students in the Broad college of business at MSU. That pre - pilot verified that with that very small sam ple, two students colluded and voted for each other to win more rewards. The first pilot with MTurk (pilot #1) was not successful, and there was no effective participation, even though many workers signed up in the website. That was because the HIT in MTurk let everybody click on the link to the website and sign up even without accepting the HIT. As a result, the treatment group became full whereas the worker list was empty. Therefore, I cancelled the HIT and paid the one worker who contacted me. Based on that result, I made a JavaScript code that lets each worker to see the web site link only after accepting the HIT. Moreover, the code only takes ID from the JavaScript code in the HIT . This prevents others to sign up and ensures the authenticity of the worker IDs. Another imp rovement after this pilot was direct login with signup, so that when a participant signs up, they become logged in automatically. This speeds up the process and the participants do not need to enter their username and password unless they sign out. Another improvement 81 was removing the sign out button from every page, and also remind the workers to leave the page open and stay logged in. This is because the website needs to notify the workers about the start of the experiment and change of periods through de sktop notifications and alerts. The website notification is the only practical option because MTurk does not allow asking for wor , the MTurk API cannot send messages to workers until afte r the end o f the experiment. Pilot #2 was the first pilot experiment with some results. However, some workers accepted the HIT but signed up too late when the experiment had already begun. This early stage attrition can cause large variation in the number of participants across treatments. While I statistically control for the number of participants, large variations can bring about nonlinearities . Hence, I modified the HIT to tell the workers that they need to sign up before the start of the experiment, so they should sign up immediately or return the HIT. Moreover, every HIT expires 5 minutes before the start of its experiment, so there is enough time to sign up for everybody. Another change is that the new HIT does not allow a work er to participate twice and asks the workers not to share details of the experiment for one month. That is because the participants of later experiments may access that information and then the order of experiments would matter. There were useful comments received from the parti cipants in pilot #2. Some commented that the voting period was too long and not necessary. The records also showed that most of the participants voted in the first half of the period. Another comment was about letting the workers communicate with each othe r through some discussion forum or chat room. That might be reasonable but it can introduce many uncontrollable variables. So instead, I modified the system to let the participants include their reasons with their submissions. An additional observation in pilot #2 was that some workers submitted the updated edition without any change. Therefore, I added a restriction that suggestions must be different from the updated edition. After some modifications, I conducted pilot #3 with an iterative design process, in which the suggestion periods end after one submission. I then discovered that when a suggestion period ends because of a did not notify the other part icipants until they refresh ed the page. Many workers commented that they were waiting for the end of suggestion period, even though it had already 82 ended. To address this problem, I embedded a JavaScript code in the suggestion page to ch eck the value of per iod every 8 seconds so when the period changes, the code notifies the users to refresh the page . This allows the user s to decide when to refresh the page so that if they were working on a suggestion, they could copy and paste that content and not lose it . Another problem encountered in pilot #3 was the small number of participants (only six workers). Hence, I removed all the restrictions on participation for pilot # 4, so anyone could participate in pilot #4. However, this did not increase the number of par ticipants in pilot #4, but rather decreased the quality of participation. Hence, I put back restrictions for pilot #5. In pilot #5, a s another attempt to increase the number of participants , I increased the base payoff from $5 to $10. T his usually does not improve the quality of participation, but can increase the number of people participating (Mason & Watts, 2009) . This resulted in more participants (nine workers) in pilot #5, but not enough. Therefore, for pilot #6, I also in cluded rewards for good suggestion and good voting. This increased the number of subjects to 13, but only six of the workers stayed and finished the experiment. In pilot #7, I published four HITs with total of 36 assignments which is twice as many workers as I wanted (18 workers). I published all HITs about one hour before the start of the experiment. 34 workers signed up in the experiment and 17 of them participated and stayed until the end and finished the final survey. This result was satisfactory, but there were some comments about the long waiting time. Moreover, most of the workers signed up in the first few minutes and had to wait more than 30 minutes before the experiment. Therefore, in pilot #8 I published the HITs about 40 minutes before the start of experiment instead of one hour. I also modified the interface, made it more user friendly, and asked the workers to explore financial markets while waiting for the experiment. I also increased the reward for winning suggestions from $0.50 to $1.00 beca use of some comments regarding low incentives. I also shifted the period from the period between May 1, 2013 and April 30, 2018 to the period between June 1 , 2013 and May 31 , 2018 to be more recent . More importantly , as before , I published twice as many assignments as I needed, but this time I cancelled all HITs once, I got 20 workers. This limit s excessive variation s in the number of participants. 83 Pilot #8 resulted in active participation and smo oth functioning of the website, but the final solution was not a valid feasible plan. M any participants comment ed that the task description and instructions were confusing. Therefore, I edited the task description and improved the instructions after consul ting with some English speakers. Pilot #9 resulted in a valid and feasible plan, but relatively low quality plan. Close investigation of the results revealed that the votes were not very accurate ( low selection accuracy ) . To incentivize better participatio n and particularly voting activities, p ilot #10 included a group bonus ($2 each participant) for the be st performing group. This simulate s how shareholders and stakeholders in corporations share the success or failure of the firm. However, this resulted in much worse performance than previous trials. Therefore, I eliminated the group bonus. Particularly, a high performing constitution that does not rely on any external judgement would be more applicable to crowdsourcing situations, because it separates the decision - making and shareholding roles. A fter a few additional minor changes in the interface and some tests , I proceeded to conduct the experiment s with two treatments at a time. During the first pair of experiments , the server cra shed for a few minutes. This made the results not comparable to other results. Therefore, I consider ed these two exp eriments as pilots #11 and #12 and analysed their results to improve the website. Accordingly, I added a help page with visual directions for using Yahoo Finance an d better contribution s . Moreover, that incident showed that experiments might not always go as expected, so I modified the rigid factorial design of standard RSM to a more flexible procedure presented in the previous chapter . 84 14. Results and Analysis 14.1 . Participants The subjects we re recruited from MTurk and were restricted to workers who liv e in the US and hav e 97% approval rate with more than 100 assignment s completed . E xperiments were conducted following the sequential procedure and factor selection guidelin es suggested in chapter 12. In such sequential experimentation, some participants from earlier groups may s hare information to the later groups ( treatment diffusion ). Fortunately, t his cannot affect the treatments ( rules of the constitutions ), the decis ion rights, or their incentives except different reward levels in different cons titutions may result in resentful demoralization . I f participants find out they wer e paid less than another group, t his may have a negat ive impact on their performance . This problem is controlled in two ways. First, the rewards are the same across all constitutions except the last two , which have a higher selection reward . Therefore, reward never decreased across treatments. Second , I tried to minimize treatm ent d iffusion , by stating both in the HIT description and in the consent form that the subjects should not share information about the experiment for one month . The program does not let sign ing up without accepting the consent form a s appendix B shows . Appendix A presents the IRB approval document related to informed consent . T hose who gave informed consent and signed up to participate in experiment are counted as subjects . When sign ing up , a subject can declare his/her age, gender , education and whether s/he is a native English speaker. Once signed up, each subject is randomly assigned to a treatment, and can see its constitution as instructions . The number of subjects per treatment is a measure of treatment delivery . To participate in experiment s , s ubjects answer ed a few questions about their constitution . Those who answered correctly were considered participants and could propose suggestions and /or vote for versions in their experiment . This is a measure of treatment receipt . The program kept record of all previous participants and did not let anyone participate twice . T he HIT description inform ed workers that they cannot participate twice. The number of participants in each treatment can affect the outcome of treatment . To min imize variation of the number of participants across treatment groups, the program (randomly) assigned workers 85 to the least populated groups, thereby f illing them as evenly as possible . In addition , I statistically controlled for the number of participants per group in the model . The number of participants who stayed in the experiment until the end and participated in the final survey measures the treatment adherence for each treatment group . The me asurement attrition of each treatment group is t he percenta ge of participants who abandoned the experiment at some stage and did not complete the final survey. Appendix D shows t he f inal survey page , which includes three long answer questions and four rating questions about the participant and the experiment as posttest observations ( O B ) . The scores are between 1 and 1 0 l ike other experimental studies on cooperative design (McComb, et al., 2015; Little, et al., 2010) . In every treatment, workers who completed the survey received a base compensation of $10 plus a bonus depending on their contributions . 14.2 . Dependent Variables As dependent variables , I measure d the quality of the final solutions (plans) in terms of feasibility , (actual) return , truthfulness and total c ost . Feasibility is a binary variable that indicates if a solution is well formatted and valid and results in an unambiguous return. T he return is the primary indicator of the performance of a constitution and is the main response variable that is defined only for feasible solutions. If the final plan is feasible and claims a specific return, truthfulness measures the proximity of the claimed return to the actual return calculated by the experi menter. Truthfulness is not defined for a plan that is not feasible (no actual return) or does not include claim. T he total c ost is the sum of payments for each treatment group. Other dependent variables include t otal bonus , average bonus per person , number of versions per period (m) , number of votes per period , selection accuracy ( p ) and accuracy ratio as defined in chapter 12. The following sections present the treatments experimented based on the research method proposed in chapter 12, startin g from the initial constitution or treatment (1) . These experiments serve as proof of concept for the model and the methodology. To minimize the time effect, I conducted e very set of exp eriment s on a weekday (Monday - Thursday) starting at 7:00pm EST . Each experiment lasted about one hour ( T Z = 60 mins ) followed by a final survey with 20 minutes time limit ( T F =20 mins ) . I published four HITs per treatment with 9 assignments per HIT about 40 minutes before the start of experiment ( 6:20 pm ) . 86 14.3. Treatment (1) In the initial constitution , t he suggestion and selection periods are 4 minutes ( T P = T V = 4 ) and there is no limit on the number of suggestions ( M = ). T he selection process is one vote per person with the plurality rule as the criterion for winning . The program let s participants e dit their suggestions during suggestion periods but vote s are not revisable in this treatment . Appendix I presents the initial constitution in plain English text. 36 workers signed up after publishing the HITs , but five of them finished the registration process too late after the experiment had started. Hence , 31 subjects were assigned to this treatment. All of them passed the constitution test and qualified as parti cipant s . Table 14 - 1 presents a summary of the results of this treatment. Metrics Treatment (1) : The initial constitution Subjects (Treatment delivery) 31 Participants (Treatment receipt) 31 (100%) Abandoned (Measurement attrition) 9 (29%) Completed (Treatment adherence) 22 (71%) % Female 48% % English speaker 95% Average age 37.15 Actual return (performance) $ 0 .78 M Claimed return $ 7056.21 M Truthfulness 0.01 % Total Cost $ 273.54 Total bonus $ 7.95 Average bonus per participant $ 0.36 Number of rounds 8 Average number of versions per period ( m ) 8 .0 Average number of votes per period 21.5 Accurate Selections (p) 4 out of 6 rounds ( 0. 67) Average accuracy ratio 2.31 Table 14 - 1: S ummary of the O utcome s for T reatment (1) 87 This treatment resulted in a feasible final plan th at would turn $ 1000 into about $ 780 k . This is the actual return that I calculated based on the historical prices for the transactions in the final plan. The plan came with a claim of about $7 billion return , which indicates a low truthfulness of .0 1 %. This treatment had eight rounds but the last round was not a full round due to time limit and had only one suggestion. Moreover, I consider the first round as the warm up phase because the participants are learning how the program works. Therefor e, the selection accuracy ( p ) and the metrics averaging across periods (average m, average number of votes per peri od and average accuracy ratio) are based on the six rounds in th e middle , disregarding the first and last rounds. The y are the italicized metrics in table 14 - 1 . As for the selection accuracy, in four out of six rounds, the (actual) best choice won , so a rough estimate of p is 4/6= 66.7 %, which has room for improvement . Since the selections are far from accurate and the number of suggestions ( m - 1 = 7 ) is reasonable , so increasing the number of suggestions is not a priority . Hence , I analyze d the possible improvement s in the selection process . The selection criterion was plurality and most rounds had multiple good choices competing against each other. Hence, I chose approval voting as factor A and h ypo thesized ( H A ) that it improves performance . This factor resulted in the following sentence being added to the voting clause of the constitution: each period, you can vote for multiple choices, but not all choices. No bias toward or against a ny choice order was detected, but during each voting period, t he later votes were more accurate . Therefore, increasing T V may help, but this would decrease the number of rounds and fewer rounds can decrease quality because the winning version improved until the last period ( did not settle). Moreover , some comments mentioned that the voting periods were too long . Hence, instead of increasing T V , I chose revisable voting as factor B and h ypo thesized ( H B ) that it improves performance . Including this factor resulted in the following sentence being added to the voting clause: Therefore , a ccording to step 2 in the suggested procedure, the next treatments are a , b and ab , thereby forming a 2 2 full factorial scheme along with the results of treatment (1) . 88 14.4. Treatments a , b and ab Treatment a uses approval voting and lets participants s elect multiple choices in each voting period. Treatment b allows voters revise their votes in each voting period . Treatment ab allows both multiple choices and revising in voting . Everything else is identical to the initial constitution in all treatments. After publishing the HITs for these treatments, t otal of 102 workers signed up, but 10 of them finished registration too late after the experiments had started. Hence, 31 , 31 and 30 subjects were assigned to treatment s a , b and ab respectively and 27, 28 and 29 of them passed the constitution test and qualified as participants. Table 14 - 2 summarizes the results of these three treatments. Metrics Treatment a Treatment b Treatment ab Subjects (Treatment delivery) 31 31 30 Participants (Treatment receipt) 27 (87 % ) 28 (90 % ) 29 (97 %) Abandoned (Measurement attrition) 10 (37 %) 8 (29 %) 10 ( 34 %) Completed (Treatment adherence) 17 (63%) 20 (71%) 19 ( 66 %) Average age 30.29 39.8 32.22 % Female 41% 35% 32% % English speaker 100% 95% 100% Actual return (performance) $ 36.8 M $0.49 M $ 11.45 M Claimed return $ 79.95 M $ 0.99 M $ 12.69 M Truthfulness 46.06% 49.19% 90.25% Total Cost $ 214.49 $ 262.42 $ 238.42 Total bonus $ 8.74 $ 8.68 $ 8.68 Average bonus per participant $ .51 $ .43 $ .46 Number of rounds 8 8 8 Average m (versions per period) 10.83 8.67 10.83 Average number of votes per period 20.33 17.17 27.83 Average number of selected choices 1.4 1.0 1.88 Accurate Selections (p) 6 out of 6 (1.0) 3 out of 6 (0.5) 3 out of 6 (0.5 ) Average accuracy ratio 1.71 1.2 0.9 Table 14 - 2 : S ummary of the O utcomes for T reatments a , b and ab 89 T reatment s a , b and ab resulted in feasible final plans with actual returns of 36.8, 0.49 and 11.45 million dollars and claim ed returns of 80, 1 and 12.7 million dollars re s pectively. Therefore, treatment a yielded the highest performance and treatment ab yielded the most truthful claim. E ach t reatment had eight rounds and for the same reasons mentioned under treatment (1) , only six middle rounds were considered for estimating the last five metrics (italicized) in table 14 - 2 . In addition to the previous metrics , there is a new metric : average number of selected choices (per voter per period) . That is because in approval voting, voters can select different number s of choices each period and this metric is the average of the number of selected choices across voters and periods. Obviously, it is 1.0 for treatment s (1) and b because they only allo w voting for one choice at a time. The average number of selected choices was 1.4 and 1.88 for treatments a and ab respectively. These levels are surprisingly small , because the average number of choices in each treatment was 10.83 and s electing more choices would increase a voter chance of voting for the winning choice and winning the selection reward. That means on average voters in treatments a and b selected 13% and 17% of the choices each period respectively. To perform a preliminary analysis on the effects of the two factors a and b on the performance of the constitution , I define d dummy variables X A and X B as predictor s . X A =1 when selection criterion is approval voting and X A =0 otherwise. Also, X B =1 when votes are revisable and X B =0 otherwise. Moreover, I include d variable n , the number of participants (treatment receipt) as a control variable . As mentioned before, the performance of each constitution ( actual return ) is the response variable y . D ue to the multiplicative and exponential nature of returns, I use d the logarithm of the response variable as the dependent variable and fitted the following linear model to the data: 0 n A . X A B . X B Table 14 - 3 shows the data and the nu merical values of the variables and table 14 - 4 shows the estimates of the model coefficients as well as the relevant statistics. 90 Treatment Performance y ($M) Dependent Variable: Ln(y) Participants n Approval Voting X A Revisable Voting X B (1) 0.78 - 0.245 31 0 0 a 36.83 3.606 27 1 0 b 0.4 9 - 0.722 28 0 1 ab 11.45 2.438 29 1 1 Table 1 4 - 3: L evels of the D ependent and I ndependent V ariables in the F irst F our T reatments Term Coefficient Standard Error Standardized beta Coefficient p Intercept 4.045 0 - n - 0.138 0 - 0.113 - X A 3.298 0 0.912 - X B - 0.892 0 - 0.247 - R 2 = 1 , df = 0 , N=4 Table 14 - 4: R egression Results for the F irst F our T reatments Three predictors and four data points leave no degree of freedom for the residuals and does not allow for any test of significance. However, the number of subjects ( n ) had the least effect on the r esponse variable and only varied between 27 and 31 with a small standard deviation of 1.7 , which was less than 6% of its mean . Moreover, n is the least significant variable ( p= .872 ) using forward step wise regression. Therefore, for a preliminary test of the significance of factors, I excluded n from the model and estimated the coefficients with one degree of freedom for the residuals. Table 14 - 5 presents the results. Term Coefficient Standard Error Standardized B eta Coefficient p Intercept - 0.072 0.30 0.8 5 X A 3.506 * 0.3 5 0.969 0.06 X B - 0.822 0.3 5 - 0.227 0.25 R 2 = 0 .991, R 2 adj = 0 .973 , F=54.2* , df = 1 , N=4 Table 14 - 5 : Regression Results for the F irst F our T reatments E xcluding V ariable n Note: 91 As table 14 - 5 shows, approval voting significantly improved the performance of the constitution s , but r evisable voting had an insignificant negative effect . These results are preliminary , but can direct the design of the next experiments and collection of additional data . Therefore, the next set of treatments will use approval voting but no t revisable voting. In fact, s ince t reatment a was the best performing constitution it bec ame the baseline ( initial ) constitution for the next set of treatments according to step 5 of the proposed procedure . Moreover, I applied the factor selection guidelines on the results of treatment a to include new factors for the next treatments . As for the selection accuracy, throughout all rounds in treatment a , the best choice won, giving an estimate of 100% for p , which does not have room for improvement. This leads to other improvement opportunities specially regarding the quantity and quality of suggestions. Three comments mentioned that the suggestion periods were too short and they could not make a good plan in that short period. Moreover, o ut of seven winning suggestions, four were submitted in the last minute and three of them were submitted in the last 10 seconds. Hence, I chose T P as factor C and increased it from four to six minutes , hypothesizing ( H C ) that T P has a positive effect on performance . I changed the suggestion clause accordingly. T he winning version improved until the last period ( did not settle) and thus more rounds may improve the outcome. Since the selection was accurate ( p 1 ) for the best performing constitution (treatment a ) , we may increase the number of rounds (z) by shortening the voting periods. Moreover, a couple of participants commented that the voting periods were too long . Hence, I chose T V as factor D and de creased it from four minutes to three minutes , hypothesizing ( H D ) that T V has a negative effect on performance . I changed the voting clause in the constitution accordingly. Therefore, based on the suggested procedure, the next treatments are ac , ad and acd , thereby yielding a total of seven points including the previous results . 92 14.5. Treatments ac , ad and acd Treatment ac had longer suggestion period s ( T P = 6 min ) and t reatment ad had shorter voting period s ( T V = 3 min ). Treatment acd had longer suggestion period s and shorter voting period s . Everything else was identical to treatment a including approval voting . After publishing the HITs for these treatments, total of 82 workers signed up, but 3 of them did it too late after the experiments had started. Hence, 26 , 27 and 26 subjects were assigned to treatments ac , ad and acd respectively and 2 6 , 2 6 and 2 5 of them passed the constitution test and qualified as participants. Table 14 - 6 summarizes the results of these three treatments. Metrics Treatment ac Treatment ad Treatment acd Subjects (Treatment delivery) 26 27 26 Participants (Treatment receipt) 26 ( 100 %) 2 6 (9 6 %) 2 5 (9 6 %) Abandoned (Measurement attrition) 6 ( 23 %) 7 ( 27 %) 7 ( 28 %) Completed (Treatment adherence) 20 ( 77 %) 19 (7 3 %) 18 ( 72 %) Average age 37.05 37.3 37.89 % Female 20 % 40 % 50 % % English speaker 9 0% 100 % 100% Actual return (performance) $ 12 . 14 M $ 24 M $ 427 M Claimed return $ 2,186 M No Claim No Claim Truthfulness 0.56% N/A N/A Total Cost $ 2 48 . 78 $ 2 5 2. 96 $ 2 24 . 82 Total bonus $ 7 . 32 $ 10 .8 $ 7 . 35 Average bonus per participant $ . 37 $ . 57 $ . 41 Number of rounds 6 9 7 Average m (versions per period) 12.4 16 . 25 1 1 . 5 Average number of votes per period 2 4 . 6 3 7. 5 27. 17 Average number of selected choices 1.4 8 1. 99 1. 94 Accurate Selections (p) 3 out of 5 ( 0.6 ) 5 out of 8 (0. 63 ) 3 out of 6 (0.5 ) Average accuracy ratio 1.77 1. 07 1.57 Table 14 - 6 : S ummary of the O utcomes for T reatments ac , ad and acd 93 Treatments ac , ad and acd resulted in feasible final plans with actual returns of 12.14 , 24 and 427 million dollars . Treatment ac resulted in a claimed return of about 2.2 b illion dollars with a low truthfulness of 0.56% , but the other two plans did not include a claim. Therefore, treatment acd resulted in the highest actual return but without a claim . Due to different period lengths, t reatment s had different number s of rounds as table 14 - 6 shows. T he last five metrics (italicized) in this table did not include the first round, for the same reason mentioned under treatment (1) . However, these treatment s had complete last round s, which were considered in estimations. The seven data points can give an estimate of the effects of the four factors . To include the additional factors, I define d variables X C = T P and X D = T V as predictors in the new model : 0 n A . X A B . X B C . X C D . X D Table 14 - 7 shows the data and table 14 - 8 shows the estimates of the model coefficients as well as the relevant statistics. Treatment y ($M) Ln(y) n X A X B T P = X C T V = X D (1) 0.78 - 0.245 31 0 0 4 4 a 36.83 3.606 27 1 0 4 4 b 0.49 - 0.722 28 0 1 4 4 ab 11.45 2.438 29 1 1 4 4 ac 12.14 2.497 26 1 0 6 4 ad 24 3.178 26 1 0 4 3 acd 427 6.057 25 1 0 6 3 Table 1 4 - 7 : L evels of the D ependent and I ndependent V ariables in the F irst S even T reatments 94 Term Coefficient Standard Error Standardized B eta Coefficient p Intercept 2.48 22.2 0.93 n 0.061 0.77 0.055 0.95 X A 3.1 2.32 0.653 0.41 X B - 0.293 1.94 - 0.062 0.90 X C 0.473 1.16 0.199 0.75 X D - 1.627 2.32 - 0.343 0.61 R 2 = 0 . 876 , R 2 adj =0.259 , F= 1.42 , df = 1 , N=7 Table 14 - 8 : Regression Results for the F irst S even T reatments This model has one degree of freedom for the residuals, and none of the predictors has a significant effect. After applying step wise regression (forward or backward) , factor A is significant (positive effect) with five degrees of freedom for the residuals . Tabl e 14 - 9 shows the result s of stepwise regression . Term Coefficient Standard Error Standardized Beta Coefficient p Intercept - 0.483 0.94 0.63 X A 4.038 ** 1.12 0.851 0.015 R 2 = 0.724 , R 2 adj = 0 . 669 , F=13.1 * * , df = 5 , N=7 Table 14 - 9 : Stepwise Regression Results for the F irst S even T reatments Note: This supports hypothesis H A that approval voting significantly improved the perfo rmance of the constitutions, but we need more power to d etect s maller effects . Since treatment acd resulted in the highest performance , it became the baseline and initial constitution for the next set of treatments . I applied the factor selection guidelines on the results of treatment acd to choose new factors for the next treatments. This treatment re sulted in low selection accuracy ( 0.5 ) and average accuracy ratio (1.57). O nly in 3 out of 6 rounds, the best version won. The average number of selected ch oices per period was 1.94 with 11.5 choices per period on average . That means each voter selected about 17% of choices each period on average . Looki ng into the data reveals that some participants took advantage of the approval voting system and select ed almost all choices to get selection rewards without evaluating versions . One way to cope wi th 95 this problem is to make the selection reward dependent on the number of selected choices to deter selecting bad choices . Hence as factor E , I changed the selection reward from fixed amount of $.03 to variable amount of $.0 1 *(number of choices not selected ) and hypothesized ( H E ) that it improves the performance Accordingly, I changed the reward clause in the constitution to the following : Reward s : After each voting period, if your suggestion wins, you will receive $1.00 bonus, and if the choice you voted for wins, you will receive a $0.01 bonus for every choice you did t sele ct in that period. That is $0.0 1 *(Number of Choice Number of Choices Se lected). For example, if there were 10 choices and you voted for 3 of them and on e of them wi n s , then you receive 7 cents for voting in that period. I defined the dummy variable X E to represent the new factor in the model. It equals one when t he selection reward depends on the number of selected choices and zero otherwise. Generally, the reward function is a categorical variable when there are multiple options, but here we compare only two options. Another improvement opportunity is the quantity a nd quality of suggestions. Again, two participants comment ed that they could make better plans if the suggestion periods were longer . Moreover, out of seven winning suggestions, three were submitted in the last minute. Hence, I increased factor C further to new level X C = T P = 8 minutes denoted by C' hereafter . I changed the suggestion clause accordingly. This provides more data to test hypothesis H C . As a result , based on the suggested procedure, the next treatments are ac'd , acde and ac'de , thereby yielding total of ten points including the previous results. 96 1 4 .6. Treatments ac'd, acde and ac'de Treatment ac ' d has longer suggestion periods ( T P = 8 min ) and treatment a c d e uses the variable reward function for voting . Treatment ac ' d e applies both changes . Everything else is identical to treatment a cd from the previous section . After publishing the HITs for these treatments, total of 75 workers signed up, but 9 of them did it too late after the experiments had started. Hence, 2 3 , 2 2 and 2 1 subjects were assigned to treatments ac 'd , a c d e and ac ' d e respectively , and 2 2 , 19 and 2 0 of them passed the constitution test and qualified as participants. Table 14 - 10 summarizes the results of these three treatments. Metrics Treatment ac 'd Treatment acde Treatment ac ' d e Subjects (Treatment delivery) 23 2 2 2 1 Participants (Treatment receipt) 2 2 ( 96 %) 19 (86 %) 2 0 (9 5 %) Abandoned (Measurement attrition) 6 ( 18 %) 2 ( 11 %) 5 ( 25 %) Completed (Treatment adherence) 18 ( 82 %) 17 (89 %) 15 ( 75 %) Average age 37.06 33.53 31.36 % Female 39 % 41 % 53 % % English speaker 9 4.4 % 100% 100% Actual return (performance) $ 101.8 M $ 1.203 M $ 15.93 M Claimed return $ 121.97 M $ 1.5 M $ 22.69 M Truthfulness 83.46 % 80% 70.2% Total Cost $ 2 24.12 $ 229.03 $ 199.30 Total bonus $ 6.77 $ 20.86 $ 16.08 Average bonus per participant $ . 376 $ 1.227 $ 1.072 Number of rounds 6 7 6 Average m (versions per period) 9.8 13.33 9.2 Average number of votes per period 31.0 30.33 19.20 Average number of selected choices 2.14 2.0 1. 63 Accurate Selections (p) 5 out of 5 ( 1.0 ) 4 out of 6 (0.6 7 ) 2 out of 5 (0. 4 ) Average accuracy ratio 1.42 1. 03 0.8 Table 14 - 10 : S ummary of the O utcomes for T reatments ac 'd , a c d e and ac ' d e These treatments resulted in feasible final plans with actual returns of about 102 , 1.2 and 16 million dollars with truthfulness of between 70% and 80. T reatment ac ' d resulted in the highest actual return with 97 the highest truthfulness of 83% . It also had the best selection accuracy ( p 1.0 ) with the best choice winning every period. Its average accuracy ratio (1.42) was large compared to the other two (1.03 and 0.8) . Apparently, the new reward function was not very effective . It could not even lower the average number of choices selected by the voters (18% and 15%) . That is despite the largest total and average bonus es in these treatments . Perhaps this new rule was too complex for subjects . The next section tests that. 14.7. Including Control Variables Due to different period lengths, the t reatment s had different numbers of rounds . T he last five metrics in the table did not include the first round, but included the last round because the last rounds in these treatments were complete . Now we have ten data points to estimate the effects of five factors and some control variables. Table 14 - 11 shows the data for the response variable y , five predictors (factors) and six control variables. The control variables from left to right are n , d ay, a verage e xpertise , a verage c omprehension , f emale p ercentage and t otal b onus. As before, n is the number of participants per group. d ay is the day of experiment among the four weekdays starting from Monday as day one. The experiments were in blocks of one or three treatments and each block was in a different day of the week. This might have brought about a block effect on the performance. Treatment y ($M) Ln(y) X A X B X C X D X E n Day Exp Comp F % Bonus (1) 0.783 - 0.245 0 0 4 4 0 31 1 3.227 6.818 0.476 7.95 A 36.83 3.606 1 0 4 4 0 27 2 3.882 7.294 0.412 8.74 B 0.486 - 0.722 0 1 4 4 0 28 2 4.100 7.450 0.316 8.68 ab 11.45 2.438 1 1 4 4 0 29 2 4.000 7.000 0.316 8.68 ac 12.14 2.497 1 0 6 4 0 26 3 4.700 6.900 0.200 7.32 ad 24 3.178 1 0 4 3 0 26 3 3.368 7.368 0.421 10.80 acd 427 6.057 1 0 6 3 0 25 3 3.667 8.056 0.500 7.35 ac'd 101.8 4.623 1 0 8 3 0 22 4 3.167 5.389 0.389 6.77 acde 1.203 0.185 1 0 6 3 1 19 4 4.235 7.412 0.412 20.86 ac'de 15.93 2.768 1 0 8 3 1 20 4 3.200 7.000 0.533 16.08 Table 1 4 - 11 : L evels of the D ependent a nd I ndependent V ariables in A ll Ten T reatments 98 Exp is the average expertise score of participants in each group , and Comp is the average comprehension score of participants in each group. I defined c omprehension of participants as t heir self - reported score (between one and ten) in response to the following rating question in the final survey: Was the task description clear and understandable? [ 1= 10= Comp letely clear and easy to understand] I also defined expertise as the score (1 to 10) a participant gave to the following question in the final survey: What is your level of expertise in financial investment and stock market? [ 1= Never heard of it 0 = A professional trader in financial markets] Appendix D provides screenshots of the final survey including the above questions. F% is the p ercentage of female participants in each group, and B onus is the total bonus earned by participants in each group. Table 14 - 12 presents the average and standard deviation of the factors and control variables, as well as the Pearson corre lations among them . Mean St d . Dev. Ln(y) 2.44 2.16 Ln(y) X A .80 .42 .71** A X B .20 .42 - .38 - .38 B X C 5.40 1.65 .41 .45* - .45 C X D 3.50 .53 - .45 - .5* .5 - .64** D X E .20 .42 - .23 .25 - .25 .51 - .5 E n 25.3 3.89 - .22 - .57* .43 - .8*** .79*** - .79*** n Day 2.80 1.03 .38 .66** - .41 .84*** - .82*** .61* - .95*** Day Exp 3.75 .52 - .26 .09 .3 - .25 .46 - .04 .07 - .07 Exp Comp 7.07 .69 - .07 - .05 .12 - .46 .04 .10 .14 - .22 .34 Comp F% .40 .10 .22 .01 - .43 .24 - .57* .4 - .28 .14 - .77*** .18 F% Bonus 10.32 4.58 - .34 .23 - .19 .26 - .47 .94*** - .7** .53 .06 .23 .32 Table 14 - 12: Descriptive Statistics and Correlations amon g Group - Level Variables Note: 99 F irst , I ran the regression on the five factors and control variable n , but not other control variables so the results be comparable to previou s results. Table 14 - 13 shows the results of this regression with ten treatments. This model had three degree of freedom for the residuals, and none of the predictors had a significant effect except X A as before. The insignificance of the F statistic indicates lack of fit of the model. However, after applying stepwise regression (backward), factors A and E are significant with seven degrees of freedom for the residuals. Table 14 - 14 shows the results of stepwise regression. This demonstrates that approval voting signifi cantly improved the performance of the constitutions, and the new reward function significantly reduced it. The control variable n was not significant. Term Coefficient Standard Error Standardized Beta Coefficient p Intercept - 5.185 11.424 0.681 n 0.333 0.404 0.598 0.471 X A 3.597 * 1.428 0.701 0.086 X B - 0.25 1.258 - 0.049 0.855 X C 0.617 0.494 0.47 0.3 X D - 1.864 1.495 - 0.454 0.301 X E - 2.153 2.076 - 0.419 0.376 R 2 = 0. 874 , R 2 adj = 0.621 , F= 3.46 , df = 3 , N=10 Table 14 - 1 3 : Regression Results for the T en T reatments Note: Term Coefficient Standard Error Standardized Beta Coefficient p Intercept - 0.483 0.97 0.634 X A 4.216 *** 1.12 0.821 0.007 X E - 2.257 * 1.12 - 0.44 0.084 R 2 = 0.687 , R 2 adj = 0. 598 , F=7.7** , df = 7, N=10 Table 14 - 14 : S tepwise R eg ression Results for the T en Treatments Note: N ow with more data points , I consider other control variable s in the model as well : 0 A . X A B . X B C . X C D . X D + E . X E + n .n + Day . Day + Exp . Exp + Comp . Comp + F% . F% + Bonus . Bonus + 100 Since the number of parameters is more than the number of data points, I used forward and backward stepwise regression to achieve the best fitness statistics ( R 2 adj , F and p values ) . In this process, the total bonus ( Bonus ) showed a significantly negative effect on performance, but it has large collinearity with factor E, so when factor E is included, Bonus does not have a significant effect anymore and is eliminated from the model. Table 14 - 1 5 shows the coefficient estimates for this model after stepwise regression. Term Coefficient Standard Error Standardized Beta Coefficient P Intercept - 12.16*** 1.65 0.002 X A 3.538 ** * .284 0. 689 0 X C 0.732** * 0. 103 0. 557 0.0 0 2 X E - 4.486 * * * 0. 325 - 0. 874 0 F% 8.605 *** 1.18 0.394 0.002 Comp 0.748 ** 0.197 0.239 0.019 R 2 = 0. 995 , R 2 adj = 0. 98 , F= 87.67* ** , df = 4 , N=10 Table 14 - 15 : S tepwise R egression Results for the Ten Treatments with all C ontrol Variables Note: , The result has four degrees of freedom for the residuals . F actors A and C have significantly positive effects, but factor E has a significantly negative effect. In addition, percentage of female participants had a significantly positive effe ct on performance. Average comprehension also had a significantly positive effect on performance , but average expertise did not have a significant effect after including average comprehension. However, average expertise was significantly positively associa ted with comprehension ( standardized beta coefficient = 1.18, p =.016) when control ling for the percentage of female participants . Thus , average comprehension fully mediates the effect of average expertise on performance. Interestingly, factor C did not have significant effect before including the control variables as tables 14 - 13 and 14 - 14 show. In fact, the control variables F% and Comp were suppressor s such that excluding them from the model suppressed the effect of factors C. A fter controlling for suppression, the results demonstrate d that longer suggestion periods (factor C) im proved performance as expected, supporting hypothesi s H C . Nonetheless , the results reject ed hypothesis H E and did not support hypothese s H B or H D . 101 At this point, we can move in the direction of the steepest ascent using the estimated coefficient of factor C for direction ( Ln C (y. C ) = + . 7 3 2 ) . T he other factors are set at their best perform ing levels, so X A = 1 and X E = 0 . Factor B was not significant, so we can set X B = 0 , which was the level used in seven treatments including the best performing ones . Factor D was not significant either, but because its coefficient was negative, we use X D = 3 minutes for the steepest ascent. The rest of the procedure follows standard RSM. 14.8. Subject Level Analysis Throughout the ten treatments, total of 295 subjects sig n ed up in the website (Appendix B) and 253 of them signed up in time and passed the constitution test (Appendix C). Even though the program collected demographic information on all the subjects who signed up, the survey scores are only avilable for the ones who stayed in the experiment and completed final survey. Moreover, the subjects who did not parti cipate in the experiment might have provided false demographic information. Therefore, for subject level analysis , I only used the data from the 185 final participants who completed the final survey . Table 14 - 1 6 presents some basic descriptive statistics for the subjects. In the analyses, I used the average age (35.68) for the five participants who did not specify their ages, and the expected value (.3934) of the gender dummy variable for the two participants with out gender specification. There was no other missing information. Variable Average Standard Deviation (Sample) Mininum Maximum Bonus Earned $ 0 . 56 2 0 .756 0 $3.45 Age 35.68 9.838 20 70 Comprehension 7.065 2.43 1 10 Expertise 3.762 2.154 1 10 Language 97.3% English Speaker 2.7% Not English Speaker Gender 39.34% Female 60.66% Male Table 14 - 1 6 : Subject - Level Descriptive S tatistics on t he 185 F inal P articipants from All Treatments 102 T he most relevant performance measure for participants is the bonus each participant earned , because bonus was based on contributions and reflect their performances . This section proposes a structural m odel to exp lain the antecendents of bonus as the response variable. Figure 14 - 1 illustrates the path diagram of the proposed structure with the relevant hypotheses labeling associations. Figure 14 - 1: Path D iagram for the F ull M odel with Labeled Associations T his model includes treatment factors and participant characteristics to predict/ explain bonus in a treatment . This model has four control variables among which, g ender and a ge can affect all three endogenous variables as figure 14 - 1 shows. Block ( day of exp e riment ) and voting reversibility are control variables for the response variable bonus . There is no latent variable and all variables are directly measured. While the main purpose of this section is exploration rather than confirmation, the following ten hypotheses justif ied the proposed structure : H 1 : The number of participants in a treatment group is negative ly associated with the bonus for each participant. That is simply because more competition reduces the chance of each participan t to win. H 2 : Approval voting is positively associated with the bonus. That is because each participant can select multiple choices and have a higher chance of winning the prediction voting reward. 103 H 3 & H 4 : Longer suggestion periods ( T P ) and longer selection periods ( T V ) have negative effects on bonus. That is because longer periods lead to fewer periods thereby fewer winner s . H 5 : Using reward function instead of fixed reward (Factor E) is positively associated with bonus. That is because this reward function pays more to incentivize more selective votes. H 6 : Using reward function is negatively associated with how well participants understood the task and instructions (comprehension) . That is because it is relatively more complex than a fixed reward. H 7 : English speakers better understand (comprehension) the task and process. That is because the task and instructions are in English. H 8 : Higher comprehension is positively associated with higher bonus. That is because those who understood the task better, could participate better and win more rewards. H 9 : The participants with higher exp ertise in the subject have better task understand ing and comprehension. That is because of their familiarity with the concepts. H 10 : The participan ts with higher expertise in financial markets earn more bonus. That is because of their experience and knowledge in using the tools . I used SPSS and AMOS software for path analysis on the structural model . T ables 14 - 17 , 14 - 18 and 14 - 19 present the path coefficients for the endogenous variables bonus, comprehension and expertise respectively . Figure 14 - 2 provides a screenshot of the output (standardized) of the AMOS software . 104 Term Coefficient Standard Error Standardized Beta Coefficient P Intercept 1.97 3.001 0.512 N - 0.025 0.085 - 0.121 0.771 Approval Voting (X A ) - 0.023 0.192 - 0.013 0.904 Revisable Voting (X B ) - 0.006 0.157 - 0.003 0.972 T P = X C - 0.046 0.066 - 0.094 0.483 T V = X D - 0.178 0.191 - 0.118 0.354 Reward Function (X E ) 0.62 ** 0.302 0.311 0.041 Comprehension 0.037 * 0.022 0.12 0.091 Expertise 0.073 *** 0.025 0.209 0.004 Gender - 0.222 ** 0.109 - 0.143 0.044 Age - 0.008 0.005 - 0.099 0.152 Block (Day) - 0.061 0.303 - 0.08 0.84 R 2 = 0. 272 , R 2 adj = 0. 226 , F= 5.882 ** * , df = 1 73 , N=185 Table 14 - 17 : Path C oefficients for B onus as the D ependent V ariable Note: Term Coefficient Standard Error Standardized Beta Coefficient P Intercept 7.198 *** 1.33 0 Reward Function (X E ) 0.254 0.457 0.04 0.579 Expertise 0.304 *** 0.084 0.27 0 English Speaker - 0.834 1.06 - 0.056 0.432 Gender - 0.671 * 0.372 - 0.135 0.073 Age - 0.007 0.018 - 0.028 0.699 R 2 = 0. 119 , R 2 adj = 0. 0 9 4 , F= 4.833** * , df = 1 79 , N=185 Table 14 - 18 : Path C oefficients for C omprehension as the D ependent V ariable Note: Term Coefficient Standard Error Standardized Beta Coefficient P Intercept 3.663 *** 0.574 0 Gender - 1.398 *** 0.312 - 0.316 0 Age 0.018 0.015 0.083 0.24 R 2 = 0. 102 , R 2 adj = 0. 093 , F= 10.386** * , df = 1 82 , N=185 Table 14 - 19 : Path C oefficients for E xpertise as the D ependent V ariable Note: 105 The results support hypotheses H 5 , H 8 , H 9 , H 10 , but coul d not reject the null hypothesis for the other hypothesized relationships ( H 1 , H 2 , H 3 , H 4 , H 6 and H 7 ) . Moreover, two control variables age and block did not have significant relationships with the relevant endogenous variables, but gender had a significant ly negative association with comprehension, expertise and bonus. Gender was coded as zero for male and one for female. Particularly, on average , men ($.69) earned about twice as much as women ($.37) . This finding is interesting given that in the previous analysis, the percentage of female participants had a significantly positive effect on performance. The goodness of fit statistics for t his model are relati vely poor : CMIN = 1452 ( p< 0 .01 ) , NFI = . 065 , GFI = . 456 , AGFI =. 175 , RMR = . 735 , RMSEA = 0. 355 and as for parsimony PNFI = . 05 . Therefore, t o better illustrate and estimate the effects of significant variables, I ran backward elimination and f ou nd the path coefficients in a reduced model. Tables 14 - 2 0 , 14 - 2 1 and 14 - 2 2 present t he significant path coefficients for bonus, comprehension and expertise respectively . Figure s 14 - 3 and 14 - 4 respectively present screenshot s of the standardized and unstandardized outputs of the AMOS software for the reduced model. Figure 14 - 5 presents another illustration of the path diagram for the reduced model, declaring the s ignificance of the unstandardized coefficients. Figure 14 - 2: Standardized Results of Path Analysis from AMOS for the Full Model 106 Term Coefficient Standard Error Standardized Beta Coefficient P Intercept - 0.03 0.178 0.866 Reward Function (X E ) 0.755 *** 0.129 0.379 0 Comprehension 0.043 ** 0.021 0.137 0.047 Expertise 0.066 *** 0.025 0.188 0.008 Gender - 0.225 ** 0.106 - 0.145 0.035 R 2 = 0. 225 , R 2 adj = 0. 239 , F= 15.413** * , df = 1 80 , N= 185 Table 14 - 2 0 : Path C oefficients for B onus as the DV in the R educed M odel Note: Term Coefficient Standard Error Standardized Beta Coefficient P Intercept 6.179 *** 0.417 0 Expertise 0.306 *** 0.083 0.271 0 Gender - 0.671 ** 0.366 - 0.134 0.068 R 2 = 0.114 , R 2 adj = 0. 104 , F=11.696 *** , df = 1 82 , N=185 Table 14 - 2 1 : Path C oefficients for C omprehension as the DV in the R educed M odel Note: Term Coefficient Standard Error Standardized Beta Coefficient P Intercept 4.3 *** 0.194 0 Gender - 1.367 *** 0.311 - 0.309 0 R 2 = 0. 096 , R 2 adj = 0. 091 , F= 19.339*** , df = 1 83 , N=185 Table 14 - 2 2 : Path C oefficients for E xpertise as the DV in the R educed M odel Note: 107 Figure 14 - 3: Standardized Results of Path Analysis from AMOS for the Reduced Model Figure 14 - 4: Unstandardized Results of Path Analysis from AMOS for the Reduced Model Figure 14 - 5: Path Diagram with the Unstandardized Path Coefficients for the Reduced Model 108 The reduced model has far be tter goodness of fit statistics: CMIN is 1.325 and is not significant anymore ( p=0.723 ). For the reduced model NFI = .986, GFI = .997, AGFI =.986, RMR = .01 and RMSEA is less than .01 , and r egarding parsimony PNFI = .296 for the reduced model. Furthermore, t he reduced model better reveal ed the three mediation al relationships . First, comprehension partially mediate d the effect of expertise on bonus. The direct effect of expertise on bonus was . 066 ( p=.008 ) and the indirect effect was .013 , for which the Sobel test statistic is 1.79 with p value of .07 , indicating a significan t indirect effect . Therefore, t he total ef fect of expertise on bonus was .079 of which , 84% was direct and 16 % was through increasing comprehension. P articipants with more expertise earned more rewards partially because they could better understand the task and process, but mostly (five times more ) because of other reasons above and beyond the resulting improvement in comprehension . Seco nd, expertise partially mediated the association between gender and comprehension. Gender ha d a negative signi ficant direct effect of amount - .67 ( p=.068 ) on comprehension. Its indirect effect on comprehension was - .42, for which the Sobel test statistic is - 2.82 with p value of .005, implying a significant mediation . Therefore, the total effect of gender on comprehension was - 1.09, o f which 62% was direct and 39 % was because of a negative association with expertise. Third, in addition to a significantly negative direct effect of - .225 ( p=.035 ) , gender also had indirect negative effects on bonus through comprehension ( - .029 ) , expertise ( - .09 ) and the comprehension resulted from expertise ( - .02 ) . The numbers in parentheses are the estimated indirect effect s of the three mediational paths. The Sobel test statistic for the first mediator (comprehension) is - 1.37 with p value of .17 , indicating a non - significant path. For the second mediator (expertise), it is - 2.26 with p value of .024, indicating a significant mediation . Hence , expertise partially mediated the relationship between gender and bonus. T he tota l effect of gender on bonus was - .36 , of which 62 % was direct and 38 % was indirect . While e xpertise can partially explain the gender wage gap, one should look for other plausible mediators such as attention span and care for money , in order to explain the direct effect. Also, the effect of gender deserves further study as the percentage of female participants had a significantly positive effect on performance, yet when examined on a subject level, females received a lower bonus. 109 15. Discussion and Limitations The subject level results demonstrated the role of expertise and gender in the success of participants to earn bonus. As figure 14 - 3 illustrates, the standardized direct effect of expertise on bonus (.19) was the largest direct effect among gender, compreh ension and expertise ( not factor E). E xpertise also had an indirect effect through improving comprehension. The total standardized effect of expertise on bonus (.225) was comparable to the total standardized ef fect of gender on bonus ( - .232) in the subject level analysis. One important finding was t he significantly negative effect of gender on bonus . It is consistent with the findings of Niederle and Vesterlund (2011) s. Since the e - constitution code could not discriminate against or in favour of any group, this result demonstrated the possibility of a significant gender pay gap under zero possibility of discrimination . On the other hand, in group level, the percentage of female participants had a significantly positive effect on the group performance as table 14 - 15 shows. This might be because female participants cared more about teamwork and group performance rather tha n winning rewards, while perhaps male participants were more selfish and aggressive in competing inside group s to win more rewards. Another possible explanation is that most groups had disproportionately more male participants than female (40% female on av erage) and thus groups with more female participants were more balanced and diversified , which led to generating and selecting more novel and superior ideas. As explained in chapter 11, Hanson (2009) found diverse teams are more effective. All in all , this is an area that requires further research. While gender had large significant effects on all endogenous variables in the subject level analysis, age did not have any significant effect on any variable even on expertise. Th at is surprising because older people often have more experience and knowledge. One possibility is that older people with expertise in financial markets are too busy to work in MTurk. Another surprising finding is that , being English speaker did not have a significant effect on comprehension. Perhaps, that is because the registration form asked if the - native English speakers were fluent in English and could understand the task and instructions well enough to earn a good bonus. Factor E did not have significant 110 effect on comprehension either, rejecting the hypothesis that the new reward function made the task harder to understand. The significant effect of factor E on bonus was predictable , because the new reward function paid more on average. However, one cannot interpret this bonus increase as performance improvement . P articularly , the two treatments with the new reward function did not yield bette r outcomes. The insignificant effect of approval voting on bonus is because most of participants did not select too many choices to take advantage of approval voting scheme. As tables 14 - 2, 14 - 6 and 14 - 10 show, the average number of choices selected was between one and two in most treatments with approval voting as opposed to one in other treatments. Two o ther unexpectedly in significant factors in subject level are C ( T P ) and D ( T V ), which determine d the number of rounds. S horter periods resulted in more rounds and more rou nds should have brought more winning reward s . However, as table 14 - 20 shows, the effects while negative were not large enough to be significant. That seems to be due to the small variation in the number of rounds, which was obscured by larger significant effects of other variables such as gender and expertise. The same argument goes for the in s ignificant effect s of the number of participants in each group ( n ). It is worth noting that the effects of variables on individual bonuses are different from their effects on the performance of group. Expertise had direct and indirect positive effects (through comprehension) on bonus in the subject level analysis, but average expertise had only indirect positive effect (through average comprehension) on group performance. Factor E had opposite effect s on individual bonus and group performance. Approval voting improved group performance, but did not increase individual rewards. Same thing holds for longer suggestion periods ( T P ). In fact, the total bonuses paid per treatment reflect the cost of treatment as another response. While a higher group performance implies a more effective constitution, a larger bonus means a more costly and less efficient constitution. This makes approval voting and longer suggestion periods even more desirable, because they improved performa nce while not increasing costs. The group level results demonstrated the effectiveness of approval voting in improving the performance of constitu tion . Th at is probably because approval voting does not fall under the impos sibility theorems (Maniquet & Mangin, 2011) , while plurality voting violates the independence of irrelevant alternatives (Nisan, et al., 2007) . Remarkably, approval voting resulted in superior performance despite rewarding the 111 voter s based on prediction voting. The p redication voting incentive scheme s reward voters for voting for the winner choice and approval voting al lows voters to vote for as many choices as they want. This should incentivize rational voters to vote for every choice to maximize their chance of winning. However, it did not happen as often as utility theory would predict. Voters rarely voted for more than three choices and they were more sele ctive than approval voting allowed . However, a mechanism that deters selection of inferior choices may improve the performance of approval voting even further . A reward function that penalized voting for wrong choices (Factor E) was o ne such attempt , but the results did not support its hypothesized effect . One may try to ascribe this observation to the complexity of the function and hypothesize that participants did not understand it , but the subject level results rejected the hypothesized negative relationship between comprehension and factor E. Ano ther possible explanation is that participants were selecting too few choices anyway , and this incentive could not reduce the ir number of selected choice s any further , but rather distracted them . Perhaps a less distractive mechanism is to limit the number of choices each participant can select . Accordingly, one may define variable X A as a continuous variable between zero and one, indicating the proportio n o f choices a voter can select in each period. Another finding in the group level analysis was that l onger suggestion periods ( T P ) improved the performance of the design process. In fact, several participants commented in the final survey that they needed more time to edit and create a new suggestion. That can also be ascribed to the relative reward amounts for suggestion and voting. The reward for the winning suggestion ($1) was much larger than the reward for right selection ($.03), therefore participants wanted to spend more time on making suggestion s rather than voting. As future research, one can investigate the interaction effects between relative rewards and period lengths. The significance of control variables in the group level analysis has an important theoretical implication. It shows that missing a confounding variable in the model can suppress some significant effects and give rise to misleading results . Hence, it is imperative to consider every plausible cause and test their effects . When the degrees of freedom are very limited (as in this project), stepwise regression can be considerably helpful in detecting significant confounding variables. 112 Generally, t he results demonstrated that t he characteristics of the design process (factor s A, C and E) as well as the designers (expertise, gender) can ha ve significant effect s on the quality of the outcome. However, others (Brooks, 2010; Deng & Ji, 2018) proclaimed that i t is not the design process, but rather it is the designer that drives the quality of design. This research showed that the process often complements the designer(s) particularly when there are many designers (collective design) involved . For example, meritocr atic schemes give more weight s to the inputs from more expert designers. F urther research can analyze the relative impacts of design process and designer characteristics on performance. The group level results also demonstrated how the suggested procedure could improve the performance of constitution from $.78M to $427M in just ten experiments. The main advantage of the suggested procedure was that the effects of factors C and E were estimated mostly at the more effective levels of fact ors A and B , because that is the region of interest and thus the effects of interest. Standard RSM would require more than 24 runs to detect the same effects and reach the same conclusion . O nly six out of those 24 runs would be i rreversible approval voting (region of interest) . Other trials would estimate effects outside the region of interest that do not help in moving toward the optimal point. Conversely, t he suggested procedure yielded 7 out of ten runs in that region of interest as tables 14 - 11 show s . That is because standa rd RSM emphasizes on symmetry and orthogonality, whereas the suggested procedure distorts the experimental design towards more experiments inside or closer to the region of interest. Moreover, RSM sees the process as a black box and considers all factors equally as important while t he suggested procedure applies specific g uidelines to utilize deeper analysis of the mediators to decide which factors can better improve the performance of the process. This allow s for using information from prior experiments t o design later ones by introducing more factors to the model. The suggested procedure is particularly useful when there are many parameters and we do not know which ones are more important to change as factors. One may modify the guidelines to incorporate other constitutional parameters and features and apply the suggested procedure to improve crowdsourcing protocols, blockchain protocols or other types of constitutions . Practitioners may develop similar guidelines for other processes and applications so that they can use the suggested procedure. To this end, they need to specif y parameters for their process as in RSM. 113 One limitation of the suggested procedure is that it is not an algorithm, but rather depends on a set o f guidelines which require subjective judgements . As a result, different practitioners or experimenters may go in different paths to improve constitutions. Particularly, the outputs of experiments are subject to noise and sample variance, thus even RSM can result in different paths for the improvement of constitutions in different trials with different subjects. Another limitation of this study is the small number (10) of experiments. A larger sample size (in terms of treatments) could give more statistical power and enable us to estimate the effect of less impactful factors such as revisable voting (factor B) . Another limitation is the lack of replicate d runs and estimate of pure error ( SS PE ) . To test the lack of fit, one should estimat e the pure error by replicating some experiments . Testing l ack of fit is particularly important in forming and using second order model and the final steps of RSM. One limitation of the subject level analysis is that there was one item to measure each of the two construct s comprehension and expertise. It might pose a threat to construct validity. A better approach would include two or three questions for each constr uct. Another threat to construct validity is that the expertise was measured in the final survey after the experiment, so the performance of the subject in the experiment could affect his/her perception of his expertise in the financial markets. A better a pproach would be to Perhaps the most important limitation of the results is external validity. The groups were very small (about 20 subjects ), the duration of the process was on e hour and the prediction voting was the only incentive for better selection. Other incentive mechanisms such as group reward could result in different outcome and different effects for factors , p erhaps, an even stronger effect for approval voting . Moreover, the design problem (retroactive investment plan s ) was unique. While this particular problem brought many advantages regarding in ternal validity and reliability, it poses a threat to external validity and ecological validity because it does not ha ve a real world application . For future research, one may try to use the suggested procedure to solve a more practical problem such as designing a financial portfolio. However, other problems may need more elaborate evaluation for the quality of solution. 114 1 6 . Conclusion s This project started with the problem of collective design in the cyberspace. While it is akin to the problem of collective action, information technology has brought abou t new aspects to this problem. O n one hand, computers while being im partial and more trustworthy, can execute rules faster than humans. On the other hand, the anonymity of users poses new challenges to collective decision making in cyberspace. To better understand various aspects of this problem, I introduce d the concept of e - constitution and developed a design model including a structured representation and formalization for it. This design model as a meta - artifact decomposes a n e - c onstitution into 14 components and parameters including a state transition fun ction and a weighting function. This model implies that most of collective action structures are special cases of the same phenomenon but mostly with different weighing functions. Moreover, this model highlights the importance of the convexity or concavity of the weighting function in distributing and control ling power. The constitution model provides a framework to design various governance structures such as crowdsourcing schemes, collective intelligence systems, blockchain protocols, DAOs and organizatio nal bylaws. The main problem in most of these situations is how to aggregate individual inputs into one collective output. In this regard, the social choice function and weighing function play crucial roles. T his model defines the objective of a constituti on as collective design of a solution for a problem. T he solution can be a decision, a policy, a product or a financial plan as in the experiments of this project . This objective provides quantifiable performance measures for constitutions, namely quality, cost and speed. The value of the outcome solution reflects the quality of the constitution. This project discussed the about its mediators . They imply several hy potheses regarding constitutional parameters and features. One may consider this model a nascent design science theory for e - constitutions. Another contribution of this project is proposing a method to improve the performance of constitutions systematically. It includes a procedure based on RSM and a set of guidelines to introduce factors to the 115 model in batches instead of simultaneous factori al design. The procedure improves a constitution more efficiently by utilizing information obtained in the prior experiments to design the next experiments. To this end, the constitution design model and its parameterization was essential in development of guidelines. As a proof of concept, I used the method and design model to improve a simple collective design process through online experiments on MTurk. The results demonstrated the utility and effectiveness of the model and method in improving the perfor mance of constitutions. An important finding is that some constitutional characteristics such as the voting scheme (social choice function) can significantly affect the quality of the outcome design. Hence, the quality of outcome design can be used as an i ndicator for the performance of constitutions. This enables us to evaluate and compare constitutions objectively free from any value assumption. The ultimate goal is to move towards an optimal constitution. 116 APPENDICES 117 APPENDIX A: IRB Application Determination Letter Figure A - 1: IRB Approval and Application Determination Letter 118 APPENDIX B: Registration Page I ncluding the Consent Form Figure B - 3 : Screenshot of the M ain P age in the W eb A pplication 119 APPENDIX C: Screenshots of the W ebpages for the Experiment Figure C - 1: C onstitution P age for P articipants to Read the C onstitution as I nstructions 120 Figure C - 2: S uggestion P age for P art icipants to S ubmit their S uggestion 121 Figure C - 3: Voting P age for P articipants to S ee V ersions and S ubmit their Selection 122 Figure C - 4: History P age for P articipants to See the Versions from Previous Periods 123 APPENDIX D: Final Survey Webpage Figure D - 1: Survey Page for Participants to Answer Seven Questions after Experiment 124 Figure D - 125 APPENDIX E: Control Panel for the Experimenter Figure E - 1: Control P anel for E xperimenter to I nstantiate E - constitutions and T reatment Groups 126 APPENDIX F: Computer Code of the Generic E - Constitution This appendix presents a computer code that instantiates a generic e - constitution based on constitutional parameters specified in the control panel. /* Period = - 9 : Null 0 : Before Starting 1 , 3 , odd : Suggestion 2 , 4 , even : Voting - 1 : Final Survey - 11 : Experiment Ended After Final Survey DT = The time point indicating the end of the current period. */ if (DeadLine < DateTime .Now && !Active) // Did not pass the constitution test and now it is too late. { ClientScript.RegisterStartupScript(GetType(), "Attention!" , "alert('Your time has expired.');" , true ); return ; } else if (Period < - 10 || !User[ "Terminated" ].Equals( DBNull .Value) ) { LabelLogin.Text = User[ "Name" ] + " ! Your final balance is $" + (( float )User[ "Balance" ]).ToString( "N2" ); return ; } else if (Period == - 9) // Null : Experiment Not Started Yet { return ; } else if (Period == 0 || !Active) // Period = 0 or Participant not active yet { Response.Redirect( "~/Constitution.aspx" ); } else if (Period == - 1) // Final Period and Participant is active { Session[ "Treat" ] = ( int )User[ "Treatment" ]; Session[ "Group" ] = ( int )User[ "Group#" ]; Response.Redirect( "~/Survey.aspx" ); } else if ( Period % 2 == 1) // Suggestion Period and Participant is active Response.Redirect( "~/Suggestion.aspx" ); else // if (Period % 2 == 0) // Voting Period and Participant is active Response.Redirect( "~/Voting.aspx" ); 127 // Submitting Suggestion: ******************************************************************************** if ( DateTime .Now < Suspended) { LabelLogin.Text = "You cannot propose for " + Global .LeftTime(Suspended); return ; } // Already proposed? query = "select * from Versions where Treatment = " + Treat + " and Group# = " + Group + " and Period=" +(Period+1)+ " and Proposer=@User and Choice <> 0" ; SqlDataReader Version = com.ExecuteReader(); if (Version.Read()) { LabelVersion.Text = "You changed your suggestion" ; query = "UPDATE Versions SET Solution = @Solution, Time = @Time, HtmlSolution = @HtmlSolution " + "WHERE Treatment = @Treatment AND Group# = @Group AND Period = @Period AND Choice = @Choice" ; return ; } // Number of versions in this period: query = "select count(*) from Versions where Treatment= " + Treat + " and Group#= " + Group + " and Period= " + (Period+1); int m = ( int )com.ExecuteScalar(); // Insert the suggestion as a new version: query = "insert into Versions (Treatment, Group#, Period, Choice, Solution, HtmlSolution, Proposer, Time) Values (@Treatment, @Group, @Period, @Choice, @Solution, @HtmlSolution, @Proposer, @Time)" ; // Whether it just became enough to close the suggestion period: if (m >= M || Closing <= DT) { Period++; DT = DateTime .Now.AddHours(Tv); query = "update Groups set Period=" + Period + " , DT='" + DT + "' where Treatment = " + Treat + " and Group# = " + Group; Global .InviteVoting(Treat, Group, DT); Response.Redirect( "~/Voting.aspx" ); } 128 // Casting Vote: *************************************************************************************** query = "select * from Versions where Treatment = " + Treat + " and Group# = " + Group + " and Period = " + Period + " and Proposer = @User and Choice = @Choice" ; if (com.ExecuteScalar() != null ) // Voting for oneself suggestion? { Message.Text = "You cannot vote for your own edition!" ; return ; } // Already voted in this period? query = "select * from Voting where Treatment = " + Treat + " and Group# = " + Group + " and Period = " + Period + " and Voter = @User" ; SqlDataReader Voted = com.ExecuteReader(); if (Voted.Read()) // Already voted { if (!VoteChange ) return ; // If the constitution does not allow changing votes. query = "update Voting set Choice = @Choice, Time = @Time where (Treatment = @Treatment and Group# = @Group and Period = @Period and Voter = @Voter)" ; } else // Not voted yet { // Meritocracy: if (TotalExtra > 0 && W > 0) { int NewExtra = TotalExtra - VoteW + 1; ExtraVotes.Text = "You have " + NewExtra + " Extra Votes left." ; query = "update People set ExtraVote = " + NewExtra + " where Email = @Voter" ; } if (BetFee > 0) { query = "select Balance from People where Email = @Voter" ; object Balance = com.ExecuteScalar(); if (BetFee > Balance) { Message.Text = "Sorry! You do not have enough balance to vote." ; return ; } LabelLogin.Text = "You spent $" + BetFee + " to vote." ; LabelBalance.Text = "Your Balance = $" + (Balance - BetFee); query = "update People se t Balance = " + (Balance - BetFee) + " where Email = @Voter" ; } query = "insert into Voting (Treatment, Group#, Period, Choice, Stage, Voter, Time, VoteWeight, Value) Values (@Treatment, @Group, @Period, @Choice, @Stage, @Voter, @Time, @VoteWeight, @Value)" ; } Message.Text = "You casted " + VoteW + " units of vote" ; 129 // At the end of each period: [ if (DT < DateTime .Now) ] if (Period == - 1) // Was final rating period: ********************************************* { Period = - 11; // Flag the treatment as finished DT = DateTime .MaxValue; } else if (Period == 0) // Was registration Period: *********************************************** { Period=1; // Switch to Suggestion Period DT = DateTime .Now.AddHours(Tp); } else if (Period % 2 == 0) // Was Voting Period: ********************************************** { query = "SELECT CASE WHEN SumVotes IS NULL THEN 0 ELSE SumVotes END AS SumVoteZ, Versions.Choice, Versions.Proposer, Versions.Solution FROM(SELECT treatment, Group#, period, choice, sum(VoteWeight) AS SumVotes FROM Voting GROUP BY treatment, Group#, period, choice) AS VotesOnChoices RIGHT JOIN Versions ON Versions.Treatment = VotesOnChoices.Treatment AND Versions.Gro up# = VotesOnChoices.Group# AND Versions.Perio d = VotesOnChoices.Period AND VotesOnChoices.Choice = Versions.Choice WHERE Versions.Treatment=" + Treat + " AND Versions.Group#=" +Group+ " AND Versions.Period=" +Period+ " ORDER BY SumVoteZ DESC, Choice ASC" ; var DataReader = com.ExecuteReader(); var VersionVotes = new DataTable (); VersionVotes.Load(DataReader); int MinVote = ( int )VersionVotes.Rows[VersionVotes.Rows.Count - 1][0]; int MaxVote = ( int )VersionVotes.Rows[0][0]; int Winner = ( int )VersionVotes.Rows[0][1]; // (int)Winning["Choice"]; string Proposer = ( string )VersionVotes.Rows[0][2]; // Version["Proposer"].ToString(); string Solution = ( string )VersionVotes.Rows[0][3]; // Version["Solution"].ToString(); query = "select * from People where Email = @Proposer" ; User = com.ExecuteReader(); string ProposerName = ( string )User[ "Name" ]; float Balance = ( float )User[ "Balance" ]; int TotalExtra = ( int )User[ "ExtraVote" ]; query = "insert into Versions(Treatment, Group#, Period , Choice , Solution , HtmlSolution, Proposer , Time) values(" + Treat + "," + Group + "," + (Period + 2) + ", 0 , @Solution , @HtmlSolution, @Proposer , '" + DateTime .Now + "')" ; // Suspend the loser if required by the constitution if (Te > 0 && MinVote < MaxVote) { DateTime Until = DateTime .Now.AddHours(Te); query = "update People set Suspended = '" + Until + "' where Email in (' '" ; string Loser; for ( int i = VersionVotes.Rows.Count - 1; VersionVotes.Rows[i][0] == MinVote && VersionVotes.Rows[i][1] != 0; i -- ) { Loser = ( string )VersionVotes.Rows[i][2]; query += ", '" + Loser + "'" ; } query += ")" ; } // Meritocracy if the constitution merits only the winner. if (Winner > 0 && W > 0 && !Merit2All) { switch (Meritocracy) { case 1: V += MaxVote; break ; case 2: V += MaxVote - ( int )VersionVotes.Select( "Choice=0" )[0][0]; break ; case 3: V += MaxVote - MinVote; break ; 130 } TotalExtra += V; query = "update People set ExtraVote = " + TotalExtra + " where Email = @Proposer" ; } // Meritocracy if the constitution merits all the proposers if (W > 0 && Merit2All) { string Proposeri; int Votei; int SumVotei; switch (Merito cracy) { // V + Votes(i) -- > Every Proposer case 1: for ( int i = 0; i < VersionVotes.Rows.Count; i++) { if (( int )VersionVotes.Rows[i][1] == 0) continue ; Votei = ( int )VersionVotes.Rows[i][0]; SumVotei = V + Votei; if (SumVotei <= 0) break ; Proposeri = ( string )VersionVotes.Rows[i][2]; query= " update People set ExtraVote= ExtraVote+ " + SumVotei+ " where Email= @Proposer" ; } break ; // V + Votes(i) - Votes(0) -- > Every Proposer case 2: int Vote0 = ( int )VersionVotes.Select( "Choice=0" )[0][0]; V - = Vote0; for ( int i = 0; i < VersionVotes.Rows.Count; i++) { if (( int )VersionVotes.Rows[i][1] == 0) continue ; Votei = ( i nt )VersionVotes.Rows[i][0]; SumVotei = V + Votei; if (SumVotei <= 0) break ; Proposeri = ( string )VersionVotes.Rows[i][2]; query= "update People set ExtraVote= ExtraVote+ " + SumVotei+ " where Email= @Proposer" ; } break ; // V + Votes(i) - MinVotes -- > Every Proposer case 3: V - = MinVote; for ( int i = 0; i < VersionVotes.Rows.Count; i++) { if (( int )VersionVotes.Rows[i][1] == 0) continue ; Votei = ( int )VersionVotes.Rows[i][0]; SumVotei = V + Votei; if (SumVotei <= 0) break ; Proposeri = ( string )VersionVotes.Rows[i][2]; query= "update People set ExtraVot e= ExtraVote+ " + SumVotei+ " where Email= @Proposer" ; } break ; // Fixed Votes V -- > Every Proposer default : if (V == 0) break ; query = "update People set ExtraVote = ExtraVote + " + V + " where Email in (" ; for ( int i = 0; i < VersionVotes.Rows.Count; i++) { if (( int )VersionVotes.Rows [i][1] == 0) continue ; Proposeri = ( string )VersionVotes.Rows[i][2]; query += ", '" + Proposeri + "'" ; 131 } query += ")" ; break ; } // query = "update People set ExtraVote = ExtraVote + " + V + Votes + " where Email = @Proposer"; } // Reward the winner proposer if (Winner > 0 && Reward > 0) { Balance += Reward; query = "update People set Balance = " + Balance + " where Email = @Proposer" ; } // Reward the right votes on the winning suggestion if (Winner > 0 && Rv > 0) { query = "update People set Balance = Balance + " + Rv + " where Email in (select Voter from Voting wher e Choice= " +Winner+ " and Treatment=" +Treat+ " and Group#=" +Group+ " and Period= " +Period+ ")" ; } // Reward the right votes on the updated edition if (Winner == 0 && Ro > 0 && MaxVote > 0) { query = "update People set Balance = Balance + " + Ro + " where Email in (select Voter from Voting where Choice=0 and Treatment=" +Treat+ " and Group#=" +Group+ " and [Period] =" +Period+ ")" ; } if (Closing < DateTime .Now) // If the process ended. { // Switching to the Final Period: ********** *************************************************** Period = - 1; DT = EndingTime; } else { Period++; // Switch to Suggestion Period DT = DateTime .Now.AddHours(Tp); } else if (Period % 2 == 1) // Was Suggestion Period: ******************************************** { query = "select count(*) from Versions where Treatment=" + Treat + " and Group#=" + Group + " and Period=" + (Period+1); int m = ( int )com.ExecuteScalar(); if (m > 1) // Enough suggestions for voting { Period++; // Switch to Voting Period DT = DateTime .Now.AddHours(Tv); InviteVoting(Treat, Group, DT); } else // if (m==1) : Not enough suggestions for Voting { if (Closing < DateTime .Now) // Switching to the Final Period: { Period = - 1; DT = EndingTime; else DT = Closing; // Stay in the suggestion period and w ait for a suggestion until the end } } 132 APPENDIX G: Database of the W ebsite Figure G - 1 illustrates the data structure diagram of the database used in the website. As it shows, the database has six tables including a table for the treatm ents. As illustrated, each participant, version, vote and rating score belongs to a treatment and hence treatment number is a foreign key in all other table s and its corresponding relationships are highlighted in green. The treatment table includes the par ameters for e - constitutions that the website can instantiate . Figure G - 1: Data Structure Diagram for the Database Used in the Web Application 133 APPENDIX H: HIT Description in MTurk Figure H - 1: HIT Description for the Workers in Mechanical Turk 134 APPENDIX I: Constitution for Treatment (1) Problem: Imagine it is June 1, 2013 and you have $1000 to invest in stocks, currencies and precious metals like silver. What would be the best trading plan and strategy to maximize your profit over the 5 - year period from Ju ne 1, 2013 until May 31, 2018? Your only goal is to reach maximum wealth on June 1, 2018. You can use historical data on financial websites such as Finance.Yahoo.com and CoinRanking.com . Assume there is no transaction fee, no commission, and no dividend. Process: The game begins with an initial plan. Then you improve the plan in several editing rounds. Each round consists of a suggestion period followed by a voting period that results in an Updated Edition for the next round. This game iterates for about one hour. Then you must complete a short survey to receive $10 plus your r ewards. Suggestion: In each suggestion period, you can submit one suggestion and modify it if you want to. You are not required to submit a suggestion every round. Suggestions should be different from the Updated Edition. Each suggestion period ends after 4 minutes if at least one suggestion is submitted. Otherwise, the program waits until one submission. Voting: Each voting period begins with a minimum of two choices and lasts 4 minutes. The choices are the Updated Edition and the submitted suggestions. You cannot vote for your own edition. You are not required to vote every round. You cannot change your vote within a voting period. Winning: After each voting period, the most voted version wins and becomes the Updated Edition for the next period. Then votes remain anonymous. Then, the next suggestion period begins. Rewards: After each voting period, if your suggestion wins, you will receive $1.00 bonus, and if the choice you voted for wins, you will receive $0.03 bonus. 135 APPENDIX J : Price - Based Constitution This appendix presents a constitution based on market equilibrium pr ices as the criterion for selection. Price is a collective decision that is least vulnerable to attacks and malicious activities. Market price is the most efficient aggregation of dispersed information (Hayek, 1945) . In nonmarket mechanisms, participants lack adequate incentives to estimate or reveal the value of the contributions accurately (Ba, et al., 2001b) . Market price is a sufficient statistic that summarizes relative strengths of selec alternatives. The possibility to compare individual preferences yields a group utility function and unshackles us from the impossibility theorems (Scott & Antonsson, 1999) . Hence, it is Pareto - efficient, monotone, independent from irrelevant alternatives and non - dictatorial with any number of choices. Ren et al. (2017) presented a process model that used price to evaluate choices. Perhaps, that is the best attempt thus far to use price for collective selection. However, their method is not applicable to a design process. In the price - based constitution , participants can trade shares of versions during selection (trading) periods. Then, at the end of each period, it uses the equilibrium prices as the aggregate evaluation of the versions, judges the version with the highest share price as the most valuable version, and mak es it the updated edition . Then, it voids all transactions of all other versions to prevent forking. The e - constitution contains an automatic market maker that buys and sells to match asks and bids. A fully automated electronic exchange can serve as market maker (Ba, et al., 2001b) . Automatic market makers can use algorithms to adjust prices based on transactions and give instant feedback to traders (Boer, et al., 2007) . One applied market maker is Th e Logarithmic Market Scoring Rules (Jian & Sami, 2010) . Meanwhile, markets should have at least 30 participants to be efficient (Christiansen, 2007) traders cannot effectively aggregate information (Healy, et al., 2010) . The price - based constitution inherits an endogenous meritocracy from market competition. Traders are responsible for their decisions and have incentive to obtain information and evaluate versions accurately and then trade wisely or not trade if they do not have the expertise or information to do so. Hence, unqualified traders incur all the costs of their bad decision s and eventually fade out from the market. Smart 136 traders benefit from their informed selections, survive in the market, make more trades, and exert more influence in selection. This constitution sorts the versions based on their prices so that the most val ued versions are exposed to be traded more often and thus are priced more precisely. After all, the only selection criterion is having the maximum price and thus only the relative prices of the best versions matter. Therefore, the outcome is not sensitive to prices of the average versions or the magnitude of the prices in general. Hence, the typical over - optimism common in prediction markets does not cause a problem here. The constitution in plain text is as follows: General Process: At the beginning, for a period of T A time , there is an auction for the shares (100%) of the initial solution and anyone can buy the shares to invest in the initial solution. The shareholders own the solution and its IP rights proportional to their shares. Budget B is the total money invested to b e used to reward winning suggestions. The solution evolves through several editing rounds. Each editing round consists of a suggestion period followed by a trading (selection) period, repeating until running out of budget B. Anyone can participate in sugg estion, trading or both. Suggestion Period: A participant can submit only one suggestion during each suggestion period if they want to. The proposers (participants who submitted suggestions) can modify their suggestions until the trading period begins. In Updated Updated updated to be the winning version from the previous round. Ending Suggestion Period: A suggestion period ends after T P time if AT LEAST one suggestion is submitted. Then the trading period begins with a minimum of two choices including the submitted Updated T P time , the program waits until one suggestion is submitted and then the trading peri od begins immediately. 137 Trading Period: Each trading period lasts T V time , during which participants can trade the shares of all Updated bids. Versions are continuously sorted based on their equilibrium share prices so that the highest prices are at the top. At the beginning of each trading period, the share price of all suggested versions are initially set to the share price of their parent version ( Updated Edition). Winning Version: After each trading period, the version with the highest equilibrium share price wins and shall be published with all its transactions co nfirmed. The transactions and shares of all other versions are voided. Updated receive a compensation equal to R times the amount of increase in the share price and the name is announced. Then the next suggestion period begins for further modifications if the design process has not ended. Accounting: The shareholders of a version own the shares of all versions derived (forked) from it and can sell all o r any of them. Traders can spend the same money in multiple markets, but at the end of each trading period, only the balances for the winning version are valid and all other parallel balances become void. Conversely, when a trader cashes out, the withdraw al amount is deducted from all his parallel balances. In a trading period, the maximum amount that a trader can cash out is the minimum of all his parallel balances. One accounting technicality is that a trader cannot cash out an amount that may result in a negative balance on any version. At the end of each trading period, only one version becomes valid and transactions of other versions become voided. The winning version can be any version in a period. Therefore, during a trading period, traders can with draw the least of their balances on all versions. However, at the end of each round, traders can withdraw all their confirmed balances on the winning version. 138 Figure J - 1 illustrates how the prices may change during selection periods. The picture only shows the trading periods back to back hiding the suggestion periods between them because the prices cannot change during the suggestion periods. By default, this constitut ion does not allow for trade during suggestion periods, and has distinct suggestion and trading periods and all suggestions in a round start competing at the same time at the beginning of the trading period. This prevents participants from copying other su ggestions and make the competition fairer in some situations. Another approach is to reveal each suggestion once submitted and let traders trade at any time even during the suggestion periods. It improves the liquidity of the market and helps to reach the equilibrium faster. Accordingly, each iteration has a suggestion period with the possibility of trading, and then there is an exclusive trading period without any suggestion, so that all prices reach relative equilibrium before the Figure J - 1: Prices of Parallel Versions Using a Price - based Constitution (Example) 139 selection of the maximu m price. Allowing trades during the suggestion periods automatically rewards the best proposers while eliciting their information on their suggestions. A proposer of a superior suggestion is the first one who knows about the value of that suggestion and ca n be the first buyer before its price rises by other investors. Additionally, at the same time the proposer informs others about the value of that suggestion by raising its price and making it more visible in the rankings. If a proposer strongly believes i n the quality of his proposed suggestion, he can use his exclusive information to extract the most profit out of it before anyone else. Perhaps, this possibility obviates an explicit compensation clause in the constitution and eliminates the problem of cop This constitution while may seem complicated, is easier to implement on a blockchain. Each suggestion forks the blockchain and after a pre - specified trading period (i.e. number of blocks), only the branch with the highest price f or the token (share) is retained and all other branches become invalid automatically. Therefore, each suggestion is essentially a controlled purposive forking and the selection criterion determines which branch is valid as the canonical chain. The price - ba sed constitution can be very effective for making decisions in publicly traded corporations. For any decision, each choice creates a parallel market for the corporate shares, functioning as if that specific choice will be taken. Therefore, the corporate sh ares are traded in different parallel markets based on hypothetical decisions. The parallel markets function for a period until reaching an equilibrium or the deadline for making the decision and then the choice that yielded the highest share price becomes effective and its transactions will be confirmed. However, shares and transactions corresponding to alternatives will be voided. Everybody can propose a decision or improvement but only decisions will be implemented that maximize the share price. Accordin gly, the e - constitution can replace the CEO by making all decisions through financial markets. This eliminates the moral hazard problem altogether. One important challenge to use this constitution is that it requires an exogenous criterion to value the fi nal edition and to pay off the shareholders at the end. This is necessary because the incentives for trading shares come from the prospec t of final profit. Corporations associate annual profit and net income to the shares and that directs the trades. Other wise, if there is a market for the final solution or product, its selling 140 price or revenue serves as the exogenous evaluation and incentivizes thoughtful investment. However, if such exogenous criteria do not exist, we need to define the quality of choices and incentivize investment on higher quality ones. One approach is to hire some independent raters who do not benefit from any alternative and have not traded any shares, and have them assign financial values to the final edition and then compensate the s hareholders based on the median of the values given by the raters. Also, reward the rater(s) whose valuation was closest to the median. To make the rating (valuation) process more reliable and accurate, one may select a random sample of independent raters. However, we are back to the aggregation of voting and rating for evaluation. Prediction markets for idea evaluation need to tie the payoffs to a real observable outcome on which the participants can bet. Otherwise, they bet on expert evaluations instead o f the idea quality (Blohm, et al., 2011) . Blohm et al. (2011) explained that markets need several participants and several trades to reach a meaningful equilibrium price as evaluation, whereas other evaluation methods like rating only need one or two participant and one round to result in a meaningful evaluation thr ough an aggregation function. They compared the performance of rating scales with prediction markets, and found that rating scales result in better evaluation accuracy and higher satisfaction for the participants. They concluded that rating is a better mec hanism than trading. However, there were two problems with their experiment. First, in the prediction markets, the participants did not use real money to provide incentives, which is the main point of using markets. Second, the results were compared agains t some other rating scores given by a panel of experts. It suffers a method bias in favour of rating and against the prediction markets. Similarly, Gottschlich and Hinz (2014) developed a Decision Support System for designin g stock outperform the market benchmark and comparable public funds. However, the market benchmark does not result from market equilibrium, but rather a portfol io that is decided by a group of people through some collective decision making process other than equilibrium price. Blume et al. (2010) claimed that markets are open to gaming because a participant can transfer money from one account to another and trade with himself to manipulate the price. However, they overlook how 141 bids and asks clear in liquid markets. If a participant tries to make a trade at a price far from the equilibrium price, his bid or ask will first clear the e xisting asks or bids that are closer to the equilibrium price, not his own opposing offer at his planned price. The only way to trade with oneself (or accomplice) is outside the market, which does not affect the market price. Moreover, the only way to mani pulate the price is through large amounts of trading, and incurring all the costs associated with moving the price. 142 BIBLIOGRAPHY 143 BIBLIOGRAPHY Acemoglu, D. & Robinson, J., 2012. Why Nations Fail: The Origins of Power, Prosperity and Poverty. 1st ed. New York: Crown Publishers. Archak, N. & Sundararajan, A., 2009. Optimal Design of Crowdsourcing Contests. Phoenix, The 13th International Conference on Information Systems. Ariely, D. , Loewenstein, G. & Prelec, D., 2003. Coherent Arbitrariness: Stable Demand Curves without Stable Preferences. Quarterly Journal of Economics, 118(1), pp. 73 - 105. Atzei, N., Bartoletti, M. & Cimoli, T., 2017. A Survey of Attacks on Ethereum Smart Contracts . Heidelberg, Springer, p. 164 186. Bandiera, O., Barankay, I. & Rasul, I., 2013. Team incentives: evidence from a firm level experiment. Journal of the European Economic Association, Volume 11, pp. 1079 - 1114. Bao, J., Sakamoto, Y. & Nickerson, J., 2011. E valuating Design Solutions Using Crowds. Detroit, MI, Americas Conference on Information Systems. Ba, S., Stallaert, J. & Whinston, A. B., 2001a. Research Commentary: Introducing a Third Dimension in Information Systems Design The Case for Incentive Alignment. Information Systems Research, 12(3), pp. 225 - 239. Ba, S., Stallaert, J. & Whinston, A. B., 2001b. Optimal Investme nt in Knowlege within a Firm Using a Market Mechanism. Managment Science, 47(9), pp. 1203 - 1219. Benoit, J. - P. & Kornhauser, L., 2010. Only a Dictatorship is Efficient. Games and Economic Behavior. Bix, B., 2013. Boilerplate, Freedom of Contract and Democra tic Degradation. The Tulsa Law Review, Volume 49. Blohm, I., Riedl, C., Leimeister, J. M. & Krcmar, H., 2011. Idea Evaluation Mechanisms for Collective Intelligence in Open Innovation Communities: Do Traders Outperform Raters? Shanghai, Thirty Second Inter national Conference on Information Systems. Blume, M., Luckner, S. & Weinhardt, C., 2010. Fraud Detection in Play - Money Prediction Markets. Information Systems and E - Business Management, 8(4), pp. 395 - 413. Boer, K., Kaymak, U. & Spiering, J., 2007. From Di screte - Time Models to Continuous - Time Asynchronous Modeling of Financial Markets. Computational Intelligence, 23(2), pp. 142 - 161. Bonabeau, E., 2009. Decisions 2.0: The Power of Collective Intelligence. MIT Sloan Management Review, 50(2), pp. 44 - 53. Boreha m, R. & Rutter, K., 2018. R3. Available at: https://www.r3.com/research/ Brabham, D., 2008. Crowdsourcing as a Model for Problem Solving: An Introduction and Cases. Convergence: The International J. of Research into New Media Technologies, Volume 14, pp. 7 5 - 90. 144 Brooks, F. P., 2010. The Design of Design: Essays from a Computer Scientist. 2nd ed. Upper Saddle River, NJ: Addison - Wesley. Buchanan, J. & Tullock, G., 1961. The Calculus of Consent: Logical Foundations of Constitutional Democracy. 1st ed. Ann Arbor: University of Michigan Press. Buterin, V., 2013. Ethereum: A Next - Generation Smart Contract and Decentralized Application Platform. Available at: http://ethereum.org/ethereum.html Cerasoli, C., Nicklin, J. & Ford, M., 2014. Intrinsic Motivation and extrinsic incentives jointly predict performance: A 40 - year meta - analysis. Psychol Bull, 140(4), pp. 965 - 980. Chan, J., Dang, S. & Dow, S., 2016. Improving Crowd Innovation with Expert Facilitation. San Francisco, CA, ACM. Chanron, V. & Lewis, K., 2005. A Study of Convergence in Decentralized Design Processes. Research in Engineering Design, Volume 16, pp. 133 - 145. Chaum, D., 2015. Random - Sample Voting: Far lower cost, better quality and more democratic, New York: Scribd. Chilton, L., Landay, J. & Weld, D. , 2016. HumorTools: A Microtask Workflow for Writing News Satire. El Paso, Texas, ACM. Chilton, L. et al., 2013. Cascade: Crowdsourcing Taxonomy Creation. Paris, France, CHI. Christiansen, J. D., 2007. Prediction Markets: Practical Experiments in Small Mar kets and Behaviors Observed. The Journal of Prediction Markets, 1(1), pp. 17 - 41. Clarkson, G. & Alstyne, M. V., 2007. The Social Efficiency of Fairness: An Innovation Economics Approach to Innovation. Hawaii, 40th Annual Hawaii International Conference on Systems Sciences. Collins, J. & Porras, J., 1994. Built to Last: Successful Habits of Visionary Companies. 1st ed. New York: William Collins. Davis, J. & Lin, W. H., 2011. Web 3.0 and Crowdservicing. Detroit, MI, America's Conference on Information Systems. Davis, K., 2015. InnoCentive.com Collaboration Case Study. Journal of Management Policies and Practices, 3(1), pp. 20 - 22. De Paola, M., Scoppa, V. & Nistico, R., 2012. Monetary incentives and student achievement in a depressed labor market: Result s from a randomized experiment. Journal of Human Capital, Volume 6, pp. 56 - 85. Dechanaux, E., Kovenock, D. & Sheremeta, R. M., 2015. A Survey of Experimental Research on Contests, All - Pay Auctions and Tournaments. Experimental Economics, Volume 18, pp. 609 - 669. Delfgauw, J., Dur, R., Sol, J. & Verbeke, W., 2013. Tournament incentives in the field: Gender differences in the workplace. Journal of Labor Economics, Volume 31, pp. 305 - 326. 145 Deng, Q. & Ji, S., 2018. A Review of Design Science Research in Informati on Systems: Concepts, Process, Outcome, and Evaluation. Pacific Asia Journal of the Association for Information Systems, 10(1), pp. 1 - 36. Easley, D. & Kleinberg, J., 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. 1st ed. Cam bridge, MA: Cambridge University Press. Edge, A. G. & Remus, W., 1984. The Impact of Hierarchical and Egalitarian Organization Structure on Group Decision Making and Attitudes. Developments in Business Simulation & Experiential Learning, Volume 11. Ertekin , S., Rudin, C. & Hirsh, H., 2013. Approximating the crowd. Data Mining and Knowledge Discovery, 1(1), pp. 1 - 32. Fidler, D., 2008. A Theory of Open - Source Anarchy. Indiana Journal of Global Studies, Volume 15. Freshtman, C. & Gneezy, U., 2011. The trade - of f between performance and quitting in high - power tournaments. Journal of the European Economic Association, Volume 9, pp. 318 - 336. Friedman, D., 2000. Contracts in Cyberspace. American Law and Economics Association Mtg, 4 May. Fullerton, R. & McAfee, P., 1 999. Auctioning Entry into Tournaments. Journal of Political Economy, 107(3), pp. 573 - 605. Gallupe, R. B. et al., 1992. Electronic Brainstorming and Group Size. Academy of Management Journal, Volume 32, pp. 350 - 369. Gardiner, P. D. & Stewart, K., 2000. Rev isiting the golden triangle of cost, time and quality: the role of NPV in project control, success and failure. International Journal of Project Management, 18(4), pp. 251 - 256. Gillick, D. & Liu, Y., 2010. Non - Expert Evaluation of Summarization Systems is Risky. Los Angeles, California, Association for Computational Linguistics, pp. 148 - 151. Gneezy, U. & Rustichini, A., 2000. Pay enough or don't pay at all. Quarterly Journal of Economics, 115(3), pp. 791 - 810. Gottschlich, J. & Hinz, O., 2014. A Decision Sup port System for Stock Investment Recommendations using Collective Wisdom. Decision Support Systems, 59(1), pp. 52 - 62. Gregor, S., 2006. The Nature of Theory in Information Systems. MIS Quarterly, 30(2), pp. 611 - 642. Gregor, S. & Hevner, A. R., 2013. Positioning and Presenting Design Science Research for Maximum Impact. MIS Quarterly, 37(2), pp. 337 - 355. Gregor, S. & Jones, D., 2007. The Anatomy of a Design Theory. Journal of the Association for Information Systems, 8(5), pp. 312 - 335. Grigg, I., 2017. EOS - An Introduction. Available at: https://eos.io/documents/EOS_An_Introduction.pdf Hansen, M. T., 2009. Collaboration: How leaders avoid the traps, create unity, and reap big results. 1st ed. Boston, MA: Harvard Business Press. 146 Hayek, F. A., 1945. The Use of Knowledge in Society. American Economic Review, 35(4), pp. 519 - 530. Hazelrigg, G. A., 1996. The Implications of Arrow's Impossibility Theorem on Approaches to Optimal Engineering Design. Journal of Mechanical Engineering, 118 (1), pp. 161 - 164. Healy, J., Linardi, S., Lowery, J. R. & Ledyard, J. O., 2010. Prediction Markets: Alternative Mechanisms for Complex Environments with Few Traders. Management Science, 56(11), pp. 1977 - 1996. Heer, J. & Bostock, M., 2010. Crowdsourcing Gra phical Perception: Using Mechanical Turk to Assess Visualization Design. Atlanta Georgia, CHI. Hevner, A., March, S., Park, J. & Ram, S., 2004. Design Science in Information Systems Research. MIS Quarterly, 28(1), pp. 75 - 105. Heyman, J. & Ariely, D., 2004. Effort for Payment: A Tale of Two Markets. Psychological Science, 15(11), pp. 787 - 793. Hill, S. & Ready - Campbell, N., 2011. Expert Stock Picker: The Wisdom of (Experts in) Crowds. International Journal of Electronic Commerce, 15(1), pp. 73 - 102. Hoffman, L ., 2009. Crowd Control. Communications of the ACM, 52(3), pp. 16 - 17. Holmstrom, B., 1982. Moral Hazard in Teams. The Bell Journal of Economics, 13(2), pp. 324 - 340. Holston, J., Issarny, V. & Parra, C., 2016. Engineering Software Assemblies for Participator y Democracy: The Participatory Budgeting Use Case. Austin, TX, USA, ACM 38th International Conference on Software Engineering Companion. Hong, L. & Page, S., 2001. Problem Solving by Heterogeneous Agents. Journal of Economic Theory, Volume 97, pp. 123 - 163. Hong, L., Page, S. E. & Riolo, M., 2012. Incentives, Information, and Emergent Collective Accuracy. Managerial and Decision Economics, 19 July, Volume 33, pp. 323 - 334. Horton, J. & Chilton, L., 2010. The Labor Economics of Paid Crowdsourcing. Cambridge MA , ACM. Hossain, T., Hong, F. & List, J. A., 2014. Framing manipulations in contests: A natural field experiment. Working Paper. Howe, J., 2006. The Rise of Crowdsourcing. Wired Magazine, 14(1), pp. 1 - 5. Huang, Y., Singh, P. V. & Mukhopadhyay, T., 2012. Cro wdsourcing Contest: A Dynamic Structural Model of the Impact of Incentive Structure on Solution Quality. Orlando, 33rd International Conference on Information Systems. Jackson, M., 2003. Mechanism Theory. Humanities and Social Sciences , pp. 228 - 77. Jain, R ., 2010. Investigation of Governance Mechanisms for Crowdsourcing Initiatives. Lima, Peru, AMCIS 2010 Proceedings. 147 Jian, L. & Sami, R., 2010. Aggregation and Manipulation in Prediction Markets: Effects of Trading Mechanism and Information Distribution. Cam bridge, Massachusetts, 11th ACM Conference on Electronic Commerce. Johnson, D. & Post, D., 1996. Law and Borders The Rise of Law in Cyberspace. Stanford Law Review, pp. 1367 - 1377. Johnson, J., 2007. Social Networks and the Wisdom of Crowds. Network World, 24(1), pp. 28 - 29. Juels, A., Kosba, A. & Shi, E., 2016. The Ring of Gyges: Using Smart Contracts for Crime. Vienna, Austria, 23rd ACM Conference on Computer and Communications Security (CCS). Kahneman, D., 2011. Thinking, Fast and Slow. 1st ed. New York: Farrar, Straus and Giroux. Keuschnigg, M., Bader, F. & Bracher, J., 2016. Using crowdsourced online experiments to study context - dependency of behavior. Social Science Research, Volume 59, pp. 68 - 82. Khuri, A. & Mukhopadhyay, S., 2010. Response surface methodology. WIREs Computational Statistics, 2(March/April), pp. 128 - 149. Kim, S. - H., 2016. On the Optimal Social Contract: Agency Costs of Self - Government. Journal of Comparative Economics, 44(1), pp. 982 - 1001. Kl ine, W., Kotabe, M., Hamilton, R. & Ridgley, S., 2017. Organizational Constitution, Organizational Identification, and Executive Pay. Asia - Pacific Journal of Business Administration, 9(1), pp. 51 - 68. Kornberger, M., 2016. The visible hand and the crowd: An alyzing organization design in distributed innovation systems. Strategic Organization, Organizing Crowds and Innovation(1), pp. 1 - 20. Kornrumpf, A. & Baumol, U., 2014. A Design Science Approach to Collective Intelligence Systems. Hawaii, 47th International Conference on System Science. Kosorukoff, A., 2000. Human - Based Genetic Algorithm. Available at: http://www.HBGA.com Kosorukoff, A., 2000. Social Classification Structures: Optimal Decision Making in an Organization. Las Vegas, Nevada, Genetic and Evoluti onary Computation Conference. Kosorukoff, A., 2001. Human Based Genetic Algorithms. Proceeding of IEEE Conference on Systems, Man, and Cybernetics, pp. 3464 - 3469. Kosorukoff, A. & D., G., 2002. Evolutionary Computation as a form of Organization. IEEE. Kyri akou, H., Nickerson, J. V. & Sabnis, G., 2017. Knowledge Reuse for Customization: Metamodels in an Open Design Community for 3D Printing. MIS Quarterly, 41(1), pp. 315 - 332. Lazear, E. P. & Rosen, S., 1981. Rank - order tournaments as optimum labor contracts. Journal of Political Economy, Volume 89, pp. 841 - 864. Leimeister, J. M., 2010. Collective Intelligence, Cambridge, MA: Business & Information Systems Engineering. 148 Leimeister, J. M., Huber, M., Bretschneider, U. & Krcmar, H., 2009. Leveraging Crowdsourcing : Activation - Supporting Components for IT - Based Ideas Competition. Journal of Management Information Systems, 26(1), pp. 197 - 224. Lévy, P., 1997. New York, Plenum. Liebenaua, J. & Harindranat hb, G., 2002. Organizational Reconciliation and its Implications for Organizational Decision Support Systems: A Semiotic Approach. Decision Support Systems, Volume 33, p. 389 398. Little, G., Chilton, L., Goldman, M. & Miller, R., 2010. Exploring Iterati ve and Parallel Human Computation Processes. Washington DC, ACM. Little, G., Chilton, L., Goldman, M. & Miller, R., 2010. Turkit: Human Computation Algorithms on Mechanical Turk. New York City, NY, Proceedings of the 23rd annucal ACM symposium on User Interface software and technology, pp. 57 - 66. Liu, T. X., Yang, J., Adamic, L. A. & Chen, Y., 2014. Crowdsourcing with All - Pay Auctions: A Field Experiment on Taskcn. Management Science, 60(8), pp. 2020 - 2037. Lorge, I., Fox, D., Davits, J. & Brenner, M., 1 958. A Survey of Studies Contrasting the Quality of Group Performance and Individual Performance, 1920 - Psychological Bulletin, Volume 55, p. 337. Malhotra, N. K., 1982. Reflections on the Information Overload Paradigm in Consumer Decision Making. Th e Journal of Consumer Research, 10(4), pp. 436 - 440. Malone, T., Laubacher, R. & Dellarocas, C., 2010. The Collective Intelligence Genome. MIT Sloan Management Review, 51(3), pp. 21 - 31. Malone, T. et al., 2017. Putting the Pieces Back Together Again: Cont est Webs for Large - Scale Problem Solving. Portland, OR, Proceedings of the ACM Conference on Computer - Supported Cooperative Work and Social Computing. Malone, T. & Smith, S., 1988. Modeling the Performance of Organizational Structures. Operations Research, 36(3), pp. 421 - 436. Malone, T. W., Laubacher, R. & Dellarocas, C., 2009. Harnessing Crowds: Mapping the Genome of Collective Intelligence, Cambridge, MA: MIT Sloan Research Paper No. 4732 - 09. Maniquet, F. & Mangin, P., 2011. Approval Voting and Arrow's Im possibility Theorem, Warwick: University of Warwick. Mason, W. & Watts, D. J., 2009. Financial Incentives and the Performance of Crowds. SIGKDD Explorations, 11(2), pp. 100 - 108. McComb, C., Goucher - Lambert, K. & Cagan, J., 2015. Fairness and Manipulation: An Emprical Study of Arrow's Impossibility Theorem. Milan, Italy, Proceedings of the 20th International Conference on Engineering Design (ICED15). Melkonyan, T., 2013. Decentralization, incentive contracts and the effect of distortions in performance measu res. The Manchester School, 1(1), pp. 1 - 22. 149 Miller, M. S., Morningstar, C. & Frantz, B., 2001. Capability - based Financial Instruments, Cupertino, CA : ODE . Miller, M. & Stiegler, M., 2003. The Digital Path: Smart Contracts and the Third World. Information and Communication. Austrian Perspective on the Internet Economy. Mookherjee, D., 2005. Decentralization, Hierarchies and Incentives: A Mechanism Design Perspective. Economic Letters. Mulgan, G., 2006. The Process of Social Innovation. Innovations, 1(2), pp . 145 - 162. Mullen, B., Johnson, C. & Sales, E., 1991. Productivity loss in brainstorming groups: A meta - analytic integration. Basic and Applied social Psychology, Volume 72, pp. 3 - 23. Musiani, F., 2013. Governance by Algorithms. Internet Policy Review, 2(3 ), pp. 1 - 8. Myers, R., Montgomery, D. & Anderson - Cook, C., 2009. Response Surface Methodology - Process and Product Optimization Using Designed Experiments. 3rd ed. Hoboken, New Jersey: John Wiley & Sons, Inc. Nakamoto, S., 2008. Bitcoin: A Peer - to - Peer El ectronic Cash System. Nan, N., 2008. A principal - agent model for incentive design in knowledge sharing. Journal of Knowledge Management, 12(3), pp. 101 - 113. Nickerson, J. V., Sakamoto, Y. & Yu, L., 2011. Structures for Creativity: The crowdsourcing of design. Vancouver, BC, Canada, CHI Workshop on Crowdsourcing and Human Computation: Systems, Studies, and Platforms. Niederle, M. & Vesterlund, L., 2011. Gender and competition. Annual Review of Economics, Vo lume 3, pp. 601 - 630. Niederman, F. & March, S., 2012. Design Science and the Accumulation of Knowledge in the Information Systems Discipline. ACM Transactions on Management Information Systems, 3(1), pp. 1 - 15. Nisan, N., Roughgarden, T., Tardos, E. & Vazir ani, V. V., 2007. Algorithmic Game Theory. 1st ed. New York: Cambridge University Press. Norta, A., 2015. Creation of Smart - Contracting Collaborations for Decentralized Autonomous Organizations. Singapore, Springer, pp. 3 - 17. Norta, A., 2017. Designing a S mart - Contract Application Layer for Transacting Decentralized Autonomous Organization. Singapore, Springer, pp. 595 - 604. Olafsson, J., 2011. An Experiment in Iceland: Crowdsourcing a Constitution? Working Paper. Orrison, R., Wilson, B. J. & Zillante, A., 2 004. Multiperson tournaments: an experimental examination. Management Science, Volume 50, pp. 268 - 279. Page, S. E., 2012. A Complexity Perspective on Institutional Design. Politics, Philosophy & Economics, II(1), pp. 5 - 25. 150 Paulus, P., Kohn, N., Arditti, L. & Korde, R., 2013. Understanding the Group Size Effect in Electronic Brainstorming. Small Group Research, 44(3), pp. 332 - 352. Pederson, J. et al., 2013. Conceptual Foundations of Crowdsourcing: A Review of IS Research. Hawaii, 46th International Conferenc e on System Sciences. Peffers, K., Tuunanen, T., Rothenberger, M. & Chatterjee, S., 2007. A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems, 24(3), pp. 45 - 77. Prestopnik, N., 2010. Theory, Des ign and Evaluation - (Don't Just) Pick any Two. Transactions on Human - Computer Interaction, 2(4), pp. 167 - 177. Prestopnik, N. & Crowston, K., 2012. Exploring Collective Intelligence Games with Design Science: A Citizen Science Design Case. Sanibel Island, Florida, ACM Group Conference. Pullinger, K., 2007. Living with A Million Penguins: inside the wiki - novel. Available at: https://www.theguardian.com/books/booksblog/2007/mar/12/livingwithamillionpenguins Purao, S., 2002. Design Research in the Technology of Information Systems: Truth or Dare, Atlanta: Department of Computer Information Systems, Georgia State University. Radin, M., 2000. Humans, Computers, and Binding Commitment. Indiana Law Journal, 75(4), pp. 1125 - 1162. Radin, M., 2004. Regulation by Cont ract, Regulation by Machine. Journal of Institutional and Theoretical Economics, Volume 160, pp. 1 - 15. Raykar, V. C. et al., 2010. Learning from crowds. Journal of Machine Learning Research, 11(7), pp. 1297 - 1322. Ren, J., 2011. Exploring the Process of Web - based Crowdsourcing Innovation. Detroit, Michigan, AMCIS Proceedings. Ren, J. et al., 2014. Increasing the Crowd's Capacity to Create: How Alternative Generation Affects the Diversity, Relevance and Effectiveness of Generated Ads. Decision Support Systems , 1(1), pp. 1 - 12. Ren, J., Ozturk, P. & Yeoh, W., 2017. Online Crowdsourcing Campaigns: Bottom - Up versus Top - Down Process Model. Journal of Computational Information Systems, 1(1), pp. 1 - 12. Rosen, S., 1986. Prizes and incentive in elimination tournaments. American Economic Review, Volume 76, pp. 701 - 715. Sakamoto, Y. & Bao, J., 2011. Testing Tournament Selection in Creative Problem Solving Using Crowds. Shanghai, 32nd International Conference on Information Systems. Scott, M. & Antonsson, E., 1999. Arrow's Theorem and Engineering Design Decision Making. Research in Engineering Design, 11(1), pp. 218 - 228. Shaw, A., Horton, J. & Chen, D., 2011. Designing Incentives for Inexpert Human Raters. Han gzhou, China, CSCW. 151 Shoham, Y. & Leyton - Brown, K., 2010. MULTIAGENT SYSTEMS: Algorithmic, Game - Theoretic, and Logical Foundations. 1.1 ed. Cambridge: http://www.masfoundations.org. Singla, A. & Krause, A., 2013. Truthful Incentives in Crowdsourcing Tasks u sing Regret Minimization Mechanisms. Rio de Janeiro, Brazil, World Wide Web Conference Committee (IW3C2). Surowiecki, J., 2004. The Wisdom of Crowds: Why the Many are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. New York, Doubleday. Szabo, N., 1997. Formalizing and Securing Relationships on Public Networks. First Monday, 2(9). Taylor, C., 1995. Digging for Golden Carrots: an Analysis of Research Tournaments. The American Economic Review, 85(4), pp. 872 - 8 90. Thuan, N. H., Antunes, P. & Johnstone, D., 2017. A Process Model for Establishing Business Process Crowdsourcing. Australasian Journal of Information Systems, 21(1), pp. 1 - 21. Valentine, M. A. et al., 2017. Flash Organizations: Crowdsourcing Complex Wo rk by Structuring Crowds as Organization. Denver, ACM CHI. Vincent, T. L., 1983. Game Theory as a Design Tool. Journal of Mechanisms, Transmissions, and Automation in Design, Volume 105, pp. 165 - 170. Wightman, D., 2010. Crowdsourcing Human - Based Computatio n. Reykjavik, Iceland, Proceedings of the 6th Nordic Conference on Human - Computer Interaction: Extending Boundaries, pp. 551 - 560. Wood, G., 2015. Ethereum: A Secured Decentralised Generalised Transaction Ledger, s.l.: Ethereum. Wu, H., Corney, J. & Grant, M., 2015. An evaluation methodology for crowdsourced design. Advanced Engineering Informatics, Volume 29, pp. 775 - 786. Yu, L., 2011. Crowd Creativity through Combination. Atlanta, Georgia, USA, C&C ACM 978 - 1 - 4503 - 0820. Yu, L. & Nickerson, J., 2011. Cooks o r Cobblers? Crowd Creativity through Combination. Vancouver, BC, CHI. Yu, L. & Nickerson, J., 2011. Generating Creative Ideas through Crowds: An Experimental Study of Combination. Shanghai, 32nd International Conference on Information Systems. Yu, L. & Nic kerson, J., 2013. An Internet Scale Idea Generation System. ACM Transactions on Interactive Intelligent Systems, 3(1), pp. 1 - 24. Yu, L. & Sakamoto, Y., 2011. Features Selection in Crowd Creativity. FAC, HCII, LNAI, pp. 383 - 392. Zhang, X., Venkatesh, V. & B rown, S. A., 2011. Designing Collaborative Systems to Enhance Team Performance. Journal of the Association for Information Systems, 12(8), pp. 556 - 584.