DECODE PHENOME-GENOME INTERACTIONS: A DATA SCIENCE APPROACH By Abhijnan Chattopdhyay A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2022 ABSTRACT DECODE PHENOME-GENOME INTERACTIONS: A DATA SCIENCE APPROACH By Abhijnan Chattopdhyay The responses of plants to their environments are determined by multiple interacting genetic factors that themselves may operate through numerous biological mechanisms. Disentangling these complex genome-by-environment interactions is a significant challenge to understanding the underlying biology and developing more robust crops. This dissertation integrates high throughput phenotyping and genome sequencing and aims to harness these multidimensional interactions to test whether different genetic components affect biological processes through similar or distinct mechanisms. First, we present a comparison of different methods that can be practically used for genome-enabled prediction and selection purposes with the help of synthetic datasets with varying levels of difficulty and variability. Using such tools, we have found multiple traits are modulated by similar genomic regions, termed “co-localization”. But, the question remains, how can one test for co-segregation, or co-linkages, of multiple phenotypes to specific genetic polymorphisms? From domain knowledge, we can argue that there exist various physical modes of interactions among photosynthetic processes, which result in distinct patterns of interactions between measured parameters. We propose a Bayesian latent variable (LV) approach that tries to imitate such physical modes of interaction among photosynthetic processes by projecting the multivariate phenotypes into lower-dimensional latent factors. Estimation of the entries of the loading matrix (the connection between multidimensional phenotypes to LVs) is through the Automatic Relevance Determination (ARD) prior, which can automatically remove the irrelevant latent factors and add immediate interpretability. This means for a single genotype, the observed latent factors will likely reflect the effects of environmental or developmental effects on mechanistic interconnections. Also, these low-dimensional structure/ latent factors can be genetically mapped using quantitative trait loci (QTL) mapping and can be validated with the linkages from colocalized traits obtained from univariate QTL analysis. The added advantage of our approach is we can describe specific classes of relationships among multiple phenotypes governed by specific genetic regions that can be shared or specific to environments which can be further used to distinguish functional and genetic linkages among a range of photosynthetic regulatory processes. We extended our setup to integrate multiple environments and showed that the latent variables, either specific to one treatment or shared by various treatments, can be mapped to distinct genetic loci, revealing specific genetic polymorphisms altering the co-regulatory network among phenotypes in 𝐺𝑒𝑛𝑜𝑡𝑦 𝑝𝑒×𝑃ℎ𝑒𝑛𝑜𝑡𝑦 𝑝𝑒×𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙 space. The final piece of my work is to model the association/correlation between phenotypes as a function of genetic and environmental explanatory variables to pin down distinct mechanisms. We develop an efficient estimation methodology called Correlation Modeling under Pairwise Likelihood Estimation (CMPLE), aided by a novel Minorize-Maximize (MM) algorithm, and provide statistical inference techniques. Simulation studies mimicking biological data show that the method is beneficial for recovering pertinent information, including different regulatory pathways, and is computationally efficient in handling many parameters. Our approach is also illustrated by analyzing a motivating dataset from recombinant inbred cowpea lines. Using CMPLE, we can identify the specific genetic variations affecting distinct biological mechanisms, namely “Photoprotection” and “Photoinhibition,” under various environmental conditions. Dedicated in the memory of Atindra Mohon Sarkar (Dadu), Gita Sarkar (Didun), and Binapani Chattopadhyay (Thakuma). iv ACKNOWLEDGEMENTS First and foremost, I am incredibly grateful to my supervisors, Dr. Tapabrata Maiti and Dr. David Mark Kramer, for their invaluable advice, continuous support, and patience during my Ph.D. study. Both have seen my ups and downs as a researcher and kept pushing me for the greater good. It has been an enormous honor for me to have you both as my academic advisors at the Michigan State University. You have inspired me with your immense knowledge and experience and helped shape me as a better researcher and person. This dissertation would not have been possible without your dedication, advice, continuous encouragement, invaluable guidance, and persistent help. Next, I thank Dr. Samiran Sinha, whose comprehensive support influenced my statistical methods and critical thinking. I have significantly benefited from your wealth of knowledge and meticulous editing. Also, thanks to my committee members, Dr. Chih-Li Sung and Dr. Shrijita Bhattacharya, who offered guidance and support. I am also indebted to several members of our lab. Special thanks to Isaac, Oliver, and Donghee, with whom I spent countless hours running different statistical models and debugging codes. I gratefully recognize the help of Sebastian and Atsuko for lending biological explanations for all the questions I had regarding Photosynthesis. Also, Thanks to the lovely family of STT for your overwhelming love and support. Thanks to my roommates, Anurag and Dipti, for making East Lansing a home away from home. Thank you, Atri, Rejada, and Tathagata, for countless hours of playing FIFA and letting me win. Thank you, Anushree, Alex, Shreya, and Sneha, for constantly listening to me rant and talk things out when things became too severe. Lastly, my family deserves endless gratitude: my father for teaching me to appreciate persistence and inculcate passion for research, my mother for teaching me the act of being selfless, and my brother for teaching me that it is not over until it is over. To my family, I give everything, including this. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTER 1 PHENOME-BY-GENOME-BY-ENVIRONMENT INTERACTIONS AND THE SCOPE OF DATA SCIENCE . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Photosynthetic model and Electron transfer chain . . . . . . . . . . . . . . . . . . 3 1.3 Biological Questions and Big Data Platform . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Facilitating science to generate "benchmark" modles . . . . . . . . . . . . 7 1.3.2 Generating Hypothesis and regulatory Pathways . . . . . . . . . . . . . . . 9 1.4 Scope of Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 CHAPTER 2 FINDING THE BEST TOOL FOR GENOME-ENABLED-PREDICTION: A COMPARISON STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Methods and Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Bayesian Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Genomic-BLUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.3 LASSONET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Simulation setup 1: Significant markers in equispaced locations . . . . . . 28 2.3.2 Simulation setup 2: Significant markers in one cluster . . . . . . . . . . . . 29 2.3.3 Simulation setup 3: Significant markers in two clusters . . . . . . . . . . . 30 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 CHAPTER 3 BAYESIAN LATENT FACTOR MODELS TO DIFFERENTIATING GE- NETIC AND MECHANISTIC BASES OF PHOTOSYNTHESIS . . . . . . 32 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 Linakage maps using QTL mapping . . . . . . . . . . . . . . . . . . . . . 35 3.3 Bayesian Latent Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Bayesian Factor Analysis (BFA) . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Bayesian Canonical Correlation Analysis (BCCA) . . . . . . . . . . . . . . 40 3.3.3 Bayesian Group Factor Analysis (BGFA) . . . . . . . . . . . . . . . . . . 42 3.3.4 Mean Field Variational Approximation . . . . . . . . . . . . . . . . . . . . 43 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 CHAPTER 4 CMPLE TO DECODE PHOTOSYNTHESIS USING THE MINORIZE- MAXIMIZE ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 vi 4.1.1 General background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.2 Contributions to the literature . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Models and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.2 Correlation modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Standard deviation modeling . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Estimation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.1 Composite likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 The MM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.1 Simulation design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.2 Method of analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.4 Computational advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.6 Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.6.2 Method of analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.6.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 CHAPTER 5 IMPACT OF THIS DISSERTATION . . . . . . . . . . . . . . . . . . . . . 95 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 APPENDIX A SUPPLEMENT FOR PHENOME-BY-GENOME-BY-ENVIRONMENT INTERACTIONS AND THE SCOPE OF DATA SCIENCE . . . . . 99 APPENDIX B SUPPLEMENT FOR CMPLE TO DECODE PHOTOSYNTHE- SIS USING THE MINORIZE-MAXIMIZE ALGORITHM . . . . . 102 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 vii LIST OF TABLES Table 2.1: Prediction performance from simulation setup 1 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Table 2.2: Prediction performance from simulation setup 2 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Table 2.3: Prediction performance from simulation setup 3 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 4.1: Results of the simulation study for scenario 1 with 𝑛 = 261, 𝑝 = 2, 𝑞 = 4. All entries of the table except for the true parameter values are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error . . . . . . . . . . . . . . . . . . . . 83 Table 4.2: Results of the simulation study for scenario 2 with 𝑛 = 600, 𝑝 = 2, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error . . . . . . . . . . . . . . . . . . . . 83 Table 4.3: Results of 𝛼 parameters from the simulation study for scenario 3 with 𝑛 = 500, 𝑝 = 6, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error . . . . . . . . . . 84 Table 4.4: Results of 𝛿 parameters from the simulation study for scenario 3 with 𝑛 = 500, 𝑝 = 6, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error . . . . . . . . . . 85 Table 4.5: Average computation time (in seconds) using the MM algorithm and direct optimization (DOP) via the optim function with the “L-BFGS-B” method for 100 simulations under different scenarios. . . . . . . . . . . . . . . . . . . . . 86 viii Table 4.6: Parameter estimates and the 95% confidence interval in parentheses of the parameters of the standard deviation model for the measured phenotypes from the cowpea dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 4.7: Parameter estimates and 95% confidence interval in parentheses of the param- eters of pairwise correlation among the measured phenotypes from the cowpea dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 4.8: Average marginal effect estimates and 95% confidence interval in parentheses of the pairwise correlations between phenotypes based on predictors. . . . . . . 91 Table 4.9: Pairwise correlation estimates and 95% confidence interval in parentheses of the measured phenotypes from all genetic combinations of Marker 1 and Marker 2 from the cowpea dataset under 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 temperature. . . . . . . . . . . 92 Table 4.10: Pairwise correlation estimates and 95% confidence interval in parentheses of the measured phenotypes from all genetic combinations of Marker 1 and Marker 2 from the cowpea dataset under 𝐿𝑜𝑤 temperature. . . . . . . . . . . . . 92 Table B.1: Simulation results for 𝛼 parameters under scenario 4 with 𝑛 = 1000, 𝑝 = 10, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, DOP: direct optimization, MM: minorize-maximize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Table B.2: Simulation results for 𝛿 parameters under scenario 4 with 𝑛 = 1000, 𝑝 = 10, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, DOP: direct optimization, MM: minorize-maximize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 ix LIST OF FIGURES Figure 1.1: Real data example where we identify example of photoprotection and pho- todamage been regulated by different genetic variations at different environ- mental conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Figure 1.2: Simplified schematics for regulating light energy capture and storage by plant photosynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Figure 1.3: Flowchart of photo-protection and photo-damage through purely correlative scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Figure 1.4: Three basic mechanistic models describing proposed processes that can limit the LPs of photosynthetic and photoprotective mechanisms . . . . . . . . . . . 10 Figure 1.5: Relationships among measured parameters, predicted model behaviours and clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 1.6: Correlations among different phenotypes under different conditions: A) Control/Pre-stress, B) DHS, C) recovery after DHS (RecD), D) LHS, and E) recovery after LHS (RecL). . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Figure 2.1: Choice of 𝜆 and LASSONET path . . . . . . . . . . . . . . . . . . . . . . . . 27 Figure 3.1: Variation in photosynthetic parameters and leaf temperature across the differ- ent treatments. A-H) Violin and box plots showing the distribution of various parameters among the RILs and parental lines. The red marker indicates the mean of all genotypes. Con= Control, DHS=dark heat stress, LHS=light heat stress , RecD=recovery after DHS and RecL=recovery after LHS. I-J) Correlation and density plots between 𝑞𝐿 and 𝜙 𝐼 𝐼 under Control and I) DHS or J) LHS using the raw data for each treatment. . . . . . . . . . . . . . . . . . 34 Figure 3.2: Genetic and phenotypic linkages among multiple photosynthetic processes. LOD scores for different parameters are presented for Control/Pre-stress (Con) (left most panel), DHS (middle panel), LHS (right most panel).Chromosomes are separated by transparent colors with faint lines for borders. . . . . . . . . . 36 Figure 3.3: Genetic and phenotypic linkages among multiple photosynthetic processes. LOD scores for different parameters are presented for Recovery after DHS (RecD) (left panel) and Recovery after LHS (RecL) (right panel). Chromo- somes are separated by transparent colors with faint lines for borders. . . . . . . 36 x Figure 3.4: A hypothetical illustration of expected relationships between latent variables, correlations among measured parameters, and genetic components . . . . . . . 38 Figure 3.5: Bayesian Factor Analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under Control/Pre-stress(Con) (left most panel), DHS (middle panel), LHS (right most panel). . . . . . . . . . . . . . . . . . . 46 Figure 3.6: Bayesian Factor Analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under Recovery after dark heat stress (RecD) (left panel), and Recovery after light heat stress (RecL) (right panel). . . . . . . 46 Figure 3.7: Bayesian Canonical Correlation analysis coupled with QTL mapping for ge- netic linkage between photosynthetic traits under Control (Con) and dark heat stress (DHS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Figure 3.8: Bayesian Canonical Correlation analysis coupled with QTL mapping for ge- netic linkage between photosynthetic traits under dark heat stress (DHS), and light heat stress (LHS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Figure 3.9: Bayesian Group factor analysis of photosynthetic traits under Control (Con), dark heat stress (DHS), light heat stress (LHS), Recovery after dark heat stress (RecD) , and Recovery after light heat stress (RecL). . . . . . . . . . . . . . . . 51 Figure 3.10: QTL analysis of resulting LVs from BGFA . . . . . . . . . . . . . . . . . . . . 52 Figure 4.1: Average computational time comparison between CMPLE and direct opti- mization method (DOP) for 100 simulations . . . . . . . . . . . . . . . . . . . 86 Figure 4.2: QTL plot of different phenotypes used in the Cowpea RIL data. LOD thresh- old for each phenotype is marked by the bold horizontal line. QTL with a LOD higher than that can be considered significant. Chromosomes are marked by the vertical lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Figure A.1: Light and temperature effects on LEF and photosystem II quantum efficiency (𝜙 𝐼 𝐼 ). Each parameter was plotted as a function of the square root of the ambient photosynthetically active radiation (PARamb, X-axis) and leaf tem- perature (Tleaf, coloration of points). (a) Dependencies of LEF measured at PARamb; (b) LEF measured at 10 s high light (𝐿𝐸 𝐹ℎ𝑖𝑔ℎ ); (c) the high light- induced differences in LEF (𝐿𝐸 𝐹ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 ); (d) the PSII quantum efficiencies measured under ambient PAR (𝑃ℎ𝑖2𝑎𝑚𝑏 , points coloured by Tleaf) and at 10 s high light (𝑃ℎ𝑖2ℎ𝑖𝑔ℎ , grey points). . . . . . . . . . . . . . . . . . . . . . . . 99 Figure A.2: Gaussian Mixture Model (GMM) clustering of LEFamb (Panel A) and corre- lation matrixes between LEFamb, PARamb and leaf temperature (Tleaf) for each cluster (Panel B). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 xi Figure A.3: Gaussian Mixture Model (GMM) clustering of LEFhigh (Panel A) and corre- lation matrixes between LEFhigh, PARamb and leaf temperature (Tleaf) for each cluster (Panel B). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure B.1: Correlation Modeling Under Pairwise Likelihood Estimation (CMPLE) workflow102 Figure B.2: Correlation network under DHS . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure B.3: Correlation network under DHS and LHS . . . . . . . . . . . . . . . . . . . . 105 xii CHAPTER 1 PHENOME-BY-GENOME-BY-ENVIRONMENT INTERACTIONS AND THE SCOPE OF DATA SCIENCE Portions of this chapter appeared in the following publication: A. Kanazawa, A. Chattopadhyay, S. Kuhlgert, H. Tuitupou, T. Maiti, and D. M. Kramer, “Light potentials of photosynthetic energy storage in the field: what limits the ability to use or dissipate rapidly increased light energy?,” Royal Society of Open Science, vol. 8, p. 211102, 2021 1.1 Background Generation and testing of models (or hypotheses) are essential components of the scientific method. Development of Artificial Intelligence (AI) and Machine Learning(ML) promises algorithms, tools and techniques that can identify previously unseen connections between phenomena. Though these AI methods can reveal new correlations and connections, they do not provide mechanistic insights. Indeed, it is often unclear how the unseen networks of ML operate, or if the algorithms they develop have any relationship to the true mechanisms that govern the phenomena. Indeed, different ML approaches may lead to similar predictions, but with mechanistic algorithms,inference might be unrelated to the true physical processes. This issue also affects the robustness of AI/ML for making universally-applicable predictions. The fact that multiple, non-mechanistic algorithms can fit limited sets of data gives rise to the phenomenon of "overfitting" in which model outputs can provide excellent fits to subsets of data that are not universally applicable. More physically realistic models may overcome these issues by constraining algorithms to those that are physically realistic (based on universally-applicable models). Lack of tethering of AI/ML to physical models motivates our proposed work to "bridge the gap" between scientifically feasible phenomenons and AI driven models using experimental data. The aims include the development of tools that allow AI algorithms to be compared to or constrained by hypothetical (physically-relevant) models, thereby enabling "classical" scientific 1 hypotheses testing as well as the generation of more universally applicable models, reducing the occurrence of overfitting. We chose as a use case the understanding how solar energy transduction enables and limits the energy productivity of crops, is critical for improving the productivity and resilience of crops in a rapidly changing world. Recent development of large scale genotyping and phenotyping technologies demonstrates the opportunity to harness natural and induced variation in photosynthetic processes across various environmental conditions. The major scientific challenge is to understand the complex interactions among the genomics, environment and performance (phenotypes) of plant photosynthesis–a hyper-dimensional problem that is difficult for unaided human understanding. Our ultimate aim is to enable global analyses of the flood of data from these technologies to generate and test models relevant to meaningful biological functions. These models will represent hypotheses that can be directly tested using more reductionist approaches in the lab. These tools and models will then be used to identify genetic components that can account for the observed diversity of genotype and phenotype variation, and can be used as targets for advanced breeding and engineering efforts . New high-throughput phenotyping platforms that can rapidly measure multiple phenotypes, allow us to compare the genomic associations of multiple traits. Such platforms generate data “hyper-cubes” that can relate a wide range of parameters (reflecting potentially linked traits), metadata (e.g. environmental conditions) and genomic content. Here, we explore the possibility of using such “co-association” (or co-segregation) maps of hyper-cubic data sets to test models that predict functional and genetic linkages among a range of photosynthetic regulatory processes. We propose to develop a new class of generative models based on dimensioanlity reduction methods for high dimensional phenomics network. These networks provide biologically significant clusters of interrelated photosynthesis traits which can be regarded as fundamental mechanisms across different genotypes and different environmental stress. Representation of such dynamics across "Genotype × Phenotype × Environmental" space is crucial to understand the mechanistic bases of the "true" phenomenon and motivates the researchers to use such structural models with different 2 crops in different climates. 1.2 Photosynthetic model and Electron transfer chain Consider the case of light capture by photosynthesis [1]. In chloroplasts, photosynthesis can be initiated when light energy is absorbed by pigments (chlorophylls and specific carotenoids). Using high throughput plant phenotyping tools, it is possible to rapidly measure a range of parameters that reflect distinct, mechanistically-related processes related to photosynthetic efficiency on different genotypes under various environmental conditions (light intensity, temperature, humidity, 𝐶𝑂 2 levels, time and location). Interpretation of these parameters is based on the literature [2, 3, 4]. Also, through affordable sequencing processes, the gene expressions of any population can be easily obtained, and one can identify SNP markers that can be significantly associated with any phenotype of interest. Under environmental stresses, e.g., high light intensities, high or low temperatures, lack of water, light input can exceed the capacity to perform photochemistry. This leads to the buildup of photochemical intermediates that can initiate the formation of reactive oxygen species and subsequent photodamage to the photosynthetic machinery, while decreasing the efficiency of pho- tochemistry [5, 6]. Chloroplasts can protect themselves from photodamage by activating various “nonphotochemical quenching” (𝑁 𝑃𝑄) processes that dissipate absorbed light energy, decreas- ing the accumulation of reactive intermediates. While 𝑁 𝑃𝑄 can alleviate photodamage, it also decreases photochemical efficiency, and thus the regulation of 𝑁 𝑃𝑄 is finely adjusted by the chloroplast to balance these tradeoffs. There are multiple forms of 𝑁 𝑃𝑄 (rapidly formed “energy-dependent”, 𝑞𝐸 and slowly activated photo-inhibitory quenching, 𝑞𝐼), which are activated under different environmental conditions and modified by genetic variations [7, 8]. These altered 𝑁 𝑃𝑄 responses (“total” 𝑁 𝑃𝑄, designated 𝑁 𝑃𝑄𝑡 [9]) can contribute to the canonical 𝑞𝐸 mechanism where the prediction is that the extent of 𝑞𝐸 will be positively associated with increased lumen acidification, which will be reflected in a positive correlation between 𝑁 𝑃𝑄𝑡 and the thylakoid pmf, in our case estimated by the 𝐸𝐶𝑆𝑡 3 parameter [10]. This predicted association should be modified or broken down under certain conditions or in mutants that lack key components required for activation of the 𝑞𝐸 response. In some cases, the breakdown in normal photoprotective mechanisms can lead to the buildup of a large fraction of photodamaged PSII centers, as reflected in increased 𝑞𝐼 (slowly reversible 𝑁 𝑃𝑄 associated with photodamage). The associated loss of photochemical activity can lead to decreased electron and proton transfer, resulting in decreased 𝑝𝑚 𝑓 , which will be reflected in a negative correlation between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡. As an empirical evidence, Figure 1.1 shows a positive correlation between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡 in a cowpea RIL population under the control temperature and for a genetic combination of QTL markers in chromosomes 4 and 9 (genotypes with AA allele for both markers) [11] . On the other hand, 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡 are negatively correlated under the low temperature (Chilling stress) and for a different genetic combinations of QTL markers in chromosomes 4 and 9 (genotypes with AA allele the first marker and BB allele for the second). Control Temperature Low Temperature 10 2.5 8 2.0 NPQt NPQt 6 1.5 4 2 1.0 0.10 0.15 0.20 0.25 0.30 0.35 0.05 0.10 0.15 0.20 ECSt ECSt Figure 1.1: Real data example where we identify example of photoprotection and photodamage been regulated by different genetic variations at different environmental conditions Under ideal conditions, a large fraction of solar energy is used to drive photochemical reactions. 4 This fraction is usually termed the quantum yield of photochemistry. Productive photochemistry induces a series of electron and proton transfer reactions, resulting in the formation of biochemical energy-storing products, ATP and NADPH, which in turn are used to drive the fixation of 𝐶𝑂 2 and other cellular processes. These electron transfers involve two chlorophyll-containing complexes, Photosystem I (PS I) and Photosystem II (PS II), which are essentially connected by the cytochrome b6f complex and mobile electron carriers plastoquinone/plastoquinol (PQ/PQH2) and plastocyanin (PC). In “non-cyclic photophosphorylation”, PS II oxidizes the water and releases protons into the lumen, which travels down an electron transport chain to PSI while forming an electrochemical proton gradient (pmf, proton motive force) and passes to NADP+ to make NADPH (Figure 1.2). Under different abiotic stresses, plants regulate their photosynthetic machinary by triggering various nonphotochemical quenching processes (𝑁 𝑃𝑄). Process (A) (Energy-dependent 𝑁 𝑃𝑄 (𝑞𝐸)) activated by acidification of the thylakoid lumen resulting in quenching excitation energy through the 𝑞𝐸 mechanism. This is reflected by the positive correlation between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡. On the other hand, formation of reactive oxygen species can damage PS II, resulting in Process (B) (long-lived photoinhibition-related 𝑁 𝑃𝑄 (𝑞𝐼)) and decrease the number of active PS II centers which can be observed by the negative correlation between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡. These two processes are illustrated in the Figure 1.3. It is noteworthy that these two forms of 𝑁 𝑃𝑄 are both induced under conditions where light input exceeds capacity and have similar effects on photochemical efficiency. 5 Damaged PSII PSII repair X B H2O O 2 qI PSII damage NADPH + 1O 2 ATP ADP + Pi H+ Fd H2O O H+ H+ H+ 2 ZE stroma V Z A lumen VDE H2O - O H+ H2O O 2 qE 2 H+ H2O O + H+ 2 PsbS LHCII PSII PQ/PQH2 b6f PC PSI ATP synthase H+ H+ Figure 1.2: Simplified schematics for regulating light energy capture and storage by plant photo- synthesis However, the 𝑞𝐸 form is typically considered to act as a primary photoprotective mechanism and is readily reversed. In contrast, the qI form involves protein damage, the repair of which requires degradation and resynthesis of the PS II D1 protein, and is thus considered to reflect more severe responses [12]. This patterns are highly influenced by a number of other factors as well. [13] showed that at under lower 𝐶𝑂 2 and increasing light, there is a rapid drop in the yield of PS II (𝜙 𝐼 𝐼 ) and a corresponding rapid rise in the yield of 𝑁 𝑃𝑄, together with a decrease in 𝑞𝐿. But, under high 𝐶𝑂 2 there is a slower drop in the yield of PS II and 𝑞𝐿 with increasing light, and slower rise in the yield of 𝑁 𝑃𝑄. This shows that multiple parts of the photosynthetic machinery indulges in co-regulating the quenching behaviours. 6 A: qE mechanism B: qI mechanism Genetic Environment Genetic Environment diversity Condition diversity Condition Photo-protection Photo damage + _ NPQt ECSt NPQt ECSt Figure 1.3: Flowchart of photo-protection and photo-damage through purely correlative scheme This apparent connections between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡 is further impacted by other photosynthetic responses, e.g., 𝜙 𝐼 𝐼 , 𝑞𝐿, etc. as illustrated below. Thus, we have a complex interactions among numerous phenotypic responses which can impact the "beneficial" photoprotective and "harmful" photoinhibitary (photodamage) mechanisms. In real world, the fine balance of such phenotype associations breaks down with certain genetic and environment predictors. Here we aim to associate genetic markers with the corresponding combination of environmental conditions modulating the contributions from these two forms of regulatory mechanism. In fact, by identifying this co- segregation, we can gain better insights about the genetic and environmental determinants of variations in biological system. 1.3 Biological Questions and Big Data Platform 1.3.1 Facilitating science to generate "benchmark" modles Imagine yourself as a manufacturer of a new model of a car. You have built a world class engineering laboratory to design and produce a car which might be the future of automobile industry. The only problem is you are yet to get a licence to test your models of car outside the lab. So, you would 7 not be able tell how it might behave on freeways or winding roads or how the tires might fare in snowy conditions. Even if you have carefully designed the new car, you are quite uncertain to assess its performance in real road conditions. A similar problem has been faced by the scientists studying plants, and in particular, photosynthesis, the process by which plants converts light energy into chemical energy generating all our food. This natural process involves net movement of electrons through a series of electron carriers performing a series of chemical reactions known as the light-dependent reactions. In short, Light energy is absorbed by pigment molecules,which passes excited electrons to an electron transport chain activating energetically "downhill" flow of electrons,and thus leading to synthesis of ATP and NADPH. Photosynthesis experts around the globe have been studying photosynthesis in their lab with sophisticated instruments and under specific controlled conditions. From such "reductionist" experiments, researchers are able to dissect complex processes of electron transfers into different component parts of the photosystems. Results from such researches provide a detailed framework of the wonderful biological machine which has powered life for over a billion years. Such versatile platform enables researchers to demonstrate novel processes under dynamic en- vironmental conditions. Not only that, Science have enlightened us with the genomic artifacts of photosynthetic systems which uses genomic sequencing to identify combinations of genetic loci associated with specific traits and generate elite lines with combinations of those traits through marker assisted breeding. Co-assessing genomic information identifies potentially important ge- netic loci, helping plants to cope with environmental changes and perils. Coupling sophisticated Phenotyping with Gene-sequencing explore the possibility of test models to predict functional and genetic linkages among a range of Photosynthetic regulatory processes. With the reproducible characteristics at the core of the data generating mechanism, such platform and data generated has the potential to serve as the "benchmark" models for Phenomics applications. 8 1.3.2 Generating Hypothesis and regulatory Pathways Biological schematics do not always behave as expected outside the lab. Photosynthesis is highly sensitive to rapid changes in environmental conditions such as light, temperature, humidity, and the availability of water and other nutrients. Understanding different photosynthetic parameters in rapid fluctuations in environmental conditions are critical for plant productivity and the avoidance of photodamage. With this goal of bringing "Nature to the lab”, We developed an experimental approach using the open science PhotosynQ platform to probe the “Light Potentials" of photo- synthetic processes to rapid increases or decreases in ambient light In this work, we describe an approach to studying the extents and mechanisms or the diversity of such dynamic responses in the field. In a selected set of data on Mentha, we show that the capacity to increase LEF and NPQ upon rapid increases in light are strongly suppressed in leaves previously exposed to low ambient PAR or low leaf temperature. A simple linear effects model applied over the entire data set indicated strong correlations between LEFamb, PARamb, and Tleaf, suggesting that both environmental factors controlled LEFamb. However, such correlations may be coincidental since PAR and Tleaf are both expected to be dependent on weather or time of day, as it is clear from the solid statistical correlations between PAR and Tleaf. Also, the effects are likely to be co-dependent. For example, at low PARamb, LEFamb should be light-limited and thus have minimal dependence on Tleaf. Still, at higher PARamb, it may be more strongly controlled by temperature-dependent processes. One approach to disentangling these effects would be to slice the data into segments, e.g., at different ranges of PARamb, and test for correlations with Tleaf within each piece. However, arbitrary-chosen ranges for the details can add bias or fail to detect more complex interactions. We thus applied a Gaussian Mixture Model (GMM) clustering approach based on those presented earlier. Because GMM is an unsupervised machine learning method, it can reduce bias in selecting clusters representing regions of distinct interactions among environmental and photosynthetic parameters. GMM assumes that the data points from the population of interest are drawn from a combination (or mixture) of Gaussian distributions with specific parameters and performs an 9 optimization scheme to a sum of several Gaussian distributions, allowing for an utterly unsupervised process, avoiding potential user bias. An expectation-maximization (EM) algorithm was used to fit the GMM to the dataset, generating a series of Gaussian components (clusters) with distributions characterized by specific means and covariance matrices. The optimal number of groups was determined using the Bayesian Information Criterion (BIC), the value of the maximized log- likelihood, with a penalty on the number of parameters in the model. This approach also allows the comparison of models with differing parameterizations and differing numbers of clusters because the volumes, shapes, and orientations of the covariances can be constrained to those described by defined models. Clusters obtained through GMM are within the cluster (intracluster) and between cluster (inter- cluster) variations. Intracluster variations can be analyzed to determine variations in the interactions between parameters and variations in environmental conditions, e.g., to assess if a relationship is modulated in different ways under different ranges of conditions. Also, as will be seen in the Dis- cussion, intercluster variations (differences in the mean and covariances between clusters) can be used to differentiate distinct patterns of behavior, or mechanistic interactions, between conditions. Figure 1.4: Three basic mechanistic models describing proposed processes that can limit the LPs of photosynthetic and photoprotective mechanisms Using an unsupervised statistical clustering approach, we showed that these effects could be independent of each other under some environmental conditions while likely interacting under 10 others. This enables to compare the responses of multiple photosynthetic processes, and we were able to test for contributions from several mechanistic Models (Figure : 1.4) for limitations to LEF and NPQ potentials: 1) Limitations in photosystem I (PSI) electron acceptors; 2) increased thylakoid proton motive force (pmf) leading to rapid increases in NPQ in the form of qE, and 3) increased pmf leading to robust photosynthetic control of plastoquinol oxidation at the cytochrome b6f complex (PCON). Figure 1.5: Relationships among measured parameters, predicted model behaviours and clustering Figure: 1.5b plots the dependence of 𝑁 𝑃𝑄 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 , which can be attributed to light-induced qE changes, on light-induced pmf changes (𝐸𝐶𝑆𝑡 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 ). A generally positive correlation was observed between 𝑁 𝑃𝑄 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 and (𝐸𝐶𝑆𝑡 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 ), but with high variability, especially at higher values. Applying the clustering obtained for Figure: 1.5a on top of the data in Figure: 1.5b, we see that this variability can be explained by the environmental conditions and the modes of behaviours. 11 Specifically, we see clear evidence for condition-dependent suppression of rapid activation of qE in response to increases in pmf. Particularly, the sensitivities of 𝑁 𝑃𝑄 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 to 𝐸𝐶𝑆𝑡 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 , as indicated by the slopes in figure 8b, were smallest in clusters 1 (slope ∼ 1.6) and 2 (slope ∼ 17.7), which comprise those with Model 3-like behaviour and occurred at low Tleaf and PARamb values. Higher sensitivities of 𝑁 𝑃𝑄 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 to (𝐸𝐶𝑆𝑡 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 ) were seen for clusters 3 (slope ∼ 28.1) and 4 (slope ∼ 35.1), which comprised those associated with Models 2 and intermediate, and occurred at higher Tleaf and PARamb values. To assess what controlled the switch between Models 2 and 3, we performed GMM (using 𝑞𝐿 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 , 𝑃+ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 , Tleaf as inputs). Four distinct clusters were observed (see symbol colours, 1.5a). Intercluster comparisons show that points in clusters 1 and 2 fell exclusively in the region predicted for Model 3. Cluster 3 fell entirely within the region predicted for Model 2. Cluster 4 extended between these regions, possibly indicating contributions from both mechanisms. The clusters falling in the Model 3 region were associated with relatively low Tleaf (1.5c) and PARamb (1.5d), compared with those associated with Model 2 or intermediate behaviours, suggesting that Model 2 prevailed at higher Tleaf and/or PARamb, while Model 3 prevailed at lower values. Within the GMM clusters , 𝑞𝐿 ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 was dependent predominantly on Tleaf (cluster 3), PARamb (clus- ter 1), or both (clusters 2 and 4). This dependence suggests that Tleaf and PARamb acted either independently or cooperatively, depending on conditions, affecting the propensity for photosynthe- sis to adopt Model 2 or 3 behaviours. As a first-order test of the robustness of these clusters by re-analysing randomly selected subpopulations of the data. Over there, we obtained comparable results, i.e. that we would interpret in similar ways, with as few subpopulations as small as 25% of the full dataset, suggesting that the clustering approach was reasonably robust. In summary, We found no evidence for Model 1 under any of our conditions, indicating that in Mentha, under our conditions, PSI was maintained in oxidized forms. At higher leaf temperatures, Model 2 prevailed, meaning robust control of induced LEF by NPQ. Strikingly, at lower leaf temperatures, we saw evidence for Model 3, where high light-induced increases in pmf but not in NPQ, resulting in a net reduction of QA and oxidation of P700. Thus, the results reveal considerable 12 temperature-dependent limitations to NPQ, independent of the formation of pmf, that result in the shape of states likely to produce reactive oxygen species. This low-temperature limitation may thus represent a new target for improving the efficiency and robustness of photosynthesis. 1.4 Scope of Data Science As the concept of co-association maps gain popularity among the Photosynthesis community, both practitioners and scientists aim to harness the possibility of simultaneously gather data and generate models to support hypothesis invoked from it. In particular, with statistical guarantees researchers were able to draw insights from such data and test hypothesis in the context of biological discovery. With the support of both experiments performed in lab and data from field, we want to accomplish a number of biological goals with the theme of this dissertation. In particular, we aim to answer the following questions: Biological Query 1: Genome-wide regression and prediction performance Genetic studies are highly complex in nature as it involves the analysis of high dimensional data, with phenotype(s) being regressed upon a large amount of predictor variables. Even under linear setup, the inherent association among the SNPs and the "heritability" factor make the estimation even complex. One such example would be of a complex phenotype where genetic markers from a specific cluster is attributable to a given phenotype. Since different statistical models addresses the regression problem differently and the nature of "true" association between the response and the predictor variables are unknown, it is near impossible to make robust inference both in terms of prediction and variable selection. Here, we explore various methods like Bayesian linear models, G-BLUP (incorporates genome information) and a shallow neural network approach, called LASSONET to make comparisons for prediction accuracy under varied complexity. Based on synthetic datasets encompassing various situations, we discuss the prediction performance and point out several advantages and disadvantages for different methods. 13 Biological Query 2: Functional and Genetic linkages for Photosynthetic regulatory Process under different conditions Plants behave differently with changes in different environmental conditions, for example with respect to excess heat, or excess light or combinations of both(we call this as stress). These behaviours of plants observed under different stress conditions are reflected by phenotypic variations in measured photosynthetic parameters causing natural variation in photosynthetic processes under diverse environmental stresses. The differences in the interconnections between the photosynthetic parameters are evident from the following correlation matrices measured under different stresses: Figure 1.6: Correlations among different phenotypes under different conditions: A) Control/Pre- stress, B) DHS, C) recovery after DHS (RecD), D) LHS, and E) recovery after LHS (RecL). Studies confirm these interconnections among different phenotypes are responsible for mechanistic bases for adaptations to specific processes of photosynthesis that allow for greater fitness in specific environments. Also, we know about natural variations within a species has come from studies that associate measured phenotypes with specific genomic components, resulting in the familiar quantitative trait loci (QTL) maps. Our objective is to identify specific genomic regions which can potentially different photosynthetic process across different conditions. Biologically speaking, 14 we explore the possibility of using such “co-association” (or co-segregation) maps of hypercubic data sets (across "Genotype × Phenotype × Environmental" space) to test models that predict functional and genetic linkages among a range of photosynthetic regulatory processes. We explore statistical methods for assessing potential multidimensional linkages and discuss the types of scientific questions that can be asked using the approaches, as well as potential pitfalls. Biological Query 3: Correlation modeling of multivariate phenotypes in terms of genetic and environmental variables Quantitative genomics experiments aim to reveal underlying mechanisms that link genotypic variations with multiple biological responses (phenotypes). Interactions/correlations among vari- ous phenotypes give new insights into how genetic diversity may have tuned biological processes to enhance fitness under diverse conditions. Dealing with multivariate phenotypes along with the genetic and environmental interactions is a challenging task. One advantage of multivariate GWAS over univariate GWAS is that multivariate GWAS can handle across-trait correlation. But, mul- tivariate GWAS does not explicitly model across-trait correlation in terms of relevant predictors. Also, there exist several statistical methods that deal with variance covariance estimation under generalised linear model. But, in our knowledge, there are no work that can model the interactions between the multiple responses in terms of predictor variables. We therefore provide an innovative framework to dissect genetic configurations behind photosynthetic mechanisms by modeling stan- dard deviations and pairwise correlations among a set of multivariate phenotypes through genetic and environmental predictors. Specifically, our framework, "Correlation Modeling under Pairwise Likelihood Estimation", abbreviated as CMPLE is capable to recover pertinent biological models arising from multi-omics platforms, such as high-throughput phenotyping and genome sequencing. Besides, this procedure has the aspect of many desirable qualities, such as efficient computation, interpretability, and statistical foundation. Our main contributions in this regard are as follows • Note that, conventional maximum likelihood estimation of parameters fails when one per- forms standard deviation modeling and pairwise correlation modeling (in terms of predictors). 15 We have developed a pairwise-composite likelihood approach to estimate the model param- eters from the functional forms of standard deviation and correlations among multivariate responses. • We have proposed a gradient Minorize-Maximize (MM) algorithm to efficiently estimate the model parameters. This guarantees optimization convergence and computational advantages under the given setup. • Implementing CMPLE on a motivating dataset from a population of cowpea (Vigna unguic- ulata. (L.) Walp.), we have identified specific genetic variations affecting distinct biological mechanisms, namely “Photoprotection” and “Photoinhibition” under various environmental conditions. In summary, this research advances statistical approaches for the inference of predictors associated with pairwise correlations in a multivariate setup. Instead of estimating the covariance/precision matrix, the pairwise correlation modeling is relevant in Phenotype × Genotype × Environment association studies and helps discover novel bio-physiological pathways in Photosynthesis. 16 CHAPTER 2 FINDING THE BEST TOOL FOR GENOME-ENABLED-PREDICTION: A COMPARISON STUDY 2.1 Background Plummeting cost of gene sequencing and genetic marker assays has effectively made it possible to apply them to thousands of individuals in genetic studies. By integrating high-throughput phe- notyping data with genomic information, scientists have led statistical genomics into a new era of revolution. The data quality and volume have helped researchers develop tools and techniques that can be efficiently applied for advanced breeding and cultivar improvement. Genome-wide associa- tion studies (GWAS) explore an individual’s genetic potential and estimate phenotypes using single nucleotide polymorphism (SNP) markers. This process is known as "genome-enabled prediction" (genomic prediction) and can be used to determine breeding program selections (genomic selec- tion). Commonly, genomic prediction is applied early in a breeding program to increase the overall selection procedure and thus helps increase the rate of genetic gain in multiple applications. Whole-genome-regression (WGR) technique was initially proposed by Meuwissen et al. [14] and has been extensively applied for the genomic analysis of complex traits in plants [15], animals [16] and humans [17, 18]. In WGR, the response phenotypes are regressed over a large number of genetic markers concurrently, invoking a statistical challenge of the "curse of dimensionality" [19] in which the number of predictor variables (e.g., SNPs) is often much larger than the number of observations. This large-p-with-small-n regression (where 𝑛 represents sample size and 𝑝 denotes the number of predictors) has been extensively studied in statistical and machine learning (ML) literature. Some statistical methods developed for the cause include Bayesian Ridge Regression (BRR), G-BLUP, BayesA, BayesB, BayesC, Bayesian Lasso, etc. Machine learning techniques, such as Reproducing Kernel Hilbert Space, Gradient Boosting Trees, Random Forest, and Artificial Neural Network have also been implemented to cope with the challenge of high dimensional 17 regressions and have proven to be effective in genetics. However, there are concerns regarding the best tool which can be used for simultaneous feature (marker) selection and complex trait (phenotype) prediction. Most statistical methods for phenotype prediction in WGR have linear models as their backbone. Nevertheless, in real datasets, the association’s nature might also be nonlinear. Machine learning methods such as Shallow neural networks (e.g., single layer NNs) have implemented nonparametric high dimensional regression using nonlinear models and been utilized in multiple applications of the plant, animal, or human genetics. Some studies reflect that NNs can be efficiently trained to obtain high prediction accuracy, but no consistent evidence exists that NNs can outperform linear models. It has been documented that the results obtained from NNs are highly dependent on the genetic architecture, marker density, sample size, span of linkage disequilibrium, and the traits of interest of the species [18]. Hence, empirical evidence suggests that no single approach has uniform superiority across data sets and traits. Another aspect of WGR deals with choosing optimal SNP markers with high predictive accuracy for the phenotype of interest. Though there have been a very large number of SNPs that are genotyped for a study, most of the methods deal with one SNP at a time for genomic selection. There are several reasons to consider all the SNPs for analysis. The marginal effects of each SNP might have a very different effect from their joint effects. One example of this behavior would be an SNP which is not related to a disease, but correlated with a causal SNP, and will have a marginal association with the disease. Another example might be to think of a situation where several SNPs may have weak marginal effects but strong joint effects. Conditional on causal SNPs which are already included in the model, one would expect that false-positive signals will tend to be weakened while marginally uncorrelated causal SNPs will have a better chance of being selected. Also, the predictive power of a single SNP is assumed to be pretty low. When utilizing a large number of relevant SNPs, one can also improve the prediction power by several folds. While working with a large amount of SNPs, one usually faces the difficulty that the number and extent of spurious associations between the response phenotype and the predictor SNPs increase rapidly 18 while including many predictors into the model. Also, additional challenges are embodied due to the weak effects of causal variants and strong linkage disequilibrium (LD) among SNPs. There is a large number of studies based on variable selection methods as discussed earlier, but most of those methods are statistically inaccurate and computationally infeasible for ultra high dimensional 𝑝. One of the techniques to resolve this problem is through the sure independence screening (SIS) method [20] by first reducing the dimension to a moderate scale (below sample size) by univariate correlation learning, and then selecting important predictors by a popular variable selection method, such as the LASSO. Similarly, Wu et al. [21] reduced the dimension of predictors to a relatively smaller size using a simple score criterion and subsequently applied the LASSO. One of the shortcomings of this approach is that important features which are marginally uncorrelated with response are more likely to be missed. This is because the univariate screening step is carried out using marginal correlations. One can also modify the SIS implementation by iterative sure independence screening (ISIS) procedure. In this procedure, one can iterate the SIS procedure conditional on the previously selected features which helps in capturing meaningful features that are marginally uncorrelated with the response. In a recent approach, LASSONET tackles this problem within a Neural Network framework, where feature sparsity is attained by incorporating a skip layer [22]. It has proven to perform significantly better with simultaneous feature selection and prediction problems. In this study, we present a comparison of different methods that can be practically used for genome enabled prediction and selection problems. With the help of synthetic datasets with different level of difficulty and variability, our goal is to find out the best tool that can perform simultaneous variable selection and achieve higher prediction accuracy. 2.2 Methods and Materials 2.2.1 Bayesian Linear Models Data is collected on a single continuous response, 𝑦𝑖 , 𝑖 = 1, . . . , 𝑛, for 𝑛 individual genotypes. The data equation is given as 𝑦𝑖 = 𝜃 𝑖 + 𝑒𝑖 , where 𝜃 𝑖 is the linear predictor that models the expected value 19 of 𝑦𝑖 given the predictors, and 𝑒𝑖 are independently and normally distributed random variables with mean zero and variance 𝜎𝑒2 . The linear predictor, 𝜃 𝑖 is further expressed as 𝜃 𝑖 = 𝜇 + 𝑗=1 𝑥𝑖, 𝑗 𝛽 𝑗 , Í𝑝 where 𝜇 is the overall intercept, 𝑥𝑖, 𝑗 denotes the marker information for the 𝑗’th predictor for 𝑖’th individual, and 𝛽 𝑗 denotes the effect of the 𝑗’th predictor on the response. Let, 𝜂 represents the collection of unknown parameters: the intercept, regression coefficients, and the residual variance, expressed as 𝜂 = {𝜇, 𝛽1 , . . . , 𝛽 𝑝 , 𝜎𝑒2 }. Since, we are performing our analysis within Bayesian paradigm, we assume prior density in the following way: Ö𝑝 2 𝑝(𝜂) = 𝑝(𝜇) 𝑝(𝜎𝑒 ) 𝑝(𝛽 𝑗 ) 𝑗=1 The joint likelihood is written by, Ö𝑛 ∑︁𝑝 𝑝(𝜂|𝑦 1 , . . . , 𝑦 𝑛 ) = N (𝑦𝑖 − 𝜇 − 𝑥𝑖, 𝑗 𝛽 𝑗 , 𝜎𝑒2 ) 𝑝(𝜂) (2.1) 𝑖=1 𝑗=1 We assign a flat prior to the intercept term, 𝜇 and a scaled-inverse 𝜒2 density to the residual variance, 𝜎𝑒2 : 𝑝(𝜎𝑒2 ) = 𝜒−2 (𝜎𝑒2 |S𝑒 , 𝑑𝑓𝑒 ), where the degrees of freedom is 𝑑𝑓𝑒 (> 0) and the scale parameter is S𝑒 (> 0) . For the regression coefficients, 𝛽 𝑗, we have assigned either flat or informative priors. The choice of informative priors plays a significant role to attain the different choice of shrinkage. Here, we provide different choices for the priors, for example Gaussian prior for Bayesian ridge regression [23], scaled-t density for BayesA [14], Double Exponential or Laplace prior for Bayesian Lasso [24], mixture of point mass at zero and scaled-t slab for BayesB [14] and mixture of point mass at zero and Gaussian slab for BayesC [25]. We describe the different priors and the choice of hyper-parameters more elaborately in the following paragraph. Bayesian Ridge Regression In Bayesian Ridge Regression (BRR), the regression coefficients are assigned IID normal distribu- tions with mean zero and variance 𝜎𝛽2 . In the second level of hierarchy, we assign a scaled-inverse Chi-squared density, with parameters 𝑑𝑓 𝛽 and S𝛽 for the variance parameter. The joint distribution of the priors along with the hyper-parameters is written as, Ö𝑝 2 𝑝(𝛽1 , . . . , 𝛽 𝑝 , 𝜎𝛽 ) = N (𝛽 𝑗 |0, 𝜎𝛽2 ) 𝜒−2 (𝜎𝛽2 |𝑑𝑓 𝛽 , S𝛽 ) (2.2) 𝑗=1 20 Here, the density is parameterized in a manner, so that the prior expectation and the mode of the S𝛽 S𝛽 variance parameter are 𝐸 (𝜎𝛽2 ) = 𝑑𝑓 𝛽 −2 , and 𝑀𝑜𝑑𝑒(𝜎𝛽2 ) = 𝑑𝑓 𝛽 +2 , respectively. The values of 𝑑𝑓 𝛽 and S𝛽 are not known. For our analysis, we set the 𝑑𝑓 𝛽 as 5 and solve for the scale parameter for matching the expected R-squared of the model. In genomic studies, this is commonly known as the Best Linear Unbiased predictor (BLUP). BayesA In BayesA, the regression coefficients are modeled as a scaled-t density, with parameters 𝑑𝑓 𝛽 and S𝛽 . In our setup, this density is constructed as an infinite mixture of scaled-normal densities for computational convenience. Similar to BRR, in the first level of hierarchy, marker regression coefficients are assigned IID normal densities with mean zero and marker specific variance 𝜎𝛽2 and in the second level of hierarchy, a scaled-inverse Chi-squared density, with parameters 𝑑𝑓 𝛽 and S𝛽 is assigned for the variance parameter. The difference between BRR and BayesA is through the treatment of the scale parameter, S𝛽 . Here, the scale parameter is modeled through a gamma density with rate and shape parameters 𝑟 and 𝑠 respectively. We have set 𝑑𝑓 𝛽 as 5, 𝑠 as 1.1 and solve for the rate parameter to match the expected R-squared of the model. In BayesA, the joint distribution of the priors along with the hyper-parameters is written as, Ö 𝑝 𝑝(𝛽1 , . . . , 𝛽 𝑝 , 𝜎𝛽2 ) = N (𝛽 𝑗 |0, 𝜎𝛽2 ) 𝜒−2 (𝜎𝛽2 |𝑑𝑓 𝛽 , S𝛽 )G(S𝛽 |𝑟, 𝑠), (2.3) 𝑗=1 where G(.|., .) denotes a Gamma density. Bayesian LASSO The marginal distribution of marker effects in Bayesian LASSO (BL) is double-exponential. Fol- lowing the work of Park and Casella, we represent the double exponential density as a mixture of scaled normal densities. First level of hierarchy introduces independent normal densities with zero mean and marker specific variance 𝜏 2𝑗 × 𝜎𝑒2 on the marker effects. The residual variance, 𝜎𝑒2 is modeled as a scaled-inverse Chi-square density and the predictor specific scale parameters, 𝜏 2𝑗 are modeled as IID exponentially distributed with rate parameter 𝜆2 /2. Lastly, 𝜆2 is assigned a gamma 21 prior, 𝜆2 ∼ G(𝑟, 𝑠). For our setup, we have set 𝑠 as 1.1 and solved for 𝑟 to match the expected R-squared of the model. In BL, the joint distribution of the priors along with the hyper-parameters is written as, 𝑝 Ö 𝜆2 𝑝(𝛽1 , . . . , 𝛽 𝑝 , 𝜏1 , . . . , 𝜏𝑝 , 𝜆2 |𝜎𝑒2 ) = N (𝛽 𝑗 |0, 𝜏 2𝑗 × 𝜎𝑒2 )𝐸𝑥 𝑝(𝜏 2𝑗 | )G(𝜆2 |𝑟, 𝑠), (2.4) 𝑗=1 2 where 𝐸𝑥 𝑝(.|.) denotes an Exponential density. BayesB and BayesC In these cases, the regression coefficients are modeled as IID priors which are expressed as mixtures of point mass at zero and a slab. The slab part is structured with either scaled-t density for BayesB or normal for BayesC. These mixture priors are extensions for BayesA and BRR in the respective cases by incorporating an additional parameter 𝜋 which represents the prior proportion of non zero predictors. We assign a Beta prior, 𝜋 ∼ 𝐵𝑒𝑡𝑎( 𝑝 𝑜 , 𝜋0 ) to the mixting parameter. The beta prior is parameterized to achive 𝐸 (𝜋) = 𝜋0 . Also, 𝑝 0 > 0 is interpreted as the number of prior counts and 𝜋0 ∈ [0, 1]. If one chooses 𝑝 0 = 2 and 𝜋0 = 0.5, one can obtain a uniform prior in [0, 1], on the other hand a large value of 𝑝 0 collapses the prior with point of mass at 𝜋0 . The joint distribution of the priors along with the hyper-parameters in BayesB is written as, Ö 𝑝 𝑝(𝛽1 , . . . , 𝛽 𝑝 , 𝜎𝛽2 , 𝜋) = { [𝜋N (𝛽 𝑗 |0, 𝜎𝛽2 ) + (1 − 𝜋)1(𝛽 𝑗 = 0)] 𝜒−2 (𝜎𝛽2 |𝑑𝑓 𝛽 , S𝛽 )} (2.5) 𝑗=1 ×G(S𝛽 |𝑟, 𝑠)𝐵𝑒𝑡𝑎(𝜋| 𝑝 0 , 𝜋0 ), and the joint distribution of the priors along with the hyper-parameters in BayesC is written as, Ö 𝑝 𝑝(𝛽1 , . . . , 𝛽 𝑝 , 𝜎𝛽2 , 𝜋) = { [𝜋N (𝛽 𝑗 |0, 𝜎𝛽2 ) + (1 − 𝜋)1(𝛽 𝑗 = 0)] 𝜒−2 (𝜎𝛽2 |𝑑𝑓 𝛽 , S𝛽 )} (2.6) 𝑗=1 ×𝐵𝑒𝑡𝑎(𝜋| 𝑝 0 , 𝜋0 ), In both the cases, we have set 𝜋0 = 0.5 and 𝑝 0 = 10. This signifies a weakly informative beta prior for the mixing parameter with the prior mode as 0.5. 22 2.2.2 Genomic-BLUP Instead of the linear models, one can also incorporate different random effects while structuring the conditional expectation function, 𝜃 𝑖 . Assuming we have 𝑙 many random effects (𝑢 1 , . . . , 𝑢 𝑙 ), Í𝑝 Í we can write the conditional expectation for the 𝑖’th individual as 𝜃 𝑖 = 𝜇 + 𝑗=1 𝑥𝑖, 𝑗 𝛽 𝑗 + 𝑙𝑘=1 𝑢𝑖,𝑘 . Extending from our previous section, we can write the collection of unknown parameters as 𝜂 = {𝜇, 𝛽1 , . . . , 𝛽 𝑝 , 𝑢𝑖 , . . . , 𝑢 𝑙 , 𝜎𝑒2 } and express the prior density in the following way: Ö𝑝 Ö𝑙 𝑝(𝜂) = 𝑝(𝜇) 𝑝(𝜎𝑒2 ) 𝑝(𝛽 𝑗 ) 𝑝(𝑢 𝑘 ) 𝑗=1 𝑘=1 One common choice is to incorporate Gaussian random effects with some specified covariance structure. In Bayesian settings, people have extensively studied this form in terms of Reproducing Kernel Hilbert Space Regression (RKHS). Gianola et al. [26] have proposed this approach for prediction purpose in genomic studies. The general idea of RKHS is as follows: First, one need to specify the Reproducing Kernel (RK), which is a positive definite functional mapping from the pairs of individuals into the real line. For example, given two genotypes, 𝑥𝑖 and 𝑥𝑖 ′ , we can construct the reproducing kernel as a real valued function, 𝑘 (𝑥𝑖 , 𝑥𝑖 ′ ) that maps the genotype pair {𝑥𝑖 , 𝑥𝑖 ′ } into Í Í a real line satisfying the condition 𝑖 𝑖 ′ 𝛼𝑖 𝛼𝑖 ′ 𝑘 (𝑥𝑖 , 𝑥𝑖 ′ ) > 0, for any non zero coefficients 𝛼𝑖 and 𝛼𝑖 ′ . Next, we represent the regression function as a linear combination of basis functions determined through the reproducing kernel. In Bayesian settings, the RKHS can be expressed as 𝑦 𝑖 = 𝜇 + 𝑢𝑖 + 𝑒𝑖 , 𝑝(𝜇, 𝑢, 𝑒) ∝ N (0, 𝐾𝜎𝑢2 )N (0, 𝐼𝜎𝑒2 ), where 𝐾 = {𝑘 (𝑥𝑖 , 𝑥𝑖 ′ )} is a 𝑛 × 𝑛 kernel matrix. In Genomic-BLUP (G-BLUP), one incorporates only one random effects which represents the linear regression on the marker densities, 𝑔 ∼ N (0, 𝐺𝜎𝑔2 ), where 𝐺 stands for the marker information matrix. For practical problems and ease of interpretations, we have standardized the 𝐺 matrix to have an average diagonal value of approximately one. Janss et al.[27] argued the equivalence between the RKHS regression through Gaussian process and random regressions on principal components. In our implementation, we have used the eigen value decomposition of the genomic matrix, 𝐺 to make use of this equivalence. 23 2.2.3 LASSONET In linear models, LASSO is a very popular tool that assigns zero weights to the most redundant features through 𝑙1 regularization which results in feature sparsity/feature selection. With the backdrop of neural networks, Lemhadri et al. [22] developed LASSONET which can perform global feature selection by adding a residual (skip) layer and allowing a predictor to participate in any hidden layer if the residual layer is active. This method integrates feature selection with the learning of parameters directly, which helps in delivering an entire regularization path with a range of feature sparsity. The objective function being implemented in LASSONET is as follows, minimize𝜃,𝑊 𝐿(𝜃, 𝑊) + 𝜆||𝜃|| 1 (2.7) subject to ||𝑊 𝑗(1) || ∞ ≤ 𝑀 |𝜃 𝑗 |, 𝑗 = 1, . . . , 𝑑. The advantages of using this tool is that it uses only a subset of the features and the linear and non linear components are optimized jointly, allowing the flexibility to capture non-linearity. The key idea of this procedure is the constraint (1) |𝑊 𝑗,𝑘 | ≤ 𝑀 |𝜃 𝑗 | , which budgets the total amount of non linearity involving the predictor 𝑗 with respect to to the relative effct importance of 𝑋 𝑗 as the main effect. Training the LASSONET deals with two operations: at first, a vanilla gradient step is applied on all model parameters followed by a hierarchical proximal operator being applied on the input layer pair (𝜃, 𝑊 (1) ). Also, this helps in gaining huge computational efficiency. Authors argued that the LASSONET regularization path has an equivalent training cost of training a 𝑠𝑖𝑛𝑔𝑙𝑒 model. They have also suggested to use a default value of M=10, for the hierarchy coefficient. 2.3 Experiments In this section, we discuss the performance of different methods in terms of prediction and selection accuracy based on synthetic datasets. In the synthetic dataset, we have simulated data on a single response generated through marker genotypes from a real dataset from CIMMYT global wheat 24 breeding program. This dataset comprises phenotypic, genotypic, and pedigree information of 599 wheat lines was made publicly available by Crossa et al. [15]. Each line was genotyped for 1279 diversity array technology (DArT) markers. Similar to the RIL population for cowpea, at each marker, there were two homozygous genotypes possible, and they were coded as 0 or 1. For our analysis, we have taken three different cases: (1) significant equispaced markers, (2) significant markers in one cluster, and (3) significant markers in two clusters. The simulation settings (2) and (3) are based on our experience with phenotypes where the correlative pattern of nearby markers indicates a QTL region instead of a single QTL marker. Also, in real data analysis, many of the phenotypes of interest are very complex, and genetic and environmental fluctuations hugely influence them. A measure of an individual’s genetic variation accountable for differences in their traits is termed as "Heritability" (represented as ℎ2 ). It should be noted that the estimate of the heritability in a particular trait is conditional on a specific population and environment. It is highly dynamic (changes over time as circumstances change). Estimates of heritability can range from zero to one, where a value close to zero indicates that most of the variability in a given trait is due to environmental factors, with very little influence from genetic variations. On the other hand, a heritability score close to one indicates that genetic differences can be attributed to explaining almost all of the variations in a trait, with little con- tribution from environmental factors. Many genetic disorders caused by variants (also known as mutations) in a single gene have high heritability. In human genomics, many of the complex traits in an individual, such as intelligence and genetic diseases, have an estimated heritability score in the range of 0.4 to 0.55, suggesting that the variability of such trains is due to a combination of genetic and environmental factors. In our simulation we have incorporated a heritability score of 0.2, and 0.5 for each of the three cases to account for difference in trait complexity. Also, we have varied the number of significant markers as 10, 20, and 30 to replicate real data situations. Below are the results from the three different situations as per our interest. In each of the cases, the number of individuals (𝑛) is 599, the number of available markers (𝑝) is 1279, the number of significant markers (𝑝 0 ) is 10, 20, 30, and the heritability (ℎ2 ) is 0.2, and 0.5. For each combination 25 of significant markers and heritability score under one setup, the univariate response variable is generated as follows: 𝑝 ∑︁ 𝑦𝑖 = 𝑥𝑖, 𝑗 𝛽 𝑗 + 𝜖𝑖 (2.8) 𝑗=1 where 𝜖𝑖 ∼ N (0, 1 − ℎ2 ) and the marker effects, 𝛽 𝑗 are modeled via the following mixture model,    N (0, ℎ2 /10) if 𝑗 ∈ Significant marker list    𝛽𝑗 =  0 otherwise.    This simulation design was chosen closely following Perez and de los Campos [28]. To compare the predictive performance, we divide the dataset into training and testing framework following the common convention of 80-20 split. For prediction accuracy, we compared three different measure: (1) correlation of predicted response with the signal, i.e., Cor( 𝑦ˆ , signal), (2) correlation of predicted response with actual y, i.e., Cor( 𝑦ˆ , 𝑦), and (3) Mean Square error (MSE). For an ideal model, one should achieve higher values for the measures (1) and (2), but should attain lower MSE. For the Bayesian methods, we ran 10, 000 iterations, with 1, 000 samples as the burn-in samples. For LASSONET, we chose the optimal 𝜆 which minimizes the MSE in the training setup from the LASSO path and used it for the testing dataset. The value of 𝑀 was set at 10. Below we present one of the regularization path for the LASSONET solution for the equispaced markers with ℎ2 = 0.2 and 𝑝 0 = 10 26 Figure 2.1: Choice of 𝜆 and LASSONET path 27 2.3.1 Simulation setup 1: Significant markers in equispaced locations In this simplistic setup, we chose the significant markers in the genomic regions as equispaced markers for each choice of 𝑝 0 = 10, 20 and 30 and the effects of the selected markers are generated following equation 2.8. Table 2.1 highlights the performance accuracy of different methods. Table 2.1: Prediction performance from simulation setup 1 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error Cor with signal Cor with actual 𝑦 MSE Methods ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 0.67 0.51 0.45 0.26 0.59 0.74 BRR 0.68 0.53 0.44 0.23 0.77 1.03 0.80 0.68 0.61 0.40 0.62 0.88 0.93 0.86 0.78 0.61 0.31 0.54 BayesA 0.89 0.83 0.78 0.57 0.41 0.84 0.86 0.70 0.85 0.70 0.32 0.61 0.95 0.86 0.77 0.61 0.32 0.54 BayesB 0.93 0.65 0.73 0.59 0.44 0.83 0.89 0.70 0.82 0.69 0.35 0.61 0.96 0.76 0.76 0.65 0.33 0.53 BayesC 0.93 0.64 0.72 0.61 0.46 0.78 0.88 0.70 0.82 0.71 0.36 0.60 0.89 0.71 0.82 0.63 0.28 0.56 BL 0.85 0.62 0.79 0.55 0.41 0.86 0.85 0.70 0.85 0.69 0.32 0.63 0.81 0.64 0.80 0.65 0.32 0.54 GBLUP 0.81 0.63 0.76 0.63 0.47 0.77 0.84 0.70 0.85 0.71 0.32 0.59 0.94 0.82 0.62 0.36 0.45 0.71 LASSONET 0.83 0.51 0.55 0.20 0.66 1.02 0.81 0.62 0.60 0.37 0.67 0.92 28 2.3.2 Simulation setup 2: Significant markers in one cluster In this setup with Significant markers in one cluster, we chose the significant markers in the genomic regions in the range (91, 100), (91, 110), and (91, 120) for the choice of 𝑝 0 = 10, 20 and 30 and the effects of the selected markers are generated following equation 2.8. Table 2.2 highlights the performance accuracy of different methods. Table 2.2: Prediction performance from simulation setup 2 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error Cor with signal Cor with actual 𝑦 MSE Methods ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 0.56 0.73 0.25 0.46 0.73 0.56 BRR 0.68 0.80 0.34 0.57 1.01 0.71 0.59 0.76 0.34 0.57 0.94 0.67 0.78 0.91 0.62 0.76 0.54 0.31 BayesA 0.74 0.89 0.61 0.80 0.77 0.39 0.69 0.86 0.72 0.85 0.62 0.32 0.81 0.95 0.61 0.72 0.54 0.35 BayesB 0.76 0.91 0.63 0.77 0.75 0.43 0.71 0.88 0.71 0.82 0.62 0.36 0.77 0.96 0.64 0.70 0.53 0.37 BayesC 0.75 0.92 0.67 0.76 0.72 0.45 0.67 0.88 0.72 0.82 0.61 0.36 0.73 0.88 0.63 0.81 0.56 0.28 BL 0.74 0.87 0.66 0.83 0.76 0.36 0.66 0.85 0.71 0.86 0.64 0.30 0.70 0.85 0.66 0.80 0.51 0.29 GBLUP 0.72 0.86 0.69 0.82 0.71 0.38 0.63 0.82 0.72 0.86 0.61 0.32 0.86 0.94 0.37 0.64 0.68 0.44 LASSONET 0.74 0.84 0.37 0.64 0.96 0.61 0.61 0.77 0.42 0.26 0.88 0.60 29 2.3.3 Simulation setup 3: Significant markers in two clusters In this setup with Significant markers in two clusters, we chose the significant markers in the genomic regions in the range between (91, . . . , 95, 701, . . . , 705), (91, . . . , 100, 701, . . . , 710), and (91, . . . , 105, 701, . . . , 715) for the choice of 𝑝 0 = 10, 20 and 30 and the effects of the selected markers are generated following equation 2.8. Table 2.3 highlights the performance accuracy of different methods. Table 2.3: Prediction performance from simulation setup 3 with heritability score as 0.2 and 0.5. Cor with signal: correlation of predicted response with the signal, Cor with actual y: correlation of predicted response with actual response, MSE: Mean square error Cor with signal Cor with actual 𝑦 MSE Methods ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 ℎ2 = 0.2 ℎ2 = 0.5 0.68 0.78 0.34 0.55 0.73 0.58 BRR 0.77 0.86 0.33 0.56 0.97 0.65 0.40 0.70 0.26 0.53 0.88 0.62 0.89 0.94 0.62 0.79 0.53 0.31 BayesA 0.78 0.96 0.57 0.77 0.76 0.40 0.48 0.81 0.70 0.84 0.64 0.31 0.90 0.97 0.61 0.76 0.53 0.35 BayesB 0.81 0.92 0.59 0.74 0.74 0.43 0.49 0.84 0.69 0.81 0.64 0.34 0.85 0.97 0.66 0.74 0.53 0.37 BayesC 0.79 0.92 0.64 0.73 0.71 0.44 0.49 0.81 0.71 0.84 0.63 0.32 0.81 0.93 0.65 0.83 0.55 0.27 BL 0.77 0.88 0.63 0.80 0.75 0.36 0.48 0.79 0.69 0.84 0.66 0.31 0.78 0.88 0.68 0.82 0.51 0.30 GBLUP 0.74 0.86 0.67 0.80 0.69 0.37 0.47 0.77 0.72 0.85 0.61 0.31 0.90 0.95 0.43 0.68 0.68 0.44 LASSONET 0.80 0.91 0.34 0.59 0.92 0.62 0.49 0.74 0.19 0.52 0.91 0.62 30 2.4 Discussion First noticeable difference from comparing the heritability (ℎ2 ) from 0.2 to 0.5 is the reduction of our prediction accuracy in all the three simulation setups. That means, the correlation between the signal and actual response with the predicted fell off while the MSE increased. This can be biologically explained as the heritability is measure of how complex the phenotype is, and as the complexity increases, the prediction performs poorly. Next, in all the three situations BRR performed the worst while there was no conclusive evidence of a best method that outperformed others at least for the cases considered. For simulation setup 1, both the bayesian mixture models, BayesB and BayesC were performing better in terms of prediction accuracy. This result was consistent across the different number of significant markers chosen for the context. For simulation situation two, we found that the mixture based methods, BayesB and BayesC have a better accuracy in terms of the correlation of the predicted response and signal. But, We attained similar performance in terms of correlation with actual response and MSE with Bayesian LASSO and G-BLUP. For heritability score of 0.5, MSE of Bayesian LASSO and G-BLUP were consistently smaller than BayesB and BayesC. Note that, we did not see reasonable prediction from LASSONET so far with the simulated situation 1 and 2. For simulation setup 3, we found similar conclusions as we had obtained for setup 2. But, here we found LASSONET to have improved performance in terms of correlation with signal than the competing methods. Overall, we found that Bayesian LASSO and Genomic-BLUP were the robust performers to apply for different complexities of the data in terms of prediction purpose. In genetic studies, there are usually two major interests: (a) Prediction, (b) Selection. In this chapter, we have mainly focused on the Prediction aspect, but argue that since we have used Bayesian tools, we can do hypothesis testing based on the credible intervals to find the selected markers. LASSONET is more interpretable in a sense that it does prediction and selection simultaneously. So, we can not disregard it as well. We intend to explore more situations to justify our arguments. 31 CHAPTER 3 BAYESIAN LATENT FACTOR MODELS TO DIFFERENTIATING GENETIC AND MECHANISTIC BASES OF PHOTOSYNTHESIS 3.1 Background The term “Natural variations” in photosynthetic processes explains the ability of some phototrophs to transcend others under specific environmental conditions. Inter-dependency between the genetic architecture of an individual (genome) and observable physical or physiological traits or charac- teristics (phenome) provides an opportunity to harness these natural variations. With the advent of high-throughput phenotyping platforms, it is highly feasible to rapidly measure multiple pho- tosynthetic parameters (phenotypes), which allows us to compare the genotypic variations across numerous traits. Also, rapid advancements in genotyping technologies permitted the production of high-density genetic chips cost-effectively, making the connection from “genome to phenome” possible. Integrating such multi-omics data platforms creates data “hyper-cubes,” which involve multidimensional potentially linked traits, environmental variables, and genomic content. Using “co-association” (or co-segregation) maps of such hyper-cubic data sets, we dissect various func- tional and genetic linkages under photosynthetic machinery. We propose to develop a new class of generative models based on dimensionality reduction methods in the form of a “colocalized” pheno- type network. Representation of such dynamics across 𝐺𝑒𝑛𝑜𝑡𝑦 𝑝𝑒 × 𝑃ℎ𝑒𝑛𝑜𝑡𝑦 𝑝𝑒 × 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙 space is crucial to understanding the mechanics of adaptations and facilitating agricultural yield improvement. One way to associate observed responses (phenotypes) with certain genomic regions is through the familiar Quantitative trait loci (QTL) maps. Using such an association tool over a population of cowpea recombinant inbred lines (RIL), we tested how the tolerance of plants differ when exposed to heat stress imposed under different lighting conditions (light or dark). Furthermore, as with all “omics” approaches, where correlations among multiple traits are informative under 32 different conditions, it can enable the discovery of potentially causal phenotype interactions, which in turn may shed light on the functions of photosynthetic regulatory pathways. However, we found potential caveats from this analogy as the system’s dominant correlations can result from parallel transitive or indirect interactions. Rapid advancements in multi-omics technologies have led to great deal of interest in the integrated analysis of multi-modal datasets. As the multi-modal datasets provide information on multiple subjects (genotypes) and features from different viewpoints (treatments), the integrated analysis will help to understand biological mechanisms of complex problems and develop tailored treatment for many diseases and health problems. In the past few years, several approaches for integrative analysis have been proposed and applied in diverse field of applications, e.g., in brain imaging, chemical systems biology, single-cell RNA-seq data. One class of models that has been extensively used are based on low-rank matrix factorization such as nonnegative matrix factorization [29, 30], factor analysis [31, 32], canonical correlation analysis methods [33, 34]. Other approaches utilizes clustering framework to obtain interpretation among the multi-omics data , e.g., hierarchical clustering [35], consensus clustering [36], icluster [37]. The basic concept underlying these approaches deals with finding low-dimensional latent factors, which are assumed to carry pertinent information regarding the underlying biological variations across different genotypes and phenotypes and environments. But, these methods are highly unsupervised, which makes the model estimation, inference and interpretation very difficult. Nonetheless, the integrative analysis has proven to be far superior than individual (uni-modal) analysis and there is room for improvement in both methodological and applied research areas. In this paper, we adopted factor analysis framework to assess differential linkages by generating networks of interactions defined by latent variables (LVs), each of which represents a distinct mode of action of photosynthesis . We then compare the behavior of these networks with the outcomes of hypothetical models operating under different conditions and assess potential associations of genetic components to specific modes of action. 33 3.2 Materials and methods Plants (RIL population) being used in this study is taken from University of California, Riverside from the cross between Yacine and 58-77. The Yacine × 58-77 RIL population consisted of 104 lines used to generate the population-specific linkage map, but only 90 RILs were used in the QTL mapping due to limitations in seed stocks for some lines. A total of five different treatments were used for the data analysis, namely: Control (Con), Dark heat stress (DHS), Recovery after Dark heat stress (RecD), Light heat stress (LHS), Recovery after Light heat stress (RecL). Figure 3.1: Variation in photosynthetic parameters and leaf temperature across the different treat- ments. A-H) Violin and box plots showing the distribution of various parameters among the RILs and parental lines. The red marker indicates the mean of all genotypes. Con= Control, DHS=dark heat stress, LHS=light heat stress , RecD=recovery after DHS and RecL=recovery after LHS. I-J) Correlation and density plots between 𝑞𝐿 and 𝜙 𝐼 𝐼 under Control and I) DHS or J) LHS using the raw data for each treatment. Data analyses were performed using (R Core Team 2019). Subsequent analyses uses the covariate adjusted effects obtained via the analysis of covariance (ANCOVA) model. 34 3.2.1 Linakage maps using QTL mapping This section assesses possible linkages between genomic variations in the RIL population and specific responses to LHS and DHS. For the linakge maps we have used genomic BLUP as discussed in earlier sections. Figures 3.2 and 3.3 show several striking features in QTL maps for 𝜙 𝐼 𝐼 under control, LHS and DHS, and recovery conditions. First, the control shows significant QTLs on chromosomes 3, 6, 9, and 10 that completely disappeared and were replaced by distinct QTLs during LHS (chromosome 2) and DHS (chromosome 2 and 6). The most likely basis for this “linkage swapping” is that, under control conditions, 𝜙 𝐼 𝐼 is modulated by one set of genetically-controlled processes and a different set of processes linked to various genetic components under stressful conditions. This interpretation is consistent with genotype-by-environment interaction, whereby genotypes may behave differently depending on the environment, and the roles of “ancillary” components of the organism, that control processes not essential under many conditions but critical under diverse and fluctuating environments. There are many examples in photosynthesis research where knocking out well-conserved genes has little effect under (artificially static) laboratory conditions but shows emergent phenotypes under more severe or rapidly fluctuating environments. 3.3 Bayesian Latent Factor Models To set the models and notations, assume that the observed data contain 𝑁 independent units. For each unit 𝑄 traits (phenotypes) are observed for 𝑆 different treatments. We use 𝑋 (1) ∈ R𝑄×𝑁 , 𝑋 (2) ∈ R𝑄×𝑁 , . . . , 𝑋 (𝑆) ∈ R𝑄×𝑁 to denote the collection of 𝑆 treatments with dimensionality 𝑄 on 𝑁 independent observations. Let 𝑌 be their vertical concatenate, which is of size 𝑃 × 𝑁, where 𝑃 = 𝑆𝑄, 𝑌 𝑃×𝑁 = [𝑋 (1) , 𝑋 (2) , . . . , 𝑋 (𝑆) ] 𝑇 Our goal is to first find out 𝐾 < 𝑃 factors that describe the dependencies between the observed phenotypes across the data sets encompassing different treatments. In other words, the problem can be described as a set of 𝐾 latent factors which contain a projection for each of the 𝑆 treatments 35 6 Phi2 6 Phi2 6 Phi2 4 4 4 y y y 2 2 2 0 0 0 6 PhiNO 6 PhiNO 6 PhiNO 4 4 4 y Phi2 y Phi2 y Phi2 2 2 2 0 0 0 6 gH. 6 gH. 6 gH. 4 4 4 y PhiNO y PhiNO y PhiNO 2 2 2 0 0 0 6 pmf 6 pmf 6 pmf 4 4 4 y gH. y gH. y gH. 2 2 2 0 0 0 6 NPQt 6 NPQt 6 NPQt 4 4 4 y pmf y pmf y pmf 2 2 2 0 0 0 6 Leaf.Temperature 6 Leaf.Temperature 6 Leaf.Temperature 4 4 4 y NPQt y NPQt y NPQt 2 2 2 0 0 0 6 qL 6 qL 6 qL 4 4 4 y Leaf.Temperature y Leaf.Temperature y Leaf.Temperature 2 2 2 0 0 0 6 Relative.Chlorophyll 6 Relative.Chlorophyll 6 Relative.Chlorophyll 4 4 4 y qL y qL y qL 2 2 2 0 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Relative.Chlorophyll Relative.Chlorophyll Relative.Chlorophyll Figure 3.2: Genetic and phenotypic linkages among multiple photosynthetic processes. LOD scores for different parameters are presented for Control/Pre-stress (Con) (left most panel), DHS (middle panel), LHS (right most panel).Chromosomes are separated by transparent colors with faint lines for borders. 6 Phi2 6 Phi2 4 4 y y 2 2 0 0 6 PhiNO 6 PhiNO 4 4 y Phi2 y Phi2 2 2 0 0 6 gH. 6 gH. 4 4 y PhiNO y PhiNO 2 2 0 0 6 pmf 6 pmf 4 4 y gH. y gH. 2 2 0 0 6 NPQt 6 NPQt 4 4 y pmf y pmf 2 2 0 0 6 Leaf.Temperature 6 Leaf.Temperature 4 4 y NPQt y NPQt 2 2 0 0 6 qL 6 qL 4 4 y Leaf.Temperature y Leaf.Temperature 2 2 0 0 6 Relative.Chlorophyll 6 Relative.Chlorophyll 4 4 y qL y qL 2 2 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Relative.Chlorophyll Relative.Chlorophyll Figure 3.3: Genetic and phenotypic linkages among multiple photosynthetic processes. LOD scores for different parameters are presented for Recovery after DHS (RecD) (left panel) and Recovery after LHS (RecL) (right panel). Chromosomes are separated by transparent colors with faint lines for borders. 36 with a non-zero weight for each factor. Note that, one would like to put sparsity over the weights for added interpretations. Our data challenge is very similar to the Factor analysis (FA), which explains a multivariate dataset 𝑋 ∈ R𝑄×𝑁 in terms of 𝐾 < 𝑄 latent factors for defining the dependencies between the 𝑁 observed samples of dimensionality 𝑄. In FA, the underlying latent factors are connected to the observable variables through factor weights, which are collected in the loading matrix 𝑊 ∈ R𝑄×𝐷 . One can add sparsity to the individual entry of 𝑊 to obtain straightforward interpretations. In our setup, we can apply FA to each of the 𝑆 treatment conditions and estimate the loading matrix and factor scores for every condition. Since the loading matrix 𝑊 connects the set of observed pheno- types in terms of a smaller group of latent variables (LV), one can expect that each LV identified by FA can, under appropriate conditions, represent the physical modes of interactions among photo- synthetic processes, which result in distinct patterns of interactions between measured phenotypes. In a single genotype, the observed LVs will likely reflect environmental or developmental effects on mechanistic interconnections. Here, though, we consider a population of genetically distinct plants under a single environmental condition, where the LV will likely reflect the actions of genetic polymorphisms that alter the behavior of the co-regulatory network among phenotypes. A hypothetical illustration describes two genetic components that alter photosynthesis’s re- sponses to HS in distinct ways. Consider a functional mechanism (mechanism A) representing the correlations expected for pmf-controlled photoprotection. For example, a genetic variation that decreases the activity of the ATP syntheses can result in decreases in gH+, increases in pmf, and decreases 𝜙 𝐼 𝐼 . Using FA analysis one can describe this behavior by LV1 in Figure 3.4, as a network of correlations. In this visualization, if the same colored lines connect two measured parameters, they are positively correlated, whereas different colors signify negatively correlated. From our model, decreases in gH+ should increase NPQ(t) but decreases in 𝜙 𝐼 𝐼 . Hence, gH+ is linked through LV1 with differently colored lines to NPQ(t) but with the same colored lines to 𝜙 𝐼 𝐼 . Suppose that an additional process modulates 𝜙 𝐼 𝐼 through photoinhibition and repair of PSII, independently of ATP synthesis activity (mechanism B). For example, a genetic component that 37 results in more rapid PSII repair should increase the content of active PSII and thus 𝜙 𝐼 𝐼 while decreasing NPQ(t). Here, parameters can be connected through multiple LVs, each describing a different set of correlations, as illustrated through LV2 in Figure 3.4. Figure 3.4: A hypothetical illustration of expected relationships between latent variables, correla- tions among measured parameters, and genetic components But, FA on individual treatments fails to capture the dependencies across multiple treatment conditions. Also, the LVs under each treatment are independent, and one can not make connections or test separable models based on individual FA analysis. One solution is the Canonical Correlation Analysis (CCA,), which can simultaneously model underlying associations between two sets of treatment conditions. CCA helps identify linear combinations of variables from each modality that maximize their correlation. CCA also suffers from certain caveats. For example, they do not provide an inherent robust inference for statistical associations between phenotypes. Also, the associations between data modalities need to be modeled to capture variations. One possible way to address the caveats is based on the probabilistic interpretation of CCA [38], which allows for the uncertainty estimation of the model parameter. This approach has been extended to more complex situations by adding hierarchical prior distributions as explained in Bayesian CCA [39]. Still, it fails to recover associations among data modalities and is computationally challenging under high-dimensional 38 problems. One of the limitations is averted by Virtanen and colleagues [40, 41] by removing the irrelevant latent factors and further extending to more than two modalities, namely Group Factor Analysis (GFA) [42, 41]. GFA is a simple extension of the Bayesian FA model with group-wise sparsity, which helps in straightforward interpretations. GFA has proven its applicability in various domains, from genomics, drug discoveries, and task-based FMRI data [43]. To our knowledge, GFA has not been applied to reveal Phenome-Genome interactions from Multi-omics data modalities. To illustrate the differences between various methods, we have applied Bayesian formulations of FA, CCA, and GFA to our dataset and argued the implications and interpretations under different setups. Our findings include multiple nodes of mechanistic processes representing "positive- negative" associations linking phenotypes to specific patterns under various stress conditions. We conclude from our analysis that due to the flexibility and robustness, the integrative framework of Bayesian GFA can reveal meaningful biological mechanisms previously unknown under heat stress treatments. 3.3.1 Bayesian Factor Analysis (BFA) The Bayesian version of FA assumes that 𝑁 observations of 𝑄 phenotypes stored in the data matrix 𝑋 ∈ R𝑄×𝑁 are generated by the latent variable matrix 𝐹 ∈ R𝐾×𝑁 , where 𝐾 represents the number of latent dimensions. In formal notations, suppose that 𝑄 dimensional data vector 𝑦 𝑛 follows a 𝐾 factor model: 𝑥 𝑛 = 𝑊 𝑓𝑛 + 𝜇 + 𝜖 𝑛 𝑓𝑛 ∼ N (0, IK ) (3.1) 𝜖 𝑛 ∼ N (0, 𝜓) Under the Model 3.1, 𝑥 𝑛 ∼ N (𝜇, 𝜓 + 𝑊𝑊 𝑇 ). Without loss of generality, we can assume zero mean data and omit the 𝜇 parameter hereafter. To tackle FA model in Bayesian context, we introduce a prior 𝑝(𝜃) over the model parameters 𝜃 = (𝑊, 𝜓) with respect to the posterior distribution 𝑝(𝜃|𝑌 ). For simpler inferences, the prior distributions are selected to be conjugate such that the posterior 39 distribution has the same functional form as the prior distribution. To determine the number of latent dimensions to be included in the model, we incorporate Automatic relevance Determination (ARD) prior over the loading matrix 𝑊. This is achieved through a hierarchical prior specifications 𝑝(𝑤|𝛼) on the elements of 𝑊, where 𝛼 = (𝛼1 , 𝛼2 , . . . , 𝛼𝐾 ). Inherently, by pushing some 𝛼 𝑘 ’s towards infinity, one can drive the elements of the loading matrix of 𝑊 to become close to zero. This results in the pruning of the irrelevant latent components 𝑘 during inference. Ö 𝑄 Ö 𝐾 𝑝(𝑊 |𝛼) = N (𝑤 𝑗,𝑘 |0, 𝛼−1𝑘 ), 𝑗=1 𝑘=1 Ö 𝐾 𝑝(𝛼) = Γ(𝛼 𝑘 |𝑎 𝛼 , 𝑏 𝛼 ), 𝑘=1 −1 𝑝(𝜓) = W (𝜓|Λ0 , 𝑣 0 ), where, Γ(·) denotes a gamma distribution and Λ0 represents a symmetric positive definite matrix and 𝑣 0 is the degrees of freedom of the inverse Wishart distribution (W −1 (·)). The joint probabilistic distribution of the model 3.1 is given by,   𝑝(𝑋, 𝐹, 𝑊, 𝛼, 𝜓) = 𝑝(𝑋 |𝐹, 𝑊, 𝜓) 𝑝(𝑊 |𝛼)𝑃(𝛼) 𝑝(𝜓) 𝑝(𝐹) To estimate the model parameters and the latent variables , we need to evaluate the posterior distri- bution 𝑝(𝐹, 𝑊, 𝛼, 𝜓|𝑋) and marginalising the unintended variables. However, the marginalisations are very complex and often analytically intractable. Thus, the posterior distribution needs to be approximated. 3.3.2 Bayesian Canonical Correlation Analysis (BCCA) In Bayesian CCA, we assume that 𝑁 observations from two different data modalities, 𝑋 (1) and 𝑋 (2) are generated from a common latent variables 𝐹 ∈ R𝐾×𝑁 . 40 Similar to Model 3.1, we can write, 𝑥 𝑛(1) = 𝑊 (1) 𝑓𝑛 + 𝜖 𝑛(1) 𝑥 𝑛(2) = 𝑊 (2) 𝑓𝑛 + 𝜖 𝑛(2) 𝑓𝑛 ∼ N (0, IK ) (3.2) 𝜖 𝑛(1) ∼ N (0, 𝜓 (1) ) 𝜖 𝑛(2) ∼ N (0, 𝜓 (2) ) Here, in Model 3.2, 𝑊 (1) ∈ R𝑄×𝐾 and 𝑊 (2) ∈ R𝑄×𝐾 are the projection matrices which transform the latent variables 𝑓𝑛 into the input space of two separate treatments. The joint distribution is given by, Ö 𝑁 Ö 2   𝑝(𝑋, 𝐹, 𝑊, 𝛼, 𝜓) = 𝑝(𝑥 𝑛(𝑠) | 𝑓𝑛 , 𝑤 (𝑠) , 𝜓 (𝑠) ) 𝑝(𝑤 (𝑠) |𝛼 (𝑠) )𝑃(𝛼 (𝑠) ) 𝑝(𝜓 (𝑠) ) 𝑝( 𝑓𝑛 ), 𝑛=1 𝑠=1 𝑄 Ö 𝐾 Ö −1 𝑝(𝑤 (𝑠) |𝛼 (𝑠) ) = N (𝑤 (𝑠) 𝑗,𝑘 |0, 𝛼 𝑘 (𝑠) ), 𝑗=1 𝑘=1 Ö 𝐾 (𝑠) 𝑝(𝛼 ) = Γ(𝛼 𝑘(𝑠) |𝑎 𝛼(𝑠) , 𝑏 𝛼(𝑠) ), 𝑘=1 𝑝(𝜓 (𝑠) ) = W (𝜓 (𝑠) |Λ0(𝑠) , 𝑣 0(𝑠) ) −1 Here, the prior distributions are chosen so that the posterior distributions has the same functional form. The prior distribution over the loading matrix is chosen to be ARD priors similar to be BFA setup which helps in recovering the relevant latent factors. The inference of model parameters and latent variables depends on computing the posterior distribution, 𝑝(𝐹, 𝑊, 𝛼, 𝜓|𝑋) which is analytically intractable and should be approximated. Following Chong Wang [2007], one can use the mean field variational Bayes, or the Gibbs sampling. Even there the inference becomes unusually cumbersome in the presence of high dimensional data. To overcome this, Virtanen er al (2011) proposed to impose modality wise sparsity. A further extension has been proposed by the same authors which generalizes the same idea to more than two data modalities, which is known as the Bayesian group factor analysis. 41 3.3.3 Bayesian Group Factor Analysis (BGFA) For the group factor analysis, we assume that there are 𝑆 many data modalities, where the 𝑠’th data modality is being represented as 𝑋 (𝑠) ∈ R𝑄×𝑁 , 𝑠 = 1, . . . , 𝑆. Now, equivalent to the latent factor components discussed in BFA and BCCA, BGFA tries to find the optimal set of 𝐾 latent factors which can separate between-group associations from within-group associations. Mathematically, one can write the data from the 𝑠’th group generated as follows, 𝑓𝑛 ∼ N (0, IK ) 𝑥 𝑛(𝑠) = 𝑊 (𝑠) 𝑓𝑛 + 𝜖 𝑛(𝑠) (3.3) −1 𝜖 𝑛(𝑠) ∼ N (0, 𝑇 (𝑠) ) −1 Where 𝑇 (𝑠) denotes a diagonal covariance matrix, with 𝑇 ( 𝑠) = 𝑑𝑖𝑎𝑔(𝜏1(𝑠) , . . . , 𝜏𝑄(𝑠) ) as the inverse of the error variances of the 𝑠’th group. The structure of the loading matrix, 𝑊, and the latent structures, 𝐹, are automatically learned by imposing group-wise sparsity through the independent ARD priors. The automatic pruning of the unimportant latent components is achieved by putting a separate ARD prior to the elements of 𝑊 (𝑠) , 𝑄 Ö 𝐾 Ö −1 𝑝(𝑤 (𝑠) (𝑠) |𝛼 ) = N (𝑤 (𝑠)𝑗,𝑘 |0, 𝛼 𝑘 (𝑠) ), 𝑗=1 𝑘=1 Ö 𝐾 (𝑠) 𝑝(𝛼 ) = Γ(𝛼 𝑘(𝑠) |𝑎 𝛼(𝑠) , 𝑏 𝛼(𝑠) ), 𝑘=1 𝑝(𝜏 ) = Γ(𝜏 (𝑠) |𝑎 𝜏(𝑠) , 𝑏 𝜏(𝑠) ) (𝑠) We have chosen the hyperparameters 𝑎 𝛼(𝑠) , 𝑏 𝛼(𝑠) ), 𝑎 𝜏(𝑠) , 𝑏 𝜏(𝑠) to be very small number (e.g., 10−14 ) in order to get uninformative priors. Finally, we can write the joint distribution as, Ö𝑁 Ö 𝑆   𝑝(𝑋, 𝐹, 𝑊, 𝛼, 𝜏) = 𝑝(𝑥 𝑛(𝑠) | 𝑓𝑛 , 𝑤 (𝑠) , 𝜏 (𝑠) ) 𝑝(𝑤 (𝑠) |𝛼 (𝑠) )𝑃(𝛼 (𝑠) ) 𝑝(𝜏 (𝑠) ) 𝑝( 𝑓𝑛 ) 𝑛=1 𝑠=1 Note that, the posterior calculations are often analytically intractable and it needs to be approximated through mean field variational approximation. 42 3.3.4 Mean Field Variational Approximation In Bayesian settings, the calculations regarding the posterior distributions are computationally challenging. The way around is to approximate the true posterior with a suitable factorized distribution through Variational Bayesian (VB) setting. Let the model parameters are denoted by 𝜃 and our goal is to approximate the true posterior, 𝑝(𝜃|𝑋) with the help of the variational distribution, 𝑞(𝜃). The main idea in VB is to minimize the dissimilarity, 𝐷 (𝑞; 𝑝) between 𝑞(𝜃) and 𝑝(𝜃|𝑋). The most used dissimilarity measure in such cases is the Kullback–Leibler divergence (𝐾 𝐿-divergence) which makes this minimization tractable. In theory, the 𝐾 𝐿-divergence is written as, ∫ 𝑝(𝜃|𝑋) 𝐷 𝐾 𝐿 (𝑞|| 𝑝) = 𝑞(𝜃) 𝑙𝑛 𝑑𝜃 𝑞(𝜃) Following Bishop [44], the marginal log-likelihood is written as, ∫ 𝑝(𝑋, 𝜃) L (𝑞) = 𝑞(𝜃) 𝑙𝑛 𝑑𝜃 𝑞(𝜃) 𝑙𝑛 𝑝(𝑋) = L (𝑞) + 𝐷 𝐾 𝐿 (𝑞|| 𝑝), where L (𝑞) is the lower bound of the marginal log likelihood. Note that, 𝑙𝑛 𝑝(𝑋) is constant and this implies that maximizing the Evidence lower bound (ELBO) L (𝑞) is equivalent to the minimization Î of the 𝐾 𝐿-divergence 𝐷 𝐾 𝐿 (𝑞|| 𝑝). We assume that 𝑞(𝜃) can be factorized as 𝑞(𝜃) = 𝑖 𝑞𝑖 (𝜃 𝑖 ) and L (𝑞) is maximized with respect to all possible 𝑞𝑖 (𝜃 𝑖 ), 𝑙𝑛 𝑞𝑖 (𝜃 𝑖 ) = < 𝑙𝑛 𝑝(𝑋, 𝜃) > 𝑗≠𝑖 + 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, Î where < · > represents the expectation taken with respect to 𝑖 𝑞𝑖 (𝜃 𝑖 ) for all 𝑗 ≠ 𝑖. In BGFA, we can approximate the full posterior by the following variational distribution, Ö 𝑆 𝑞(𝜃) = 𝑞(𝐹) 𝑞(𝑊 (𝑠) )𝑞(𝛼 (𝑠) )𝑞(𝜏 (𝑠) ), 𝑠=1 43 where 𝜃 = {𝐹, 𝑊, 𝛼, 𝜏}. Since we have assigned conjugate priors, we can obtain the following analytically tractable solutions after optimizing 𝑞(𝜃). Ö𝑁 𝑞(𝐹) = N ( 𝑓𝑛 |𝜇 𝑓𝑛 , Σ 𝑓𝑛 ), 𝑛=1 Ö𝑄 (𝑠) 𝑞(𝑊 (𝑠) ) = N (𝑊 𝑗,∗ |𝜇𝑊 (𝑠) , Σ𝑊 (𝑠) ), 𝑗,∗ 𝑗,∗ 𝑗=1 Ö𝐾 (𝑠) 𝑞(𝛼 ) = Γ(𝛼 𝑘(𝑠) | 𝑎˜ 𝛼(𝑘) ˜ (𝑘) 𝑠 , 𝑏 𝛼 𝑠 ), (3.4) 𝑘=1 Ö𝑄 ( 𝑗) ( 𝑗) 𝑞(𝜏 (𝑠) ) = Γ(𝜏 𝑗(𝑠) | 𝑎˜ 𝜏 𝑠 , 𝑏˜ 𝜏 𝑠 ), 𝑗=1 (𝑠) where the 𝑗’th row of 𝑊 (𝑠) is denoted by 𝑊 𝑗,∗ . For the optimization, we can follow the variational Bayes Expectation-Maximization (VBEM) scheme. For the convergence, we assign the relative change of ELBO, L (𝑞) to fall below a preassigned small value (e,g., 10−5 ). This is essentially a sequential procedure where the parameters are updated sequentially. We have only listed the optimization scheme for BGFA. Procedures for BFA and BCCA follow similar strategies. 3.4 Results Although the comparative LOD profiles and QTL linkages shown in Figure 3.2 and 3.3 reflect genetic linkage between the measured traits, one can not comprehend the complex nature of the interaction between these photosynthesis regulatory partners. Furthermore the question remains if there exist multiple interaction co-occurring withing a treatment conditions and modulated through different genetic components. For example, the QTL maps show apparent linkages between different subsets of measurable parameters under different conditions. Under DHS we observed one set of overlapping QTLs on chromosome 6 between 𝜙 𝐼 𝐼 , gH+, NPQt, 𝜙 𝑁𝑂 and 𝜙 𝑁 𝑃𝑄 , and a distinct set on chromosome 2 with linkages between 𝜙 𝐼 𝐼 and 𝜙 𝑁 𝑃𝑄 , as well as NPQt (in LHS) but not the other parameters. These complexities likely reflect the pleiotropic, time-and condition-dependent interactions among processes and genetic loci. To mitigate this problem we have performed BFA on each treatment conditions. We observed several trends in the BFA analyses and associated QTL 44 maps, as described in the following. First, different LVs were linked to distinct sets of QTLs. For example, under DHS, LV1 mapped to a QTL on chromosome 6, LV2 to a QTL on chromosome 1 and LV4 mapped to a QTL on chromosome 4. The segregation of LV with distinct QTLs was consistently observed across all conditions, suggesting that BFA was able to partition the observed variation into distinct modes of behavior that are influenced by different sets of genetic components. Second, most QTLs associated with LVs were also observed from QTL analyses of individual parameters. When multiple parameters were linked through a LV, we also observed overlapping QTL from maps of the individual parameters (Figure 3.2 and 3.3), providing further support that EFA coupled to QTL mapping was able to identify possible mechanistic and genetic linkages between traits associated with distinct biochemical/physiological behaviors. For example, leaf temperature (Tleaf) and relative chlorophyll content (SPAD) showed genetic associations, but only weak functional connections to other parameters, suggesting that genetic variations in the traits do not, under our conditions, strongly influence the photosynthetic control mechanisms. Third, some LV showed functional trends but did not show measurable genetic linkages. For example, the linkage between 𝜙 𝑁𝑂 and gH+ on LV1 during DHS and the link between pmf and gH+, on LV3 during recovery from LHS (Figure 3.5 and 3.6) showed only small associations with genomic loci. This behavior may indicate that the observed variations were controlled through many small effect loci that did not result in measurable associations using our current techniques. Next, BFA revealed changes in mechanistic interactions and genetic control modes under different environmental challenges. Under each treatment, both the patterns of correlations among parameters and the QTL linkages for the contributing LVs were distinct. For example, a key distinction between Control and DHS was the change in the sign of the correlations between traits for LV1. In the control, 𝜙 𝑁𝑂 and qL were negatively and positively linked to 𝜙 𝐼 𝐼 on LV1 respectively. Under DHS, 𝜙 𝑁𝑂 and qL became positively and negatively linked respectively at LV1, with no change in the directionality of the linkage for 𝜙 𝐼 𝐼 (Figure 3.5). These changes imply that the effects of genetic variations on functional/regulatory interactions are distinct under the different 45 Figure 3.5: Bayesian Factor Analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under Control/Pre-stress(Con) (left most panel), DHS (middle panel), LHS (right most panel). Figure 3.6: Bayesian Factor Analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under Recovery after dark heat stress (RecD) (left panel), and Recovery after light heat stress (RecL) (right panel). 46 treatments. Moreover, even when the functional linkages among parameters were similar, e.g. comparing control and LHS Recovery (Figure 3.6), the LV loading factors mapped to distinct sets of loci, suggesting that different processes and loci are involved in maintaining and reestablishing photosynthetic responses before and after LHS. Overall, these changes in functional and genomic linkages suggest that different sets of genetic components influence these behaviors under different conditions. Finally, BFA may resolve distinct but overlapping QTLs. When comparing QTLs of individual parameters, we observed an apparent linkage between pmf and qL under Control condition (Figure 3.2), whereas BFA suggested that these two parameters are controlled by different interaction networks (LVs) (Figure 3.5). While we cannot rule out the possibility that this separation was caused by the limitations of our approach, it suggests that BFA may distinguish between distinct but closely-linked QTLs as long as they control distinct patterns of behaviors. One of the shortcoming of using BFA in our framework is we can not make inference from combining two or more treatment conditions. Similar to clustering approach, BFA can only explain within group variation. This results in a lack of connection between the latent factors from different data modalities. For example, the LV1 from Control condition is not comparable with the LV1 from DHS condition. Consequently, the QTL maps from LVs can not fully resolve the colocalization between interlinked parameters from multiple treatments. To mitigate the between group association and possible mechanistic linkages among the differ- ent treatments we incorporated the between group interactions into our analysis through BCCA and BGFA. In BCCA, we conducted pairwise comparisons among treatments with different combina- tions of photosynthetic parameters. In Figure 3.7, we compared the two treatment combinations, Control and Dark heat stress from the photosynthetic parameters, 𝜙 𝐼 𝐼 , pmf, qL, and NPQt. The upper right panel shows log of the estimated ARD matrix shown as Hinton diagrams where blue segment corresponds to active components while red segment corresponds to inactive ones. To compute the number of latent factors, We compared different Hinton diagrams and ground truthed it based on QTL from each latent factors. Here, we found three Latent factors to be optimal with LV1 47 Con DHS Figure 3.7: Bayesian Canonical Correlation analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under Control (Con) and dark heat stress (DHS). is specific to DHS, LV3 is specific to Control. One interesting distinction between the interactions among the parameters was that, under Control condition, 𝜙 𝐼 𝐼 and qL were found to be positively correlated whereas under DHS, 𝜙 𝐼 𝐼 and qL were negatively correlated with a strong association with NPQt. Also, this LV1 was found to be mapped with the QTLs found with pmf under DHS (Figure 3.2). One of our findings from this approach is that in the case where multiple trait interact in opposite directions (negative correlations), their affect will cancel out and the resulting LV will not detect the genetic linkages found from individual QTL maps. This is because the LVs are linear combinations of the individual weights obtained for each trait. For example, in our heterogeneous population, the observed negative correlations between 𝜙 𝐼 𝐼 and NPQt was reflected as a "damped out" effect in 48 LV3 and the apparent linkage in Chromosome 2 in missing. We further applied BCCA on the treatment conditions, DHS and LHS on the same set of photosynthetic parameters (Figure 3.8). We found two latent factors to be optimal in this setup with LV1 specific to DHS and LV2 specific to LHS. The notable difference in the mechanistic connection between the two treatments is through the lack of connection between NPQt and pmf under LHS. Also, LV2 mapped out to chromosome 2, which co-localized with the 𝜙 𝐼 𝐼 and NPQt under LHS (Figure 3.2). In order to explore the differences and associations between a set of DHS LHS Figure 3.8: Bayesian Canonical Correlation analysis coupled with QTL mapping for genetic linkage between photosynthetic traits under dark heat stress (DHS), and light heat stress (LHS) measured phenotypes across five treatments, we applied BGFA by concatenating the observed data matrices vertically. As we have discussed earlier, the resulting loading matrix (𝑊) provided the connections with measured phenotypes with the auxiliary latent variables and the latent factors ( 𝑓𝑛 ) were mapped with QTL mapping to show different genetic linkages. The number of latent 49 variables optimal under this case were optimally chosen to be five. In Figure 3.9, we plotted out the loading matrices corresponding to each treatments and Figure 3.10 showed the genetic linkages corresponding to each latent factors. We found that LV2 was specific to Control condition, and LV1 was specific to LHS. LV4 was shared by DHS and RecD, whereas LV3 was shared by LHS and RecD. LV5 was shared by all the treatment conditions barring LHS. One of the key mechanistic linakages we found is based on the LV4, where the connection of pmf with LV4 is missing under RecD, but present in DHS. Also, we found the canonical connection between NPQt and pmf was missing under LHS, which is consistent across the BCCA and BGFA analysis. Since the LVs are comparable across treatments, we can test for the amount of interactions between traits being modulated by any particular LV. For example, LV5 which is shared by Con, DHS, RecD and RecL has a different extent of interactions across treatments. The connection between 𝜙 𝐼 𝐼 , 𝜙 𝑁𝑂 , pmf and qL is significantly stronger in Control from others. From the genetic linkages obtained from the LVs we can confirm the colocalization of 𝜙 𝐼 𝐼 , 𝜙 𝑁𝑂 and NPQt being modulated by a QTL regions at chromosome 2. We found a QTL peak at chromosome from LV4 which could be mechanistically linking phenotypes of interest. Also, with LV5 and LV3, we found QTL peaks at chromosome 7 which were not found from individual QTL msps or with BFA or BCCA. 3.5 Discussion We aimed to extend the analyses of biophysical measurements of photosynthesis to understand how nature has tweaked key processes to respond to changing environmental conditions. This is possible because of the availability of inexpensive genomic sequencing and the development of rapid and detailed phenotyping that combines measurements of photosynthetic regulatory networks at multiple points. The combined data can give a more resolved view of the interplay of biophysical processes in vivo and the genetic components that control them. However, methods to handle such hyperdimensional data sets are still being developed. While it is possible to make predictions from such data sets using ML, the need to generate and test specific mechanistic hypotheses is 50 essential to the scientific method. The methods described representing first-order attempts to use these combined tools to compress hyperdimensional data into usable forms (LVs) and use these to generate and test hypothetical models for how genetic polymorphisms impact the regulatory network of photosynthesis. We also show that latent factors can provide a deeper analysis of more complex interacting networks, by teasing apart distinct modes of interactions and specific genetic components that control them. The results strongly support the view that the regulatory network is CON DHS LHS RecD RecL Figure 3.9: Bayesian Group factor analysis of photosynthetic traits under Control (Con), dark heat stress (DHS), light heat stress (LHS), Recovery after dark heat stress (RecD) , and Recovery after light heat stress (RecL). highly flexible and controlled by distinct sets of ancillary genetic components depending on specific environmental challenges, consistent with the genotype-by-environment interaction paradigm. We conclude that, in the cowpea diversity panels we used and under the conditions of our 51 Figure 3.10: QTL analysis of resulting LVs from BGFA experiments, genetic variations observed in leaf movements do not lead to measurable variations in photoprotection under low-temperature stress. Further, responses to DHS and LHS are governed by distinct genetic variations that broadly impact non-qE-dependent NPQ mechanisms, but not qE or transpiration rates. In addition, genetic variation in transpiration-induced cooling did not influence the tolerance of PSII to LHS and DHS. the methods will no doubt be advanced by increasing the diversity and resolution of genetic variants, the numbers of specific processes measured, and the sophistication of the modeling, including the use of machine learning. These observations suggest that latent factors can be helpful in applications towards generating hypothetical models from genetic diversity experiments that measure multiple, functionally-related phenotypes. Because we used the results of latent factors for subsequent analyses, i.e., QTL 52 mapping, combining the two marks and interpreting one in light of the other provides some confidence in the conclusions. His approaches should also be useful for crop improvement efforts, especially in identifying specific mechanisms and genetic components that modulate photosynthetic efficiency and resilience under diverse environmental challenges. The results also emphasize certain caveats that need to be considered for immediate applica- tions and improved methods of development. Some of these issues can be alleviated by introducing functions to linearizing parameters or adding additional measurements that discriminate between possible mechanisms. other issues, including the simplified assumptions of linear interdepen- dencies and compensation between parameters, multiple interpretations of correlational data, etc., will require the development of next-level approaches, such as extended methods like clustering algorithms to determine and constrain possible LV structures. Also, using FA, we explored the possibility of deciding possible latent space in the phenotypic area which can regulate specific phe- notypic interactions. They intend to extend this knowledge by possibly backtracking the phenotypic interactions by exploring the latent factors in the genomic space. While incorporating the gene regulatory network in our desired data, we expect to observe the gene-driven pathways controlling the phenotypic interactions. LV structures help explain the mechanistic bases of biophysical mechanisms corresponding to causal pathways (domain-specific) in phenotype interactions. In fact, our empirically mo- tivated LV proposes a new research theme for understanding the interdependence across the 𝐺𝑒𝑛𝑜𝑡𝑦 𝑝𝑒 × 𝑃ℎ𝑒𝑛𝑜𝑡𝑦 𝑝𝑒 × 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡 space. Methodological and practical innovations for quantifying such pathways provide scientific grounds for the functions of photosynthetic regulatory pathways. However, dominant correlations in a system can result from parallel transitive or indirect interactions. We show certain classes of hypotheses can be generated and tested using simple comparisons of QTL maps. But still the question remains how we can model the interactions or correlations among the measured phenotypes with a given set of predictors. We model the correlations among multiple traits with a selected number of predictors in the following chapter. 53 CHAPTER 4 CMPLE TO DECODE PHOTOSYNTHESIS USING THE MINORIZE-MAXIMIZE ALGORITHM 4.1 Motivation 4.1.1 General background Understanding photosynthesis, how solar energy transduction enables and limits the energy pro- ductivity of crops is critical for improving the quality and resilience of agriculture in a rapidly changing world. Abiotic stress factors, e.g., high light intensities, high or low temperatures, lack of water, inhibit the ability to use light energy productively and lead to photodamage to the photosyn- thetic machinery [45]. Plants can maintain photochemistry to adapt to the challenges of non-ideal environments using a range of mechanisms, where several photosynthetic responses can contribute to this maintenance of yield, and it is possible to harness these variations to improve crop per- formance. However, the dynamics of photosynthetic responses may include complex interactions among species, genotypes, developmental stages, or other environmental conditions. The recent development of high-throughput phenotyping platforms [46, 47] can rapidly and non-invasively measure multiple, potentially related, photosynthetic traits and environmental parameters. An- alyzing such voluminous data with complex interactions among multiple traits, genotypes, and environmental variables requires computationally efficient and interpretable statistical models that can potentially explore the mechanistic bases of useful or adaptive photosynthetic processes. Several methods have been suggested for investigating the statistical association between mea- sured traits and genetic markers, including genome-wide association studies (GWAS) and whole- genome regression (WGR) approaches, which produce familiar quantitative trait loci (QTL) map [48, 49]. Standard QTL mapping has mainly been used to analyze the genetic association with individual traits. Nevertheless, alterations in genetic loci can affect the associations between multi- ple characteristics, classified as meaningful biological mechanisms. This is particularly important 54 when addressing important but complex traits such as photosynthetic efficiency or crop yield, which can be affected by multiple processes under different conditions. It is thus essential to interpret associations between traits using genetic markers and determine if variations at different genetic markers affect the inter-relationships among traits through similar or distinct mechanisms. A natu- ral choice for multiple-trait analysis is to extend single-trait GWAS or WGR methods directly to the multiple-trait domain [50, 51]. But, characterization of such methods to elucidate the association among multiple traits remains challenging [52, 53]. We address a few of the challenges below. Multiple-trait analysis tools do not exploit the information in the correlation matrix of related traits and thus cannot connect them with genetic and environmental predictors. Pleiotropy, the effect of genetic diversity on multiple traits, plays a significant role under different abiotic stresses [54, 55]. Without modeling the correlation matrix, one can not fully express the occurrence of pleiotropy in real-world applications. Also, dimension reduction procedures, where a multivariate response is summarized into a univariate score using principal component (PC) analysis, have limited usage due to its lack of interpretability. To address this stated need, we propose an interpretable model of the variance-covariance matrix in terms of the predictor variables and related inference. Pourahmadi [56] used the Cholesky decomposition, and expressed the entries of the variance- covariance matrix in terms of the unrestricted parameters and guaranteed positive-definiteness of the variance-covariance matrix. Although one could model these unrestricted parameters in terms of the predictor variables, the regression parameters do not have any easy interpretation. Alternatively, one can model the covariance matrix as a parsimonious quadratic function of predictor variables [57]. For modelling the variance-covariance in terms of predictor variables, Zou et al. [58] proposed to use a regression model for the second moments of the response variable. The authors then imposed a positivity restriction on the resulting eigenvalues to ensure positive definiteness of the variance-covariance matrix. Unfortunately, for these methods, model parameters lack direct interpretation when correlations among responses are of utmost interest. A downside of correlation modeling is the computational burden to estimate many parameters [59]. If there are 𝑝 predictors and 𝑞 traits, the correlation and standard deviation modeling involve 55 at least ( 𝑝 + 1)𝑞(𝑞 + 1)/2 model parameters. The estimation of so many parameters is challenging and computationally expensive. These limitations motivate us to develop a framework to model the correlations and standard deviations among the responses in terms of several predictor variables. We use the pairwise composite likelihood method for statistical inference. For efficient estimation of the parameters we develop an Minorize-Maximize (MM) algorithm. The method is abbreviated as CMPLE for Correlation Modeling under Pairwise Likelihood Estimation. Specifically, by comparing the impacts of genetic variations on the correlations among a set of related phenotypes, we can distinguish between certain classes of (well-defined) hypothetical biological models and determine whether combinations of genetic variations and environmental conditions affect similar or distinct mechanisms. We show that it is possible to distinguish between classes of hypothetical models under certain conditions, leading to new biological discoveries. This analysis has direct application in plant breeding research. We predict that by applying CMPLE to diversity panels from different species, we can reveal additional mechanisms of adaptation and will guide the breeding and engineering of photosynthesis for higher, more climate-resilient productivity. 4.1.2 Contributions to the literature Finding the possible genetic variations and environmental conditions that dictate the photodamage or photoprotection is a critical step in improving photosynthetic yield and productivity. We believe that modeling the pairwise correlations through the genetic and environmental predictors is the best way to explore the dynamic nature of the problem stated. With this goal in mind, we have developed CMPLE, where the correlations among different traits are subjectively modeled and estimated using a pairwise composite likelihood framework. The pairwise-composite likelihood method has, in the past, been used in different contexts. For example, Lele and Taper [60] used it in the estimation of variance components, Gao et al. [61] used it in genome-wide association studies, and Bai et al. [62] used it in spatial-clustered data. However, to the best of our knowledge, the pairwise-composite likelihood method has not been used to model pairwise correlations. Our 56 work directly models the correlations and standard deviation in terms of predictor variables. Instead of the pairwise likelihood approach if one tries the conventional full likelihood based inference using the 𝑞 variate response, then the parameters of the standard deviations and cor- relations need to be estimated in such a way that the resulting 𝑞 × 𝑞 variance-covariance matrix is positive definite. Without any doubt, this is an exceptionally hard optimization problem and difficult to interpret in practical situations. Pairwise likelihood approach allows the modeling of pairwise correlation between 𝑞 multivariate responses avoiding the requirement of the 𝑞 × 𝑞 vari- ance covariance matrix to be positive definite. In real life settings, biologists need to address how the pairwise correlations are related to the predictors, not the entire variance-covariance matrix. Our estimated model parameters have easy interpretations which can be directly applied in various situations. Our approach also mitigates the problem of the computational burden. To alleviate the compu- tational issue, we develop a Minorize-Maximimize (MM) algorithm [63] for parametric estimation. Although the MM algorithm has been successfully used in different areas [64, 65, 66], it has never been used in correlation modeling. The critical aspect of the MM algorithm is to find a suitable minorizing function that helps optimize a complex objective function (aka logarithm of the pairwise composite likelihood, in our case). There is no standard recipe to obtain a minorizing function. It is very much problem-specific and requires innovative use of mathematical inequalities. Never- theless, our numerical studies show that the use of the MM algorithm can reduce the computation time manifold. It has also demonstrated superior performance while handling a large number of parameters. We have developed an R function, called CMPLE which can be readily applied while modeling correlations between multiple responses in terms of predictors (both continuous and categorical). 57 4.2 Models and notations 4.2.1 Background To set the models and notations assume that the observed data are collected from 𝑛 independent units/subjects. For each unit, 𝑞 traits (phenotypes) and 𝑝 features (candidate genes) are observed. Let 𝑌𝑖, 𝑗 and 𝑋𝑖,𝑟 be the 𝑗th trait and the 𝑟th feature of the 𝑖th unit, 𝑗 = 1, . . . , 𝑞, 𝑟 = 1, . . . , 𝑝, and 𝑖 = 1, . . . , 𝑛. The goal is to study the correlation between any pair of phenotypes and investigate how this correlation is regulated by a set of features. Let us assume that conditional on the covariate 𝑋𝑖 = (𝑋𝑖,1 , . . . , 𝑋𝑖,𝑝 )𝑇 , 𝑌𝑖 = (𝑌𝑖,1 , . . . , 𝑌𝑖,𝑞 )𝑇 follows a multivariate normal distribution with mean 𝜇𝑖 = 0, and variance-covariance matrix Σ𝑖 . The goal is understanding the correlation and its behavior with respect to the features. The variance-covariance matrix can be presented as Σ𝑖 = Diag(𝜎𝑖,1 , . . . , 𝜎𝑖,𝑞 ) 𝑅𝑖 Diag(𝜎𝑖,1 , . . . , 𝜎𝑖,𝑞 ), where 𝑅𝑖 = ((𝜌𝑖, 𝑗,𝑘 )) is the 𝑞 × 𝑞 correlation matrix for the 𝑞 phenotypes from the 𝑖th subject, and the variance of 𝑌𝑖, 𝑗 is denoted by 𝜎𝑖,2 𝑗 . 4.2.2 Correlation modeling To achieve that goal, 𝜌𝑖, 𝑗,𝑘 , the pairwise correlation between 𝑌𝑖, 𝑗 and 𝑌𝑖,𝑘 , is written as 𝜌𝑖, 𝑗,𝑘 = 𝑔 −1 (𝜂𝑖, 𝑗,𝑘 ), where 𝑔 : (−1, 1) → (−∞, ∞) is a known link function to transform the correlation to Í𝑝 the linear predictor defined as 𝜂𝑖, 𝑗,𝑘 = 𝛿 𝑗,𝑘,0 + 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 , where 𝛿 𝑗,𝑘 = (𝛿 𝑗,𝑘,0 , 𝛿 𝑗,𝑘,1 , . . . , 𝛿 𝑗,𝑘,𝑝 )𝑇 is the regression parameter. Observe that 𝜂𝑖, 𝑗,𝑘 = 𝑔(𝜌𝑖, 𝑗,𝑘 ), and we require that 𝑔 to be a one-to-one function. There are many popular choices for the link function 𝑔. For the convenience, we take 𝑔(•) = log{(1 + •)/(1 − •)}. This results in 2 2 𝜌𝑖, 𝑗,𝑘 = 𝑔 −1 (𝜂𝑖, 𝑗,𝑘 ) = 1 − =1− Í𝑝 . (4.1) 1 + exp(𝜂𝑖, 𝑗,𝑘 ) 1 + exp(𝛿 𝑗,𝑘,0 + 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 ) The regression coefficient 𝛿 𝑗,𝑘,𝑟 has a monotone linear relation with the correlation. Hence, we can interpret a predictor’s effect on the correlation via the regression parameters 𝛿 𝑗,𝑘,𝑟 . Specifically, if 𝛿 𝑗,𝑘,𝑟 > 0 (𝛿 𝑗,𝑘,𝑟 < 0), then the correlation between the 𝑗th and 𝑘th phenotype increases (decreases) with the 𝑟th feature while other features remains unchanged. 58 Although any model is just an approximation of the truth, we can use the model to compute another interpretable measure, such as the average marginal effect (AME) [67, 68, 69]. In general, AME on the mean is defined as the change in the conditional mean of an outcome variable with respect to a single feature. Likewise, the AME of the 𝑟th feature on the ( 𝑗, 𝑘)th pairwise correlation can be defined as the average change of the correlation for a change in the 𝑟th feature. Let us denote the ( 𝑝 − 1) component vector (𝑋𝑖,1 , . . . , 𝑋𝑖,𝑟−1 , 𝑋𝑖,𝑟+1 , . . . , 𝑋𝑖,𝑝 )𝑇 by 𝑋𝑖,(−𝑟) . Then, for a binary feature 𝑋𝑟 , the AME is defined as 𝐴𝑀 𝐸𝑟 = 𝐸 {𝜌𝑖, 𝑗,𝑘 |𝑋𝑖,𝑟 = 1, 𝑋𝑖,(−𝑟) } − 𝐸 {𝜌𝑖, 𝑗,𝑘 |𝑋𝑖,𝑟 = 0, 𝑋𝑖,(−𝑟) } = 𝐸 {𝜑𝑟,( 𝑗,𝑘) (𝑋𝑖 , 𝜃)}, where 𝜃 denotes all the parameters and   1 1 𝜑𝑟,( 𝑗,𝑘) (𝑋𝑖 , 𝜃) = 2 Í𝑝 − Í𝑝 . 1 + exp(𝛿 𝑗,𝑘,0 + 𝑠≠𝑟 𝛿 𝑗,𝑘,𝑠 𝑋𝑖,𝑠 ) 1 + exp(𝛿 𝑗,𝑘,0 + 𝛿 𝑗,𝑘,𝑟 + 𝑠≠𝑟 𝛿 𝑗,𝑘,𝑠 𝑋𝑖,𝑠 ) For a continuous feature 𝑋𝑟 , 𝐴𝑀 𝐸𝑟 = 𝐸 {𝜑𝑟,( 𝑗,𝑘) (𝑋𝑖 , 𝜃)}, where Í𝑝 exp(𝛿 𝑗,𝑘,0 + 𝑠=1 𝛿 𝑗,𝑘,𝑠 𝑋𝑖,𝑠 )   𝜕 𝜌𝑖, 𝑗,𝑘 𝜑𝑟,( 𝑗,𝑘) (𝑋𝑖 , 𝜃) = = 2𝛿 𝑗,𝑘,𝑟 Í𝑝 . 𝜕 𝑋𝑖,𝑟 {1 + exp(𝛿 𝑗,𝑘,0 + 𝑠=1 𝛿 𝑗,𝑘,𝑠 𝑋𝑖,𝑠 )}2 Let b𝜃 be the estimator of 𝜃 and 𝑆 denotes the estimated variance-covariance matrix of b 𝜃. Then the estimator of 𝐴𝑀 𝐸𝑟 is 𝐴𝑀 š𝐸 𝑟 = (1/𝑛) Í𝑛 𝜑𝑟 ( 𝑗,𝑘) (𝑋𝑖 , b 𝜃). Applying the delta method, we obtain 𝑖=1 the standard error of 𝐴𝑀 𝐸𝑟 as √︄ Í𝑛 Í𝑛 𝑖=1 𝜑𝑟 ( 𝑗,𝑘) (𝑋𝑖 , 𝜃) ⊤ 𝜑𝑟 ( 𝑗,𝑘) (𝑋𝑖 , 𝜃) [∇𝜃 ] b 𝑆 [∇𝜃 𝑖=1 ] 𝜃=b𝜃 , 𝑛 𝜃=𝜃 𝑛 where ∇𝜃 (•) ≡ 𝜕 (•)/𝜕𝜃. 4.2.3 Standard deviation modeling The log-linear function is used to model the standard deviation of the phenotypes in terms on the features. Specifically, for the 𝑗th response and the 𝑖th experimental unit, the standard deviation is modeled as 𝑝 ∑︁ log(𝜎𝑖, 𝑗 ) = 𝛼 𝑗,0 + 𝛼 𝑗,𝑟 𝑋𝑖,𝑟 . (4.2) 𝑟=1 The 𝛼 parameters measure the effect of the features on the standard deviation. Like the correlation, AME can be used to measure the effect of the features. 59 4.3 Estimation methodology 4.3.1 Composite likelihood As mentioned previously, in our pairwise modeling, there is no guaranty that the correlation matrix 𝑅𝑖 is positive definite. Thus, the model parameters cannot be estimated by maximizing the multivariate normal density function. With this, we propose to estimate the model parameters via the pairwise-composite likelihood method. Now define 𝜃 = (𝛼𝑇 , 𝛿𝑇 )𝑇 , where 𝛼 = (𝛼𝑇1 , . . . , 𝛼𝑇𝑞 )𝑇 and 𝛿 = (𝛿𝑇1,2 , 𝛿𝑇1,3 , . . . , 𝛿𝑇𝑞−1,𝑞 )𝑇 . The pairwise composite likelihood function for 𝑞 responses is 𝑞−1 Ö Ö 𝑞 𝐶 𝐿 (𝜃) = L 𝑗,𝑘 (𝜃), 𝑗=1 𝑘= 𝑗+1 Î𝑛 where L 𝑗,𝑘 (𝜃) = 𝑖=1 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 ) denotes the pairwise likelihood function for the 𝑗th and 𝑘th responses, and 1 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 ) = √︃ 2𝜋𝜎𝑖, 𝑗 𝜎𝑖,𝑘 1 − 𝜌𝑖,2 𝑗,𝑘   𝑌2 2  1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 × exp − − + 2 . (4.3) 2(1 − 𝜌𝑖, 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 2 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 The estimator of 𝜃 is defined as b 𝜃 = argmax𝜃 ℓ(𝜃), where ℓ(𝜃) = log{𝐶 𝐿(𝜃)}. Note that the length of the 𝜃-vector is 𝑛𝜃 = 𝑞 × ( 𝑝 + 1) + 𝑞2 × ( 𝑝 + 1) = ( 𝑝+1)𝑞(𝑞+1)  2 . For a scenario with two features (𝑝 = 2) and four phenotypes (𝑞 = 4), 𝑛𝜃 is 30. For the scenario of 𝑝 = 6 and 𝑞 = 4, 𝑛𝜃 is 70. Thus, applying the standard Newton-Raphson method or its variant is very time-consuming as it will require repeated inversion of a large matrix. Therefore, we develop an MM algorithm which is more computationally efficient than direct maximization of ℓ(𝜃) using the Newton-Raphson method. 4.3.2 The MM algorithm The MM algorithm squarely depends on finding a suitable minorization function for the log of the Í composite likelihood, ℓ(𝜃). Note that ℓ(𝜃) = 𝑗 <𝑘 ℓ 𝑗,𝑘 (𝛼, 𝛿), where ℓ 𝑗,𝑘 (𝛼, 𝛿) is the logarithm of 60 the pairwise likelihood function 𝑛  1 ∑︁ ℓ 𝑗,𝑘 (𝜃) = − log(𝜎𝑖,2 𝑗 ) + log(𝜎𝑖,𝑘 2 ) + log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝑖=1  𝑌2 2  1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 + − + 2 . (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 Now, we state the main result based on which our analysis is based on. Theorem 1. For any 𝜃 and 𝜃 0 in the parameter space, ∑︁ 𝑝 ∑︁ ∑︁ ∗ ℓ (𝜃|𝜃 (0) )= 𝑔1 (𝛼 𝑗 |𝜃 (0) ) + 𝑔2 (𝛿 𝑗,𝑘 |𝜃 (0 ) + 𝑔3 (𝜃 (0) ) 𝑗=1 𝑗 <𝑘 is a minorization function of ℓ(𝜃) such that and ℓ(𝜃) ≥ ℓ ∗ (𝜃|𝜃 (0) ) ∀𝜃, 𝜃 0 and ℓ(𝜃) = ℓ ∗ (𝜃|𝜃), where 𝑔1 (𝛼 𝑗 |𝜃 (0) ) = 𝑠:𝑠< 𝑗 𝜓1,𝑠, 𝑗 (𝛼 𝑗 − 𝛼 (0) (0) (0) (0) Í Í 𝑗 , 𝑗 |𝜃 ) + 𝑠:𝑠> 𝑗 𝜓1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 𝑗 , 𝑗 |𝜃 ) for 𝑗 = 1, . . . , 𝑞, 𝑔2 (𝛿 𝑗,𝑘 |𝜃 (0) ) = 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (0) ) for 𝑗 ≠ 𝑘, and 𝑔3 (𝜃 (0) ) = (0) ÍÍ 𝑗 <𝑘 𝜓3, 𝑗,𝑘 (𝜃 ), with  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 ∑︁𝑛   𝜓1, 𝑗,𝑘 (𝛼𝑟 − 𝛼𝑟(0) , 𝑟 |𝜃 (0) ) = 1+ 2 + 𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 ) 𝑖=1 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,(0)𝑗,𝑘 ) 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖,(0)𝑗,𝑘 ) 2 𝑌𝑖,𝑟 − exp{4𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} (0) 2 2 4𝜎𝑖,𝑟 (1 − 𝜌𝑖,(0)𝑗,𝑘 )  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2 ) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   − 2 + exp{3𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} , 6𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) 6𝜎𝑖,(0) (0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 61 𝑛  𝑌𝑖,2𝑗 2   1 𝑌𝑖,𝑘 ∑︁       (0) 2 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 )= − log(1 − 𝜌𝑖, 𝑗,𝑘 ) − + × 2 2 (0) 2 (0) 2 (0) 2  𝑖=1  4𝜎𝑖,(0)  𝑗 (1 − 𝜌 𝑖, 𝑗,𝑘 ) 4𝜎 𝑖,𝑘 (1 − 𝜌 ) 𝑖, 𝑗,𝑘     2 (0) 2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) 2 1 − 𝜌 𝑖, 𝑗,𝑘 ª 1 − 𝜌 𝑖, 𝑗,𝑘 ª ® + log ­ © © 2 2 1 − 𝜌𝑖,2 𝑗,𝑘 ­ (0) (0) (0) ® 1 − 𝜌𝑖, 𝑗,𝑘 2𝜎 𝜎 (1 − 𝜌 ) « ¬ 𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 « ¬    2 2 2 3 (0) 3 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 1 − 𝜌 (0) 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 1 + 𝜌 𝑖, 𝑗,𝑘 ª 𝑖, 𝑗,𝑘 ª − ® − © © (0) 2 2 (0) (0) (0) + ­ ­ ® (0) (0) 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) « 1 − 𝜌𝑖, 𝑗,𝑘 ¬ 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 1 𝜌 𝑖, 𝑗,𝑘   « ¬ 𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘2 (0) © 1 + 𝜌𝑖, 𝑗,𝑘 ª  + log ­ ® , 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) (0) 1 + 𝜌𝑖, 𝑗,𝑘 « ¬ and  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 𝑛   (0) ∑︁ 1 (0) 2 (0) 2 𝜓3, 𝑗,𝑘 (𝜃 )= + − log(𝜎𝑖, 𝑗 𝜎𝑖,𝑘 ) . (0) 2 (0) (0) (0) 2 𝑖=1 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 − 𝜌 𝑖, 𝑗,𝑘 ) 2𝜎 𝜎 𝑖, 𝑗 𝑖,𝑘 (1 + 𝜌 𝑖, 𝑗,𝑘 ) Proof of Theorem 1: Conditional on the covariate 𝑋𝑖 , 𝑌𝑖 = (𝑌𝑖,1 , . . . , 𝑌𝑖,𝑞 )𝑇 follows a multivariate normal distribution with mean 0 and variance-covariance matrix Σ𝑖 . As defined in Section 4.3.1, the Î𝑛 pairwise likelihood for the ( 𝑗, 𝑘)th response is L 𝑗,𝑘 (𝜃) = 𝑖=1 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 ), with 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 ) is given in (4.3). The logarithm of L 𝑗,𝑘 (𝜃) is 𝑛  1 ∑︁ ℓ 𝑗,𝑘 (𝛼, 𝛿) = − log(𝜎𝑖,2 𝑗 ) + log(𝜎𝑖,𝑘 2 ) + log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝑖=1  𝑌2 2  1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 + − + 2 . (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 To derive the minorization function for ℓ 𝑗,𝑘 , we consider each term separately. Consider the following term 2 2 𝑌𝑖,2𝑗 𝑌𝑖,2𝑗 𝜎𝑖,(0)𝑗 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) − =− × 𝜎𝑖,2 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,(0) 2 (0) 2 𝜎𝑖,2 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝑗 (1 − 𝜌𝑖, 𝑗,𝑘 )  𝜎 (0) 4 (0) 2 2  𝑌𝑖,2𝑗 © 𝑖, 𝑗 ª © 1 − 𝜌 𝑖, 𝑗,𝑘 ª ≥− ® +­ 2 ® . (0) 2 (0) 2 ­ 2𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑘 ) « 𝜎𝑖, 𝑗 ¬ « 1 − 𝜌𝑖, 𝑗,𝑘 ¬ 62 The above inequality follows from the AM-GM inequality. Similarly, we have 2 2 2 𝑌𝑖,𝑘 2 𝑌𝑖,𝑘 (0) 𝜎𝑖,𝑘 (1 − 𝜌𝑖,(0)𝑗,𝑘 ) − 2 (1 − 𝜌 2 ) =− 2 2 × 2 (1 − 𝜌 2 ) 𝜎𝑖,𝑘 𝑖, 𝑗,𝑘 (0) 𝜎𝑖,𝑘 (1 − 𝜌𝑖,(0)𝑗,𝑘 ) 𝜎𝑖,𝑘 𝑖, 𝑗,𝑘 2  𝜎 (0) 4 (0) 2 2  𝑌𝑖,𝑘 © 𝑖,𝑘 ª © 1 − 𝜌 𝑖, 𝑗,𝑘 ª ≥− (0) 2 (0) 2 ­ ® +­ 2 ® . 2𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) « 𝜎𝑖,𝑘 ¬ « 1 − 𝜌𝑖, 𝑗,𝑘 ¬ Í𝑝 Next, after replacing 𝜌𝑖, 𝑗,𝑘 by 1 − 2/{1 + exp(𝛿 𝑗,𝑘,0 + 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 )}, in the term 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 /𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ), we obtain 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 = 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 2𝑌𝑖, 𝑗 𝑌𝑖,𝑘 − 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ){1 + exp(𝛿 𝑗,𝑘,0 + Í𝑝 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 )}  2   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 − 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 = 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 )  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 − 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 2 − (4.4) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ){1 + exp(𝛿 𝑗,𝑘,0 + 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 )} Í𝑝 = 𝐵1 + 𝐵2 + 𝐵3 + 𝐵4 . Now,  2  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   𝜎 (0)   𝜎 (0)   1 − 𝜌 (0) 2  𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 𝐵1 = 2 ≥ 1 + log + + , 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) (0) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 )(0) 2 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 1 − 𝜌𝑖,2 𝑗,𝑘 and this inequality follows due to the fact that for any generic 𝑥 > 0, 𝑥 ≥ {1 + log(𝑥)} and equality holds when 𝑥 = 1. Next, using the AM-GM inequality we have     2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘2  𝜎 (0)  3  𝜎 (0)  3  1 − 𝜌 (0) 2  3  𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 𝐵2 = − 2 ≥− + + . 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) (0) 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) 2 𝜎 𝑖, 𝑗 𝜎 𝑖,𝑘 1 − 𝜌𝑖,2 𝑗,𝑘 Í𝑝  After replacing 1 + exp(𝛿 𝑗,𝑘,0 + 𝑟=1 𝛿 𝑗,𝑘,𝑟 𝑋𝑖,𝑟 ) by 2/ 1 − 𝜌𝑖, 𝑗,𝑘 in (4.4), we have  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 𝐵3 + 𝐵4 = − + . 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 63 Now,  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 𝐵3 = − 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 )  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  𝜎 (0) 3 𝜎 (0) 3 1 + 𝜌 (0) 3  © 𝑖, 𝑗 ª © 𝑖,𝑘 ª © 𝑖, 𝑗,𝑘 ª ≥ − (0) (0) (0) ® +­ ® +­ ® , 1 + 𝜌𝑖, 𝑗,𝑘 ­ 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 « ¬ « ¬ « ¬ and   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘2 𝐵4 = 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 )   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) (0) (0) © 1 + 𝜌𝑖, 𝑗,𝑘 ª   © 𝜎𝑖, 𝑗 ª © 𝜎𝑖,𝑘 ª ≥ 1 + log ­ ® + log ­ ® + log ­ ® , 2𝜎𝑖,(0) (0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 1 + 𝜌𝑖, 𝑗,𝑘 « ¬ « ¬ « ¬ and these two inequalities follow from the AM-GM inequality and 𝑥 ≥ 1+log(𝑥) for any generic 𝑥 > 0. 64 We further define 𝜓1, 𝑗,𝑘 (𝛼𝑟 − 𝛼𝑟(0) , 𝑟 |𝜃 (0) )  2 𝑛  (0) ! 2 (0) ! 4 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) ! ∑︁ 𝜎𝑖,𝑟 𝑌𝑖,𝑟 𝜎𝑖,𝑟 𝜎𝑖,𝑟 = log − + log (0) 2 2 (0) 2 𝑖=1 𝜎𝑖,𝑟 4𝜎𝑖,𝑟 (1 − 𝜌𝑖,(0)𝑗,𝑘 ) 𝜎𝑖,𝑟 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 − 𝜌 𝑖, 𝑗,𝑘 ) 𝜎𝑖,𝑟  2 2 ) (0) !3 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) ! 3 (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 𝜎𝑖,𝑟 𝜎𝑖,𝑟 − 2 − (0) 6𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) 𝜎𝑖,𝑟 6𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 𝜎𝑖,𝑟   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 (0) ! 𝜎𝑖,𝑟 + log , 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) (0) 𝜎𝑖,𝑟 ∑︁𝑛  2 𝑌𝑖,𝑟 (0) = 𝑇 𝑍𝑖 (𝛼𝑟 − 𝛼𝑟 ) − exp{4𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} (0) 2 (0) 2 𝑖=1 4𝜎𝑖,𝑟 (1 − 𝜌𝑖, 𝑗,𝑘 )  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 + 2 𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 ) 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0)  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2 ) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  − + exp{3𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} (0) (0) (0) 2 (0) (0) (0) 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 )   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2  𝑇 (0) + (0) (0) 𝑍𝑖 (𝛼𝑟 − 𝛼𝑟 ) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖,(0)𝑗,𝑘 )  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 ∑︁𝑛   = 1+ + 𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 ) (0) (0) (0) 2 (0) (0) (0) 𝑖=1 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 2 𝑌𝑖,𝑟 − exp{4𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} (0) 2 2 4𝜎𝑖,𝑟 (1 − 𝜌𝑖,(0)𝑗,𝑘 )  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2 ) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   − 2 + exp{3𝑍𝑖𝑇 (𝛼𝑟(0) − 𝛼𝑟 )} , 6𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 ) (0) 6𝜎𝑖,(0) (0) (0) 𝑗 𝜎𝑖,𝑘 (1 + 𝜌𝑖, 𝑗,𝑘 ) 65 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (0) ) (0) 2 2 𝑛  𝑌𝑖,2𝑗 2 1 −   ∑︁ 1    𝑌    𝜌 𝑖, 𝑗,𝑘 ª − log(1 − 𝜌𝑖,2 𝑗,𝑘 ) − 𝑖,𝑘 = + © 2 2 2 2 2 ­ ® 𝑖=1 2   4𝜎 (0) (1 − 𝜌 (0) ) 4𝜎 (0) (1 − 𝜌 (0)  1 − 𝜌 )  𝑖, 𝑗,𝑘 ¬  𝑖, 𝑗 𝑖, 𝑗,𝑘 𝑖,𝑘 𝑖, 𝑗,𝑘  «  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) 2 © 1 − 𝜌𝑖, 𝑗,𝑘 ª + log ­ (0) 2 1 − 𝜌𝑖,2 𝑗,𝑘 ® 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 − 𝜌 𝑖, 𝑗,𝑘 ) « ¬    2 2 3 𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 (0) 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 (0) 3 © 1 − 𝜌𝑖, 𝑗,𝑘 ª © 1 + 𝜌𝑖, 𝑗,𝑘 ª − ® − (0) 2 1 − 𝜌𝑖,2 𝑗,𝑘 (0) (0) (0) 1 + 𝜌𝑖, 𝑗,𝑘 ­ ­ ® 6𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 − 𝜌 ) 𝑖, 𝑗,𝑘 « ¬ 6𝜎 𝜎 𝑖, 𝑗 𝑖,𝑘 (1 + 𝜌 𝑖, 𝑗,𝑘 «) ¬   𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2 (0) © 1 + 𝜌𝑖, 𝑗,𝑘 ª  + log ­ ® . 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 + 𝜌 (0) 𝑖, 𝑗,𝑘 ) 1 + 𝜌𝑖, 𝑗,𝑘 « ¬ Since, 𝜌𝑖, 𝑗,𝑘 = 1 − 2/{1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}, 1 + 𝜌𝑖, 𝑗,𝑘 = 2 exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )/{1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}, 𝜌𝑖,2 𝑗,𝑘 = {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) − 1}2 /{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1}2 , 1 − 𝜌𝑖,2 𝑗,𝑘 = 4 exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )/{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1}2 . 66 Now using these terms, we obtain 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (0) ) 𝑛  ∑︁ log(4) = − − 0.5𝛿𝑇𝑗,𝑘 𝑍𝑖 + log{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1} 𝑖=1 2 2 2  {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1}4 © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘  (0) 2 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × ª (0) 2 (0) 2 64 exp(2𝛿𝑇𝑗,𝑘 𝑍𝑖 ) 𝜎 « 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 h   i (0) 2 𝑇 𝑇 + log 1 − 𝜌 − log(4) − 𝛿 𝑋 𝑗,𝑘 𝑖 + 2log{exp(𝛿 𝑍 𝑗,𝑘 𝑖 ) + 1} (0) (0) 2 2𝜎𝑖,(0) 𝑖, 𝑗,𝑘 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 )   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   2 {1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}6 (0) 2 − 1 − 𝜌𝑖, 𝑗,𝑘 × 6𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) 64 exp(3𝛿𝑇𝑗,𝑘 𝑍𝑖 )  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   2 {1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}3 (0) − 1 + 𝜌𝑖, 𝑗,𝑘 × 6𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) 8 exp(3𝛿𝑇𝑗,𝑘 𝑍𝑖 )   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘     (0) 𝑇 𝑇 + log 1 + 𝜌𝑖, 𝑗,𝑘 − log(2) − 𝛿 𝑗,𝑘 𝑍𝑖 + log{1 + exp(𝛿 𝑗,𝑘 𝑍𝑖 )} . 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 + 𝜌 (0) 𝑖, 𝑗,𝑘 ) Also,  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 𝑛   (0) ∑︁ 1 (0) 2 (0) 2 𝜓3, 𝑗,𝑘 (𝜃 )= + − log(𝜎𝑖, 𝑗 𝜎𝑖,𝑘 ) . (0) 2 (0) (0) (0) 2 𝑖=1 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑘 (0) (1 − 𝜌 𝑖, 𝑗,𝑘 ) 2𝜎 𝜎 𝑖, 𝑗 𝑖,𝑘 (1 + 𝜌 𝑖, 𝑗,𝑘 ) Now, the minorization of the composite log-likelihood is (𝑞−1) ∑︁ ∑︁𝑞  ∗ ℓ (𝜃|𝜃 (0) )= 𝜓1, 𝑗,𝑘 (𝛼 𝑗 − 𝛼 (0) (0) 𝑗 , 𝑗 |𝜃 ) + 𝜓1, 𝑗,𝑘 (𝛼 𝑘 − 𝛼 𝑘 , 𝑘 |𝜃 ) (0) (0) 𝑗=1 𝑘=( 𝑗+1)  (0) (0) + 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 ) + 𝜓3, 𝑗,𝑘 (𝜃 ) ∑︁𝑞 ∑︁ ∑︁ = 𝑔1 (𝛼 𝑗 |𝜃 (0) ) + 𝑔2 (𝛿 𝑗,𝑘 |𝜃 (0) ) + 𝑔3 (𝜃 (0) ), 𝑗=1 𝑗 <𝑘 where 𝑔1 (𝛼 𝑗 |𝜃 (0) ) = 𝜓1,𝑠, 𝑗 (𝛼 𝑗 −𝛼 (0) (0) 𝜓1, 𝑗,𝑠 (𝛼 𝑗 −𝛼 (0) (0) Í Í 𝑠:𝑠< 𝑗 𝑗 , 𝑗 |𝜃 ) + 𝑠:𝑠> 𝑗 𝑗 , 𝑗 |𝜃 ) for 𝑗 = 1, . . . , 𝑞, 𝑔2 (𝛿 𝑗,𝑘 |𝜃 (0) ) = 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (0) ) for 𝑗 ≠ 𝑘, 𝑔3 (𝜃 (0) ) = 𝜓3, 𝑗,𝑘 (𝜃 (0) ). ÍÍ 𝑗 <𝑘 67 In the MM algorithm, we maximize the minorizing function ℓ ∗ rather than ℓ. The minorizing function ℓ ∗ is expressed as a summation of 𝑔1 (𝛼1 |𝜃 (0) ), . . . , 𝑔1 (𝛼 𝑝 |𝜃 (0) ), and 𝑔2 (𝛿1,2 |𝜃 (0) ), . . . , 𝑔2 (𝛿 𝑝−1,𝑝 |𝜃 (0) ), this results in the separation of the parameters. Separation of the parameter has a great advantage when optimizing a function with respect to a high-dimensional argument (𝜃 in our case). The functions 𝑔1 , 𝑔2 are all differentiable functions, and satisfy standard regularity conditions, and these function are used in updating the parameter values. The parameter estimates are obtained by the gradient MM algorithm [63]. Let 𝜃 (𝑡) be the parameter value at the 𝑡th iteration, then 𝜃 (𝑡+1) is obtained by one step Newton-Raphson method  −1 𝜕2 ∗    𝜕 ∗ 𝜃 (𝑡+1) =𝜃 (𝑡) − ℓ (𝜃|𝜃 (𝑡) ) ℓ (𝜃|𝜃 (𝑡) ) . (4.5) 𝜕𝜃𝜕𝜃 𝑇 𝜃=𝜃 (𝑡) 𝜕𝜃 𝜃=𝜃 (𝑡) The above step is repeated until the estimate converges. Specifically, we stop the above iteration when 1𝑇 (|𝜃 (𝑡+1) − 𝜃 (𝑡) |/|𝜃 (𝑡) |) < 𝜖0 , a prespecified small number. Observe that in Equation (4.5), rather than the log-composite likelihood ℓ(𝜃), the minorization function ℓ ∗ (𝜃|𝜃 (𝑡) ) is used. Next, 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛼 𝑗 = 𝜕𝑔1 (𝛼 𝑗 |𝜃 (𝑡) )/𝜕𝛼 𝑗 , a function of 𝛼 𝑗 only, and 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛿 𝑗,𝑘 = 𝜕𝑔2 (𝛿 𝑗,𝑘 |𝜃 (𝑡) )/𝜕𝛿 𝑗,𝑘 , a function of 𝛿 𝑗,𝑘 only. Consequently 𝜕 2 ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝜃𝜕𝜃 𝑇 is a block-diagonal matrix, and each block is a matrix of order ( 𝑝 + 1) × ( 𝑝 + 1) and this greatly enhances computational efficiency. Specifically, the complexity of the inversion of each block matrix is in the order 𝑂 (( 𝑝 + 1) 3 ). Thus, the complexity of one update of the MM algorithm is 𝑂 (𝑛𝑛𝜃 + 𝑛( 𝑝 + 1) 2 𝑞(𝑞 + 1)/2 + ( 𝑝 + 1) 3 𝑞(𝑞 + 1)/2). In other words, the complexity is 𝑂 (𝑛𝑛𝜃 + 𝑛( 𝑝 + 1)𝑛𝜃 + ( 𝑝 + 1) 2 𝑛𝜃 ), where 𝑛𝜃 = (1 + 𝑝)𝑞(1 + 𝑞)/2. On the other hand, the complexity of a direct optimization of ℓ(𝜃) using the Newton- Raphson method is 𝑂 (𝑛𝑛𝜃 + 𝑛𝑛2𝜃 + 𝑛3𝜃 ). Alternatively, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm can be used to optimize ℓ(𝜃). This algorithm avoids large matrix inversion, so it has a square order complexity. Although the proportionality constant of the complexity order is unknown, the order of this complexity for BFGS is still larger than the complexity of the MM algorithm as long as ( 𝑝 + 1) < 𝑞(𝑞 + 1)/2, and this holds for our real data and the simulation scenarios. 68 The terms of Equation (4.5) are [𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝜃] 𝜃=𝜃 (𝑡) = [𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛼1 , . . . , 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛼𝑞 , 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛿1,2 , . . . , 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛿 𝑞−1,𝑞 ] 𝜃=𝜃 (𝑡) , and  𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )  ∑︁ ∑︁ 𝑛  𝑌𝑖,2𝑗 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 = −1 + − 2 (𝑡) 2 (𝑡) 2 (𝑡) (𝑡) 𝜕𝛼 𝑗 𝜃=𝜃 (𝑡) 𝑠:𝑠< 𝑗 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖,𝑠, 𝑗 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌 (𝑡) 𝑖,𝑠, 𝑗 ) 𝑌𝑖, 𝑗 𝑌𝑖,𝑠  ∑︁ ∑︁𝑛  𝑌𝑖,2𝑗 + (𝑡) (𝑡) (𝑡) 𝑍 𝑖 + −1 + 2 2 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖,𝑠, 𝑗) 𝑠:𝑠> 𝑗 𝑖=1 𝜎𝑖,(𝑡)𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 )  𝑌𝑖, 𝑗 𝑌𝑖,𝑠 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 − (𝑡) (𝑡) 2 + (𝑡) (𝑡) (𝑡) 𝑍𝑖 , for 𝑗 = 1, . . . , 𝑞, 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌 (𝑡) 𝑖, 𝑗,𝑠 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖, 𝑗,𝑠 ) 𝑛  𝜌 (𝑡) 𝑌𝑖,2𝑗 𝜌𝑖,(𝑡)𝑗,𝑘 2 𝜌 (𝑡) 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )   ∑︁ 𝑌𝑖,𝑘 𝑖, 𝑗,𝑘 𝑖, 𝑗,𝑘 = 2 − 2 2 − 2 2 (𝑡) 𝜕𝛿 𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑖=1 1 − 𝜌𝑖, 𝑗,𝑘 𝜎 (𝑡) 𝑖, 𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 𝜎 (𝑡) 𝑖,𝑘 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 2𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝜌𝑖,(𝑡)𝑗,𝑘  (1 − 𝜌 (𝑡) 2 ) 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑖, 𝑗,𝑘 + 2 + 𝑍𝑖 , 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 2 for 𝑗 < 𝑘 = 1, . . . , 𝑞 and 𝑍𝑖 = (1 𝑋𝑖𝑇 )𝑇 . Furthermore, let 𝐴 = {𝜕 2 ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝜃𝜕𝜃 𝑇 } 𝜃=𝜃 (𝑡) , then † 𝐴 = Diag( 𝐴1 , . . . , 𝐴𝑞 , 𝐴1,2 , . . . , 𝐴†𝑞−1,𝑞 ), where 𝜕 2 ℓ ∗ (𝜃|𝜃 (𝑡) )   𝐴𝑗 = 𝜕𝛼 𝑗 𝜕𝛼𝑇𝑗 𝜃=𝜃 (𝑡) 𝑛  ∑︁  ∑︁ 4𝑌𝑖,2𝑗 3  (𝑌 2 + 𝑌 2 ) 𝑖, 𝑗 𝑖,𝑠 (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2  =− + + (𝑡) 2 (𝑡) 2 (𝑡) 2 𝑖=1 𝑠:𝑠< 𝑗 𝜎𝑖, 𝑗 (1 − 𝜌𝑖,𝑠, 𝑗 ) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠 (𝑡) (1 − 𝜌𝑖,𝑠, 𝑗 ) (1 + 𝜌𝑖,𝑠, (𝑡) 𝑗) ∑︁  4𝑌𝑖,2𝑗 3  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑠 2) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2   + + + 𝑍𝑖 𝑍𝑖𝑇 , (𝑡) 2 (𝑡) 2 (𝑡) (𝑡) (𝑡) 2 (𝑡) 𝑠:𝑠< 𝑗 𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑠 ) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖, 𝑗,𝑠 ) (1 + 𝜌𝑖, 𝑗,𝑠 ) 69 𝜕 2 ℓ ∗ (𝜃|𝜃 (𝑡) )   𝐴†𝑗,𝑘 = 𝜕𝛿 𝑗,𝑘 𝜕𝛿𝑇𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑛  1 + 𝜌 (𝑡) 2 2 2 (𝑡) 2 ∑︁ 𝑖, 𝑗,𝑘 © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1 + 5𝜌𝑖, 𝑗,𝑘 = − + × ª (𝑡) 2 2 (𝑡) 2 (𝑡) 2 (𝑡) 2 3 ­ ® 𝑖=1 (1 − 𝜌𝑖, 𝑗,𝑘 ) 𝜎 «  𝑖, 𝑗 𝜎 𝑖,𝑘 ¬ (1 − 𝜌 𝑖, 𝑗,𝑘 )  1 2 (𝑡) 2 2 2 (𝑡) 2 + 2 (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 ) (1 + 𝜌𝑖, 𝑗,𝑘 ) − (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 )(1 + 7𝜌𝑖, 𝑗,𝑘 ) (𝑡) 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 3 2 − 4(𝑌 + 𝑌 ) 2  (1 − 𝜌 (𝑡) 2 2 𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 ) + 𝑍𝑖 𝑍𝑖𝑇 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 3 4 𝑛  𝜌 (𝑡) 𝑌𝑖,2𝑗 2 𝜌𝑖,(𝑡)𝑗,𝑘 ! ∑︁ 𝑖, 𝑗,𝑘 𝑌𝑖,𝑘 + 2 − 2 + 2 2 𝑖=1 1 − 𝜌𝑖,(𝑡)𝑗,𝑘 𝜎 (𝑡) 𝑖, 𝑗 𝜎 (𝑡) 𝑖,𝑘 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 (𝑡)  (𝑡) (𝑡) 2  2𝜌𝑖, 𝑗,𝑘  𝜌𝑖, 𝑗,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 )   𝑌𝑖, 𝑗 𝑌𝑖,𝑘  1    − (𝑡) (𝑡) + 𝑍𝑖 𝑍𝑖𝑇 . (𝑡) 2 2 (𝑡) 2  2 𝜎𝑖, 𝑗 𝜎𝑖,𝑘  (1 − 𝜌𝑖, 𝑗,𝑘 )  (1 + 𝜌𝑖, 𝑗,𝑘 )    The above expressions are derived in the following manner. Observe that 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛼 𝑗 = 𝜕𝑔1 (𝛼 𝑗 |𝜃 (𝑡) )/𝜕𝛼 𝑗 = 𝑠:𝑠< 𝑗 𝜕𝜓1,𝑠, 𝑗 (𝛼 𝑗 − 𝛼 (𝑡) (𝑡) Í 𝑗 , 𝑗 |𝜃 )/𝜕𝛼 𝑗 + 𝜕𝜓1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 (𝑡) (𝑡) Í 𝑠:𝑠> 𝑗 𝑗 , 𝑗 |𝜃 )/𝜕𝛼 𝑗 for 𝑗 = 1, . . . , 𝑞. Now,  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 2 ∑︁ 𝑛   𝜓1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 (0) 𝑗 , 𝑗 |𝜃 (0) )= 1+ 2 + 𝑍𝑖𝑇 (𝛼 (0) 𝑗 − 𝛼𝑗) 𝑖=1 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖,(0)𝑗,𝑠 ) 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖,(0)𝑗,𝑠 ) 𝑌𝑖,2𝑗 − 2 2 exp{4𝑍𝑖𝑇 (𝛼 (0) 𝑗 − 𝛼 𝑗 )} 4𝜎𝑖,(0) 𝑗 (1 − 𝜌𝑖,(0)𝑗,𝑠 )  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑠 2) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠   − 2 + exp{3𝑍𝑖𝑇 (𝛼 (0) − 𝛼 𝑗 )} , 6𝜎𝑖,(0) (0) (0) 6𝜎𝑖,(0) (0) (0) 𝑗 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖, 𝑗,𝑠 ) 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖, 𝑗,𝑠 ) 70 so  2   𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 2 𝜕𝜓1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 (0) (0) 𝑗 , 𝑗 |𝜃 ) ∑︁ 𝑛    = − 1+ 2 + (0) (0) 𝜕𝛼 𝑗 𝑖=1 2𝜎𝑖,(0) (0) 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖, 𝑗,𝑠 ) (0) 2𝜎𝑖,(0) 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖, 𝑗,𝑠 ) 𝑌𝑖,2𝑗 + 2 2 exp{4𝑍𝑖𝑇 (𝛼 (0) 𝑗 − 𝛼 𝑗 )} 𝜎𝑖,(0) 𝑗 (1 − 𝜌𝑖,(0)𝑗,𝑠 )  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑠 2) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠  + + × (0) 2 (0) (0) (0) 2𝜎𝑖,(0) 𝜎 𝑗 𝑖,𝑠 (0) (1 − 𝜌 𝑖, 𝑗,𝑠 ) 2𝜎 𝑖, 𝑗 𝜎 𝑖,𝑠 (1 + 𝜌 𝑖, 𝑗,𝑠 )  𝑇 (0) exp{3𝑍𝑖 (𝛼 𝑗 − 𝛼 𝑗 )} 𝑍𝑖 , and  2   − 𝛼 (𝑡) (𝑡)  𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 𝑌𝑖,2𝑗 + 𝑌𝑖,𝑠 2 1, 𝑗,𝑠 (𝛼 𝑗 𝑗 , 𝑗 |𝜃 )  𝜕𝜓 ∑︁ 𝑛    = − 1+ 2 + 𝜕𝛼 𝑗 𝜃=𝜃 (𝑡) 𝑖=1 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠(𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 ) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑠 ) 𝑌𝑖,2𝑗 + 2 2 𝜎𝑖,(𝑡)𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 )  2  (𝑌𝑖,2𝑗 + 𝑌𝑖,𝑠 2) 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠  + 2 + (𝑡) (𝑡) (𝑡) 𝑍𝑖 , 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠(𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 ) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖, 𝑗,𝑠 ) ∑︁ 𝑛  𝑌𝑖,2𝑗 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 = −1 + 2 2 − (𝑡) (𝑡) 2 𝑖=1 𝜎𝑖,(𝑡)𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌 (𝑡) 𝑖, 𝑗,𝑠 )  𝑌𝑖, 𝑗 𝑌𝑖,𝑠 + (𝑡) (𝑡) 𝑍𝑖 . 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖,(𝑡)𝑗,𝑠 ) Likewise,  𝜕𝜓 1,𝑠, 𝑗 (𝛼 𝑗 − 𝛼 (𝑡) (𝑡)  𝑗 , 𝑗 |𝜃 ) ∑︁𝑛  𝑌𝑖,2𝑗 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 = −1 + − 2 (𝑡) 2 (𝑡) 2 𝜕𝛼 𝑗 𝜃=𝜃 (𝑡) 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖,𝑠, 𝑗 ) 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠(𝑡) (1 − 𝜌 (𝑡) 𝑖,𝑠, 𝑗 )  𝑌𝑖, 𝑗 𝑌𝑖,𝑠 + (𝑡) (𝑡) (𝑡) 𝑍𝑖 . 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖,𝑠, 𝑗 ) 71 Adding the above two expressions, we obtain  ∗ 𝜕ℓ (𝜃|𝜃 (𝑡) )  ∑︁ ∑︁ 𝑛  𝑌𝑖,2𝑗 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 = −1 + 2 2 − (𝑡) (𝑡) 2 𝜕𝛼 𝑗 𝜃=𝜃 (𝑡) 𝑠:𝑠< 𝑗 𝑖=1 𝜎𝑖,(𝑡)𝑗 (1 − 𝜌𝑖,𝑠, (𝑡) 𝑗) 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌 (𝑡) 𝑖,𝑠, 𝑗 ) 𝑌𝑖, 𝑗 𝑌𝑖,𝑠  ∑︁ ∑︁ 𝑛  𝑌𝑖,2𝑗 + (𝑡) (𝑡) (𝑡) 𝑍𝑖 + −1 + 2 2 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖,𝑠, 𝑗 ) 𝑠:𝑠> 𝑗 𝑖=1 𝜎𝑖,(𝑡)𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑠 )  𝑌𝑖, 𝑗 𝑌𝑖,𝑠 𝑌𝑖, 𝑗 𝑌𝑖,𝑠 − (𝑡) (𝑡) 2 + (𝑡) (𝑡) (𝑡) 𝑍𝑖 . 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌 (𝑡) 𝑖, 𝑗,𝑠 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 + 𝜌𝑖, 𝑗,𝑠 ) Next consider, 𝜕 2 𝜓1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 (𝑡) 𝑗 , 𝑗 |𝜃 ) (𝑡) ∑︁𝑛  4𝑌𝑖,2𝑗 =− exp{4𝑍𝑖𝑇 (𝛼 (𝑡) 𝑗 − 𝛼 𝑗 )} 𝜕𝛼 𝑗 𝜕𝛼𝑇𝑗 (𝑡) 2 (𝑡) 2 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑠 )  (𝑌 2 + 𝑌 2 ) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2   3 exp{3𝑍𝑖𝑇 (𝛼 (𝑡) 𝑖, 𝑗 𝑖,𝑠 + + − 𝛼 𝑗 )} 𝑍𝑖 𝑍𝑖𝑇 , (𝑡) (𝑡) (𝑡) 2 𝜌𝑖,(𝑡)𝑗,𝑠 ) 𝑗 2𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖, 𝑗,𝑠 ) (1 + and  𝜕2𝜓 1, 𝑗,𝑠 (𝛼 𝑗 − 𝛼 (𝑡) 𝑗 , 𝑗 |𝜃 ) (𝑡)  ∑︁ 𝑛  4𝑌𝑖,2𝑗 3 =− + × (𝑡) 2 (𝑡) 2 𝜕𝛼 𝑗 𝜕𝛼𝑇𝑗 𝜃=𝜃 (𝑡) 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑠 ) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠 (𝑡)  (𝑌 2 + 𝑌 2 ) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2  𝑖, 𝑗 𝑖,𝑠 + 𝑍𝑖 𝑍𝑖𝑇 . (𝑡) 2 (𝑡) (1 − 𝜌𝑖, 𝑗,𝑠 ) (1 + 𝜌𝑖, 𝑗,𝑠 ) Similarly,  𝜕2𝜓 1,𝑠, 𝑗 (𝛼 𝑗 − 𝛼 (𝑡) 𝑗 , 𝑗 |𝜃 ) (𝑡)  ∑︁ 𝑛  4𝑌𝑖,2𝑗 3 =− + × (𝑡) 2 (𝑡) 2 𝜕𝛼 𝑗 𝜕𝛼𝑇𝑗 𝜃=𝜃 (𝑡) 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖,𝑠, 𝑗 ) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠 (𝑡)  (𝑌 2 + 𝑌 2 ) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2  𝑖, 𝑗 𝑖,𝑠 + 𝑍𝑖 𝑍𝑖𝑇 . (𝑡) 2 (𝑡) (1 − 𝜌𝑖,𝑠, 𝑗 ) (1 + 𝜌𝑖,𝑠, 𝑗 ) Combining the above two expressions, we obtain  2 ∗ 𝜕 ℓ (𝜃|𝜃 (𝑡) )  ∑︁ ∑︁ 𝑛  4𝑌𝑖,2𝑗 3  (𝑌 2 + 𝑌 2 ) 𝑖, 𝑗 𝑖,𝑠 =− 2 2 + (𝑡) 2 (𝑡) (𝑡) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑠 (𝑡) 𝑇 𝑠:𝑠< 𝑗 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖,𝑠, 𝑗 ) (1 − 𝜌𝑖,𝑠, 𝑗) 𝜕𝛼 𝑗 𝜕𝛼 𝑗 𝜃=𝜃 (𝑡) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2  ∑︁ ∑︁ 𝑛  4𝑌𝑖,2𝑗 𝑇 + 𝑍𝑖 𝑍𝑖 − (𝑡) (𝑡) 2 (𝑡) 2 (1 + 𝜌𝑖,𝑠, 𝑗 ) 𝑠:𝑠< 𝑗 𝑖=1 𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑠 )  (𝑌 2 + 𝑌 2 ) (𝑌𝑖, 𝑗 + 𝑌𝑖,𝑠 ) 2  3 𝑖, 𝑗 𝑖,𝑠 + + 𝑍𝑖 𝑍𝑖𝑇 . (𝑡) (𝑡) (𝑡) 2 (𝑡) 2𝜎𝑖, 𝑗 𝜎𝑖,𝑠 (1 − 𝜌𝑖, 𝑗,𝑠 ) (1 + 𝜌𝑖, 𝑗,𝑠 ) 72 Next, observe that 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )/𝜕𝛿 𝑗,𝑘 = 𝜕𝑔2 (𝛿 𝑗,𝑘 |𝜃 (𝑡) )/𝜕𝛿 𝑗,𝑘 = 𝜕𝜓2, 𝑗,𝑘 (𝛿 𝑗,𝑘 |𝜃 (𝑡) )/𝜕𝛿 𝑗,𝑘 . Recall that, 𝑛  (𝑡) ∑︁ log(4) 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 ) = − − 0.5𝛿𝑇𝑗,𝑘 𝑍𝑖 + log{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1} 𝑖=1 2 2 2  {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1}4 © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘  (𝑡) 2 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × ª (𝑡) 2 (𝑡) 2 64 exp(2𝛿𝑇𝑗,𝑘 𝑍𝑖 ) « 𝜎 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘    (𝑡) 2 + 2 log 1 − 𝜌 𝑖, 𝑗,𝑘 − log(4) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 )  𝑇 𝑇 − 𝛿 𝑗,𝑘 𝑍𝑖 + 2log{exp(𝛿 𝑗,𝑘 𝑍𝑖 ) + 1}   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   2 {1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}6 (𝑡) 2 − 1 − 𝜌𝑖, 𝑗,𝑘 × 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡) 64 exp(3𝛿𝑇𝑗,𝑘 𝑍𝑖 )  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   2 {1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}3 (𝑡) − 1 + 𝜌𝑖, 𝑗,𝑘 × 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡) 8 exp(3𝛿𝑇𝑗,𝑘 𝑍𝑖 )   𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2    (𝑡) + log 1 + 𝜌𝑖, 𝑗,𝑘 − log(2) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 )  𝑇 𝑇 − 𝛿 𝑗,𝑘 𝑍𝑖 + log{1 + exp(𝛿 𝑗,𝑘 𝑍𝑖 )} 73 Further simplifying, we get 𝑛  (𝑡) ∑︁ log(4) 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 ) = − − 0.5𝛿𝑇𝑗,𝑘 𝑍𝑖 + log{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 1} 𝑖=1 2 2 2  4 © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1   (𝑡) 2 𝑇 𝑇 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2) ª (𝑡) 2 (𝑡) 2 64 𝜎 « 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘    (𝑡) 2 + 2 log 1 − 𝜌 𝑖, 𝑗,𝑘 − log(4) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 )  𝑇 𝑇 − 𝛿 𝑗,𝑘 𝑍𝑖 + 2log{exp(𝛿 𝑗,𝑘 𝑍𝑖 ) + 1}   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  6 2 1  (𝑡) 𝑇 𝑇 − 1 − 𝜌𝑖, 𝑗,𝑘 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2) 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) 64  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  2 1  3 (𝑡) 𝑇 − 1 + 𝜌𝑖, 𝑗,𝑘 × 1 + exp(−𝛿 𝑗,𝑘 𝑍𝑖 ) 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) 8   𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2    (0) + log 1 + 𝜌 𝑖, 𝑗,𝑘 − log(2) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 )  𝑇 𝑇 − 𝛿 𝑗,𝑘 𝑍𝑖 + log{1 + exp(𝛿 𝑗,𝑘 𝑍𝑖 )} . 74 Now, 𝜕𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (𝑡) ) 𝜕𝛿 𝑗,𝑘 𝑛 ∑︁  exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) = −0.5 + 𝑖=1 1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) 2 2  3 © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1   (𝑡) 2 𝑇 𝑇 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2) ª (𝑡) 2 (𝑡) 2 32 𝜎 « 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬  𝑇 𝑇 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) − exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2)  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 ( 2 exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) ) + 2 −1 + 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  5 2 3  (𝑡) 2 𝑇 𝑇 − 1 − 𝜌𝑖, 𝑗,𝑘 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2) 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) 64   𝑇 𝑇 × exp(𝛿 𝑗,𝑘 𝑍𝑖 /2) − exp(−𝛿 𝑗,𝑘 𝑍𝑖 /2)  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  2  2   3 − 1+ 𝜌𝑖,(𝑡)𝑗,𝑘 𝑇 × 1 + exp(−𝛿 𝑗,𝑘 𝑍𝑖 ) 1 − exp(−𝛿 𝑗,𝑘 𝑍𝑖 ) 𝑇 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) 8   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2  exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )  + −1 + 𝑍𝑖 , 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) 75 and  𝜕𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (𝑡) )  ∑︁𝑛  exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) = −0.5 + 𝜕𝛿 𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑖=1 1 + exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) 2 2  © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1   (𝑡) 2 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 /2) ª (𝑡) 2 (𝑡) 2 32 𝜎 « 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬ 3   𝑇 (𝑡) 𝑇 (𝑡) 𝑇 (𝑡) + exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2) exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) − exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )         + 2 −1 + 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 )   1 + exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )       𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2  2 (𝑡) 2 − 1 − 𝜌𝑖, 𝑗,𝑘 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡)  5 3 𝑇 (𝑡) 𝑇 (𝑡) × exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) + exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2) 64   𝑇 (𝑡) 𝑇 (𝑡) × exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) − exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  2 − 1 + 𝜌𝑖,(𝑡)𝑗,𝑘 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡)  2   3 𝑇 (𝑡) 𝑇 (𝑡) × 1 + exp(−𝑍𝑖 𝛿 𝑗,𝑘 ) 1 − exp(−𝑍𝑖 𝛿 𝑗,𝑘 ) 8   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2  exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )  + −1 + 𝑍𝑖 . 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 1 + exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) Simplifying further, we obtain 𝑛  𝜌 (𝑡) 𝑌𝑖,2𝑗 𝜌𝑖,(𝑡)𝑗,𝑘 2 𝜌 (𝑡) 𝜕ℓ ∗ (𝜃|𝜃 (𝑡) )   ∑︁ 𝑌𝑖,𝑘 𝑖, 𝑗,𝑘 𝑖, 𝑗,𝑘 = 2 − 2 2 − 2 2 𝜕𝛿 𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑖=1 1 − 𝜌𝑖,(𝑡)𝑗,𝑘 𝜎 (𝑡) 𝑖, 𝑗 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 𝜎 (𝑡) 𝑖,𝑘 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 2𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝜌𝑖,(𝑡)𝑗,𝑘  (1 − 𝜌 (𝑡) 2 ) 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑖, 𝑗,𝑘 + 2 + 𝑍𝑖 . 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 2 76 Next, 𝜕 2 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (𝑡) ) ∑︁𝑛  exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) = 𝜕𝛿 𝑗,𝑘 𝜕𝛿𝑇𝑗,𝑘 𝑖=1 {1 + exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )}2 2 2  © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1   (𝑡) 2 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × 3{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}2 ª (𝑡) 2 (𝑡) 2 64 « 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 ¬  × {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) − exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}2 + {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}4  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 ) + 2 × 2 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘(𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) {1 + exp(𝛿 𝑗,𝑘 𝑍𝑖 )} 𝑇   2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘   2 3 (𝑡) 2 − (𝑡) (𝑡) 1 − 𝜌𝑖, 𝑗,𝑘 × 5{exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}4 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 128  × {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) − exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}2 + {exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 /2) + exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 /2)}6  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  2  3 − (𝑡) (𝑡) 1+ 𝜌𝑖,(𝑡)𝑗,𝑘 × 2{1 + exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 )}{1 − exp(−𝛿𝑇𝑗,𝑘 𝑍𝑖 )}2 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 8  𝑇 𝑇 3 + {exp(𝛿 𝑗,𝑘 𝑍𝑖 ) + exp(−𝛿 𝑗,𝑘 𝑍𝑖 )}   2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘2 exp(𝛿𝑇𝑗,𝑘 𝑍𝑖 )  + (𝑡) (𝑡) (𝑡) × 2 𝑍𝑖 𝑍𝑖𝑇 . 2𝜎 𝜎 (1 + 𝜌 ) {1 + exp(𝛿 𝑗,𝑘 𝑍𝑖 )} 𝑇 𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 77 Subsequently,  𝜕 2 𝜓2, 𝑗,𝑘 (𝜌 𝑗,𝑘 |𝜃 (𝑡) )  ∑︁𝑛  exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) = 𝜕𝛿 𝑗,𝑘 𝜕𝛿𝑇𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑖=1 {1 + exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )} 2 2 2  © 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1   (𝑡) 2 −­ + ® 1 − 𝜌𝑖, 𝑗,𝑘 × 3{exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 /2) ª (𝑡) 2 (𝑡) 2 64 𝜎 « 𝑖, 𝑗 𝜎 𝑖,𝑘 ¬ 𝑇 (𝑡) 𝑇 (𝑡) + exp(−𝑍𝑖𝑇 𝛿 (𝑡) 2 𝑗,𝑘 /2)} × {exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) − exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)} 2  𝑇 (𝑡) 𝑇 (𝑡) 4 + {exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) + exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)}  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 2 exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) + 2 × (𝑡) 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) {1 + exp(𝑍𝑖𝑇 𝛿 𝑗,𝑘 )}2   𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 2 2   (𝑡) 2 3 − (𝑡) (𝑡) 1 − 𝜌𝑖, 𝑗,𝑘 × 5{exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 /2) 6𝜎𝑖, 𝑗 𝜎𝑖,𝑘 128 + exp(−𝑍𝑖𝑇 𝛿 (𝑡) 4 𝑇 (𝑡) 𝑗,𝑘 /2)} × {exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) − exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)} 𝑇 (𝑡) 2  𝑇 (𝑡) 𝑇 (𝑡) 6 + {exp(𝑍𝑖 𝛿 𝑗,𝑘 /2) + exp(−𝑍𝑖 𝛿 𝑗,𝑘 /2)}  2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘  2  3 − 1+ 𝜌𝑖,(𝑡)𝑗,𝑘 × 2{1 + exp(−𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )} 6𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) 8 × {1 − exp(−𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )} 2    2 2 𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 + {exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 ) + exp(−𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )} 3 + 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )  × 𝑍𝑖 𝑍𝑖𝑇 . {1 + exp(𝑍𝑖𝑇 𝛿 (𝑡) 𝑗,𝑘 )} 2 78 Simplifying further, we obtain, 2 2  𝜕 2 ℓ ∗ (𝜃|𝜃 (𝑡) )  𝑛  1 + 𝜌 (𝑡) ∑︁ 𝑖, 𝑗,𝑘 © 𝑌𝑖, 𝑗 2 2 𝑌𝑖,𝑘 1 + 5𝜌𝑖,(𝑡)𝑗,𝑘 = −­ + ®× ª (𝑡) 2 2 (𝑡) 2 (𝑡) 2 2 𝜕𝛿 𝑗,𝑘 𝜕𝛿𝑇𝑗,𝑘 𝜃=𝜃 (𝑡) 𝑖=1 (1 − 𝜌𝑖, 𝑗,𝑘 ) 𝜎 « 𝑖, 𝑗 𝜎𝑖,𝑘 ¬ (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 3 1 2 (𝑡) 2 + 2 (𝑌 𝑖, 𝑗 + 𝑌 𝑖,𝑘 ) (1 + 𝜌 𝑖, 𝑗,𝑘 ) 𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 3  2 2 (𝑡) 2 −(𝑌𝑖, 𝑗 + 𝑌𝑖,𝑘 )(1 + 7𝜌𝑖, 𝑗,𝑘 ) 2 − 4(𝑌 + 𝑌 ) 2  (1 − 𝜌 (𝑡) 2 2 𝑌𝑖,2𝑗 + 𝑌𝑖,𝑘 𝑖, 𝑗 𝑖,𝑘 𝑖, 𝑗,𝑘 ) + 𝑍𝑖 𝑍𝑖𝑇 2𝜎𝑖,(𝑡)𝑗 𝜎𝑖,𝑘 (𝑡) (1 + 𝜌𝑖,(𝑡)𝑗,𝑘 ) 3 4 𝑛  𝜌 (𝑡) 𝑌𝑖,2𝑗 2 𝜌𝑖,(𝑡)𝑗,𝑘 ! ∑︁ 𝑖, 𝑗,𝑘 𝑌𝑖,𝑘 + − + 1 − 𝜌 (𝑡) 2 𝜎 (𝑡) 2 2 𝜎 (𝑡) 𝑖,𝑘 (1 − 𝜌𝑖,(𝑡)𝑗,𝑘 ) 2 2 𝑖=1 𝑖, 𝑗,𝑘 𝑖, 𝑗 (𝑡)  (𝑡) (𝑡) 2  2𝜌𝑖, 𝑗,𝑘  𝜌𝑖, 𝑗,𝑘 (1 − 𝜌𝑖, 𝑗,𝑘 )   𝑌𝑖, 𝑗 𝑌𝑖,𝑘  1    − (𝑡) (𝑡) + 𝑍𝑖 𝑍𝑖𝑇 . (𝑡) 2 2 (𝑡) 2  2 𝜎𝑖, 𝑗 𝜎𝑖,𝑘  (1 − 𝜌𝑖, 𝑗,𝑘 )  (1 + 𝜌𝑖, 𝑗,𝑘 )    4.4 Inference Let 𝜃 0 be the true parameter lies in an open subset of multidimensional Euclidean space. Assume 𝑋𝑖 𝑋𝑖⊤ has full Í𝑛 that all predictors are in a compact subset of multidimensional Euclidean space, 𝑖=1 rank, and other regularity conditions hold. Then following the standard asymptotic results [70], we √ obtain 𝑛( b 𝜃 − 𝜃 0 ) −→ 𝑁𝑛 𝜃 (0, G −1 ) in distribution, where 𝑁𝑛 𝜃 denotes the 𝑛𝜃 -variate multivariate normal distribution. The Godambe information G is G(𝜃 0 ) = H (𝜃 0 )J −1 (𝜃 0 )H 𝑇 (𝜃 0 ), where H (𝜃) = 𝐸 [−∇𝜃 U (𝜃; 𝐷)], J (𝜃) = var[U (𝜃; 𝐷)], where 𝐷 denotes the data from randomly chosen subject or experimental unit, and U (𝜃; 𝐷) denotes the score function corresponding to this randomly chosen subject. The information G(𝜃 0 ) is consistently estimated by H bJb−1 H b𝑇 , where 79 Hb = −(1/𝑛) Í𝑛 ∇𝜃 U (𝜃; 𝐷 𝑖 )| b, J b = (1/𝑛) Í𝑛 U (𝜃; 𝐷 𝑖 )U (𝜽; 𝐷 𝑖 )𝑇 | b, 𝑖=1 𝜃 𝑖=1 𝜃 𝜕 ∑︁ ∑︁ U (𝜃; 𝐷 𝑖 ) = log{ 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 )} 𝜕𝜃 𝑗 𝑗 <𝑘  𝑞 𝜕 1 ∑︁ 1 ∑︁ ∑︁ = − (𝑞 − 1)log(𝜎𝑖,2 𝑗 ) − log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜕𝜃 2 𝑗=1 2 𝑗 <𝑘  𝑌2 2  1 ∑︁ ∑︁ 1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 − − + 2 , 2 𝑗 <𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 and detailed expressions of H and J are given below. Thus, the standard error of the 𝑟th component 𝜃 is the square root of the 𝑟th diagonal element of the inverse of H of b bJ b−1 H b𝑇 . This standard error helps compute the Wald confidence interval for the parameter and can also be used for hypothesis Í Í testing. Let ℓ𝑖, 𝑗,𝑘 (𝛼, 𝛿) = log{ 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 )}, and ℓ𝑖 (𝛼, 𝛿) = 𝑗 𝑗 <𝑘 log{ 𝑓 (𝑌𝑖, 𝑗 , 𝑌𝑖,𝑘 |𝑋𝑖 )}. Then,  1 ℓ𝑖, 𝑗,𝑘 (𝛼, 𝛿) = − log(𝜎𝑖,2 𝑗 ) + log(𝜎𝑖,𝑘 2 ) + log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 2  𝑌2 2  1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 + − + 2 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 and 𝑞−1 𝑞  1 ∑︁ ∑︁ ℓ𝑖 (𝛼, 𝛿) = − log(𝜎𝑖,2 𝑗 ) + log(𝜎𝑖,𝑘 2 ) + log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝑗=1 𝑘= 𝑗+1  𝑌2 2  1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 + − + 2 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 𝑞 1 ∑︁ 1 ∑︁ ∑︁ =− (𝑞 − 1)log(𝜎𝑖,2 𝑗 ) − log(1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝑗=1 2 𝑗 <𝑘  𝑌2 2  1 ∑︁ ∑︁ 1 𝑖, 𝑗 2𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑌𝑖,𝑘 − − + 2 . 2 𝑗 <𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖,2 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜎𝑖,𝑘 80 We need to calculate the score functions U (𝜃; 𝐷 𝑖 ) = 𝜕ℓ𝑖 (𝛼, 𝛿)/𝜕𝜃. For this derivation we use 𝜕𝜎𝑖, 𝑗 /𝜕𝛼 𝑗 = 𝜎𝑖, 𝑗 𝑍𝑖 and 𝜕 𝜌𝑖, 𝑗,𝑘 /𝜕𝛿 𝑗,𝑘 = 0.5(1 − 𝜌𝑖,2 𝑗,𝑘 )𝑍𝑖 . For 𝑗 = 1, . . . , 𝑞, 𝜕ℓ𝑖 (𝛼, 𝛿) (𝑞 − 1) 𝜕𝜎𝑖, 𝑗 ∑︁𝑞 𝑌𝑖,2𝑗 𝜕𝜎𝑖, 𝑗 ∑︁𝑞 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝜕𝜎𝑖, 𝑗 = − + − 𝜕𝛼 𝑗 𝜎𝑖, 𝑗 𝜕𝛼 𝑗 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 )𝜎𝑖,3 𝑗 𝜕𝛼 𝑗 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 )𝜎𝑖,𝑘 𝜎𝑖,2 𝑗 𝜕𝛼 𝑗 ∑︁𝑞 𝑌𝑖,2𝑗 ∑︁𝑞 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 = −(𝑞 − 1)𝑍𝑖 + 𝑍𝑖 − 𝑍𝑖 , 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 )𝜎𝑖,2 𝑗 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 )𝜎𝑖,𝑘 𝜎𝑖, 𝑗  𝑌2 2  𝜕ℓ𝑖 (𝛼, 𝛿) 𝜌𝑖, 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘 𝑖, 𝑗 𝑌𝑖,𝑘 𝜌𝑖, 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘 = − + 𝜕𝛿 𝑗,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜕𝛿 𝑗,𝑘 𝜎𝑖,2 𝑗 2 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝜕𝛿 𝑗,𝑘  2𝜌𝑖,2 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘  𝑌𝑖, 𝑗 𝑌𝑖,𝑘 2 −1 𝜕 𝜌𝑖, 𝑗,𝑘 + (1 − 𝜌𝑖, 𝑗,𝑘 ) + 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜕𝛿 𝑗,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝜕𝛿 𝑗,𝑘 𝜌𝑖, 𝑗,𝑘  2 1 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 2  𝜌𝑖, 𝑗,𝑘  𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑍𝑖 𝜌𝑖,2 𝑗,𝑘  = 𝑍𝑖 − + 2 𝑍𝑖 + + 𝑍𝑖 , 2 2 𝜎𝑖,2 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 2 (1 − 𝜌𝑖,2 𝑗,𝑘 ) for 𝑗 < 𝑘. To calculate the sensitivity matrix, we need to calculate the double derivatives of the above two expressions. That is for 𝑗 = 1, . . . , 𝑞 𝜕 2 ℓ𝑖 (𝛼, 𝛿) ∑︁ 𝑞  𝑌𝑖,2𝑗  𝜕𝜎𝑖, 𝑗 ∑︁𝑞  𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘  𝜕𝜎𝑖, 𝑗 = −2 3 𝑍 𝑖 + 2 𝑍 𝑖 𝜕𝛼 𝑗 𝜕𝛼 𝑗 𝑇 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌𝑖,2 𝑗,𝑘 )𝜎𝑖, 𝑗 𝜕𝛼 𝑗𝑇 𝑘=1,𝑘≠ 𝑗 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 𝜕𝛼𝑇𝑗 ∑︁𝑞  𝑌𝑖,2𝑗  ∑︁𝑞  𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘  𝑇 = −2 2 )𝜎 2 𝑍𝑖 𝑍𝑖 + 𝑍𝑖 𝑍𝑖𝑇 , 𝑘=1,𝑘≠ 𝑗 (1 − 𝜌 𝑖, 𝑗,𝑘 𝑖, 𝑗 𝑘=1,𝑘≠ 𝑗 𝜎 𝑖, 𝑗 𝜎𝑖,𝑘 𝜕 2 ℓ𝑖 (𝛼, 𝛿) 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 = 2 𝑍𝑖 𝑍𝑖𝑇 , 𝜕𝛼 𝑗 𝜕𝛼 𝑘 𝑇 (1 − 𝜌𝑖, 𝑗,𝑘 )𝜎𝑖, 𝑗 𝜎𝑖,𝑘 and for 𝑗 < 𝑘 𝜕 2 ℓ𝑖 (𝛼, 𝛿)  𝑌𝑖,2𝑗 𝜌𝑖, 𝑗,𝑘 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1  𝜌𝑖,2 𝑗,𝑘  = 2 2 − + 2 𝑍𝑖 𝑍𝑖𝑇 , 𝜕𝛼 𝑗 𝜕𝛿𝑇𝑗,𝑘 𝜎𝑖, 𝑗 (1 − 𝜌𝑖, 𝑗,𝑘 ) 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 2 1 − 𝜌𝑖, 𝑗,𝑘 𝜕 2 ℓ𝑖 (𝛼, 𝛿) (1 − 𝜌𝑖,2 𝑗,𝑘 )𝑍𝑖 𝑍𝑖𝑇 𝑍𝑖  𝑌𝑖,2𝑗 𝑌𝑖,𝑘 2  2𝜌𝑖,2 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘  2 −1 𝜕 𝜌𝑖, 𝑗,𝑘 = − + 2 (1 − 𝜌𝑖, 𝑗,𝑘 ) + 𝜕𝛿 𝑗,𝑘 𝜕𝛿𝑇𝑗,𝑘 4 2 𝜎𝑖,2 𝑗 𝜎𝑖,𝑘 𝜕𝛿𝑇𝑗,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝜕𝛿𝑇𝑗,𝑘  2𝜌 3  𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑍𝑖 𝑖, 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘 2𝜌𝑖, 𝑗,𝑘 𝜕 𝜌𝑖, 𝑗,𝑘 + + 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 2 𝜕𝛿𝑇𝑗,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 𝜕𝛿𝑇𝑗,𝑘  (1 − 𝜌 2 )  2 2  𝜌𝑖,2 𝑗,𝑘  𝑖, 𝑗,𝑘 1 𝑌𝑖, 𝑗 𝑌𝑖,𝑘 1 = − + 2 + 4 2 𝜎𝑖,2 𝑗 𝜎𝑖,𝑘 2 (1 − 𝜌𝑖,2 𝑗,𝑘 )  𝜌3  𝑌𝑖, 𝑗 𝑌𝑖,𝑘 𝑖, 𝑗,𝑘 + + 𝜌𝑖, 𝑗,𝑘 𝑍𝑖 𝑍𝑖𝑇 . 𝜎𝑖, 𝑗 𝜎𝑖,𝑘 (1 − 𝜌𝑖,2 𝑗,𝑘 ) 81 4.5 Simulation studies 4.5.1 Simulation design Three different scenarios were considered. The number of phenotypes, 𝑞, was set to four for all scenarios. For scenarios 1 and 2, the number of predictors 𝑝 was set to 2, and the sample size 𝑛 was set to 261 and 600, respectively. For the third scenario, 𝑝 was set to 6 and 𝑛 to 500. All these numbers were chosen by closely following real datasets [11]. Each dataset contained information on 𝑋 and 𝑌 from 𝑛 independent units. The predictor 𝑋 had 𝑝 components for every unit, and each component was independently simulated from the Bernoulli(0.5) distribution. Next, 𝑌 was generated from 𝑁4 (0, Σ𝑖 ), where Σ𝑖 = Diag(𝜎𝑖,1 , . . . , 𝜎𝑖,𝑞 )𝑅𝑖 Diag(𝜎𝑖,1 , . . . , 𝜎𝑖,𝑞 ), where 𝑅𝑖 = ((𝜌𝑖, 𝑗,𝑘 )). The true values of the parameter 𝜃 are given in the simulation tables. 4.5.2 Method of analysis For each scenario 𝜌𝑖, 𝑗,𝑘 ’s and 𝜎𝑖, 𝑗 ’s were modelled according to Equations (4.1) and (4.2) with respectively. Under each scenario 500 datasets were generated. Each dataset was analyzed by two approaches, 1) the proposed MM algorithm, and 2) the direct method where parameter estimates were obtained by directly maximizing the log-composite likelihood function. Under approach 2), we used the optim function of R and chose to optimize using the L-BFGS-B, a variant of the Broyden–Fletcher–Goldfarb–Shanno algorithm. For both the approaches, the initial values for 𝜃 parameters were randomly generated from Normal(0, 0.10). Since our proposed method is an iterative optimization, we used the sum of the absolute relative difference between the parameter estimates in subsequent iterations to be less than 0.001 as the stopping criteria for the convergence. 4.5.3 Results Results for scenarios 1 and 2 are presented in Tables 4.1 and 4.2. Results for scenario 3 are presents in Tables 4.3 and 4.4. 82 Table 4.1: Results of the simulation study for scenario 1 with 𝑛 = 261, 𝑝 = 2, 𝑞 = 4. All entries of the table except for the true parameter values are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error Par True Bias SD SE CP RMSE Par True Bias SD SE CP RMSE 𝛼1,0 −1.9 −0.9 8.0 7.4 93.8 8.0 𝛼3,0 −1.3 −0.1 7.8 7.5 93.2 7.8 𝛼1,1 −0.4 0.1 8.9 8.5 93.6 8.9 𝛼3,1 −0.2 0.1 8.6 8.6 96.2 8.6 𝛼1,2 0.3 −0.1 9.1 8.5 93.0 9.1 𝛼3,2 0 −0.9 9.1 8.6 94.0 9.1 𝛼2,0 −1.7 0.3 7.8 7.4 92.4 7.8 𝛼4,0 −1.4 −0.2 7.9 7.4 92.2 7.9 𝛼2,1 −0.4 −1.0 8.8 8.5 93.8 8.9 𝛼4,1 0 0 9.2 8.5 92.4 9.2 𝛼2,2 0 −0.6 8.9 8.5 93.6 8.9 𝛼4,2 0 −0.4 8.9 8.6 93.4 8.9 𝛿1,2,0 −0.7 −0.4 21.2 20.9 94.2 21.2 𝛿2,3,0 0 −0.1 22.3 21.2 93.6 22.3 𝛿1,2,1 −0.8 0.2 24.6 24.5 95.8 24.6 𝛿2,3,1 0 0.5 26.9 24.5 92.0 26.8 𝛿1,2,2 0 0.1 25.0 24.1 94.8 25.0 𝛿2,3,2 0 0.3 25.2 24.6 93.8 25.2 𝛿1,3,0 1.2 1.3 21.3 21.4 94.6 21.3 𝛿2,4,0 1.1 1.5 20.9 20.9 93.8 20.9 𝛿1,3,1 0 −0.9 24.9 24.6 94.6 24.9 𝛿2,4,1 0 −1.5 24.8 24.0 95.0 24.8 𝛿1,3,2 0 −0.1 25.5 24.6 94.4 25.5 𝛿2,4,2 −0.9 −0.2 24.9 24.5 95.6 24.9 𝛿1,4,0 0 −0.1 22.2 21.1 93.6 22.1 𝛿3,4,0 0 0.8 22.4 21.2 94.4 22.4 𝛿1,4,1 0 0.7 25.5 24.3 91.8 25.4 𝛿3,4,1 0 −1.3 26.1 24.6 93.0 26.1 𝛿1,4,2 0.6 0.1 24.9 24.6 95.0 24.9 𝛿3,4,2 0 −0.2 25.4 24.6 95.0 25.4 Table 4.2: Results of the simulation study for scenario 2 with 𝑛 = 600, 𝑝 = 2, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error Par True Bias SD SE CP RMSE Par True Bias SD SE CP RMSE 𝛼1,0 −1.0 −0.1 5.1 4.9 94.8 5.1 𝛼3,0 1.0 −0.6 5.1 4.9 94.8 5.1 𝛼1,1 1.0 0.1 5.7 5.6 94.2 5.7 𝛼3,1 0.3 0.4 5.4 5.7 97.0 5.4 𝛼1,2 0.2 0 5.7 5.7 94.6 5.7 𝛼3,2 0.1 0.3 5.7 5.7 95.0 5.7 𝛼2,0 2.0 −0.7 5.0 4.9 93.6 5.1 𝛼4,0 −1.0 −0.4 5.3 5.0 92.4 5.3 𝛼2,1 0.2 0.2 5.7 5.6 94.4 5.7 𝛼4,1 −0.5 0.2 5.4 5.7 96.4 5.4 𝛼2,2 −0.5 0.5 5.8 5.7 94.2 5.9 𝛼4,2 1.0 0 5.8 5.7 95.4 5.8 𝛿1,2,0 0.2 0.1 14.0 13.9 93.4 14.0 𝛿2,3,0 −0.1 0.4 13.7 14.0 94.4 13.7 𝛿1,2,1 0.5 −0.7 15.6 15.9 95.0 15.6 𝛿2,3,1 1.0 1.0 15.9 16.2 94.8 16.0 𝛿1,2,2 1.0 1.2 16.8 16.2 95.6 16.8 𝛿2,3,2 −0.2 −1.0 15.9 15.8 94.4 15.9 𝛿1,3,0 0.2 0.7 13.7 14.1 95.6 13.7 𝛿2,4,0 0.2 0.4 15.1 14.1 94.2 15.1 𝛿1,3,1 0.2 −1.1 16.3 16.2 94.2 16.3 𝛿2,4,1 0.5 −0.5 16.6 15.9 95.0 16.6 𝛿1,3,2 0.5 −0.7 16.4 16.2 94.8 16.4 𝛿2,4,2 −1.0 0 16.6 16.2 94.2 16.6 𝛿1,4,0 0.2 0.1 14.0 14.2 95.4 14.0 𝛿3,4,0 −0.1 0.1 14.2 14.1 94.2 14.2 𝛿1,4,1 0.2 −0.4 16.2 16.2 94.4 16.2 𝛿3,4,1 0.2 −0.1 16.5 16.2 95.6 16.5 𝛿1,4,2 −0.5 1.2 15.9 16.3 96.6 16.0 𝛿3,4,2 −0.2 0.7 16.4 16.2 94.8 16.4 83 Table 4.3: Results of 𝛼 parameters from the simulation study for scenario 3 with 𝑛 = 500, 𝑝 = 6, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error Par True Bias SD SE CP RMSE Par True Bias SD SE CP RMSE 𝛼1,0 −1.0 −0.8 8.7 8.0 92.4 8.8 𝛼3,0 1.0 −1.7 8.7 8.1 92.6 8.8 𝛼1,1 1.0 0.5 6.0 6.0 94.2 6.0 𝛼3,1 0.3 0.4 6.5 6.1 92.4 6.5 𝛼1,2 0.2 0.2 6.1 6.0 94.8 6.1 𝛼3,2 0.1 −0.3 6.5 6.1 93.8 6.5 𝛼1,3 −0.4 −0.4 6.2 6.0 95.2 6.2 𝛼3,3 0 0.3 6.4 6.1 93.2 6.4 𝛼1,4 0.3 −0.1 6.0 6.0 95.4 6.0 𝛼3,4 −0.2 0.1 6.6 6.1 92.6 6.6 𝛼1,5 0 0.5 6.2 6.0 93.4 6.2 𝛼3,5 0.1 0.5 6.5 6.1 93.8 6.5 𝛼1,6 −0.5 −0.1 6.5 6.0 93.8 6.5 𝛼3,6 −0.2 0.6 6.3 6.1 93.8 6.3 𝛼2,0 2.0 −0.4 8.4 8.1 93.4 8.4 𝛼4,0 −1.0 −0.7 8.4 8.0 92.8 8.4 𝛼2,1 0.2 −0.1 6.3 6.0 93.8 6.2 𝛼4,1 −0.5 −0.2 6.5 6.0 92.0 6.5 𝛼2,2 −0.5 0 6.2 6.0 94.6 6.2 𝛼4,2 0 0 6.6 6.1 91.2 6.6 𝛼2,3 −0.4 −0.3 6.7 6.2 92.8 6.7 𝛼4,3 0.3 −0.6 6.3 6.0 93.4 6.3 𝛼2,4 0 0 6.4 6.0 93.8 6.4 𝛼4,4 −0.2 0.3 6.2 6.0 93.6 6.2 𝛼2,5 0.3 −0.5 6.1 6.0 93.4 6.1 𝛼4,5 −0.2 0.1 6.1 6.0 95.0 6.1 𝛼2,6 0 −0.2 6.2 6.1 93.8 6.2 𝛼4,6 0.2 0.2 6.2 6.0 94.4 6.2 We present the bias, the standard deviation of the estimates (SD), the estimated standard error (SE), the empirical coverage probability of the 95% Wald ’s confidence intervals, and the root mean squared error (RMSE) of the estimates for the proposed MM algorithm. The second approach’s results are qualitatively similar to the MM algorithm. Hence they are not presented in the tables. The important take-way messages are 1) the biases of the parameters are negligible for different sample sizes and different 𝑝, 2) the SEs are very close to the SDs, indicating that the asymptotic standard deviation of the estimators is captured well by the SE, 3) the empirical coverage probabil- ities are pretty close to 0.95. All of these indicate that the method of estimation works well, and asymptotic properties of the estimator hold. The SD and SE decrease with the sample size (Tables 4.1 and 4.2 in the Supplementary Materials). Even for the scenario of a large number of parameters (Table 4.3, 4.4), the performance of the MM algorithm is extremely satisfactory. In general, the bias and SD (also RMSE) are considerably larger for the 𝛿 parameters than the 𝛼 parameters, indicating more uncertainties (less information) in the correlation parameters than the standard deviations. 84 Table 4.4: Results of 𝛿 parameters from the simulation study for scenario 3 with 𝑛 = 500, 𝑝 = 6, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, SE: standard error, CP: 95% coverage probability, RMSE: root mean squared error Par True Bias SD SE CP RMSE Par True Bias SD SE CP RMSE 𝛿1,2,0 0.2 1.2 23.5 23.1 94.8 23.5 𝛿2,3,0 0 1.1 24.5 23.5 92.8 24.5 𝛿1,2,1 0.5 −2.4 17.7 17.3 95.6 17.8 𝛿2,3,1 0 0.3 19.3 17.8 93.4 19.3 𝛿1,2,2 0 −4.4 17.2 17.2 94.6 17.8 𝛿2,3,2 −0.2 3.1 18.5 17.9 93.6 18.7 𝛿1,2,3 −0.7 3.3 18.3 17.4 94.4 18.6 𝛿2,3,3 0 −2.5 18.9 17.8 93.8 19.0 𝛿1,2,4 0.3 −2.7 16.9 17.2 95.2 17.1 𝛿2,3,4 0 1.7 17.0 17.8 95.0 17.1 𝛿1,2,5 0 1.8 17.6 17.2 93.0 17.7 𝛿2,3,5 0 −1.1 17.8 17.8 94.4 17.8 𝛿1,2,6 −0.8 3.0 18.0 17.5 93.4 18.2 𝛿2,3,6 0 −2.6 18.4 17.8 93.0 18.6 𝛿1,3,0 0.5 −3.4 25.0 23.4 93.0 25.2 𝛿2,4,0 0.3 1.7 23.3 23.0 94.4 23.4 𝛿1,3,1 0.2 −0.5 17.9 17.6 94.2 17.9 𝛿2,4,1 0 2.7 18.6 17.2 93.0 18.8 𝛿1,3,2 0.5 −1.4 17.7 17.7 95.0 17.8 𝛿2,4,2 −1.0 4.1 18.14 17.6 93.6 18.6 𝛿1,3,3 0 1.9 19.3 17.5 92.0 19.3 𝛿2,4,3 0.4 −4.5 18.0 17.2 93.2 18.5 𝛿1,3,4 0.2 −3.9 19.2 17.5 92.4 19.6 𝛿2,4,4 −0.3 1.4 17.0 17.2 94.4 17.1 𝛿1,3,5 −0.5 5.4 18.8 17.7 93.6 19.5 𝛿2,4,5 0.5 −1.4 17.6 17.2 93.6 17.7 𝛿1,3,6 0 1.0 18.4 17.5 93.8 18.4 𝛿2,4,6 0.3 −6.0 18.4 17.2 92.4 19.3 𝛿1,4,0 1.0 −1.4 26.8 23.5 90.8 26.8 𝛿3,4,0 −1.0 0.5 24.4 23.4 92.6 24.4 𝛿1,4,1 0.2 −2.1 19.6 17.7 92.0 19.7 𝛿3,4,1 0.2 1.3 18.3 17.6 95.2 18.3 𝛿1,4,2 −0.5 −0.1 18.1 17.8 94.8 18.1 𝛿3,4,2 −0.2 2.2 16.9 17.7 95.4 17.0 𝛿1,4,3 −0.2 0.1 18.9 17.7 93.8 18.9 𝛿3,4,3 −0.1 1.0 18.5 17.6 94.0 18.5 𝛿1,4,4 0.5 −2.8 18.3 17.8 94.2 18.5 𝛿3,4,4 0.2 2.9 17.8 17.7 95.2 18.0 𝛿1,4,5 0.1 4.6 18.5 17.7 92.6 19.1 𝛿3,4,5 0.6 −2.8 17.9 17.7 95.2 18.1 𝛿1,4,6 −0.1 −0.9 18.0 17.7 95.0 18.0 𝛿3,4,6 −0.1 −1.4 18.6 17.6 92.8 18.6 4.5.4 Computational advantage We have extended Scenario 3 from the simulation design by varying the number of predictor variables. Specifically, we set the number of phenotypes, 𝑞 to 4, and the sample size, 𝑛 to 500. We used four different values of the predictor variable, 𝑝= 2, 3, 4, 5. This resulted in the number of unknown parameters in our setting as 30, 40, 50, 60 respectively. The multivariate phenotype response, 𝑌 , and the design matrix, 𝑋 were generated exactly as Section 4.5. Under each scenario, we performed 100 simulations. Figure 4.1 shows the average computation time (in seconds) of the MM algorithm 85 1e+06 DOP CMPLE 8e+05 Computation time in seconds 6e+05 4e+05 2e+05 0e+00 30 40 50 60 Number of parameters Figure 4.1: Average computational time comparison between CMPLE and direct optimization method (DOP) for 100 simulations and direct optimization (DOP) via the optim function with the "L-BFGS-B" method. Both tech- niques were used to maximize the pairwise composite likelihood function. The numerical results seem to indicate that the DOP method has an exponential time complexity, and CMPLE has linear time complexity with respect to the number of parameters. Our method has a clear advantage in terms of computation time over the direct optimization method. The proposed method is at least four times faster than the direct method (see Table 4.5). All simulations were done on a 2.8Ghz Intel Xeon E5-1603 processor. Table 4.5: Average computation time (in seconds) using the MM algorithm and direct optimization (DOP) via the optim function with the “L-BFGS-B” method for 100 simulations under different scenarios. Simulation scenario 1 2 3 MM 5021 16394 33576 DOP 23782 56970 204036 86 4.6 Data Example 4.6.1 Background We analyzed a population of cowpea (Vigna unguiculata. (L.)Walp.) recombinant inbred lines (RILs) which has a high level of genetic diversity and significantly variable phenotypic response to fluctuating environments. Previous studies have demonstrated strong genetic variation on pho- tosynthetic responses in cowpea that co-regulates the light reactions of photosynthesis [3]. We were particularly interested in assessing the phenotype associations in terms of previously identi- fied candidate genes under two environmental conditions: (1) CT, control temperature 29°C/19°C (day/night), and (2) LT, low or suboptimal temperature (chilling stress) 19°C/13°C (day/night). The responses consisted of 𝑞 = 4 phenotypes, namely (1) steady-state PS II quantum yields, 𝜙 𝐼 𝐼 , (2) non-photochemical quenching, 𝑁 𝑃𝑄 𝑡 , (3) 𝑄 𝐴 redox state PS II center opened, 𝑞 𝐿 , and (4) thylakoid pmf (proton motive force), 𝐸𝐶𝑆𝑡 . These phenotypes were measured using MultispeQ 2.0 hand-held instruments as described in [47]. For this experiment, 𝑛 = 470 observations were used which originated from a cross between a tolerant cultivar California Blackeye 27 (CB27) bred by the University of California, Riverside and a sensitive breeding line 24-125B-1 developed by Institute de Recherche Agricole pour le Développement (IRAD, Cameroon). Single nucleotide polymorphism (SNP) markers of genotype data of CB27 × 24-125B-1, based on EST sequences produced by [71]. Individuals of the RIL population are homozygous for each marker in the two parental lines, as indicated by the designations of either AA, having the allele from CB27 (tolerant, maternal line), or BB, having the allele from 24-125B-1 (sensitive, paternal line). To incorporate them in our analysis, we have used dummy coding to transform them into binary (0, 1) features, where 0 (1) characterize the AA (BB) allele at a given marker locus. First, we performed individual QTL analysis on these four phenotypes using the Multiple QTL Mapping (MQM) model using the Rqtl package [72]. LOD thresholds were determined using a permutation analysis implemented with the mqmpermutation and mqmscan functions with the 87 number of permutations set at 1000 and a nominal significance cutoff of 𝑝 < 0.05. Results from the QTL analysis are presented in Figure 4.2. We found two candidate loci under QTL peaks at chromosome 4 (59.64 cm) and chromosome 9 (86.93 cm) that are the common significant SNP’s under both conditions. These loci were also predicted by pseudomolecules through BLAST in early release genomes in Phytozome and those are annotated by Pfam, Panther, EuKaryotic Orthologous Groups (KOG), Kyoto Encyclopedia of Genes and Genomes (KO), Gene Ontology (GO) and best- hit of Arabidopsis gene. For the subsequent analysis, we have used these 𝑞 = 4 phenotypes with 𝑝 = 3 predictors (two candidate loci and one environmental variable). 15 φII,CT 5 NPQt,CT 4 10 3 2 5 1 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 5 qL,CT ECSt,CT 15 4 10 3 2 5 1 0 0 LOD 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 φII,LT 5 NPQt,LT 8 10 4 6 3 4 2 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 8 qL,LT 10 ECSt,LT 8 6 6 4 4 2 2 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Chromosome Figure 4.2: QTL plot of different phenotypes used in the Cowpea RIL data. LOD threshold for each phenotype is marked by the bold horizontal line. QTL with a LOD higher than that can be considered significant. Chromosomes are marked by the vertical lines. 4.6.2 Method of analyses We fit the following model to 𝜎𝜙 𝐼 𝐼 , the standard deviation of phenotype 𝜙 𝐼 𝐼 , in terms of the predictors, 𝜎𝜙 𝐼 𝐼 = exp(𝛼1,0 + 𝛼1,1 Marker 1 + 𝛼1,2 Marker 2 + 𝛼1,3 Environment). Similar model was fit to the standard deviation of the other phenotypes. Simultaneously, we fit the following model to the pairwise correlations between phenotypes 𝜙 𝐼 𝐼 and 𝑁 𝑃𝑄 𝑡 , 𝜌 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄 𝑡 = 1 − 2/{1 + 88 exp(𝛿1,2,0 +𝛿1,2,1 Marker 1+𝛿1,2,2 Marker 2+𝛿1,2,3 Environment)}. Similarly, the remaining five pair- wise correlations, 𝜌 𝜙 𝐼 𝐼 &𝑞 𝐿 , 𝜌 𝜙 𝐼 𝐼 &𝐸𝐶𝑆𝑡 , 𝜌 𝑁 𝑃𝑄 𝑡 &𝑞 𝐿 , 𝜌 𝑁 𝑃𝑄 𝑡 &𝐸𝐶𝑆𝑡 , 𝜌 𝑞 𝐿 &𝐸𝐶𝑆𝑡 were modelled in terms of the predictors. Before applying the MM algorithm, we subtracted respective mean from the four phenotypes to have mean zero. For the above described models, there were a total of 40 parameters including 𝛼 parameters and 𝛿 parameters. Note that 𝜃 = (𝛼⊤ , 𝛿⊤ ) ⊤ . We set the initial value of 𝜃 to the random numbers generated from Normal(0, 0.15), and used the sum of the absolute relative difference between subsequent iterations to be less than 𝜖 0 = 0.001 as the stopping criteria for the iterative algorithm 4.5. 4.6.3 Interpretation The results of our analyses are placed in Tables 4.6 and 4.7. Specifically, Table 4.6 contains Table 4.6: Parameter estimates and the 95% confidence interval in parentheses of the parameters of the standard deviation model for the measured phenotypes from the cowpea dataset. 𝜙𝐼 𝐼 𝑁 𝑃𝑄 𝑡 𝑞𝐿 𝐸𝐶𝑆𝑡 Intercept −2.61 (−2.73, −2.49) −0.37 (−0.51, −0.23) −2.17 (−2.29, −2.05) −3.18 (−3.43, −2.93) Marker 1 −0.04 (−0.16, 0.08) −0.43 (−0.61, −0.25) 0.03 (−0.13, 0.19) 0.02 (−0.16, 0.20) Marker 2 0.08 (−0.04, 0.20) 0.45 (0.29, 0.61) −0.07 (−0.21, 0.07) −0.24 (−0.42, −0.06) Environment 0.19 (0.07, 0.31) 0.45 (0.29, 0.61) −0.14 (−0.30, 0.02) 0.16 (0.01, 0.32) Table 4.7: Parameter estimates and 95% confidence interval in parentheses of the parameters of pairwise correlation among the measured phenotypes from the cowpea dataset. Intercept Marker 1 Marker 2 Environment 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄 𝑡 −1.35 (−1.72, −0.98) −0.45 (−0.82, −0.08) 0.12 (−0.25, 0.49) −0.71 (−1.10, −0.32) 𝜙 𝐼 𝐼 &𝑞𝐿 1.82 (1.49, 2.15) 0.29 (−0.02, 0.60) −0.32 (−0.63, −0.01) −0.30 (−0.63, 0.03) 𝜙 𝐼 𝐼 &𝐸𝐶𝑆𝑡 0.16 (−0.23, 0.55) −0.65 (−1.02, −0.28) 0.38 (−0.01, 0.77) 0.17 (−0.18, 0.52) 𝑁 𝑃𝑄 𝑡 &𝑞𝐿 0.35 (−0.12, 0.82) −0.54 (−0.95, −0.13) 0.23 (−0.18, 0.64) −0.42 (−0.75, −0.09) 𝑁 𝑃𝑄 𝑡 &𝐸𝐶𝑆𝑡 0.82 (0.45, 1.19) 0.13 (−0.24, 0.50) −0.14 (−0.55, 0.27) −1.05 (−1.44, −0.66) 𝑞𝐿&𝐸𝐶𝑆𝑡 1.01 (0.58, 1.44) −0.52 (−0.95, −0.09) −0.01 (−0.42, 0.40) −0.47 (−0.88, −0.06) the estimate and 95% CI of the 𝛼 parameters involved in the standard deviation modelling, whereas Table 4.7 corresponds to the 𝛿 parameters involved in the pairwise correlation modelling. We made several key observations from our analysis. 89 • The 𝛼 parameters measure how the standard deviation of individual phenotypes changes with different candidate loci or the environmental factor. The 𝛼 𝑗 -parameters ( 𝑗 is the index for phenotype) can be viewed as the conditional effect of each predictor variable on 𝑗’th phenotype. For example, the conditional effect of Marker 1 on the standard deviation of 𝑁 𝑃𝑄 𝑡 was estimated to be −0.43. This means that in our population, if there is a change from allele AA to allele BB at Marker 1, the conditional standard deviation of 𝑁 𝑃𝑄 𝑡 will decrease by 35% while all other predictors remain unchanged. Likewise, if there is a change from allele AA to allele BB at Marker 2, then the conditional standard deviation of 𝑁 𝑃𝑄 𝑡 will increase by 57% while all other predictors remain unchanged. Like Marker 2, if the temperature changes from control (CT) to low (LT), then the conditional standard deviation of 𝑁 𝑃𝑄 𝑡 will increase by 57% while all other predictors remain unchanged. These are the most noteworthy changes in the standard deviation of the phenotypes. The standard deviation of 𝜙 𝐼 𝐼 and 𝐸𝐶𝑆𝑡 seem to be affected by environment, and Marker 2 and environment, respectively. Similarly, in Table 4.7, we collect 𝛿 𝑗,𝑘 estimates that can be used to calculate the conditional effect of each predictor on the correlation between ( 𝑗, 𝑘) phenotype pair. For example, the estimated regression parameter of Marker 1 on the pairwise correlation of 𝜙 𝐼 𝐼 and 𝑁 𝑃𝑄 𝑡 was −0.45. This means in our population, if the Marker 1 allele changes from AA to BB, the conditional pairwise correlation between 𝜙 𝐼 𝐼 and 𝑁 𝑃𝑄 𝑡 will decrease by 0.13 (using the Equation 4.1) when Marker 2 is at allele AA and environment is set at the control condition. Quantification of the changes in correlations based on the predictors has a profound signif- icance in the photosynthetic experiments as it indicates the change in biological processes which plant adapts. As example, the estimated regression parameter of the Environment variable on 𝑁 𝑃𝑄 𝑡 and 𝐸𝐶𝑆𝑡 was −1.05. This indicates in our population, if the Environment variable changes from control temperature(CT) to low temperature(LT), the conditional pair- wise correlation between 𝑁 𝑃𝑄 𝑡 and 𝐸𝐶𝑆𝑡 will decrease by 0.50 (using the Equation 4.1) when both the markers are at allele AA. • Intercept terms, after appropriate transformation, represent the baseline conditional standard 90 deviation and baseline pairwise correlation among phenotypes when all the markers are fixed at allele AA and the environment variable is fixed at the control temperature. For example, in Table 4.6, an intercept of −2.61 under column 𝜙 𝐼 𝐼 implies that the standard deviation of the phenotype 𝜙 𝐼 𝐼 is exp(−2.61) = 0.07 when all the predictors are at their baseline. Likewise, in Table 4.7, the estimated intercept −1.35 under the column 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄𝑡 implies that the estimated correlation between the phenotypes 𝜙 𝐼 𝐼 and 𝑁 𝑃𝑄𝑡 is 1 − 2/{1 + exp(−1.35)} = −0.59 when all the predictors are at their baseline. Consequently, the 95% CI of the intercept (−1.72, −0.98) implies that the 95% CI of the correlation between the phenotypes 𝜙 𝐼 𝐼 and 𝑁 𝑃𝑄𝑡 is (−0.70, −0.45) when all the predictors are at their baseline. • We have also estimated the average marginal effects and the corresponding 95% confidence intervals for candidate loci on correlations between phenotypes in Table 4.8. For example, the average marginal effect of Marker 1 on the correlation between 𝜙 𝐼 𝐼 and 𝐸𝐶𝑆𝑡 is estimated to be −0.32 (95% CI: −0.50, −0.13) when the marker allele changes from AA to BB. Likewise, the average marginal effect of the environment variable on the correlation between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡 is estimated to be −0.51 (95% CI: −0.69, −0.32). Table 4.8: Average marginal effect estimates and 95% confidence interval in parentheses of the pairwise correlations between phenotypes based on predictors. Marker 1 Marker 2 Environment 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄 𝑡 −0.11 (−0.27, 0.04) 0.03 (−0.06, 0.12) −0.16 (−0.25, −0.07) 𝜙 𝐼 𝐼 &𝑞𝐿 0.08 (−0.07, 0.23) −0.08 (−0.17, −0.01) −0.08 (−0.17, 0.01) 𝜙 𝐼 𝐼 &𝐸𝐶𝑆𝑡 −0.32 (−0.50, −0.13) 0.18 (0.02, 0.37) 0.08 (−0.11, 0.27) 𝑁 𝑃𝑄 𝑡 &𝑞𝐿 −0.26 (−0.48, −0.05) 0.11 (−0.09, 0.31) −0.20 (−0.41, −0.01) 𝑁 𝑃𝑄 𝑡 &𝐸𝐶𝑆𝑡 0.06 (−0.11, 0.23) −0.06 (−0.23, 0.11) −0.50 (−0.69, −0.32) 𝑞𝐿&𝐸𝐶𝑆𝑡 −0.24 (−0.40, −0.08) 0.01 (−0.20, 0.19) −0.22 (−0.41, −0.03) • The signs of the coefficients in Tables 4.6 and 4.7 indicate the direction in which the con- ditional (or marginal) standard deviation and correlation between the phenotypes change with respect to predictors. Different directionality can be biologically explained as different regulatory pathways inside a photosynthesis system. For example, the estimated regression parameter for Marker 1 on the correlation between 𝜙 𝐼 𝐼 and 𝑞𝐿 is 0.29, whereas the regres- 91 sion parameter for Marker 2 on the exact correlation is −0.32. This explains two different relationship between 𝜙 𝐼 𝐼 and 𝑞𝐿 asserted by Marker 1 and Marker 2, respectively. Table 4.9: Pairwise correlation estimates and 95% confidence interval in parentheses of the mea- sured phenotypes from all genetic combinations of Marker 1 and Marker 2 from the cowpea dataset under 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 temperature. 𝐴𝐴𝐴𝐴 𝐴𝐴𝐵𝐵 𝐵𝐵𝐴𝐴 𝐵𝐵𝐵𝐵 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄 𝑡 −0.59 (−0.70, −0.45) −0.55 (−0.68, −0.37) −0.72 (−0.78, −0.64) −0.69 (−0.78, −0.57) 𝜙 𝐼 𝐼 &𝑞𝐿 0.72 (0.63, 0.79) 0.64 (0.53, 0.72) 0.78 (0.73, 0.83) 0.71 (0.64, 0.78) 𝜙 𝐼 𝐼 &𝐸𝐶𝑆𝑡 0.08 (−0.11, 0.27) 0.27 (0.08, 0.43) −0.24 (−0.38, −0.09) −0.06 (−0.26, 0.15) 𝑁 𝑃𝑄 𝑡 &𝑞𝐿 0.17 (−0.06, 0.39) 0.28 (0.13, 0.42) −0.10 (−0.23, 0.04) 0.02 (−0.19, 0.23) 𝑁 𝑃𝑄 𝑡 &𝐸𝐶𝑆𝑡 0.39 (0.22, 0.54) 0.33 (0.11, 0.52) 0.44 (0.34, 0.54) 0.38 (0.18, 0.55) 𝑞𝐿&𝐸𝐶𝑆𝑡 0.47 (0.29, 0.62) 0.46 (0.33, 0.58) 0.24 (0.04, 0.42) 0.23 (0.01, 0.43) Table 4.10: Pairwise correlation estimates and 95% confidence interval in parentheses of the measured phenotypes from all genetic combinations of Marker 1 and Marker 2 from the cowpea dataset under 𝐿𝑜𝑤 temperature. 𝐴𝐴𝐴𝐴 𝐴𝐴𝐵𝐵 𝐵𝐵𝐴𝐴 𝐵𝐵𝐵𝐵 𝜙 𝐼 𝐼 &𝑁 𝑃𝑄 𝑡 −0.77 (−0.84, −0.68) −0.75 (−0.81, −0.67) −0.85 (−0.90, −0.78) −0.83 (−0.88, −0.77) 𝜙 𝐼 𝐼 &𝑞𝐿 0.64 (0.53, 0.73) 0.54 (0.40, 0.65) 0.72 (0.63, 0.79) 0.63 (0.51, 0.73) 𝜙 𝐼 𝐼 &𝐸𝐶𝑆𝑡 0.16 (−0.04, 0.36) 0.34 (0.18, 0.49) −0.16 (−0.31, 0.03) 0.03 (−0.16, 0.21) 𝑁 𝑃𝑄 𝑡 &𝑞𝐿 −0.04 (−0.25, 0.18) 0.03 (−0.12, 0.18) −0.30 (−0.42, −0.16) −0.19 (−0.38, −0.02) 𝑁 𝑃𝑄 𝑡 &𝐸𝐶𝑆𝑡 −0.19 (−0.30, −0.09) −0.18 (−0.31, −0.05) −0.05 (−0.24, 0.15) −0.03 (−0.25, 0.19) 𝑞𝐿&𝐸𝐶𝑆𝑡 0.26 (0.04, 0.46) 0.26 (0.04, 0.45) 0.01 (−0.15, 0.16) 0.01 (−0.23, 0.23) • Using the results presented in Tables 4.6 and 4.7, we estimated correlations among the different pairs of phenotypes and their associated 95% confidence intervals for all possible combinations of genetic variations and environmental conditions (see Table 4.9 and 4.10). This resulted in eight possible combinations, revealing biologically relevant patterns among the phenotypes. For the row corresponding to 𝑁 𝑃𝑄𝑡&𝐸𝐶𝑆𝑡, a positive association was found under control temperature (CT), whereas a negative association was predominant under low temperature (LT). Under the control temperature, genetic variations in chromosomes 4 and 9 modulated photochemistry mainly through the 𝑞𝐸 mechanism, while under the low temperature, they modulated photochemistry predominantly through the 𝑞𝐼 mechanism. Also, under LT, we have found that the combinations AAAA and AABB produced negative correlations between 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡, while the combinations BBAA and BBBB resulted in uncorrelated 𝑁 𝑃𝑄𝑡 and 𝐸𝐶𝑆𝑡. This suggests that the genetic variations of chromosome 4 are 92 more likely to modulate the 𝑞𝐼 mechanism. To illustrate further, we looked into the estimated correlations among 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 for the different groups under the low temperature. For the AAAA and AABB combinations, 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 were uncorrelated, but for the BBAA and BBBB combinations, 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 were negatively correlated. This can be explained as follows. If the downstream processes are blocked, electrons get accumulated in 𝑄 𝐴 , ensuring 𝑄 𝐴 to be more reduced (𝑞𝐿 goes down), which increases 𝑞𝐼. As 𝑞𝐼 builds up, the slope gradient between 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 gradually increases to a point where the negative associations between 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 break down. For example, under low temperatures, within AAAA and AABB, we have found no association between 𝑁 𝑃𝑄𝑡 and 𝑞𝐿. 4.7 Discussion Analyzing high-dimensional voluminous datasets generated by high-throughput phenotyping and genome sequencing is of paramount interest for adaptive plant breeding. However, as the process involves complex interactions among multiple traits, genotypes, and environmental variables, suit- able statistical models and efficient computational techniques are required to identify appropriate mechanisms. Therefore, we have developed the CMPLE workflow (overall workflow presented in Figure B.1) to bridge the gap in the phenotype-genotype-environment association studies by exploiting the correlation structure among phenotypes based on genetic and environmental vari- ables. This is an important step toward solving different applications arising from the integration of multi-omics datasets. Standard quantitative genomics experiments aim to determine what genetic variations contribute to individual phenotypes. On the other hand, interactions among various phenotypes signify pleiotropy, i.e., markers having multi-trait effects. Our method, CMPLE, is possibly the first tool in the quantitative genetics literature that explains pleiotropy by incorporating the pairwise correlations of multiple traits. The proposed methodology helps recover pertinent information regarding different regulatory pathways associated with genetic variations. With our experimental data on photosynthesis, we have explained a possible hypothesis that 93 genetic variations alone are not responsible for photodamage. Instead, they condition the photo- synthetic system to respond differently, favoring photoprotection or photodamage. With the given population, we identified a trade-off in photosynthesis machinery between these two processes being regulated through combinations of genetic and environmental predictors. We sparse out the genetic marker effects of specific SNPs in chromosomes 4 and 9, which under the control temperature favor photoprotective mechanisms, whereas, under low temperature, they are more consistent with modulation of 𝑞𝐼. Also, under low-temperature conditions, where the photosynthetic machinery favors the 𝑞𝐼 mechanism, we have identified the correlations between 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 to be changing with specific genetic configurations, e.g., uncorrelated 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 (for combinations AAAA and AABB) and negatively correlated 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 (for combinations BBAA and BBBB). This provides evidence for a subsequent hypothesis that the gradient between 𝑁 𝑃𝑄𝑡 and 𝑞𝐿 can further modulate the 𝑞𝐼 mechanism under low temperatures. This can be interpreted as one example of re- flecting epistasis, where multiple genetic components interact in complex ways to further modulate regulatory pathways inside a system. 94 CHAPTER 5 IMPACT OF THIS DISSERTATION In this thesis, we have proposed novel statistical machine learning methodology and computationally efficient algorithms, tools, and techniques to detect genetic markers for multivariate phenotypes and estimate the network structure from high-dimensional genomics datasets in multidisciplinary research. Through the analysis of "massive" datasets consisting of multiple phenotypes and many genetic markers we were to reveal new insights into how genetic diversity may have tuned biological processes to enhance fitness under diverse conditions. The thesis tackled numerous applications from the perspective of plant physiology and genetics as a whole. In a nutshell, we have explored model-based clustering tools to identify environmental conditions affecting different phenotypes and assessed their interactions to reveal a new limiting behavior that plants adapt to in the real world. We provided a comparison of different statistical tools for genome-enabled analysis. Next, we implemented Bayesian latent factor analysis to discover and test possible mechanistic bases of such variations by assessing cosegregation (or lack thereof) between genetic diversity and multiple traits. We found that these latent factors under appropriate conditions represent the physical modes of interactions among phenotypes, which led to the identification of quantitative trait loci (QTLs), i.e., genetic polymorphisms altering the co-regulatory network among phenotypes. A significant conclusion from our work is that standard QTL mapping on individual traits fails to address the associations between multivariate phenotypes. One should model the interactions/ correlations among phenotypes through genetic markers to affirm meaningful biological mechanisms. To this end, we proposed to model the correlations among multiple complex phenotypes as a function of genetic and environmental explanatory variables (weighted graph estimation through correlation regression model). We have developed the “state-of-the-art” estimation methodology called Correlation Modeling under Pairwise Likelihood Estimation (CMPLE), aided by a novel Minorize-Maximize (MM) algorithm, and provided a technique for statistical inference. 95 In-plant breeding, a key aspect is to evaluate the genetic merit of candidate markers for artificial selection and predict the expected yield for phototrophs. Using CMPLE, we can provide genome- enabled predictions for correlation between multiple traits. In practice, plant breeders can use our tool to screen plants to detect the participation of distinct response mechanisms in different species, under diverse environments, at different development stages. Further, it can guide the breeding of varieties with improved responses to other environmental conditions, most notably for application to climate-resilient agriculture. We want to stress that mean regression modeling/analysis is not a substitute for correlation modeling/analysis. These two provide different aspects of association in the phenotype-genotype space. Therefore, irrespective of whether we work with the residual responses (residual obtained after regressing the phenotypes on the candidate genes) or the mean zero responses (obtained after subtracting the crude means from respective phenotypes), the results of the correlation analysis remain somewhat unchanged. This work represents a significant advance in modeling pairwise cor- relation and standard deviation in terms of predictor variables. The modeling is also accompanied by a novel estimation technique that boosts the optimization problem involving many parameters. Besides methodology development, we have shown that joint inference of standard deviations and correlations among phenotypes can be used to test co-segregation of genetically-resolved associ- ation between different traits and improves the precision of phenotype network structure (Figure: B.2 and B.3) The approach can be extended for different applications. The focus of this paper was purely on the modeling of the correlation and standard deviation. This can be relaxed by modeling both the mean and variance-covariance and developing problem-specific MM algorithms and minorizing functions. The current proof-of-concept approach was developed for a moderate number of predictors. Generally, a regularized estimation is recommended for many predictors, and creating a statistical method for a regularized analysis of the correlation will be an exciting topic of future research. Another possible way for extending our work is through the simultaneous selection (of genetic predictors) and estimation of pairwise correlation in the context of high-dimensional 96 datasets. In a nutshell, this dissertation has argued uniqueness of photosynthetic mechanisms under abiotic stress (heat and cold temperatures). Next, we have demonstrated new genetic controls by incorporating the interactions between biological traits. Finally, we have offered novel statistical methodology and computationally efficient algorithm: CMPLE for Multi-omics platforms. All of these taken together can be used for creating climate adaptive plants for the betterment of mankind. This work was supported by the DOE Office of Science, Basic Energy Sciences under Awards DEFG02-91ER20021 and DE-SC0007101 and NSF-DMS 1945824 and 1924724. 97 APPENDICES 98 APPENDIX A SUPPLEMENT FOR PHENOME-BY-GENOME-BY-ENVIRONMENT INTERACTIONS AND THE SCOPE OF DATA SCIENCE A.1 Clustering on the Light-potential experiment The experiment, examining light-induced changes in chlorophyll fluorescence and absorbance changes at ambient photosynthetically- active radiation (PAR), following 10s of PAR equivalent to full sunlight, and following 10s of darkness, yields an estimate of the rapid Light Potentials of linear electron flow (LEF), nonphotochemical quenching (NPQ) and related processes. (Figure: A.1) Figure A.1: Light and temperature effects on LEF and photosystem II quantum efficiency (𝜙 𝐼 𝐼 ). Each parameter was plotted as a function of the square root of the ambient photosynthetically active radiation (PARamb, X-axis) and leaf temperature (Tleaf, coloration of points). (a) Dependencies of LEF measured at PARamb; (b) LEF measured at 10 s high light (𝐿𝐸 𝐹ℎ𝑖𝑔ℎ ); (c) the high light- induced differences in LEF (𝐿𝐸 𝐹ℎ𝑖𝑔ℎ−𝑎𝑚𝑏 ); (d) the PSII quantum efficiencies measured under ambient PAR (𝑃ℎ𝑖2𝑎𝑚𝑏 , points coloured by Tleaf) and at 10 s high light (𝑃ℎ𝑖2ℎ𝑖𝑔ℎ , grey points). As shown in Figure A.2, GMM analysis of LEFamb, PARamb, and Tleaf, found six distinct, compact clusters that differed in the mode of interaction among the photosynthetic and environmen- 99 tal parameters. Encompassing points with lower PARamb showed moderate (Cluster 5) to strong (Clusters 1,2, and 4) dependence of LEFamb on PARamb, with little contributions from Tleaf. Figure A.2: Gaussian Mixture Model (GMM) clustering of LEFamb (Panel A) and correlation matrixes between LEFamb, PARamb and leaf temperature (Tleaf) for each cluster (Panel B). By contrast, two clusters (3 and 6), which included points at higher PARamb, showed sub- stantial dependencies on both PARamb and Tleaf. These results are consistent with LEF being predominantly light-limited at low ambient PAR but increasingly limited by temperature-dependent processes at higher PAR. These two cluster classes indicate that PARamb and Tleaf are likely to af- fect LEFamb in independent ways. The fact that the shapes of the clusters were not determined with individual slicing under the individual parameters for PARamb and Tleaf but with a co-dependence 100 on both PARamb and Tleaf suggests that, under some conditions, these effects interact, e.g., Tleaf may affect the dependence of LEFamb on PARamb. GMM identified five distinct clusters for interactions among LEFhigh, PARamb, and Tleaf (Figure A.3). In contrast to the results on LEFamb, sets at lower PARamb (1, 2, and 4) showed LEFhigh dependencies on both Tleaf and PARamb, while Cluster 3 showed correlations with Tleaf but not with PARamb. The stronger dependence on Tleaf of LEFhigh compared to LEFamb implies that the exposure to high light revealed additional rate limitations in LEFhigh that were more strongly controlled by both Tleaf and PARamb and that, at least under some conditions, these effects were independent of each other. Figure A.3: Gaussian Mixture Model (GMM) clustering of LEFhigh (Panel A) and correlation matrixes between LEFhigh, PARamb and leaf temperature (Tleaf) for each cluster (Panel B). 101 APPENDIX B SUPPLEMENT FOR CMPLE TO DECODE PHOTOSYNTHESIS USING THE MINORIZE-MAXIMIZE ALGORITHM B.1 CMPLE workflow Phenotypes, Gene Markers, Environment Conditions 𝑆𝑡𝑑 𝑃ℎ𝑒𝑛𝑜 = 𝑓 (𝐺𝑒𝑛𝑒, 𝐸𝑛𝑣) 𝐶𝑜𝑟𝑟 𝑃ℎ𝑒𝑛𝑜 = 𝑓 (𝐺𝑒𝑛𝑒, 𝐸𝑛𝑣) Pairwise Composite Likelihood Optimize using MM algorithm Maximum composite- likelihood estimate Statistical Inference Phenotype Network Figure B.1: Correlation Modeling Under Pairwise Likelihood Estimation (CMPLE) workflow 102 B.2 Additional Simulation We have also performed an additional simulation with 𝑛= 1000, 𝑝= 10, and 𝑞= 4. The total number of parameters estimated here is 110. Table B.1: Simulation results for 𝛼 parameters under scenario 4 with 𝑛 = 1000, 𝑝 = 10, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, DOP: direct optimization, MM: minorize-maximize DOP MM DOP MM Par True Bias SD Bias SD Par True Bias SD Bias SD 𝛼1,0 −1.0 −0.2 9.2 −0.3 9.2 𝛼3,0 0.0 −0.3 9.8 −0.3 9.8 𝛼1,1 1.0 0.6 6.0 0.6 6.0 𝛼3,1 0.3 0.8 6.2 0.8 6.2 𝛼1,2 0.2 0.6 6.5 0.7 6.5 𝛼3,2 0.1 −0.2 7.3 −0.2 7.3 𝛼1,3 −0.4 0.3 6.5 0.3 6.5 𝛼3,3 0.0 0.4 6.3 0.4 6.3 𝛼1,4 0.3 −0.5 7.2 −0.5 7.2 𝛼3,4 −0.2 −0.2 6.4 −0.1 6.4 𝛼1,5 0.0 0.0 6.6 0.0 6.6 𝛼3,5 0.1 0.2 5.8 0.2 5.8 𝛼1,6 −0.5 0.5 5.4 0.6 5.4 𝛼3,6 −0.2 −0.6 6.6 −0.6 6.6 𝛼1,7 −0.3 0.5 7.0 0.5 6.9 𝛼3,7 1.0 0.3 6.7 0.3 6.7 𝛼1,8 0.1 −0.3 6.3 −0.2 6.3 𝛼3,8 −1.0 0.4 5.5 0.4 5.5 𝛼1,9 −0.4 0.6 6.5 0.6 6.5 𝛼3,9 −0.1 1.0 6.1 1.0 6.1 𝛼1,10 1.0 0.3 6.7 0.3 6.7 𝛼3,10 0.2 0.1 6.9 0.2 6.9 𝛼2,0 2.0 1.6 9.1 1.6 9.1 𝛼4,0 −1.0 0.0 9.1 0.0 9.1 𝛼2,1 0.2 −0.7 5.4 −0.7 5.4 𝛼4,1 −0.5 1.3 5.7 1.2 5.7 𝛼2,2 −0.5 0.0 7.4 0.1 7.4 𝛼4,2 0.0 0.3 6.0 0.3 5.9 𝛼2,3 −0.4 −0.3 6.3 −0.3 6.3 𝛼4,3 0.3 1.2 6.0 1.2 6.0 𝛼2,4 0.0 −0.9 5.9 −0.9 5.9 𝛼4,4 −0.2 0.1 7.4 0.1 7.4 𝛼2,5 0.3 −0.4 5.9 −0.4 5.9 𝛼4,5 −0.2 0.2 6.4 0.2 6.4 𝛼2,6 0.0 0.0 6.1 0.0 6.1 𝛼4,6 0.2 0.0 5.7 0.0 5.7 𝛼2,7 0.3 0.4 5.8 0.4 5.8 𝛼4,7 0.0 −0.2 6.8 −0.1 6.8 𝛼2,8 0.2 0.6 6.1 0.6 6.1 𝛼4,8 0.0 0.8 5.9 0.8 5.8 𝛼2,9 −0.4 −0.2 6.3 −0.2 6.3 𝛼4,9 0.6 −0.7 6.2 −0.7 6.3 𝛼2,10 0.0 0.2 5.5 0.2 5.6 𝛼4,10 −0.5 −0.5 6.6 −0.5 6.6 103 Table B.2: Simulation results for 𝛿 parameters under scenario 4 with 𝑛 = 1000, 𝑝 = 10, 𝑞 = 4. All entries except for the true parameter values of the table are multiplied by 100. Par: Parameter, SD: standard deviation, DOP: direct optimization, MM: minorize-maximize DOP MM DOP MM Par True Bias SD Bias SD Par True Bias SD Bias SD 𝛿1,2,0 0.2 0.1 30.3 −0.2 29.9 𝛿2,3,0 0.0 0.8 30.6 0.6 30.5 𝛿1,2,1 0.5 0.4 19.0 0.5 19.0 𝛿2,3,1 0.0 −4.6 19.3 −4.5 19.4 𝛿1,2,2 0.0 2.2 18.8 2.3 18.7 𝛿2,3,2 −0.2 −0.1 19.3 −0.1 19.3 𝛿1,2,3 −0.7 −2.3 18.2 −2.2 18.2 𝛿2,3,3 0.0 −0.2 19.4 −0.2 19.5 𝛿1,2,4 0.3 1.1 17.5 1.2 17.5 𝛿2,3,4 0.0 −0.1 18.8 −0.1 18.9 𝛿1,2,5 0.0 −3.4 17.3 −3.4 17.5 𝛿2,3,5 0.0 −3.6 17.7 −3.6 17.7 𝛿1,2,6 −0.8 0.3 20.8 0.3 20.9 𝛿2,3,6 0.0 5.3 19.9 5.4 19.9 𝛿1,2,7 0.1 5.2 19.7 5.2 19.7 𝛿2,3,7 −0.3 2.8 19.3 2.6 19.5 𝛿1,2,8 0.0 −0.4 15.5 −0.4 15.5 𝛿2,3,8 0.2 −3.8 16.3 −3.6 16.4 𝛿1,2,9 0.1 −1.5 19.9 −1.5 20.0 𝛿2,3,9 0.0 0.1 20.3 0.2 20.3 𝛿1,2,10 0.2 −1.1 20.7 −0.9 20.7 𝛿2,3,10 0.0 −1.0 20.1 −0.9 20.1 𝛿1,3,0 0.5 1.8 30.3 1.4 29.9 𝛿2,4,0 0.3 0.5 30.6 0.4 30.8 𝛿1,3,1 0.2 −0.5 20.9 −0.4 21.0 𝛿2,4,1 0.0 −1.8 19.5 −1.9 19.6 𝛿1,3,2 0.5 5.1 17.7 5.2 17.7 𝛿2,4,2 1.0 −2.9 19.9 −3.0 19.7 𝛿1,3,3 0.0 −4.5 19.3 −4.3 19.2 𝛿2,4,3 0.4 0.2 18.0 0.3 17.8 𝛿1,3,4 0.2 1.9 21.0 2.1 21.0 𝛿2,4,4 −0.3 −1.5 15.6 −1.6 15.7 𝛿1,3,5 −0.6 −5.9 19.1 −5.8 19.2 𝛿2,4,5 0.5 2.9 17.6 3.0 17.5 𝛿1,3,6 0.0 −3.4 17.8 −3.3 17.7 𝛿2,4,6 0.3 0.5 19.9 0.6 20.1 𝛿1,3,7 −0.3 0.0 19.2 0.1 19.2 𝛿2,4,7 0.1 1.9 18.4 1.9 18.4 𝛿1,3,8 0.2 −3.0 18.8 −2.9 18.8 𝛿2,4,8 −0.6 −2.1 17.6 −1.9 17.5 𝛿1,3,9 0.4 2.1 18.4 2.3 18.2 𝛿2,4,9 0.6 −1.6 18.4 −1.5 18.3 𝛿1,3,10 −0.2 2.0 18.2 2.0 18.2 𝛿2,4,10 −0.1 1.0 17.3 0.8 17.1 𝛿1,4,0 1.0 1.2 30.3 1.1 29.9 𝛿3,4,0 −1.0 2.7 30.4 3.2 30.0 𝛿1,4,1 0.2 0.0 18.9 −0.1 18.9 𝛿3,4,1 0.2 −4.9 19.4 −5.1 19.1 𝛿1,4,2 −0.5 4.5 21.1 4.3 21.0 𝛿3,4,2 −0.2 0.1 19.4 −0.2 19.5 𝛿1,4,3 −0.2 −0.4 20.2 −0.3 20.0 𝛿3,4,3 −0.1 0.0 18.2 −0.1 18.2 𝛿1,4,4 0.5 3.1 16.5 3.3 16.7 𝛿3,4,4 0.2 −3.1 18.0 −3.2 18.0 𝛿1,4,5 0.1 −0.5 19.0 −0.5 19.2 𝛿3,4,5 0.6 1.1 19.4 1.0 19.4 𝛿1,4,6 −0.1 −0.4 20.8 −0.4 21.1 𝛿3,4,6 −0.1 0.1 21.6 0.1 21.5 𝛿1,4,7 −0.1 −1.2 20.5 −1.1 20.5 𝛿3,4,7 0.0 1.3 19.9 0.9 19.4 𝛿1,4,8 0.0 1.5 17.9 1.5 17.9 𝛿3,4,8 0.2 −1.0 20.3 −0.8 20.0 𝛿1,4,9 −0.1 −3.5 18.6 −3.5 18.6 𝛿3,4,9 0.3 −0.9 20.3 −0.9 20.1 𝛿1,4,10 −0.2 0.3 19.1 0.4 19.1 𝛿3,4,10 −0.2 1.6 19.0 1.4 18.8 104 B.3 CMPLE application on Heat Stress treatments Using CMPLE on the DHS dataset based on the two selected SNP markers in chromosome 2 we have found distinguishable correlation pattern among the selected set of phenotypes. Figure B.2: Correlation network under DHS Again with the genetic configuration of BBAA for the two selected SNPs under DHS and LHS, we can identify distinct phenotypic network under the various condions specified. Figure B.3: Correlation network under DHS and LHS 105 BIBLIOGRAPHY 106 BIBLIOGRAPHY [1] R. E. Blankenship, Molecular Mechanisms of Photosynthesis. John Wiley & Sons, 2021. [2] D. M. Kramer, G. Johnson, O. Kiirats, and G. E. Edwards, “New fluorescence parameters for the determination of qa redox state and excitation energy fluxes,” Photosynthesis research, vol. 79, no. 2, pp. 209–218, 2004. [3] T. J. Avenson, J. A. Cruz, A. Kanazawa, and D. M. Kramer, “Regulating the proton budget of higher plant photosynthesis,” Proc. Natl. Acad. Sci. USA, vol. 102, pp. 9709–9713, Jul 2005. [4] A. Kanazawa, A. Chattopadhyay, S. Kuhlgert, H. Tuitupou, T. Maiti, and D. M. Kramer, “Light potentials of photosynthetic energy storage in the field: what limits the ability to use or dissipate rapidly increased light energy?,” R. Soc. Open Sci., vol. 8, p. 211102, 2021. [5] N. Keren and A. Krieger-Liszkay, “Photoinhibition: molecular mechanisms and physiological significance,” Physiol Plant., vol. 142, no. 1, pp. 1–5, 2011. [6] E. Tyystjärvi, “Photoinhibition of photosystem II,” Int Rev Cell Mol Biol., vol. 300, pp. 243– 303, 2013. [7] B. Demmig-Adams, “Carotenoids and photoprotection in plants: a role for the xanthophyll zeaxanthin,” Biochim Biophys Acta., vol. 1020, no. 1, pp. 1–24, 1990. [8] K. K. Niyogi, O. Björkman, and A. R. Grossman, “The roles of specific xanthophylls in photoprotection,” Proc. Natl. Acad. Sci. USA, vol. 94, no. 25, pp. 14162–14167, 1997. [9] S. Tietz, C. C. Hall, J. A. Cruz, and D. M. Kramer, “NPQ (⊤) : a chlorophyll fluorescence parameter for rapid estimation and imaging of non-photochemical quenching of excitons in photosystem-ii-associated antenna complexes,” Plant Cell Environ., vol. 40, pp. 1243–1255, 2017. [10] N. R. Baker, J. Harbinson, and D. M. Kramer, “Determining the limitations and regulation of photosynthetic energy transduction in leaves,” Plant, Cell & Environment, vol. 30, no. 9, pp. 1107–1125, 2007. [11] D. Hoh, I. Osei-Bonsu, A. Chattopadhyay, et al., “Genetic variation in photosynthetic re- sponses to chilling modulates proton motive force, cyclic electron flow and photosystem ii photoinhibition.” https://doi.org/10.22541/au.163422290.08126533/v1, October 14 2021. Authorea, October 14, 2021. [12] J. A. Raven, “The cost of photoinhibition,” Physiol Plant., vol. 142, no. 1, pp. 87–104, 2011. [13] D. M. Kramer, J. A. Cruz, and A. Kanazawa, “Balancing the central roles of the thylakoid proton gradient,” Trends Plant Sci., vol. 8, no. 1, pp. 27–32, 2003. [14] T. H. Meuwissen, B. J. Hayes, and M. Goddard, “Prediction of total genetic value using genome-wide dense marker maps,” genetics, vol. 157, no. 4, pp. 1819–1829, 2001. 107 [15] J. Crossa, G. d. l. Campos, P. Pérez, D. Gianola, J. Burgueno, J. L. Araus, D. Makumbi, R. P. Singh, S. Dreisigacker, J. Yan, et al., “Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers,” Genetics, vol. 186, no. 2, pp. 713–724, 2010. [16] P. VanRaden, C. Van Tassell, G. Wiggans, T. Sonstegard, R. Schnabel, J. Taylor, and F. Schenkel, “Invited review: Reliability of genomic predictions for north american holstein bulls,” Journal of dairy science, vol. 92, no. 1, pp. 16–24, 2009. [17] P. M. Visscher, J. Yang, and M. E. Goddard, “A commentary on ‘common snps explain a large proportion of the heritability for human height’by yang et al.(2010),” Twin Research and Human Genetics, vol. 13, no. 6, pp. 517–524, 2010. [18] G. de Los Campos, J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. Calus, “Whole- genome regression and prediction methods applied to plant and animal breeding,” Genetics, vol. 193, no. 2, pp. 327–345, 2013. [19] R. Bellman, “On the approximation of curves by line segments using dynamic programming,” Communications of the ACM, vol. 4, no. 6, p. 284, 1961. [20] J. Fan and J. Lv, “Sure independence screening for ultrahigh dimensional feature space (with discussion),” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 70, no. 5, pp. 849–911, 2008. [21] G. Wu, Y. Xie, H. Chen, M. Zhong, R. Liu, B. Shi, Q. Li, X. Wang, T. Wu, Y. Yan, et al., “Superconductivity at 56 k in samarium-doped srfeasf,” Journal of Physics: Condensed Matter, vol. 21, no. 14, p. 142203, 2009. [22] I. Lemhadri, F. Ruan, and R. Tibshirani, “Lassonet: Neural networks with feature sparsity,” in International Conference on Artificial Intelligence and Statistics, pp. 10–18, PMLR, 2021. [23] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970. [24] T. Park and G. Casella, “The bayesian lasso,” Journal of the American Statistical Association, vol. 103, no. 482, pp. 681–686, 2008. [25] D. Habier, R. L. Fernando, K. Kizilkaya, and D. J. Garrick, “Extension of the bayesian alphabet for genomic selection,” BMC bioinformatics, vol. 12, no. 1, pp. 1–12, 2011. [26] D. Gianola, S. Van Petegem, M. Legros, S. Brandstetter, H. Van Swygenhoven, and K. Hemker, “Stress-assisted discontinuous grain growth and its effect on the deformation behavior of nanocrystalline aluminum thin films,” Acta Materialia, vol. 54, no. 8, pp. 2253–2263, 2006. [27] L. Janss, G. de Los Campos, N. Sheehan, and D. Sorensen, “Inferences from genomic models in stratified populations,” Genetics, vol. 192, no. 2, pp. 693–704, 2012. [28] P. Pérez and G. de Los Campos, “Genome-wide regression and prediction with the bglr statistical package,” Genetics, vol. 198, no. 2, pp. 483–495, 2014. 108 [29] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factoriza- tion,” Nature, vol. 401, no. 6755, pp. 788–791, 1999. [30] J. Vendrow, J. Haddock, E. Rebrova, and D. Needell, “On a guided nonnegative matrix factorization,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3265–32369, IEEE, 2021. [31] H. H. Harman, Modern factor analysis. University of Chicago press, 1976. [32] R. J. Rummel, “Understanding factor analysis,” Journal of conflict resolution, vol. 11, no. 4, pp. 444–480, 1967. [33] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,” Neural computation, vol. 16, no. 12, pp. 2639– 2664, 2004. [34] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” in International conference on machine learning, pp. 1247–1255, PMLR, 2013. [35] K. J. Han, S. Kim, and S. S. Narayanan, “Strategies to improve the robustness of agglomerative hierarchical clustering under data source variation for speaker diarization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1590–1601, 2008. [36] V. Y. Kiselev, K. Kirschner, M. T. Schaub, T. Andrews, A. Yiu, T. Chandra, K. N. Natarajan, W. Reik, M. Barahona, A. R. Green, et al., “Sc3: consensus clustering of single-cell rna-seq data,” Nature methods, vol. 14, no. 5, pp. 483–486, 2017. [37] R. Shen, Q. Mo, N. Schultz, V. E. Seshan, A. B. Olshen, J. Huse, M. Ladanyi, and C. Sander, “Integrative subtype discovery in glioblastoma using icluster,” PloS one, vol. 7, no. 4, p. e35236, 2012. [38] F. R. Bach and M. I. Jordan, “Learning spectral clustering, with application to speech separa- tion,” The Journal of Machine Learning Research, vol. 7, pp. 1963–2001, 2006. [39] S. Virtanen, A. Klami, and S. Kaski, “Bayesian cca via group sparsity,” in ICML, 2011. [40] A. Klami, G. Bouchard, and A. Tripathi, “Group-sparse embeddings in collective matrix factorization,” arXiv preprint arXiv:1312.5921, 2013. [41] S. Virtanen, A. Klami, S. Khan, and S. Kaski, “Bayesian group factor analysis,” in Artificial Intelligence and Statistics, pp. 1269–1277, PMLR, 2012. [42] A. Klami, “Polya-gamma augmentations for factor models,” in Asian Conference on Machine Learning, pp. 112–128, PMLR, 2015. [43] X. Zhuang, Z. Yang, K. R. Sreenivasan, V. R. Mishra, T. Curran, R. Nandy, and D. Cordes, “Multivariate group-level analysis for task fmri data with canonical correlation analysis,” NeuroImage, vol. 194, pp. 25–41, 2019. 109 [44] C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning, vol. 4. Springer, 2006. [45] G. R. Cramer, K. Urano, S. Delrot, M. Pezzotti, and K. Shinozaki, “Effects of abiotic stress on plants: a systems biology perspective,” BMC Plant Biol, vol. 11, no. 1, pp. 1–14, 2011. [46] J. A. Cruz, L. J. Savage, R. Zegarac, C. C. Hall, M. Satoh-Cruz, G. A. Davis, W. K. Kovac, J. Chen, and D. M. Kramer, “Dynamic environmental photosynthetic imaging reveals emergent phenotypes,” Cell Syst., vol. 2, no. 6, pp. 365–377, 2016. [47] S. Kuhlgert, G. Austic, R. Zegarac, I. Osei-Bonsu, D. Hoh, M. I. Chilvers, M. G. Roth, K. Bi, D. TerAvest, P. Weebadde, and D. M. Kramer, “MultispeQ Beta: a tool for large-scale plant phenotyping connected to the open PhotosynQ network,” R. Soc. Open Sci., vol. 3, p. 160592, 2016. [48] M. Ritchie, E. Holzinger, R. Li, S. Pendergrass, and D. Kim, “Methods of integrating data to uncover genotype-phenotype interactions,” Nat Rev Genet, vol. 16, pp. 85–97, 02 2015. [49] D. Gianola and R. L. Fernando, “A multiple-trait bayesian lasso for genome-enabled analysis and prediction of complex traits,” Genetics, vol. 214, no. 2, pp. 305–331, 2020. [50] Y. Jia and J.-L. Jannink, “Multiple-trait genomic selection methods increase genetic value prediction accuracy,” Genetics, vol. 192, no. 4, pp. 1513–1522, 2012. [51] T. E. Galesloot, K. Van Steen, L. A. Kiemeney, L. L. Janss, and S. H. Vermeulen, “A comparison of multivariate genome-wide association methods,” PLoS ONE, vol. 9, no. 4, p. e95923, 2014. [52] E. Schadt, J. Lamb, X. Yang, and et al., “An integrative genomics approach to infer causal associations between gene expression and disease,” Nat Genet, vol. 37, pp. 710––717, 2005. [53] D. C. Kulp and M. Jagalur, “Causal inference of regulator-target pairs by gene mapping of expression phenotypes,” BMC Genomics, vol. 7, p. 125, May 2006. [54] F. W. Stearns, “One hundred years of pleiotropy: A retrospective,” Genetics, vol. 186, no. 3, pp. 767–773, 2010. [55] E. D. Schifano, L. Li, D. C. Christiani, and X. Lin, “Genome-wide association analysis for multiple continuous secondary phenotypes,” Am J Hum Genet., vol. 92, no. 5, pp. 744–759, 2013. [56] M. Pourahmadi, “Joint mean-covariance models with applications to longitudinal data: Un- constrained parameterisation,” Biometrika, vol. 86, no. 3, pp. 677–690, 1999. [57] P. D. Hoff and X. Niu, “A covariance regression model,” Stat Sin., vol. 22, no. 2, pp. 729–753, 2012. [58] T. Zou, W. Lan, H. Wang, and C.-L. Tsai, “Covariance regression analysis,” J Am Stat Assoc., vol. 112, pp. 266–281, 2017. 110 [59] K. Meyer and M. Kirkpatrick, “Better estimates of genetic covariance matrices by “bending” using penalized maximum likelihood,” Genetics, vol. 185, no. 3, p. 1097, 2010. [60] S. Lele and M. L. Taper, “A composite likelihood approach to (co)variance components estimation,” J Stat Plan Inference, vol. 103, no. 1, pp. 117–135, 2002. [61] B. Gao, C. Yang, J. Liu, and X. Zhou, “Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies,” PLoS Genet, vol. 17, pp. 1–25, 01 2021. [62] Y. Bai, J. Kang, and P. X. Song, “Efficient pairwise composite likelihood estimation for spatial-clustered data,” Biometrics, vol. 70, no. 3, pp. 661–670, 2014. [63] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,” Am Stat., vol. 58, pp. 30–37, 2004. [64] X. Huang, J. Xu, and G. Tian, “On profile MM algorithms for Gamma frailty survival models,” Stat Sin., vol. 29, pp. 895–916, 2019. [65] H. Zhou, L. Hu, J. Zhou, and K. Lange, “MM algorithms for variance components models,” J Comput Graph Stat., vol. 28, pp. 350–361, 2019. [66] D. R. Hunter and L. Runze, “Variable selection using MM algorithms,” Ann Stat., vol. 33, pp. 1617–1642, 2005. [67] T. Leeper, “Interpreting regression results using average marginal effects with r’s margins.” https://cran.r-project.org/web/packages/margins/vignettes/TechnicalDetails.pdf, 2021. [68] A. Hugues, “A perspective on interaction effects in genetic association studies,” Genet Epi- demiol., vol. 40, pp. 678–688, 2016. [69] W. H. Greene, Econometric Analysis. New York, NY: Pearson, 1997. [70] C. Varin, N. Reid, and D. Firth, “An overview of composite likelihood methods,” Stat Sin., vol. 21, pp. 5–42, 2011. [71] W. Muchero, N. N. Diop, P. R. Bhat, R. D. Fenton, S. Wanamaker, M. Pottorff, S. Hearne, N. Cisse, C. Fatokun, J. D. Ehlers, P. A. Roberts, and T. J. Close, “A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs,” Proc. Natl. Acad. Sci. USA, vol. 106, pp. 18159–18164, Oct 2009. [72] K. W. Broman and S. Sen, A Guide to QTL Mapping with R/qtl. New York, NY: Springer, 2009. 111