Decode phenome-genome interactions : a data science approach
         The responses of plants to their environments are determined by multiple interacting genetic factors that themselves may operate through numerous biological mechanisms. Disentangling these complex genome-by-environment interactions is a significant challenge to understanding the underlying biology and developing more robust crops. This dissertation integrates high throughput phenotyping and genome sequencing and aims to harness these multidimensional interactions to test whether different genetic components affect biological processes through similar or distinct mechanisms. First, we present a comparison of different methods that can be practically used for genome-enabled prediction and selection purposes with the help of synthetic datasets with varying levels of difficulty and variability. Using such tools, we have found multiple traits are modulated by similar genomic regions, termed "co-localization." But, the question remains, how can one test for co-segregation, or co-linkages, of multiple phenotypes to specific genetic polymorphisms? From domain knowledge, we can argue that there exist various physical modes of interactions among photosynthetic processes, which result in distinct patterns of interactions between measured parameters. We propose a Bayesian latent variable (LV) approach that tries to imitate such physical modes of interaction among photosynthetic processes by projecting the multivariate phenotypes into lower-dimensional latent factors. Estimation of the entries of the loading matrix (the connection between multidimensional phenotypes to LVs) is through the Automatic Relevance Determination (ARD) prior, which can automatically remove the irrelevant latent factors and add immediate interpretability. This means for a single genotype, the observed latent factors will likely reflect the effects of environmental or developmental effects on mechanistic interconnections. Also, these low-dimensional structure/ latent factors can be genetically mapped using quantitative trait loci (QTL) mapping and can be validated with the linkages from colocalized traits obtained from univariate QTL analysis. The added advantage of our approach is we can describe specific classes of relationships among multiple phenotypes governed by specific genetic regions that can be shared or specific to environments which can be further used to distinguish functional and genetic linkages among a range of photosynthetic regulatory processes. We extended our setup to integrate multiple environments and showed that the latent variables, either specific to one treatment or shared by various treatments, can be mapped to distinct genetic loci, revealing specific genetic polymorphisms altering the co-regulatory network among phenotypes in Genotype x Phenotype x Environmental space. The final piece of my work is to model the association/correlation between phenotypes as a function of genetic and environmental explanatory variables to pin down distinct mechanisms. We develop an efficient estimation methodology called Correlation Modeling under Pairwise Likelihood Estimation (CMPLE), aided by a novel Minorize-Maximize (MM) algorithm, and provide statistical inference techniques. Simulation studies mimicking biological data show that the method is beneficial for recovering pertinent information, including different regulatory pathways, and is computationally efficient in handling many parameters. Our approach is also illustrated by analyzing a motivating dataset from recombinant inbred cowpea lines. Using CMPLE, we can identify the specific genetic variations affecting distinct biological mechanisms, namely "Photoprotection" and "Photoinhibition," under various environmental conditions.
    
    Read
- In Collections
- 
    Electronic Theses & Dissertations
                    
 
- Copyright Status
- In Copyright
- Material Type
- 
    Theses
                    
 
- Authors
- 
    Chattopadhyay, Abhijnan
                    
 
- Thesis Advisors
- 
    Maiti, Tapabrata
                    
 
- Committee Members
- 
    Kramer, David Mark
                    
 Bhattacharya, Shrijita
 Sung, Chih-Li
 
- Date Published
- 
    2022
                    
 
- Program of Study
- 
    Statistics - Doctor of Philosophy
                    
 
- Degree Level
- 
    Doctoral
                    
 
- Language
- 
    English
                    
 
- Pages
- xii, 111 pages
- ISBN
- 
    9798845409478
                    
 
- Permalink
- https://doi.org/doi:10.25335/vt8w-b646