LEVERAGING LOCAL GENETIC INFORMATION IN HIGH-DIMENSIONAL BAYESIAN REGRESSION : METHODS AND COMPUTATION TOOLS

I present three projects that propose computationally efficient Bayesian methods and applications for analyzing high-dimensional genetic data, focusing on incorporating local SNP information, such as linkage disequilibrium, to elucidate genetic variability across the genome. Genome-wide information offers valuable insights, but its vast scale presents significant statistical, computational, and interpretation challenges. Focusing on local genomic segments can help address these challenges by providing a more refined approach to understanding genetic variation, particularly across different ancestry groups. In Chapter 1, I propose an approach to map the contribution of short chromosome segments to the genetic correlation between traits. While genome-wide genetic correlations between traits offer an overall estimate for comorbid traits, local regions with opposing directional genetic correlations are masked, making it challenging to untangle the strength of the relationship overall. Chapter 1 addresses this limitation by estimating local genetic correlations. Hyperuricemia/gout and chronic kidney disease are comorbid conditions for which the biological roots of the comorbidity remain unknown. Utilizing a novel approach, I disentangled the shared genetic regions contributing to both conditions. The results presented in this chapter validate several previously suggested pleiotropic loci and discovered new ones, with about a third showing genetic correlation estimates opposite to the overall correlation.Chapter 2 focuses on estimating the portability of local polygenic scores in cross-ancestry prediction accuracy. The vast majority of genetic data comes from individuals of European ancestry. As a result, many investigators attempt cross-ancestry prediction, utilizing European data to predict the risk of disease/traits among underrepresented non-European ancestries. In most cases, cross-ancestry prediction remains more accurate than within-ancestry predictions due to limitations imposed by non-European sample sizes, but it is still low. This shortcoming is largely due to differences in allele frequencies and linkage disequilibrium patterns between different ancestry groups, as well as genetic-by-environmental interactions involving environmental exposures that are not independent of ancestry. In this study, I propose a method, MC-ANOVA, to estimate the relative accuracy loss in cross-ancestry prediction across ancestries due to local linkage disequilibrium and allele frequency differences. I implemented the proposed algorithm and developed maps of the relative accuracy of cross-ancestry prediction for four non-European ancestry groups. Furthermore, I developed an interactive R Shiny app that can be used to visualize the results obtained in each portability map. My findings revealed significant variability in the portability of local PGS across genomic regions, reflecting varying degrees of genetic similarity between ancestries across regions. This study highlights the potential for improving cross-ancestry predictions by taking local genetic differences into account.The advent of big data has had a remarkable impact on PGS prediction accuracy. Sample size affects both the power to detect significant associations between SNPs and phenotypes and the accuracy of SNP effects estimates. For homogenous populations, PGS prediction accuracy grows monotonically with sample size. However, when using multi-ancestry data, the relative proportion of each ancestry group can greatly impact prediction accuracy. Therefore, in Chapter 3, using data from individuals of European ancestry from the UK Biobank and African ancestry from All of Us, I investigate how sample size and the relative proportion of each ancestry group influence PGS prediction accuracy. This study sheds light on the relative benefits of increasing within- and across-ancestry sample sizes in cross-ancestry genetic predictions through empirical results, ultimately highlighting the importance of prioritizing the collection of non-European ancestry data.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Lupi, Alexa S.

Thesis Advisors: Vazquez, Ana I.

Committee Members: de los Campos, Gustavo
Reynolds, Richard
Todem, David

Date Published: 2024

Subjects: Biometry

Program of Study: Biostatistics - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: 164 pages

Permalink: https://doi.org/doi:10.25335/9czc-r951

LEVERAGING LOCAL GENETIC INFORMATION IN HIGH-DIMENSIONAL BAYESIAN REGRESSION : METHODS AND COMPUTATION TOOLS

Full text