NEURAL NETWORKS MODELS WITH APPLICATIONS TO GENETIC STUDIES
Artificial intelligence (AI) is a thriving research field with many successful applications in areas such as computer vision and speech recognition. Machine learning methods, such as neural networks (NN), play a central role in modern AI technology. While NN also holds great promise for human genetic research, the high-dimensional genetic data and complex genetic structure bring tremendous challenges. The vast majority of genetic variants on the genome have small or no effects on diseases and fitting NN on a large number of variants without considering the underlying genetic structure (e.g., linkage disequilibrium) could bring a serious overfitting issue. Furthermore, while a single disease phenotype is often studied in a classic genetic study, in emerging research fields (e.g., imaging genetics), researchers need to deal with different types of disease phenotypes. To address these challenges, I propose a functional neural networks (FNN) method which combines functional linear model and neural networks structure. FNN uses a series of basis functions to model genetic effects within gene sequences and phenotype data with spatial-temporal structure and further builds a multi-layer functional neural network to capture the complex relationship between genetic variants and disease phenotypes. Through simulations and real data applications, I demonstrate the advantages of FNN for high-dimensional genetic data analysis in terms of robustness and accuracy. The source code of FNN is available on https://github.com/szhang0629/FNN.As FNN is a new statistical model, its statistical properties (e.g., consistency) are investigated. For FNN, the domain of the focused random variable changed from Euclidean space to functional space. Therefore, the theoretical results for NN do not apply here. The related theorems for FNN are derived in the dissertation. After showing that the FNN model is a universal approximator, I obtain the covering number of a multi-layer FNN model. Finally, the conditions for the consistency property of the FNN model are provided for application.Transfer learning has been widely used in text and image classification and has demonstrated its outstanding performance in applications. Nonetheless, it has been rarely used in genetic research, and its performance on genetic data is still largely unknown. In chapter 4, I use vanilla neural networks to investigate whether the knowledge learned from biobank databases or Caucasian samples can be used to facilitate genetic research in small-scale studies or in minority populations.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Zhang, Shan
- Thesis Advisors
-
Cui, Yuehua
Lu, Qing
- Committee Members
-
Sakhanenko, Lyudmila
Weng, Haolei
- Date Published
-
2021
- Subjects
-
Statistics
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 97 pages
- Permalink
- https://doi.org/doi:10.25335/38yv-6037