DEEP LEARNING REGULARIZATION : THEORY AND DATA PERSPECTIVES

Generalization is a central research topic in deep learning. To enhance the test performance of well-trained models on unseen data, it is essential to apply regularization techniques that refine the model's expressive capabilities and the training process. This thesis categorizes regularization into theory-driven and data-driven approaches. Theory-driven regularization encompasses methods that are broadly applicable across various contexts, including conventional techniques such as weight decay and dropout. Conversely, data-driven regularization involves techniques specifically designed for particular data sets and applications. For instance, different neural network architectures can be developed to capture various useful patterns in data for specific applications. This dissertation explores both types of regularization, from the development of new training algorithms with theoretical guarantees to the design of deep learning architectures for data-driven approaches.For theory-driven regularization, this dissertation discusses a training algorithm based on PAC-Bayes bound. PAC-Bayes bound evaluates the upper bound of the test error using only training data. However, minimizing the upper bound of the test error using existing PAC-Bayes bounds, which are theoretically tight and should intuitively benefit generalization, often results in compromised test performance compared to empirical risk minimization (ERM) with commonly used regularization techniques such as weight decay, large learning rates, and small batch sizes. The designed algorithm seeks to bridge the gap between theoretical tightness and practical effectiveness in boosting test performance for classification tasks.For data-driven regularization, this dissertation discusses graph neural networks specifically designed for directed graphs and spatial-temporal seismic data. It also introduces a physics-informed deep learning framework for full-waveform inversion, which aims to estimate subsurface structures based on seismic data by integrating the governing acoustic wave equation with convolutional neural networks. Additionally, data augmentation is considered a specialized form of regularization. This thesis explores the design of generative neural networks for time-lapse full-waveform inversion to obtain more training samples and achieve lower test errors in the target inversion task.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: Attribution-NonCommercial 4.0 International

Material Type: Theses

Authors: Zhang, Xitong

Thesis Advisors: Wang, Rongrong

Date Published: 2024

Subjects: Computer science

Program of Study: Computational Mathematics, Science and Engineering - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: 196 pages

Permalink: https://doi.org/doi:10.25335/ksns-yg29

DEEP LEARNING REGULARIZATION : THEORY AND DATA PERSPECTIVES

Full text