Predictive model building for utilizing word embedding models : applications in insurance data

Textual data contains a vast amount of information, yet for many researchers it has not been clear how the information could be used for an empirical analysis. Often times textual data are ignored or discarded in statistical analyses because regression and other statistical methods require numeric covariates. This dissertation will demonstrate how cutting-edge text mining technologies can improve empirical analyses by transforming textual data into numeric explanatory variables, thus allowing textual data to be incorporated into a statistical analysis. By transforming the textual data, the number of explanatory variables often becomes larger than the number of observations. For this reason, we explore the application of generalized additive models in tandem with adaptive lasso. In addition, we construct an algorithm for fitting a Gamma double generalized linear model with a group lasso penalty. Through this, we show how useful information can be extracted from textual data. We show how our methods can be applied through several insurance claims examples. We believe that our work can be widely used for other observational researchers in economics, business, statistical science, and social science.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Manski, Scott

Thesis Advisors: Lee, Gee Y.
Maiti, Tapabrata

Committee Members: Hong, Hyokyoung
Guo, Chenhui

Date Published: 2020

Subjects: Statistics

Program of Study: Statistics - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: 112 pages

ISBN: 9798662418776

Permalink: https://doi.org/doi:10.25335/nrfc-y046

Predictive model building for utilizing word embedding models : applications in insurance data

Full text