Predictive model building for utilizing word embedding models : applications in insurance data
Textual data contains a vast amount of information, yet for many researchers it has not been clear how the information could be used for an empirical analysis. Often times textual data are ignored or discarded in statistical analyses because regression and other statistical methods require numeric covariates. This dissertation will demonstrate how cutting-edge text mining technologies can improve empirical analyses by transforming textual data into numeric explanatory variables, thus allowing textual data to be incorporated into a statistical analysis. By transforming the textual data, the number of explanatory variables often becomes larger than the number of observations. For this reason, we explore the application of generalized additive models in tandem with adaptive lasso. In addition, we construct an algorithm for fitting a Gamma double generalized linear model with a group lasso penalty. Through this, we show how useful information can be extracted from textual data. We show how our methods can be applied through several insurance claims examples. We believe that our work can be widely used for other observational researchers in economics, business, statistical science, and social science.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Manski, Scott
- Thesis Advisors
-
Lee, Gee Y.
Maiti, Tapabrata
- Committee Members
-
Hong, Hyokyoung
Guo, Chenhui
- Date Published
-
2020
- Subjects
-
Statistics
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 112 pages
- ISBN
-
9798662418776
- Permalink
- https://doi.org/doi:10.25335/nrfc-y046