Development and validation of risk stratification models in a cohort of community-living homebound older adults, comparison of three methods : logistic regression, random forest, and Cox proportional hazard regression
Risk stratification (RS) models make predictions of an outcome based on the observed information from predictor variables. Classification of a population into different groups based on their risk of an outcome provides the opportunity for delivering targeted services to each group based on their needs and priorities. Different RS tools have been developed for older adults, but there is a limited number of RS studies developed for use in community-living older adults. This dissertation aims to develop and validate risk stratification models in a cohort of community-living homebound older adults. The study population consisted of older homebound adults who received home-based medical services from the Visiting Physician Association (VPA), which is a part of the United States Medical Management (USMM) Corporation. USMM provides a range of services, including home-based primary care and medical visits, senior home care, palliative care, and hospice services. The cohort had several features indicative of high risk: the average age was 82 years, 50% had 2265 5 comorbidities, and 45% had a severe disability (defined by a Karnofsky Performance Score KPS 226440). The population had very high rates of mortality and hospice admission (1-year rates were 32% and 10%, respectively). Given the unique and high-risk nature of this population, a RS approach was developed to help to provide USMM patients with appropriate services aligned with their priorities, as guided by a recent conceptual framework for the care of older adults with multiple comorbidities (Table 1.2). We developed and validated prediction models for two outcomes (death and hospice admission) by using three alternate statistical approaches: logistic regression (LR), random forest (RF), and Cox regression. The performance of these models was compared using the discrimination ability measured by area under the receiver operating curve (AUC). When developing the LR model we applied different variable selection methods (stepwise, backward, forward, adaptive lasso, elastic net, and manual). We developed a prediction model using a RF algorithm and used Cox regression to model time-to-event for each outcome separately (using the same variable selection methods as used in Logistic regression). All three models were developed in a derivation dataset (consisting of a random 50% of the cohort) and validated by applying to the validation dataset. Because of the large amount of missing data among predictor variables we applied multiple imputation (MI) procedures and compared the performance of LR and RF models in the original data and imputed data. For the prediction of mortality, all of the variable selection methods used in the LR model showed similar predictive performance (AUC 0.762- 0.769). Random forest had the best discrimination ability (AUC=0.83), whereas the LR and Cox models had comparable AUCs (0.76 and 0.74 respectively). We determined that the higher AUC of the RF model was mainly due to its ability to include subjects with missing data because when the subjects with missing data were excluded from the RF cohort, the UAC of the model was similar to the LR model. Also when the RF model was applied to imputed data it has similar predictive performance as the LR model which indicated the basic assumption of multiple imputation (i.e., missing at random) was not met in this data. For hospice admission, all three models had a similar discriminative ability (AUC for RF, LR, and Cox, were 0.70, 0.73, and 0.72, respectively). The variables age, race, KPS, serum albumin, surprise question (SQ), and hyperlipidemia were consistently selected as the important predictors of both outcomes in all three approaches. WE concluded that the RF approach can significantly improve the predictive performance of the RS model but this advantage comes from its ability for the inclusion of observation with missing data. When data are missing not at random use of MI had a limited effect on improving the prediction of models because the basic assumption in MI procedure is missing at random. The quality of data from large electronic health record datasets remains a limitation of developing RS models.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- Attribution-NoDerivatives 4.0 International
- Material Type
-
Theses
- Authors
-
Nasiriahmadabadi, Mojdeh
- Thesis Advisors
-
Reeves, Mathew J.
- Committee Members
-
Gardiner, Joseph C.
Sarzynski, Erin M.
Todem, David
- Date Published
-
2019
- Subjects
-
Older people
Health risk assessment--Statistical methods
Mortality
Forecasting
Hospice care
- Program of Study
-
Epidemiology - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xv, 232 pages
- ISBN
-
9781392435236
1392435234
- Permalink
- https://doi.org/doi:10.25335/e49h-7662