DEVELOPMENT AND VALIDATION OF RISK STRATIFICATION MODELS IN A COHORT OF COMMUNITY -LIVING HOMEBOUND OLDER ADULTS, COMPARISON OF THREE METHODS: LOGISTIC REGRESSION, RANDOM FOREST, AND COX PROPORTIONAL HAZARD REGRESSION By Mojdeh Nasiriahmadabadi A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Epidemiology -Doctor of Philosophy 2019 ABSTRACT DEVELOPMENT AND VALIDATION OF RISK STRATIFICATION MODELS IN A COHORT OF COMMUNITY -LIVING HOMEBOUND OLDER ADULTS, COMPARISON OF THREE METHODS: LOGISTIC REGRESSION, RANDOM FOREST, AND COX PROPORTIONAL HAZARD REGRESSION By Mojdeh Nasiriahmadabadi Risk stratification (RS) models make predictions of an outcome based on the observed information from predictor variables. Classification of a population into different groups ba sed on their ri sk of an outcome provides the opportunity for delivering targeted services to each group based on their need s and priorities. Different RS tools have been developed for older adults, but there is a limited number of RS studies developed for use in communit y-living older adults. This dissertation aims to develop and validate risk stratification models in a cohort of community -living homebound older adults. The study population consisted of older homebound adults who received home -based medical services from the Visiting Physician Association (VPA), which is a part of the United States Medical Management (USMM) Corporation. USMM provides a range of services, including home -based primary care and medical visits, senior home care, palliative care, and hospice se rvices. The cohort had several features indicative of high risk: the average age was 82 years, 50% had 5 comorbidities, and 45% had a severe disability (defined by a Karnofsky Performance Score KPS 40). The population had very high rates of mortality and hospice admission (1 -year rates were 32% and 10%, respectively). Given the unique and high -risk nature of this population, a RS approach was developed to help to provide USMM patients with appropriate services aligned with their priorities, as guided by a recent conceptual framework for the care of older adults with multiple comorbidities (Table 1.2). We developed and validated prediction models for two outcomes (death and hospice admission) by using three alternate statistical approaches: logistic regres sion (LR), random forest (RF), and Cox regression. The performance of these models was compared using the discrimination ability measured by area under the receiver operating curve (AUC). When developing the LR model we applied different variable selection methods (stepwise, backward, forward, adaptive lasso, elastic net, and manual). We developed a predic tion model using a RF algorithm and used Cox regression to model time -to-event for each outcome separately (using the same variable selection methods as u sed in Logistic regression). All three models were developed in a derivation dataset (consisting of a random 50% of the cohort) and validated by applying to the validation dataset. Because of the large amount of missing data among predictor variables we ap plied multiple imputation (MI) procedures and compared the performance of LR and RF models in the original data and imputed data. For the prediction of mortality, all of the variable selection metho ds used in the LR model showed similar predictive performa nce (AUC 0.762 - 0.769). Random forest had the best discrimination ability (AUC=0.83), whereas the LR and Cox models had comparable AUCs (0.76 and 0.74 respectively). We determined that the higher AUC of the RF model was mainly due to its ability to inc lude subjects with missing data because when the subjects with missing data were excluded from the RF cohort, the UAC of the model was similar to the LR model. Also when the RF model was applied to imputed data it has similar predictive performance as the LR m odel which indicated the basic assumption of multiple imputation (i.e., missing at random) was not met in this data. For hospice admission, all three models had a similar discriminative ability (AUC for RF, LR, and Cox, were 0.70, 0.73, and 0.72, respectiv ely). The variables age, race, KPS, serum albumin, surprise question (SQ), and hyperlipidemia were consistently selected as the important predictors of both outcomes in all three approaches. WE concluded that the RF approach can significantly improve the p redictive performance of the RS model but this advantage comes from its ability for the inclusion of observation with missing data. When data are missing not at random use of MI had a limited effect on improving the prediction of models because the basic assumption in MI p rocedure is missing at random. The q uality of data from large electronic health record datasets remains a limitation of developing RS models. iv This dissertation is lovingly dedicated to my mom for her thoughts and prayers, to my family for their unconditional love and support, to Pooya for relentlessly pushing me to work on it, to Negar for asking every single day if I'm done yet, also to my fingernails for surviving the many years of stressful chewing, and to my many sleepless nights, for making my PhD a truly Permanent Head Damage v ACKNOWLEDGMENT S Undertaking this PhD has been a truly rewarding and life -altering experience for me and it would not have been possible to do without all the support and guidance that I received. I would like to express my sincere gratitude to my advisor, Prof. Mathew Reeves, for all the support and encouragement throughout my Ph.D. study and related research. His immense knowledge and experience helped me thro ugh the research and writing of this dissertation. I cannot imagine having completed this dissertation without his guidance and support. I would like to also thank the rest of my dissertation committee: Prof. Joseph Gardiner, Dr. David Todem, and Dr. Erin Sarzynski for their insightful comments and encouragement, and also for the tough questions which incented me to widen my research and perspective. My sincere thanks goes to Dr. John Strandmark Œ corporate medical director of the Grace hospice and the repr esentative of US Medical Management Corporation Œ for the continuous support of this research, and for his patience, motivation, and knowledge. His guidance helped me through each stage of this process. I gratefully acknowledge the funding towards my Ph.D. dissertation and the data for this project from the US Medical Management Corporation. It has been a privilege and an honor to work with each and every one of these distinguished people. I would like to say a heartfelt thanks to my family, my mom, and my brothers and sisters for supporting me spiritually throughout my PhD study, as well as my life in general; and to my late father, for instilling the love of reading in me, in those early years. I am also very grateful to Dr. Negar Salehi for all the suppor t and encouragement she gave me. Last but not the least, a very special thank you to Pooya for his invaluable advice and feedback on my dissertation, and for always being so supportive of my work. This work would have not been possible without you, Pooya. vi TABLE OF CONTENTS LIST OF TABLES .................................................................................................................................. ix LIST OF FIGURES .............................................................................................................................. xiii CHAPTER 1. Introduction ....................................................................................................................1 Current care services available for older adults in the US .................................................................1 Description of the USMM Corporation ............................................................................................2 USMM patient population ..............................................................................................................4 Risk stratification approaches proposed by USMM providers ............................................................. 5 Importance of Risk Stratification .....................................................................................................6 Overview of population -based disease management ......................................................................8 Current guideline for care management of geriatric population with multi -morbidity ......................9 Overview of literature relevant to the RS in community living older adults .................................... 11 Statistical analysis of prediction models ........................................................................................ 17 Overall Analysis plan ........................................................................................................................... 19 Objectives .................................................................................................................................... 20 CHAPTER 2. Logistic Regression Model ............................................................................................. 21 Introduction ................................................................................................................................. 21 Literature review .......................................................................................................................... 24 Methods and materials ................................................................................................................. 29 Data source ......................................................................................................................................... 29 Study population ................................................................................................................................. 30 Outcome and expo sure ....................................................................................................................... 32 Statistical analysis ............................................................................................................................... 35 Variable selection methods ........................................................................................................................ 36 Model performance assessment ............................................................................................................... 38 Multiple imputation .................................................................................................................................... 39 Alternative risk stratification approaches ........................................................................................... 43 Results ......................................................................................................................................... 44 Study population ................................................................................................................................. 44 Outcome: One -year mortality ............................................................................................................. 50 Available case analysis ................................................................................................................................ 50 Imputed data analysis ................................................................................................................................. 58 Comparison of the risk stratification models .......................................................................................... 61 Final model selection .................................................................................................................................. 64 Calibration plots ........................................................................................................................................... 65 Outcome: Hospice admission ............................................................................................................. 68 Available data analysis ................................................................................................................................ 68 Imputed data analysis ................................................................................................................................. 74 Comparison of the risk stratification models .......................................................................................... 75 Final model selection .................................................................................................................................. 76 Discussion .................................................................................................................................... 77 Strengths ............................................................................................................................................. 82 vii Limitations ........................................................................................................................................... 82 Conclusion ................................................................................................................................... 83 CHAPTER 3. Random Forest Model ................................................................................................... 84 Introduction ................................................................................................................................. 84 Main concepts and definitions ...................................................................................................... 85 Mach ine learning ................................................................................................................................ 85 Machine learning in prediction models .............................................................................................. 86 Decision tree ................................................................................................................................................. 87 Random forest .................................................................................................................................... 88 Random forest construction parameters ................................................................................................. 92 Variable importance .................................................................................................................................... 93 Literature review .......................................................................................................................... 94 Methods and materials ................................................................................................................. 97 Statistical analysis ............................................................................................................................... 98 Results ....................................................................................................................................... 101 Study population ............................................................................................................................... 101 Outcome: one -year mortality ........................................................................................................... 102 Random forest development ................................................................................................................... 102 Variable importance .................................................................................................................................. 105 Comparison to the logistic regression model ........................................................................................ 106 Applying the RF model to imputed data ................................................................................................ 112 Model™s goodness -of -fit ............................................................................................................................ 113 Outcome: one -year hospice admission ............................................................................................ 115 Random forest development ................................................................................................................... 115 Variable importance .................................................................................................................................. 117 Comparison to the logistic regression .................................................................................................... 118 Applying the RF model to imputed data ................................................................................................ 124 Model™s goodness -of -fit ............................................................................................................................ 126 Discussion .................................................................................................................................. 127 Strengths ........................................................................................................................................... 130 Limitations ......................................................................................................................................... 131 Conclusion ................................................................................................................................. 131 APPENDIX ...................................................................................................................................... 133 CHAPTER 4. Cox Proportional Hazard Model and Comparison between the Three Models ................ 143 Introduction ............................................................................................................................... 143 Main concepts and definitions .................................................................................................... 144 Survival analysis methods and the COX PH model ........................................................................... 144 Definitions ......................................................................................................................................... 144 Performance evaluati on .................................................................................................................... 146 Proportional hazard assumption ...................................................................................................... 147 Literature review ........................................................................................................................ 148 Methods and materials ............................................................................................................... 150 Statistical analysis ............................................................................................................................. 154 Results ....................................................................................................................................... 156 Study population ............................................................................................................................... 156 Outcome: one -year mortality ........................................................................................................... 163 viii Model development .................................................................................................................................. 164 Model performance .................................................................................................................................. 168 Proportionality assumption ..................................................................................................................... 169 Comparison between the alternative approaches (Cox, LR, and RF) ................................................. 176 Outcome: one -year hospice admission ............................................................................................ 177 Model development .................................................................................................................................. 181 Model performance .................................................................................................................................. 184 Proportionality assumption test .............................................................................................................. 186 Comparison between the alternative approaches (LR, RF, and Cox) ................................................. 192 Discussion .................................................................................................................................. 193 Limitations ......................................................................................................................................... 198 Future direction ................................................................................................................................ 199 Conclusion ................................................................................................................................. 200 CHAPTER 5. Conclusion ................................................................................................................... 201 Population ................................................................................................................................. 202 Data source ................................................................................................................................ 203 The importance of the missing data ............................................................................................ 204 Using multiple imputation method in management of missing data ............................................. 207 Variable selection methods ........................................................................................................ 208 Using random forest method ...................................................................................................... 208 Important predictors of mortality and hospice ............................................................................ 210 Limitations ................................................................................................................................. 214 Future direction ......................................................................................................................... 215 Potential implementation of new RS approach for USMM ........................................................... 216 Conclusion ................................................................................................................................. 217 BIBLIOGRAPHY ............................................................................................................................... 218 ix LIST OF TABLES Table 1. 1. Definition of homebound patient determined by the Centers for Medicare and Medicaid Services ......................................................................................................................................................... 3 Table 1. 2. Conceptual framework for the care of older adults with multiple chronic conditions ............ 10 Table 1. 3. Summary of previous studies that developed a prognostic index for use in community -living older adult populations ............................................................................................................................... 14 Table 2. 1. Inclusion and exclusion criteria in this study patient population ............................................. 30 Table 2. 2. Patients with < 1 -year care received from USMM (N=2182) .................................................... 32 Table 2. 3. Definition and values of the functional status variables and surprise question ....................... 35 Table 2. 4. Cohort population description, by the outcome rates and unadjusted odds ratios (N=7445) . 45 Table 2. 5. Outcomes and follow up duration ............................................................................................ 49 Table 2. 6. Association between missing observations on predictor variables and the outcomes, age, gender and Medicare/Medicaid dual -eligibility, p -values, mag nitude and direction of the effect ........... 50 Table 2. 7. Model development using alternative variable selection methods for 1 -year mortali ty in available case data ...................................................................................................................................... 52 Table 2. 8. Different gamma - adaptive lasso variable selection for 1 -year mortality ................................ 56 Table 2. 9. Parameter estimates for the continuous variables from multiple imputation procedure - comparison of 20 and five imputations ...................................................................................................... 58 Table 2. 10. Variance information for the continuous from multiple imputation procedure - comparison of 20 and 5 imputations .............................................................................................................................. 59 Table 2. 11. Model development using alternative variable selection methods for 1 -year mortality using imputed data, AUCs for both de rivation and validation data sets ............................................................. 60 Table 2. 12. Prevalence of the risk levels determined by the USMM risk -stratification approaches (N=7445) ..................................................................................................................................................... 61 Table 2. 13. Comparison of the alternative risk stratification approaches for 1 -year mortality (N=3723 validation) ................................................................................................................................................... 62 Table 2. 14. Final model parameter estimates and odds ratios for 1 -year mortality using derivation dataset (N=3722) ........................................................................................................................................ 65 x Table 2. 15. Model development using alternative variable selection methods for hospice admission using available case data ............................................................................................................................ 69 Table 2. 16. Using different gamma in adaptive lasso variable selection for 1 -year hospice admission ... 72 Table 2. 17. Model development using alternative variable selection methods for 1 -year hospice admission using imputed data, AUC and 95% confidence limits for both derivation and validation data sets .............................................................................................................................................................. 74 Table 2. 18. Comparison of the alternative risk stratification approaches for 1 -year ho spice admission . 75 Table 2. 19. Final model parameter estimates and odds ratios for 1 -year hospice admission using derivation data s et (N=3722) ...................................................................................................................... 76 Table 3. 1. AUC from random forest model in derivation and validation data sets using different depth and number of trees - mortality outcome ................................................................................................. 103 Table 3. 2 . The first ten ranked important variables in the random forest model - Mortality outcome .. 106 Table 3. 3 . The variable importance in the logistic regression model, (by estimates and significance) - Mortality outcome .................................................................................................................................... 107 Table 3. 4. Comparison of the model performance for prediction of 1 -year mortality, logistic regression and random forest models (validation N=3723) ....................................................................................... 108 Table 3. 5. ROC and 95% confidence intervals from the RF and LR models (N=2312) ............................. 111 Table 3. 6. ROC contrast between the two models, RF and LR ................................................................. 112 Table 3. 7 . AUC and the 95% confidence intervals from the RF model in the imputed data ................... 113 Table 3. 8. AUC from random forest model in derivation and validation data sets using different depth and number of trees - hospice outcome ................................................................................................... 116 Table 3. 9 . The first ten ranked important variables in the random forest model - Hospice outcome .... 117 Table 3. 10. The variable importance in the logistic regression model, (by estimates and significance) - Hospice outcome ...................................................................................................................................... 119 Table 3. 11. Comparison of the model performance for prediction of 1 -year hospice admission, logistic regression and random forest models (validation cohort) ....................................................................... 121 Table 3. 12 . AUC and 95% confidence intervals from the two models, LR and RF applied to the same population (N=2590) - Hospice ou tcome .................................................................................................. 123 Table 3. 13 . AUC and 95% confidence intervals from the RF model applied to the imputed data (20 replications) - Hospice outcome ............................................................................................................... 125 xi Table 3A. 1. Ranked importance of predictor variables in the random forest model, RBA meth od - Mortality outcome .................................................................................................................................... 134 Table 3A. 2. Ranked importance of the explanatory variables in the random forest model, the loss reductio n method - Mortality outcome .................................................................................................... 136 Table 3A. 3 . Ranked importance of predictor variables in the random forest model, RBA method - Hospice outcome ...................................................................................................................................... 138 Table 3A. 4 . Ranked importance of predictor variables in the random forest model, Loss reduction method - Hospice outcome ....................................................................................................................... 140 Table 3A. 5 . Sample of fit statistics from the RF model for hospice outcome ......................................... 142 Table 4. 1. Inclusion and exclusion criteria for the Cox cohort ................................................................. 151 Table 4. 2. Study population characteristics and association of predictors with the outcomes (N=7441) over an average of 459 days of follow -up ................................................................................................ 158 Table 4. 3. Follow -up time and outcomes in the Cox study cohort ( N=7441) .......................................... 162 Table 4. 4. Comparison of alternative variable selection methods in the derivation data ( N= 3721) ..... 165 Table 4. 5. Parameter estimat es, hazard ratios, and 95% CL for predictors of the MV Cox model for mortality outcome - derivation data (N=2289) ......................................................................................... 167 Table 4. 6. Con cordance (C -index) of the Cox MV model for mortality in the validation data ( N=2312) 168 Table 4. 7. Parameter estimates and p-values for the interaction terms between time and key predictors - derivation data .......................................................................................................................................... 175 Table 4. 8. Overall test for proportionalit y assumption for all interaction terms together ..................... 175 Table 4. 9. Comparison of the model performance between the three models, Cox, LR, and RF models using validation dataset ............................................................................................................................ 176 Table 4. 10. Alternative variable selection methods for hospice outcome - derivation d ata ( N=3721) ... 181 Table 4. 11. Parameter estimates and hazard ratios from the Cox model for hospice outcome, derivation data (N=2055) ........................................................................................................................................... 183 Table 4. 12. Concordance of the Cox MV model for hospice outcome - validation data ( N=2498) .......... 184 Table 4. 13. Parameter estimates and p -values for the interaction terms between time and key predictors - derivation data ....................................................................................................................... 191 Table 4. 14. Overall test for proportionality assumption for all interaction terms together ................... 192 xii Table 4. 15. Comparison of the Cox model performance with the LR and RF models - Hospice outcome .................................................................................................................................................................. 192 xiii LIST OF FIGURES Figure 2. 1. Flow diagram of the study cohort ............................................................................................ 31 Figure 2. 2. Manual variable selection in the imputed data ....................................................................... 42 Figure 2. 3. The USMM proposed 3 -level risk stratification approach ....................................................... 43 Figure 2. 4. Adaptive lasso varia ble selection process using GLMSELECT for the mortality outcome (gamma=1.0 and validation dataset) .......................................................................................................... 57 Figure 2. 5. Elastic net variable selection process using GLMSELECT for the mortality outcome (validation dataset) ....................................................................................................................................................... 57 Figure 2. 6. Loess -based calibration pl ot for the multivariable logistic model in the validation data for the outcome of 1 -year mortality ....................................................................................................................... 66 Figure 2. 7. Decile -based cal ibration plot for the multivariable logistic model in the validation data for the outcome of 1 -year mortality ................................................................................................................ 67 Figure 2. 8. Decile -based calibration plot for the multivariable logistic model in the derivation data for the outcome of 1 -year mortality ................................................................................................................ 67 Figure 2. 9. Adaptive lasso variable selection process using GLMSELECT for the hospice admission outcome (gamma=1.0 and validation dataset) ........................................................................................... 73 Figure 2. 10. Elastic net variable selection process using GLMSELECT for the hospice admission outcome (validation dataset) ..................................................................................................................................... 73 Figure 3. 1. The schematic structure of a decision tree .............................................................................. 87 Figure 3. 2. Random forest algorithm for regression and classification ..................................................... 89 Figure 3. 3. Impact of RF hyper -parameters on the AUCs of the random forest model applied to th e validation dataset Œ 1-year mortality ........................................................................................................ 102 Figure 3. 4. The average squared error of the RF model by the number of trees for both OO B (top line) and full data (lower line) ........................................................................................................................... 105 Figure 3. 5. Correlation of the predicted probability of death between the two models ( N=2312) ........ 108 Figure 3. 6. Comparison of ROCs between the two models - RF and LR, logistic regression (N=2312) and random forest model (N=3723) ................................................................................................................ 110 Figure 3. 7. Comparison of ROCs between the logistic regression and random forest models when using the same validati on cohort in both models (N=2312) .............................................................................. 111 xiv Figure 3. 8. ROC from the random forest model applied to the imputed validation data (average of 20 predictions for each individual was generated from 20 imputed dataset) .............................................. 113 Figure 3. 9. Loess -based calibration plot for RF model - mortality outcome - validation cohort (N=3723) .................................................................................................................................................................. 114 Figure 3. 10. Decile based calibration plot for RF model - mortality outcome -validation cohort (N=3723) .................................................................................................................................................................. 114 Figure 3. 11. The average squared error of the RF model by the number of trees - Hospice outcome ... 117 Figure 3. 12. Correlation of the predicted probability of hospice admission between the two models, logistic regression and random forest, (N=2590) ..................................................................................... 120 Figure 3. 13. Comparison of ROCs between the two models - Hospice outcome, logistic regression (N=2590) and random forest model (N=3723) ......................................................................................... 122 Figure 3. 14 . Comparison of ROCs between the two models, logistic regression and random forest when using the same validation cohort in both models (N=2590) .................................................................... 123 Figure 3. 15. ROC from the random forest model applied to the imputed validation data - Hospice outcome (N=3723) .................................................................................................................................... 125 Figure 3. 16. Loess -based calibration plot for the RF model - Hospice outcome ...................................... 126 Figure 3. 17. Decile based calibration plot for the RF model - Hospice outcome ..................................... 127 Figure 3A. 1 . Correlation between predicted probability in LR and random forest ................................. 134 Figure 3A. 2 . Correlation between predicted probability in LR and random forest - Hospice admission . 138 Figure 4. 1. Flow diagram of the study population ................................................................................... 152 Figure 4. 2. KM survival plot for the whole data (N=7441) ....................................................................... 163 Figure 4. 3. Hazard rate estimates for the whole data (N=7441) ............................................................. 164 Figure 4. 4. ROC for the mortality outcome at time=365 days and AUC (365) from Cox MV model - validatio n data ( N=2312) .......................................................................................................................... 168 Figure 4. 5. Time dependent AUC (stepwise selection, validation data) ( N=2312) .................................. 169 Figure 4. 6. KM survival curve stratified by age - derivation data ............................................................. 170 Figure 4. 7. KM survival curve stratified by sex - derivation data .............................................................. 170 Figure 4. 8. KM survival curve stratified by race - derivation data ............................................................ 171 xv Figure 4. 9. KM survival curve stratified by albumin - derivation data ...................................................... 171 Figure 4. 10. KM survival curve stratified by cholesterol - derivation data ............................................... 172 Figure 4. 11. KM survival curve stratified by SQ - derivation data ............................................................ 172 Figure 4. 12. KM survival curve stratified by KPS - derivation data ........................................................... 173 Figure 4. 13. KM survival curve stratified by ADL decline - derivation data .............................................. 173 Figure 4. 14. KM survival curve stratified by hyperlipidemia - derivation data ......................................... 174 Figure 4. 15. KM plot for time -to-hospice admission in the whole cohort (N=7441) ............................... 177 Figure 4. 16. Haz ard rate for hospice admission from the first USMM vis it- whole cohort (N=7441) ..... 178 Figure 4. 17. KM plot for time -to-death from th e first USMM visit stratified by hospice admission status (N=7441) ................................................................................................................................................... 179 Figure 4. 18. Estimated hazard rates for time -to-death stratified by hospice admission status ( N=7441) .................................................................................................................................................................. 179 Figure 4. 19. KM survival among hospice admitted patients (N=1389) .................................................... 180 Figure 4. 20 . Estimated hazard rate for mortality among hospice admitted patients (N=1389) ............. 181 Figure 4. 21. ROC at day 365 from the Cox MV model for the hospice outcome - validation data ( N=2498) .................................................................................................................................................................. 185 Figure 4. 22. Integrated AUC from the Cox MV model for hospice outcome - validation data ( N=2498) 186 Figure 4. 23. KM survival curve stratified by age - derivation data ........................................................... 187 Figure 4. 24. KM survival curve stratified by race - derivation data .......................................................... 187 Figure 4. 25. KM survival curve stratified by SQ - derivation data ............................................................ 188 Figure 4. 26. KM survival curve stratified by living alone - derivation data .............................................. 188 Figure 4. 27. KM survival curve stratified by albumin - derivation data .................................................... 189 Figure 4. 28. KM survival curve stratified by KPS - derivation data ........................................................... 189 Figure 4. 29. KM survival curve stratified by hip fracture - derivation data .............................................. 190 Figure 4. 30. KM survival curve stratified by hyperlipidemia - derivation data ......................................... 190 1 CHAPTER 1. Introduction This dissertation aims to develop risk stratification (RS) models using a cohort of the US Medical Management (USMM) patient population, which is a unique population of community -living homebound older adults. The USMM organization approached my advisor and me in search of a collaboration with academic partners to develop RS models that can optimally improve their pre -existing RS approaches. The RS models were needed to identify patients at high risk of death and those at high risk of hospice admission in the ne ar future and to provide them with appropriate customized palliative care. The ultimate goal was to improve the quality and timing of the healthcare they provide to the patients at different risk levels, e.g. different palliative care services including ho spice referral. The collaboration started in 2014, and a cohort of the USMM patients cared for in the calendar year 2015 was assembled for this study. The need to develop RS models specific to the USMM population is based on the unique characteristics of t he USMM population, and the intention of the USMM organization to implement the most accurate RS in its population. This chapter is organized around the following sections including a summary of older adults care options (as currently implemented in the U S), a descriptions of the USMM Corporation, its patient population, and the alternative RS approaches proposed by the USMM providers. The role of RS in population -based disease management is also explained, and a summary of relevant RS literature is provid ed. Finally, three specific aims and analysis plan for each are described. Current c are services available for older adults in the US As the population of older adults is growing, different types of services for older adults care have been developed. There are several ways to categorize the older adults care services, for example based on the location of residence (home or institution), based on services that offered (skilled vs. custodial), or based on the purpose of health care (long term care facilities, palliative care, or hospice). A continuum 2 of care for older adults can be described as home -services (such as ADL assistance, IADL as sistance), home health services, adult day care, assisted living and retirement communities, skilled nursing facilities and long term care units, palliative care (at home or institutional), and hospice care. Many of these services are offered as specialize d care for a specific disease or condition such as dementia or Alzheimer™s disease assisted living communities called memory care. As older adults and their families preference is shifting from nursing home admission to living in the community, a range of community based care programs have been developed. (1,2) The program of all -inclusive care for older adults (PACE) and home - and community - based services for older adults (HCBS) are 2 examples of programs that are implemented nationwide to provide care for older adults who are living at home. (1,3) The goal of these programs are to help participants to live in the community as long as it is medically, socially, and financially feasible. (4) Home health care are skilled medical services th at are offered to older adults in their home and can include physician visits, nursing or nursing aide visits, medications, physical therapy, and other services. Patients who are confined to home temporarily (e.g., indicated for a medical reason such as su rgery) or permanently (e.g., disability, old age) are often use the home health services. On the other hand palliative care aims to relieve patient™s pain and suffering in contrast to the medical services that aim to cure and treat a condition. Palliative services are offered to patients who are dying and so are in the last few months of their life. Hospice care is a kind of palliative care however palliative care is not limited to hospice, i.e., it can be offered at home according to the patients and careg iver preferences. Description of the USMM Corporation United States Medical Management, LLC (USMM) is a management services organization that provides home -based medical services to homebound patients through its Visiting Physicians Association (VPA). The USMM provides medical care to patients across 11 US states including Michigan, Ohio, Texas, Florida, Kansas, Virginia, Illinois, Kentucky, Missouri, Washington, and Wisconsin. There are more than 3 100 USMM offices in these 11 states; the headquarters of the company is located in Troy, Michigan. In December 2011, VPA, in conjunction with the Detroit Medical Center (DMC), was selected as a Pioneering Accountable Care Organization (ACO). This Pioneering ACO was one of only 32 selected from over 4,500 applicatio ns in 2012. The USMM Corporation specializes in home -based health care for homebound older adults and other -mostly disabled - patients unable to access health care through traditional means. Homebound adults are defined as patients who are confined to home according to the criteria in Table 1.1. A patient needs to meet either first or second criteria to be homebound. (5) Table 1. 1. Definition of homebound patient determin ed by the Centers for Medicare and Medicaid Services First Criteria One of the following must be met: Second Criteria Both of the following must be met: 1. Because of illness or injury, the individual needs the aid of supportive devices such as crutches, canes, wheelchairs, and walkers; the use of special transportation; or the assistance of another person to leave their place of residence. 1. There must exist a normal inability to leave home. 2. Have a condition such that leaving his or her home is medically contraindicated. 2. Leaving home must require a considerable and taxing effort. For example, a patient who is blind or old senile and need the assistance from another person to leave ho me, or a patient who recently had surgery and their actions are restricted to specified and limited activities by their physician are considered homebound. Also, a patient with psychiatric illness that is of such a nature that it would not be considered sa fe for the patient to leave home unattended is another example of a homebound patient. USMM provides comprehensive clinical management, administrative and support services, and has specific expertise in physician house call medicine. The USMM providers inc lude physicians, nurse practitioners, and other allied health professionals that assist in the provision of home -based primary care. These providers include clinical educators as well as personnel from 4 certified home health agencies, hospice services, and durable medical equipment companies. USMM also owns several health properties and organizations, including hospices and home health agencies. USMM maintains a large and rich clinical dataset on its population that is drawn from the electronic medical reco rd (EMR) system named APRIMA. This EMR data includes information on demographics, socioeconomic (i.e., living alone, smoking, and insurance), functional status, comorbidities, laboratory tests, and utilization data. The EMR data is collected by USMM clinic al staff, including physicians, nurse practitioners, and clinical educators. Another database, named Status Scope, also supplements APRIMA data. It contains supplemental medical information collected by allied health professionals or their assistants durin g regular home visits. For example the ‚surprise question™ and ‚living alone™ are variables that are collected in the Status Scope data. These databases, therefore, contain extensive clinical details from the home visits that USMM patients receive. In add ition to the USMM APRIMA (EMR) and Status Scope (supplemental clinical) data, claims data were also available through a third -party corporation, E -solution, (6) which provides processed claims data from the Centers for Medicare and Medicai d Services (CMS). The processed claims data contained limited information on only 5 types of events: death (date of death), hospice utilization (first and last dates of hospice services specified in 12 -week intervals), home health (HH) utilization (first a nd last dates of HH services specified in 8 -week intervals), the most recent hospitalization (admission and discharge dates), and the hospitalization prior to the most recent one. Dates of death and hospice utilization were used as the outcomes of interest in this study. USMM patient population The UMSS patient population is unique in terms of its demographics and functional characteristics. In 2015, more than 50,000 patients in the 11 states received services from USMM. This population is older (mean age of 71 and median of 73 years old; 86% are 65 years and older) and has a more complex comorbidity profile compared to typical Medicare populations. The prevalence of common 5 comorbidities such as hypertension (81%), hyperlipidemia (50%), chronic kidney disea ses (40%), and diabetes (34%) in this cohort are all much higher than the US population of age 65 (Table 2.3). (7) The functional status of this population is also different from the typical geriatric population; functional status variables such as Karnofsky Performance Scale (KPS) and Timed Up and Go (TUG) indicate that the USMM population has more severe impaired function and need for assistance and special care. The higher prevalence of comorbidities and impaired functional status is explained by the fact that USMM patients™ population are homebound by definition. The CMS criteria (Table 1 .1) for a patient to be eligible for home health services includes but not limited to: be confined to the home, need skilled services, and be under the care of a physician. (1) The USMM population, because of its old age, multiple comorbidities, and significant functional impairments has high levels of vulnerability. The unique and high -risk spectrum of the USMM patient population implies the need to develop a tailored risk stratif ication tools to effectively and efficiently manage their care and maximize their health outcomes (such as mortality, hospice admission, patient/caregiver satisfaction, and symptom management). In this thesis the two outcomes of interest are 1 -year mortali ty and 1 -year hospice admission Further details of USMM population characteristics and the variables that are used for the model development are presented in chapter 2. o Risk stratification approaches proposed by USMM providers USMM providers proposed two approaches for risk stratification: the surprise question (SQ) and a 3 -level risk stratification approach. The surprise question is a simple question answered by the provider, "Would you be surprised if this patient died in the next 6 -12 months?". (8) The answer to the SQ is used to find high -risk patients (i.e., where the answer equals no). The second proposed approach is a decis ion tree that categorizes patients into three risk levels (high, intermediate, or low) based on five variables: 6 SQ, albumin, a recent fall, hospitalization, or ER visit since their last USMM visit. These two RS approaches are simple and easy to use, but th e performance of them in the prediction of the adverse outcomes has not been assessed. The USMM intention for conducting this study was to develop a more refined statistical approach to improve its RS process. Importance of Risk Stratification Risk stratification is a process of using observable/measurable characteristics to predict the risk of an event. It can help to classify a cohort to different levels of risk and then provide them with appropriate care. For example, a patient at high risk of dea th should prompt early referrals for palliative care or hospice, whereas an old patient at low risk of death should be considered for services Œ such as home health or other community programs Œ designed to maintain and improve their functional abilities, a nd their physical and mental health status. Similar to many developed countries, the US population is aging faster than any other time in history. (1,2) People are living longer and experiencing more comorbid ities. The number of patients with multiple -conditions has significantly increased in the past few decades. (3,4) Chronic diseases often require long -term health care and result in frequent utilization of services. The combination of the aging population and higher prevalence of multi -morbidity in older adults imposes a considerable burden of increased he alth care expenditure on governments, especially in developed countries. (5Œ8) The increasing need for health care services and limited resources has brought about an essential need to identify patients who have the most need for different types of services and to allocate services to those who will benefit the most. Risk stratifica tion methods are commonly used for this purpose. Many studies have developed and evaluated risk stratification approaches in different cohorts of older patients, for example patients with atrial fibrillation, syncope, older adults discharged from emergency department, and patients with acute coronary syndrome. These studies often illustrated that the performance of the developed risk stratification model was superior to the prior approaches. (16 Œ21) 7 As population ages, RS (i.e., prognostic) models are becoming increasingly important in clinical decision making. (22) Clinicians, researchers, and policymakers are using prognostic models to make decisions about preventive services or tr eatment strategies. Mortality based prognostic models are used to influence decisions for screening procedures in the population. An example of these decision making tools is the ePrognosis calculator by UCSF that serves as a repository of published geriat ric prognostic indices where clinicians can obtain estimates of their patients™ prognosis. For example one can get evidence -based suggestion for cancer screening based on a set of questions such as patient™s demographics, comorbidities, functional status, and mental health. (23) Also, the prediction of mortality impacts the type and intensity of treatments to be offered to older patients. For example, the use of screening tests such as colonoscopy and mammography, or intensity of therapy for diabetes mell itus in older adults can be completely different depending on the prognosis. (24 Œ29) Likewise, intensive treatment for diabetes for the prevention of long -term complications m ay not have any benefit or may even cause harm in a patient with <12 months life expectancy. Patients with limited life expectancy may benefit more from palliative care to ease their symptoms and to improve their end of life experience when time to benefit from the screening test or intensive treatment exceeds their life expectancy. The overall goal of risk stratification for clinical populations like those served by USMM is to accurately predict adverse outcomes such as mortality and medical service utiliz ation which then allows delivering more appropriate levels of clinical services to patients with different risk levels and to align services with the patients™ needs and priorities. These services can include a change in medications, nutritional support, a dditional home visit, offering palliative care and advanced care planning, or hospice referral. Different prognostic tools have been developed to identify high risk patients for palliative care, for example Palliative Performance Scale (30,31) and Palliative Prognostic Score (32) . The palliative care tools were summarized in a publication by the National Hospice and Palliative Care Organization. (33) Hospice eligibility criteria was developed by CMS. Additionally there are different disease -specific 8 hospice admission criteria and guidelines for cancer, cardiac d isease, pulmonary disease, dementia, etc. (34,35) Another example of the risk stratification tool for clinical population is the PRIME REGISTRY which is a platform for clinical d ata registry with tools for risk stratification and care planning for family medicine physicians in addition to evaluation of practice performance. (36,37) The purpose of this dissertation is to develop a risk prediction model specific to the USM M population, which is characterized by large numbers of patients with advanced age, multi -morbidity, and functional impairment. The ultimate goal of this risk stratification approach is to improve the quality and efficiency of home -based medical services, which also can include the appropriate and timely referral to other care settings, including nursing home and hospice. Overview of population -based disease management Health care organizations are working to change their cost structure and to improve thei r outcomes. The fact that 20% of the patients with chronic conditions are responsible for about 80% of health care expenditure has brought to attention the need for improvements in the disease management of such populations. Disease management is a system of coordinated healthcare interventions for populations with specific conditions. It emphasizes prevention of exacerbations and complications of the condition, evaluation of the clinical and economic outcomes, and developing a case management plan. (38) Population -based disease management is becoming today's optimal practice pattern and is replacing the former approach of episode -based disease management. It means that instead of managing patients who are seeking treatm ent at a given time, all people in a target population (e.g., insurance enrollee with a particular disease) are considered targets for case management interventions designed to prevent the complications of the disease and unnecessary medical utilization. B y identifying high -risk, high -cost patients, this approach can result in more timely delivery of appropriate interventions in a cost -effective manner that has the potential to save money. (33) The fundamental step in population diseases management is risk stratification in order to identify the high -risk, high -cost pati ents. (40) For 9 example, Lavery et al., evaluated a risk stratification approach for population based disease management of diabetes mellitus . (41) Also , Haas et al. in a study used several risk stratification instruments in predicting health care utilization among all adult patients in a primary care practice. (42) Although population health management for older adults who often have multiple chronic conditions is more complex than the disease -specific population health management. Tkatch et al. reviewed the literature for the population health management for older adults and found that interventions to promote health among target populations tend to be disease specific rather than based on a global concept of older adults™ health. (43) Current guideline for care management of geriatric population with multi -morbidity The Ameri can Geriatrics Society (AGS) Expert Panel has developed guidelines for the care of older adults with multi -morbidity. (44) Boyd et al. then developed a framework of actions to translate the AGS guideline to actions steps for decision making. (45) They provided three decisional actions and then action steps for each one. Table 1.2 contains the action steps. The first action requires the estimation of life expectancy and patients™ health trajectory. The RS mode l in this study is going to serve as an instrument for estimation of the patient™s life expectancy. Risk stratification is one of the many action steps needed for improvement in the health care for older adults with multi -morbidity. The risk stratification should be used and aligned in accordance with other action steps such as identifying patient™s priorities and communicating the information between clinicians, caregivers and patient. 10 Table 1. 2. Conceptual framework for the care of older adults with multiple chronic conditions MCC ACTION: IDENTIFY AND COMMUNICATE PATIENTS™ HEALTH PRIORITIES AND HEALTH TRAJECTORY o Use a validated approach to i dentifying patients™ health priorities o Transmit patients™ health priorities o Estimate life expectancy, trajectory, and lag time (time horizon) to benefit o Determine patients™ readiness to discuss their tra jectory or prognosis o Assess patients™ perceptions of their prognosis and trajectory MCC ACTION: STOP, START, OR CONTINUE CARE BASED ON HEALTH PRIORITIES, POTENTIAL BENEFIT VS HARM AND BURDEN, AND HEALTH TRAJECTORY Acknowledge uncertainty and variable health priorities in decision making and communication Stop or do not start medications for which harm or burden may outweigh bene fit o Stop medications deemed inappropriate in older adults o Avoid medication cascades o Perform serial trials if treatments may be contributing to bothersome symptoms o Discontinue treatments no longer indicated or needed o Review and adjust self -management tasks Consider whether the patient has advanced illness or limited life expectancy that affects bene fits and harms of treatments o Consider health trajectory and time to bene fit for preventive interventions o Explain cessation of screening and prevention as a shift in priorities and use positive messaging MCC ACTION: ALIGN DECISIONS AND CARE AMONG PATIENTS, CAREGIVERS, AND OTHER CLINI CIANS WITH PATIENTS ™ HEALTH PRIORITIES AND HEALTH TRAJECTORY Af firm shared understanding of patients ™ health priorities and the information that informs decision making o Agree on the factors and information that will inform decision making and care o Encourage patients and family/caregivers to participate in decision making Align decisions when patient and clinician have different perspectives o Link decision to something meaningful to the patient o Ensure that patients ™ health outcome goals are consiste nt with their healthcare preferences o Identify and change bothersome aspects of treatment o Accept patients ™ decisions Align decisions when clinicians have different perspectives or recommendations o Focus discussion on patients ™ health priorities, not only o n diseases o Acknowledge absence of one firight answer fl for patients with MCCs o Use collaborative negotiation to arrive at shared recommendations MCC, multiple chronic condition. Table adapted from the ‚Decision Making for Older Adults with Multiple Chronic Conditions: Executive Summary for the American Geriatrics Society Guiding Principles on the Care of Older Adults With Multi -morbidity™ (45) . This conceptual framework is to illustrate how RS is used to inform clinical care and in turn disease management at the population level. 11 Overview of literature relevant to the RS in community living older adults To find the previous studies related to the subject of this thesis, we searched Pubmed and google scholar for risk stratific ation in older adults for mortality and also for hospice. The results were reviewed for the study population and the outcomes; the relevant studies were also reviewed for forward and backward reference searching. The results of literature review are summar ized in this section. Previous studies have developed prognostic models in different geriatric populations, such as hospitalized older adults (46,47) , or nursing home patients. (13) Other studies have investigated the effect of specific comorbidities or functional status on mortality in elders. (49 Œ51) These studies are differen t from our homebound study population since their populations are not community -living, or are limited to those with a specific condition, or evaluate only a specific predictor in association with the outcomes. There are fewer studies that developed a prog nostic index for the community -living population regardless of a specific disease or chronic condition. Table 1.3 summarizes nine relevant studies that utilized prediction models to develop a prognostic index for mortality outcome in the community -living o lder adults. These study populations are mostly similar to our data population; although differ in one critical feature; none of them were described as homebound, while USMM patients are all confined to home. Yourman et al., in a systematic review for prog nostic models in older adults. Their study was the main paper that contributed to the Table 1.3 on the literature review. Yourman et al. reported on 16 studies, of which only six were performed in community -living older adults. (13) The other ten prognostic indices were developed in institutionalized population s, often nursing home residents. These six models from Yourman™s review, along with other applicable studies, are summarized in Table 1.3. These models are discussed in more detail in the next three chapters when the results are compared to my findings. Overall these previous models have been built in community -living older adults with different levels of multi -morbidity and functional impairment. The investigators used different databases for their study, 12 including Medicare administrative data (52) , population health surveys (53 Œ55) , retrospective chart review from VA hospital patients (56) , or epidemiologic cohorts (49,57,58) . Therefore they might include the oldest -old adults (49) or a much younger cohort of elderly (50 years or older). (16) They can be nursing home eligible population that are living at home, like the study population in Carey et al. research. (57) The mortality rates among these study populations ranged from 7.5% a year in Gagne study conducted in a cohort of Medicare beneficiaries who enrolled for th e drug coverage programs, to 26% a year in Fischer study conducted through retrospective chart review for all patients who were admitted and discharged from the Denver Veteran™s Administration Medical Center (DVAMC). (52,56) Han et al. reported a 6 -month mortality rate of 15% (a grossly estimated one -year mortalit y of 30%) in their study population which is higher than the 2% mortality in the total Medicare Health Outcome Survey population (MHOS). (53) The reason is that they only included MHOS participants with declining health (i.e. patients who reported their health fimuch worsefl compared to their last year health). The investigators constructed a prognostic index using regression coefficients of the multivariable models (Table 1.3). Exc ept for the two studies by Carey and Fried (57,58) , which modeled their data using Cox proportional hazard model, other studies used logistic regression models to develop the prognostic models. The study conducted by Carey in 2008 used the members of the Program of All -inclu sive Care for the Elderly (PACE) which is probably the most similar study population to the USMM patient population. They were older frail adults, eligible for nursing home, but still living in their homes. PACE is a Medicare program for adults aged 55 and older who are living with disabilities and need a nursing home level of care but can safely continue to live in the community. PACE services can include home care if needed but there is not necessarily a home -based physician visit and health services. The USMM population was also vulnerable and includes frail older adults with underlying conditions that made them homebound. However, these two population had a critical difference, which is their mortality 13 rates. The USMM population had a much higher mortali ty rate (one -year 32%) than Carey's study population (one -year 13%). Indeed, the mortality rate in the Carey™s study is lower than the expected mortality rate in a population of older adults who are eligible for nursing home. In the literature, the one -yea r mortality rate of nursing home residents has been reported between 17.4% and 35.0%. (59 Œ62) The lower rate of mortality in th e PACE patient population studied in Carey™s paper may be explained by the definition of the PACE eligibility criteria. Adults with age 55 and older who are eligible for nursing home care are participants of PACE programs; therefore patients at relatively younger age (i.e. 55 to 65) who need long -term care (e.g. due to a disability) may have a longer life expectancy which contributes to the lower overall mortality rate in the study cohort. In summary, comparing the studies in Table 1.3, the most important observation is that although all of the studies are among the community -living older adults, but the study populations are extensively heterogeneous. The heterogeneity can be best seen in the mortality rate of different studies. Consequently, these studies are not really comparable to the USMM patient population. 14 Table 1. 3. Summary of previous studies that developed a prognostic index for use in community -living older adult populations Study First author Date of publication Study population Country and Time interval Outcome and Predictors Development of a Prognostic Mode l for Six -Month Mortality in Older Adults With Declining Health (PROMPT) Paul K.J. Han (53) 2012 -N=21,870 -Medicare beneficiaries from the Medicare Health Outcome Survey (MHOS), an annual nationwide survey by CMS -Medicare beneficiaries randomly sampled each year, aged over 65, with self -reported declining health in the past year -Institutionalized and disable beneficiaries are included -6-month mortality of 15% -US -MHOS surveys from 1998 -2000, 1999 -2001, 2000 -2002, 2001 -2003 Outcome: 6 -month mortality Predictors (11): age, gender, cancer, CHF, COPD, Smoking status, proxy status, ADLs, General health perceptions, social functioning, energy/fatigue A combined c omorbidity score predicted mortality in elderly patients better than existing scores Joshua J. Gagne (52) 2011 -N=120679 d erivatio n -Medicare beneficiaries who had complete drug coverage through the Pharmacy Assistance Contract for Elderly (PACE) that provides medications at minimal expense to low -income elderly -1-year mortality of 8.9% -N= 123855 validation -Medicare enrollees who had complete drug coverage through the Pharmacy Assistance for the Aged and Disabled (PAAD) -1-year mortality of 7.5% -US -Jan 2004 - Dec 2005 Outcome: 1 -year mortality Predictors (20): metastatic cancer, CHF, dementia, renal failure, weight loss, hemiple gia, alcohol abuse, any tumor, cardiac arrhythmias, chronic pulmonary diseases, coagulopathy, complicated diabetes, anemias, fluid and electrolyte disorders, liver disease, peripheral vascular disorder, psychosis, pulmonary circulation disorders, HIV/AIDS, hypertension Index to Predict 5 -Year Mortality of Community - Dwelling Adults Aged 65 and Older Using Data from the National Health Interview Survey Schonberg (55) 2009 -N=24115 -Non -institutionalized adults aged >65 who responded to the 1997 -2000 National Health Interview Survey (NHIS) with follow up from the National Death Index (NDI) -5-year mortality of 17% (estimated 1 -year mortality of 3.4%) -US -1997 -2002 Outcome: 5 -year mortality Predictors (11): age, gender, BMI, perceived health, emphysema, cancer, diabetes, dependency in IADLs, difficulty walking, smoking, past year hospitalization 15 Table 1. 3 . (cont™d) Prediction of Mortality in Community -Living Frail Elderly People with Long -Term Care Needs Elise C. Carey (57) 2008 -N=3,899 -Community -based, frail, chronically ill older adults who are eligible for nursing home -A cohort of community -living participants enrolled in the Program of All -Inclusive Care for the elderly (PACE), (63) the program operates under Medicare and Medicaid waiver to deliver services to the elderly who are certified by the state™s Medicaid staff as eligible for nursing home -1-year mortality of 13% and 3 -year mortality of 36% -US -1988 - 1996 (participants enrolled in PACE) Outcome: Time -to-death from the time of initial enrollment in PACE (3 -year follow up) predictors (8): age, gender, dependence in the 2 ADL (toileting and dressing), CHF, COPD, Cancer, Renal failure Screening of Older Community -Dwelling People at Risk for Death and Hospitalization: The Assis tenza Socio -Sanitaria in Italia Project Giampiero Mazzaglia (64) 2007 -N=5396 -Community -dwelling, aged 65, randomly sampled from the roster of 98 Primary Care Physicians -15-month mortality of 4.7% in de rivation and 3.9% in validation cohorts -Italy, Florence -Jan 2003 - Mar 2004 Outcome: 15 -months mortality Predictors (5): age, gender, hospitalization in the past 6 months, use of 5 medications, score from a 7-item questionnaire (need help for ADLs, need help for IADLs, poor vision, poor hearing, self -perceived inadequacy of income, absence of home care services, weight loss>3 kg) A Practical Tool to Identify Patients Who May Benefit from a Palli ative Approach: The CARING Criteria Stacy M. Fischer (56) 2006 -N=895 -All patients admitted to general medical wards or medical ICU of the Denver Veterans' Administration Medical Center (DVAMC) -1-year mortality of 26% (from t he index hospitalization) -US (Colorado, Denver) -Feb - Jun 1999 (retrospective chart review) Outcome: one -year mortality residence in a nursing home, intensive care unit admit with multi - non -cancer hospice guidelines 16 Table 1. 3. (cont™d) Development and validation of a prognostic index for 4 -year mortality in older adults Sei J. Lee (54) 2006 -N=19710 -community -dwelling adults aged >50 , participants of the 1998 wave of the Health and Retirement Survey(HRS); data primarily collected through a telephone interview w ith a participation rate of 81% -4-year mortality of 12% in d erivation and 13% in validation cohorts -US -1998 -2002 Outcome: 4-year mortality Predictors (12): age, gender, diabetes, cancer, lung disease, heart failure, smoking, body mass index, difficulty with - bathing, walking several blocks, managing money, pushing large objects Development and Validation of a Functional Mor bidity Index to Predict Mortality in Community -dwelling Elders Elise C. Carey (49) 2004 -N=7393 -Community -dwelling, age 70, Participants of the Asset and Health Dynamics Among the Ol dest Old (AHEAD) study, a prospective national study that sampled community -dwelling U.S. elders age 70 -2-year mortality of 10% in derivation and 12% in validation cohorts -US -AHEAD Participants who were interviewed in 1993 Outcome: 2 -year mortality Predictors (6): age, gender, dependence in bathing, dependence in shopping, difficulty walking several blocks, difficulty pulling or pushing heavy objects Risk factors for 5 -year mortality in older adults Linda P. Fried (58) 199 8 -N=5886 derivation -Participants of the Cardiovascular Health Study (CHS) aged 65, a prospective cohort of randomly sampled from age -stratified from the Health Care Financing Administration (HCFA) Medicare enrollment lists -5-year mortality of 12% -US (4counties: Sacramento, CA; Washington, MD; Forsyth, NC; Allegheny, PA - Derivation 1989 - 1990 Validation 1992 -1993 Outcome: 5 -year mortality Predictors: age, gender, income, weight, exercise, smoking, systolic blood pressure, diuretic use, fasting blood sugar, albumin, creatinine, forced vital capacity, aortic stenosis, EF, ECG abnormality, carotid artery stenosis, CHF, difficulty in IADLs, low cognitive function 17 Statistica l analysis of prediction models Prediction models are actually problems of either estimation or hypothesis testing. For example the question fiWhat is the risk of a patient dying in the next 30 days?fl is an estimation problem that needs a prediction model to estimate the probability of death, while the questio n fiIs gender a predictor of a certain complication after surgery?fl or fiWhat are the important predictors of hospitalization in older adults?fl is a problem of hypothesis testing. Statistical models can address both types of questions. There are three main c lasses of statistical models used in prediction: regression, classification, and neural network. Regression models are the most commonly used models for prediction. (24) Different regression models are used in the literature; the two commonly used models are logistic regression and time -to-event (Cox regression) analysis. Logistic regression is the most commonly used model for prognostic models; it is used when the outcome is binary (Yes/No) like death, hospital admission, ER visit, or occurrence of a complication. Most of the studies in Table 1.3 utilized logistic regress ion analysis to build the prognostic model. Fewer studies used the Cox proportional hazard analysis to model time -to-event as the outcome. The two studies by Carey and Fried in are examples of the Cox model utilization in the development of a prediction mo del. (57,58) In th e past few decades, due to the increasing size and complexity of biological data, the limits of the traditional modelling approaches have begun to be reached, and there is a need for innovative statistical analysis for the ever -growing data. (39) Advanced methods such as machine learning algorithms t hat allow detecting pattern and making predictions in big data with complex relationships are becoming an increasingly important method for use in the development of prediction models. (40 Œ42) Random forest is one of the machine learning algorithms that has been occasionally used in bi omedical researches. (65,68,69) The machine learning algorithms have been shown to outperform the tradit ional statistical models in prediction, (70 Œ77) however some studies showed no difference in the performance of the prediction models between the traditional models and machine learning methods. (78) 18 In this dissertation, p rediction models are developed for two primary outcomes (death and hospice admission) using both traditional statistical models and machine learning algorithm. First, a logistic model is developed for e ach of the two binary outcomes. Second, a random fores t algorithm is used for the same outcomes to have comparable models. Third, a Cox PH model is developed to account for the time -to-event for both outcomes. The results of developed models are compared to the USMM proposed RF approaches. The performance of the three models is compared to each other to find the best model in terms of predictive performance. The ultimate goal of this study is to find the best model that can be integrated into the USMM database. The RS process is a necessary step in older adult care according to the framework of actions for the care of older adults with multi -morbidity (Table 1.2).The optimal process of using the RS model in the USMM patient population would be that for each patient a predicted probability is calculated from the base model, and then a risk level will be assigned to the patients based on their probability of death (or hospice admission). The high -risk patients would be flagged and brought to the attention of the provider team for appropriate and timely interventio n. This intervention can include a range of services such as a change in medications, nutritional support, additional home visit, hospice referral, or offering palliative care and advanced care planning. Lower risk patients can be targeted for other levels of services according to the USMM policies and care plans, such as screening for prevention of morbidities, rehabilitation and other programs to preserve and enhance the functional ability, and lifestyle modifications to improve physical and mental health . To assign the risk levels based on the predicted probabilities, the threshold for different levels of risk must be decided. An arbitrary cut point of the highest 20% of predictions is used in Chapter two to calculate the performance of the model. This c ut point must be determined based on the number of patients in the system and the resources that USMM can allocate for services to the different levels of risk. A more liberal threshold for identification of the high -risk patients (for example top 30% of t he predicted probability) results in a larger number of high -risk patients who need to be evaluated for an 19 intervention. Consequently, more human and financial resources are required to take care of these additional cases. On the other hand, more stringent threshold, while reduces the need for resources, may result in more false -negative cases (i.e., those who are truly at high risk of death or hospice are classified as low -risk), which in turn can cost in missing adverse events in the truly high -risk patie nt. To summarize, there is a tradeoff in determining the threshold for different risk levels. The cut points should be determined by the USMM providers based on their objectives of risk stratification and the available resources. More formal approaches fo r determining the optimal RS thresholds for an organization like USMM might involve cost -effectiveness analysis. It is critical to remember that final decision making involves the patient and caregiver™s priorities and preferences. Therefore the clinician ™s goal of care must be aligned with the patient™s and caregiver™s goals. (44) o Overall Analysis plan The USMM clinical database (APRIMA) and claims data will be used to construct a cohort of USMM patients who were FIRST registered with USMM in the calendar year 2015 and had at least one visit in that year. After data preparation and necessary recoding of variables, available potential predictor variables (including de mographics, functional status, comorbidities, laboratory tests, and socioeconomic factors) that have less than 20% missing will be considered as predictors in the analyses. There will be two outcomes of interest, death, and hospice admission that will be i dentified based on the presence of a date of death or date of hospice in the claims data. For validation of the models, the dataset will be randomly divided into two subgroups named derivation and validation. Then three different statistical approaches wil l be applied to the derivation data to develop predictive models and to generate performance metrics used to compare the different models. Each model will be then validated using the 20 validation data set. The discrimination of the models which described the ability of the models to accurately distinguish those with and without outcome will be measured by the area under the ROC curve (AUC) will be used as the primary measure of prediction accuracy and model performance. Calibration plots as a measure of goodn ess of fit will be generated when is applicable. All the analysis will be done for both outcomes separately. Details of the model development and results are provided in chapters 2 -4. Objectives The three objectives of this dissertation are: 1. To develop a nd validate multivariable logistic models for prediction of 12 -month mortality and 12-month hospice admission among the USMM population of community -living homebound older adults. The models will be compared to the alternative risk stratification approache s used by USMM, including the surprise question (in isolation) and the existing USMM 3 -level risk stratification method. 2. To develop and validate a random forest (RF) algorithm for prediction of 12 -month mortality and hospice admission. The model performanc e will be evaluated compared to the logistic regression (LR) model from aims 1. 3. To develop and validate a multivariable failure time model (Cox proportional hazard) to model time -to-event for mortality and hospice admission separately. These models will also be compared to the logistic regression and random forest models developed in aims 1 and 2. 21 CHAPTER 2. Logistic Regression Model Introduction The US population is aging faster than any other time in history. (1,2) Causes of mortality have shifted from communicable infectious diseases to chronic conditions and their complications. Diseases that used to be lethal now can be treated or managed for years. People are living longer; therefore the prevalence of chronic diseases, cancers, and persons with multiple comorbidities has significantly increased in the population. (11,12) Chronic diseases often require long -term health care, and frequent utilization of services; consequently, health care costs are growing fast as the population is aging. The combination of the aging populati on and higher prevalence of multi -comorbidity in older adults imposes a considerable burden of increased health care expenditure on governments, especially in developed countries. (7,13 Œ15) About one -fifth of Medicare beneficiaries have five or more chronic conditions, and two-thirds of Medicare expenditures are related to this group. (79) The increasing need for health care services and limited resources has brought about an essential need to identify patients who have the most need and to allocate services to those who will benefit the most. Risk stratification methods are commonly used for this purpose. Using statistical methods, one can develop a risk stratification model to predict the risk of an adverse event based on observed variab les. The model then can be applied to classify patients into different risk levels and to identify the most appropriate services for each level. Risk stratification is playing an increasingly important role in public health and clinical care. Health care organizations are working to change their cost structure and to improve their outcomes. The fact that 20 percent of the patients with chronic conditions are responsible for about 80 percent of health care expenditure has brought to attention the need for i mprovements in the disease management of such populations. Disease management is a system of coordinated healthcare interventions for populations with specific conditions. It emphasizes prevention of exacerbation or complications of the condition, 22 evaluati on of the clinical and economic outcomes, and developing a case management plan. (80,81,38) Population -based disease management is becomin g today™s optimal practice pattern and is replacing the former approach of episode -based disease management. This means that instead of managing patients who are seeking treatment at a given time, all people in a target population (e.g., insurance enrollee with a particular disease) are considered targets for case management interventions designed to prevent the complications of the disease and unnecessary medical utilization. By identifying high -risk, high -cost patients, this approach results in more timel y delivery of appropriate interventions in a cost -effective manner that has the potential to save money. (39) Risk stratification can help to identify high -risk, high -cost patients. Prediction models are used by researchers, health care providers, and policymakers to predict patient outcomes such as mortality and health care utilizations. (82) Prognostic indices can be used to target different services appropriately to older patients. For example, prediction of mortality in a target population can identify patients at high risk of death with consi deration of palliative care programs or advanced care planning. It also helps to prevent the allocation of resources to the services that are costly and not beneficial; for example, screening for slow -growing cancer in older adults with a high risk of 1 -ye ar mortality is not reasonable. Additionally hospice care can be offered to the terminally -ill patients in order to improve the quality of life for the patients and caregivers. According to Medicare criteria, a patient is eligible for hospice services, if determined to have a terminal illness (defined as having a prognosis of 6 months or less if the disease or illness runs its normal course). (35) Risk stratification can identify patients with limited life who are eligible to be evaluated for hospice services. A risk stratification approach that predicts probability of death for a group of patient can h elp to identify those at high risk of death in close future (e.g., 6 months) and so can help to identify the potential candidates for hospice services. 23 Objectives of the research are to develop alternative risk stratification models in a unique population of community -dwelling, home -bound older adults who receive home -based medical services from the United States Medical Management (USMM) Corporation. The outcomes of interest are mortality and hospice admission. Three different statistical approaches will b e applied to develop predictive models: Chapter 2 (current chapter): Binary outcomes in a fixed time interval (i.e., one -year mortality and hospice admission) will be modeled using a logistic regression approach Chapter 3:Binary outcomes in a fixed time i nterval (i.e., one -year mortality and hospice admission) will be modeled using a random forest model Chapter 4: Time -to-event will be modeled using a Cox proportional hazard model In this chapter, the first approach is presented, namely, logistic regressio n analysis. In the development of the prediction model, several variable selection methods including forward, backward, and stepwise selection are applied as well as more advanced variable selection methods, including Adaptive lasso and elastic net variabl e selection techniques. A conventional variable selection method is also used (called manual variable selection). To handle the missing data problem, a multiple imputation approach is applied and different variable selection methods are used to develop mod els using the imputed data. These models are compared by their discrimination ability indicated as c -statistic (AUC - area under the ROC curve). The models are also compared to the pre -existing risk stratification approaches that are already in -use by USMM providers. The contribution of this research to the mortality risk stratification literature are: 1. the use of community -dwelling homebound older adults, 2. incorporation of advanced variable selection techniques, 3. implementation of multiple imputation technique to manage missing data, 4. prediction of hospice admission in addition to mortality. The rest of this chapter is organized as follows: background and literature review, methods and materials, empirical results, discussion, and conclusion. 24 Liter ature review As discussed in chapter one, most of the previous prognostic models have been developed in a specific setting such as nursing home, emergency department, or hospital. Other authors have developed models in populations of older adults with spec ific conditions such as cancer, chronic kidney diseases, and cardiovascular diseases. There are a limited number of studies that focuses on the risk stratification in the community -dwelling older population. Yourman et al., in a systematic review of progno stic indices for older adults, evaluated the accuracy and generalizability of such indices. (48) They found 16 validated indices, but only six included community -dwelling patients. The rest of them used patients in a nursing home or hospitals. Table 1.3 in chapter one, summarizes these studies along with three other relevant stud ies. Following is a brief summary of the seven studies that used logistic regression in model development. (53) Of the six studies in community -dwelling patients, the only model t hat evaluated 1 -year mortality was developed by Gagne et al. The model consisted of 20 comorbidities and resulted in a c -statistic of 0.788 (95% cl, 0.786 -0.791). However, their study population included both community -dwelling and nursing home residence p atients (9%). Mortality rate of this population was 7.5% a year. Their index also showed better discrimination for 30 -day and 180 -day than 1 -year mortality. (52) Carey et al., in two separate studies developed prognostic indices for 2 -year and 3 -year mortality in older adults. In their first study, an index was developed for 2 -year mortality in community -living frail older adults. The y included variables age and sex plus 16 functional variables in the predictive model. The final index comprised of 6 variables including age, sex, dependence in bathing, dependence in shopping, difficulty walking several blocks, and difficulty pulling/pus hing heavy objects. The prognostic index had discrimination (C -statistic) of 0.74. Mortality rate in this population was 12% over 2 years. (49) The second study was to develop a prognost ic model for 3 -year mortality in a cohort of nursing home eligible older adults. The study population consisted of the participants in the Program of All -Inclusive 25 Care for the Elderly (PACE). The PACE program provides comprehensive medical and social serv ices to frail, community -dwelling older adults, most of them are dually eligible for Medicare and Medicaid. The program enables most of the participants to remain in the community rather than receiving nursing home care. (3) The study population were chronically ill elderly who met the criteria for nursing home placement. Using a Cox model for time to death, Carey et al., defined a score made of variables age, sex, dependency on toileting and dressing, and four comorbidities. The index showed a c -statistic of 0.69 for 3-year mortality in the validation data. Mortality rate in this population was 13% over a year. (57) Mazzaglia et al., developed a prognostic index for 15 -month mortality and hospitalization in a cohort of community -dwelling ol der adults in Italy. This index was developed to be used mainly by primary care physicians and consisted of age, sex, previous hospitalization, dependency on basic ADLs and IADLs, poor vision, poor hearing, use of home health services, and inadequate incom e. This index stratified elders to 4 risk groups; the c -statistic of this model was 0.75 for 15 -month mortality. Mortality rate in this cohort was 4% over a 15 months period. (64) Lee et al., proposed a prognostic index for 4 -year mortality among older adults using 12 predictor variables. Their study population was adults older than 50 years who participa ted in the 1998 wave of Health and Retirement Study. The Participation rate in their study was 81%. Significant predictors in this index included age, sex, six comorbidities, and four functional status indicators such as walking several blocks and managing money. The discrimination of this index was 0.82 in the validation data. Because they included patients as young as 50 years old who were generally healthy, the authors suggested that the optimal model for an older and sicker population would include othe r predictor variables. Mortality rate in this cohort was 12% over 4 years. (54) Han et al., developed a prognostic model using 11 predictors of 6 -month mortality among community -living older adults with a s elf -reported decline in health. The outcome of 6 -month mortality was chosen because the 6 -month prognosis is essential in hospice referral. They used the data from the Medicare 26 Health Outcome Survey (MHOS), and they did not exclude institutionalized and di sabled Medicare beneficiaries. Significant predictors included age, sex, smoking status, any cancer, congestive heart failure, COPD, ADLs, proxy status, and health -related quality of life (general health perception, social functioning, and energy/fatigue). Their model had a c -statistic of 0.75. Mortality rate in this population was 15% over 6 months (grossly estimated one -year mortality of 30%) Œ much greater than these other studies. This mortality rate was also much greater than the 2% mortality rate of t he total MHOS population. An explanation is the selection of this particular study population which included only MHOS participants with declining health (i.e. patients who reported their health fimuch worsefl compared to their last year health). (53) Schonberg et al., studied predictors of 5 -year mortality among a population aged 65 and older who participated in National Health Interview Survey and responded to annual follow up surv eys for five years from 1997 to 2002 (74% mean participation rate). The 5 -year mortality rate in this population was 17% during the study period. They used a multivariable Cox proportional hazard model with 11 predictors including age, sex, smoking status, BMI (<25 kg/m 2), dependence in IADL, difficulty walking several blocks, general health perception, past year hospitalization, and three comorbidities. The model had a c -statistic of 0.75. Mortality rate was 17% over 5 years. (55) Review of the studies that developed a prognostic model in community -living older adults revealed that the USMM population is different from the other study populations presented in table 1.3. The difference can be seen in average age, the number of comorb idities, functional status, and the setting (institution, nursing home, community). USMM patients are homebound, also they are older and have higher rates of comorbidities compared to the previous study populations. But, most importantly the USMM populatio n mortality rate (32% over 12 months ) is substantially higher than these other study populations Œ where the estimated annual mortality rates were in the range of 4 -8% but varied from as low as 3.0% (54,55) to as high 30% (53) . Moreover, selected predictors used in the previous prognostic 27 models, were not found in the USMM database. For example itemized ADL and IADL information, past year hospital ization, income and BMI are variables that were not available in this dataset. These differences made the pre -existing prognostic models not to be appropriate for the USMM population. This study aims to develop a model suitable for this population and othe r similar older cohorts. Variable selection is the basis of developing a prediction model. Variable selection can be made by including and excluding rules for predictor variables, or by built -in automated methods in the statistical software. Most of the st udies cited above used logistic regression to model the outcome and identify the important predictors of it. Carey and Schonberg made use of a Cox proportional hazard model as well. None of these studies mentioned any application of any particular variable selection method; instead they included all significant (usually P<.05) variables in a final MV model. Statistical software offers different options for variable selection methods as part of the model development process including stepwise, backward, forw ard methods, as well as more advanced methods such as lasso, adaptive lasso, elastic net, and ridge regression. Although the newer selection methods such as adaptive lasso and elastic net are not directly available in SAS for binary outcomes (Logistic or H PLOGISTIC procedures), there are methodological papers that explain the use of the GLMSELECT procedure to make use of these methods. Lund and Cohen suggested that although GLMSELECT procedure fits an ordinary regression model, it can be used to select a go od set of predictors for a logistic model. (83 Œ85) Missing data is a persistent problem in epidemiological studies. Some of the common reasons for missing data are patients™ refusal to answer, lack of knowledge, loss of contact in longitudinal studies (due to death or relocation), and failure of routine documentation in the EMR by clinical staff. We could not find a specific approach for management of missing data in any of the nine previously mentioned studies that developed prognostic indices in community -living older adults. When the missing data was described, either the observations with partly missing data were excluded from the analysis, or missingness was included as a dummy variable in the analysis. (53,57) In this analysis, missing data is 28 observed in about one -third of the cohort; therefore the missing data problem is addressed in this chapter. One of the commonly known approaches to the missing data problem is multiple imputation (MI). The SAS procedure, multiple imputation uses the assumption of missing at random (MAR) for the missing data, however the use of PROC MI can be extended to the MNAR conditions. (86) Although, it is impossible by definition, to distinguish between MAR and MNAR mechanisms. (87) To build a prediction model, variable selection is applied to a complete case data when there is no missing observation. However, excluding the cases with partly missing data can induce bias into the results. Wood et al., proposed four different approaches for variable selection in the imputed data. (88) The first method involves developing the model in the complete case data and using the same variables in the imputed data. The second method is to develop a single model in the first set of imputed data. The th ird method is to develop separate models in each imputed dataset and then combined the selected variables from all models to form the final model. The fourth method is to use stacked imputed datasets with weighted regression. The first and third methods ar e used in this study for variable selection in the imputed data. The most common methods for evaluating the accuracy of a predictive model for binary outcomes are discrimination, which is measured by area under the ROC (AUC), and calibration. The AUC (also called the concordance or C -statistic) is the most commonly used measure of discrimination of a model. It indicates how good the model classifies those with and without the outcome of interest. For a binary outcome, ROC is a plot of sensitivity against 1 - specificity for all the consecutive cutoffs in the probability of an outcome. (89) AUC values can be roughly interpreted as excellent (AUC above 0.80), good (between 0.70 and 0.80), and weak (between 0.50 and 0.70). Calibration compares the predicted and observed probability of the outcome in different risk groups. Calibration plots provide a qualitative visualization of the goodness of fit, while the Hosmer -Lemeshow is a statistical test of goodness of fit. The Brier score is another measure to evaluate the goodness of fit and performance of a predictive model. (90) Brier score is an equivalent of R -square when the outcome is binary. In this study, the AUC, 29 calibration plots, Hosmer -Lemeshow test, and Brier score ar e utilized to evaluate the model performance. (90) Methods and materials o Data source I conducted this study utilizing the United States Medical Management (USMM) dataset. USMM is a family of companies that provide home -based medical care to patients across 11 US state s including Michigan, Ohio, Texas, Florida, Kansas, Virginia, Illinois, Kentucky, Missouri, Washington, and Wisconsin. USMM specializes in home -based health care for homebound elderly individuals and other patients with complex medical issues. The USMM pro viders include physicians, nurse practitioners, clinical educators, and people with other specialties. USMM also owns several health properties, including hospices and home -health agencies. USMM maintains a database of all patients visited in their 100 off ices across the 11 states. This database includes demographics, social, functional status, clinical, laboratory, and utilization data. This database consists of the USMM electronic medical record named APRIMA in addition to other data sources. The USMM cli nical database for the calendar year 2015 was used for this analysis. Claims data were also available through a third party corporation, E -solution, which provides processed claims data from CMS. The processed claims data contained limited information on o nly 5 events including death (date of death), hospice utilization (first and last dates of hospice in 12 -week intervals), home -health utilization (first and last dates of HH services in 8 -week intervals), most recent hospitalization (admission and discharg e dates), and prior hospitalization (admission and discharge dates). Dates of death and hospice enrollment were used as the outcomes of interest in this study. Therefore the USMM EMR data and claims data together were used to define the study population. 30 o Study population The 2015 cohort was defined as all patients who had their first ever home -based medical visit between January 1 st and December 31 st, 2015. The date of the first visit was recorded in the APRIMA EMR. The data was then linked to the claims data, and those patients who did not have claims data available were excluded. Patients with age <65 years were excluded. Table 2.1 contains the incl usion and exclusion criteria for the patient population in this chapter. Table 2. 1. Inclusion and exclusion criteria in this study patient population Inclusion criteria - Register in the USMM system in the calendar year 2015 - Have at least one visit between January 1 st and Dec 31 st, 2015 Exclusion criteria - Claims data not available - Age <65 years old - Followed up for less than 1 year Since the purpose of this chapter is to analyze 1 -year mortality and hospice admission, the cohort was limited to the patients who had been followed up for at least 365 days or had the outcomes within a year of their first USMM visit. Follow up time was determined by counting the days between the first visit date and the date of the outcome ( i.e., death or hospice admission), or the date of the last visit if the outcomes did not occur. Figure 2.1 displays a flow diagram of the patient population in this study. 31 Figure 2. 1. Flow diagram of the study cohort Among the 2182 patients who were excluded due to <12 months of USMM care, 88.5% (n=1932) patients became inactive in the USMM database due to various reasons including patient opted out of the program, relocation to nursing home, loss to follow -up (hospita lization, no response to phone call, bad address), discharged by provider (e.g., due to not being homebound), patient moved, and other reasons. These reasons for withdrawal are summarized in Table 2.2. Majority of these reasons were related to patients™ pr eference (i.e., 31% who opted out), such as patient changed the Primary Care Physician (PCP), chose another house call program and refused the services. The rest of 11.5% (n=250) were patients who did not have documented reasons for withdrawal from the USM M, but their total registered time was less than 12 months (Table 2.2). Many of these patients (n=177, 71%) were visited at the end of the year 2015 (i.e., December 2015) and their last recorded visit in the USMM database, Claims -linked cohort N=12,634 Patients who had their first ever USMM visit in 2015 N=20,424 Age 65 years old N=9,627 Final cohort N=7,445 No claims data available USMM care < 1 year Age< 65 years old 32 occurred before the December 2016 . Thus the total documented time of care in the USMM system was less than 12 months, although these patients were still active in the USMM database. Table 2. 2. Patients with < 1 -year care received from USMM (N=2182) Inactive N (%) Became inactive in the USMM system Patient opt -out 686 (31.4%) Nursing home admission 380 (17.4%) Loss to follow up 206 (9.4%) Provider excluded patient 204(9.3%) Patient moved 202 (9.3%) Missing reason 189 (8.9 %) Insurance issues 65 (3.0%) < 1-year documented care 250 (11.5%) Total 2182 (100%) o Outcome and exposure There are two outcomes of interest: mortality and hospice admission. One -year mortality was determined if a date of death was recorded in the claims data within the 12 months of the first USMM visit. Likewise, 1 -year hospice admission was determined according to the recorded date of first hospice service in the claims data. Claims data was processed data provided by E -solutions, a commercial medical billing and claim processing c ompany. (6) Claims data provided the dates of death, hospice, and/or home -health services (8 -weeks period), therefore the first date of earliest hospice service was considered as the date of the outcome. If a date of death or hospice was not reported within a year from the first visit date, then the case was counted as censored at one year. If the death occurred in hospice, both outcomes (death and hospice admission) were analyzed as separate outcomes in each respective analysis. Variables with less than 20% missing observations were considered as exposure variables for the analysis. This information was collected from the baseline visit for each patient. Table 2.3 in the results section displays the frequency of missing data on each variable. 33 A total number of 41 pot ential predictor variables had <20% missing, including demographics and social factors : age, gender, race, insurance status representing if a patient has dual eligibility for both Medicaid and Medicare; life style factors: living alone, smoking; functional status factors : functional decline in ADLs, timed up and go (TUG), Karnofsky performance scale (KPS value); serum measures : serum albumin, cholesterol; and other factors : having a pressure ulcer, surprise question answer, number of medications, and number of lab test ordered by the provider. There are 24 medical history variables in the APRIMA EMR that documented if at the time of the current visit, the patient had an active diagnosis of the condition as defined by the CMS -Chronic Condition Warehouse (CCW) .(91) These 24 variables reported as binary (yes, no) data and includes: history of hypothyroidism, asthma, atrial fibrillation, cataract, chronic kidney diseases, osteoporosis, hyperlipidemia, hypertension, anemia, bre ast cancer, colorectal cancer, benign prostatic hyperplasia, COPD, depression, diabetes, endometrial cancer, glaucoma, heart failure, hip/pelvic fracture, ischemic heart disease, lung cancer, prostate cancer, stroke/TIA, rheumatoid arthritis/osteoarthritis . Diagnosis count is a variable that counted the number of existed CCW variables for each patient. Another variable, cancer, was generated if a patient had any of the four different types of cancers listed in the CCW variables. History of Alzheimer's disea se and acute MI were also among the CCW variables; however, the number of patients who had these conditions were too small to analyze. Thus they were dropped from consideration. Three variables in this dataset represent the functional status of patients; functional decline in activities of daily living (ADLs), Timed Up and Go (TUG answer), and Karnofsky Performance Scale (KPS). These variables are documented by the visiting physic ian in the APRIMA and supplemented by Status Scope. The functional status variables and surprise question are defined in Table 2.3. The decline in ADLs can be an indicator of developing frailty or other medical events that need attention for the timely pr evention of adverse outcome. (92) Activities of daily living (ADL) include six daily activities: self -feeding, bathing, dressing, toileting, transferring, getting in/out of bed/chair. (93,94) 34 Instrumental Activities of Daily Living (IADLs) include activities such as shopping, housekeeping, keeping track of finances, and food preparation. Two variables in the A PRIMA database ind icate a decline in ADL and IADL compared to the last year. The visiting physician evaluates the functional status compared to the last visit (for ADLs) or the last Annual wellness Visit (IADLs). Unfortunately, the variable measuring decline in IADLs was ex cluded from the analysis due to the high rates of missing value. Timed Up and Go test (TUG) is a simple test used to assess a person™s mobility that includes both static and dynamic balance. (95) The test involves measuring the time that a person takes to rise from a standard arm chair, walk three meters at their normal pace, turn around, walk back to the chair and sit down again. It is reported in seconds and <30 second s is considered normal. The results of this test were -ambulatory. Karnofsky Performance Status Scale (KPS) is another tool used to quantify patients™ general well -being and functional status. (96) The score ranges from 100 to 0, where 100 is perfect health and function, and 0 is death. The score is usually reported in intervals of 10. A KPS score of 80 -100 indicates the ability to carry on normal activity and to w ork. A score of 50 -70 shows inability to work, but these patients are able to live at home and care for personal needs. A score of 40 and less indicates functional disability and inability to care for self. (36) . Since only 0.4% of this population had a score of 80 -100, we e re - The Surprise question is a simple question answered by the provider, "Wo uld you be surprised if this patient died in the next 6 -12 months?" This question provides a valuable piece of information that has been shown in many different setting to be a strong predictor of mortality. (98,99) The predictive value of the surprise question has been evaluated explicitly in diseases such as cancer and kidney diseases. The value of the surprise question has not been well assessed in a general population of older age adults without specific diseases or conditions. A recent study has evaluated the performance of the SQ in prediction of two -year mortality in patients with serious illness from primary care clinics in Boston, MA. 35 The patients were screened by the primary care physicians (PCP) and enrolled in the study if they were eligible for the serious illness care program. (100) The goal of the study was to improve access to palliat ive care among the patients who are approaching the end of life. The performance of SQ in prediction of two -year mortality among the chronically ill, complex patients was measured by the Area under the curve and it was 0.74 when the question was asked from the primary care physicians. (100) The key features of these four measures are summarized in Table 2.3. Table 2. 3. Definition and values of the functional status variables and surprise question Variable Definition Values ADL - decline Functional decline in activities of daily living Decline, improve, no change TUG Timed up and go is a measure of patients mobility and balance <30 seconds, >30 seconds, non -ambulatory KPS Karnofsky performance scale quantifies patients™ general well -being and functional status Values range from 10 -100 with lower values indicating wor se functional status Surprise question Answer to the question fiwould you be surprised if this patient died in the next 6 -12 months?" Yes/ no o Statistical analysis The statistical analyses for this paper was done using SAS software, version 9.4 (SAS Institute Inc., Cary, NC). The data were randomly split into two equal size cohorts, to create derivation and validation datasets. Logistic regression was applied to develop a prediction model in the derivation dataset. The model parameters then were applied to the validation cohort, and the predicted probability of the outcome was calculated for each patient. Logistic regression model fits binary response and provides severa l variable selection methods to identify important predictor variables among many potential independent variables. Logistic regression is used to explain the effect of an explanatory variable x on the response Y. logit {Pr(Y=1| x)} = log { (| ) (| ) }= + 36 Where Y is a binary response (i.e., 1 when death occurred and 0 when it did not), X=(x 1, –, x k) is a vector of explanatory variables, 0 is the intercept parameter and (101) Receiver operating curve (ROC) was generated for each model and the area under the curve (AUC) was reported as an indicator of discrimination of the model in both derivation and validation data sets. AUC (also referred to as the C -statistic) was used as the primary measure to compare the alternative predicti on models. Sensitivity and specificity of the models were also provided for comparison between alternative RS models (i.e., multivariable logistic regression model, SQ only model, and USMM proposed RS approach). Calibration plots were generated to show the goodness of fit for the final model graphically. Further details are provided below in model assessment section. - Variable selection methods Several variable selection methods were applied to the derivation data set and then validated by applying the model to the validation data set. Both outcomes, 1 -year mortality and hospice admission, were modeled using different automated variable selection methods including forward, backward and stepwise selection. These selection methods are built -in options in PROC LOGISTIC. Stepwise selection method with the entry level of p< 0.2 and stay level of p< 0.05 was applied to select the significant predictors of the outcome. A total number of 41 predictor variables that had 20% missing observations were included in the model building process. There are newer variable selection methods including lasso, adaptive lasso, ridge, elastic net, and group lasso method s. (102 Œ105) We applied two of these methods, adaptive lasso, and elastic net variable selection, using the SAS procedure ‚Proc GLMSELECT™. These methods hav e advantages over the pre -existing stepwise selection methods in specific circumstances, especially when the data set includes a large number of predictors and a limited number of observations; and also when the predictor variables are highly correlated wi th each other. (42) Adaptive lasso and elastic net allow the model to include 37 more t han one predictor from a group of correlated predictor variables. In the adaptive lasso method, a weight vector is defined for parameter estimates as: Where, 1 2 m) are the adaptive lasso regression coefficients generated under the constrained optimization problem. The parameter gamma ( ) in the above equation is the power transformation of the parameters to form the adaptive weight. Gamma can be specified in the model statement but the default in SAS PROC GLMSELECT is 1.0 (which re presents no power transformation). I applied the adaptive lasso option with seven different values of gamma between 0 and 1. The elastic net method develops a parsimonious model by solving the least square regression problem with constraints on both the s um of the absolute coefficients as well as the sum of the squared coefficients: -X 2 subject to: and Where and are the constraints applied to the sum of the absol ute and sum of the squared coefficients, respectively. (105,107) Two different options in the model statement were specified to determine the optimal model: validation data and k -fold cross -validation. For both adaptive lasso and elastic net methods, a 4 -fold cross validation option was specified. The Selected variables were then included in a logistic regression model and c -statistics generated for both derivation and validation data sets. The procedure GLMSELECT was used for variable selection only, and not for the logit model development. The underlying assumption of PROC GLMSELECT is that the outcome is continuous, however it is accepted to use the GLMSELECT procedure with the categorical outcome for variable selection only. (83) I applied the GLMSELECT procedure because SAS does not support adaptive lasso 38 and elastic net options in the LOGISTIC or HPLOGISTIC procedures. T hen the logistic model was developed by including the variables that were selected in GLMSELECT. A manual variable selection approach was also used by running univariate logistic regression for all the predictor variables and then entering those with a s ignificance level of 0.2 into the multivariable model. The variables with p -value 0.05 in the multivariable model were included in the final model in addition to demographic variables (i.e., age, and sex) that were forced in regardless of significance. - Model performance assessment The most common measure of a predictive model™s performance is AUC or c -statistic. Calibration plots are another way to evaluate the performance of a predictive model. Calibration indicates the degree of agreement between obs erved and predicted probabilities and is therefore a measure of model fit. By plotting the predicted probability of the outcome against the observed probability of the event for groups of patients (often deciles) calibration plots are diagnostic graphs tha t help to qualitatively evaluate how good a model is in the prediction of the outcome. There are two methods that are commonly used to generate calibration plots: Loess -based and decile -based. (108) In the Loess -based method the observed and predicted probabili ty of the event for each observation are plotted and a loess function is used to smooth the plot over all observation. In the decile -based method, data is sorted by the predicted probabilities and then grouped into deciles. The average observed and predict ed probabilities for each decile are calculated and plotted. A study by Austin and Steyerberg concluded that loess -based plots have several advantages over the decile based. (109) In fact the decile calibration is dependent on the number of groups into which d ata is partitioned. In this chapter a calibration plot was generated in the validation data. A plot was also made in the derivation data for comparison. Hosmer -Lemeshow goodness of fit test is a statistical test of GOF and is another metric used to evalua te the prediction model. To do this test, data is sorted and divided into deciles similar to the method used in calibration. Hosmer -Lemeshow test statistic is obtained by calculating a chi -square statistic from a 2 x 39 g table of observed and expected freque ncies, where g is the numbe r of groups (ten in this case): Where Oi is the number of event outcomes in the ith group, Ni is the number of observations in the ith group, and i is the average predicted probability of the outcome in the ith group. This st atistic is compared to the 2 distribution with (g Œ 2) degrees of freedom. The large value of 2 and small p -value mean the lack of fit for the model. (101,110) In the evaluation of predictive model performance, measurement of the distance between the predicted and observed outcomes is essential. R -square is the measure for this distance when the outcome is continuous and is calculated as: R2= 1 - ()() Where is the observed outcome, is the mean of the observed outcome, and is the predicted outcome. R 2 presents the proportion of the variation that can be explained by model, therefore larger R 2 indicates a better model. Brier score is another measures that calculates the squared difference betwe en the actual binary outcome and the prediction. (90) It is calculated as (Y - )2 where is the predicted probability of the binary outcome Y. Brier score is lower where the model fits better. Therefore it is 0 for the perfect fit model whereas a maximum value indicat es a non -informative model. The maximum value for the non -informative model is dependent to the outcome incidence and is calculated as P*(1 - P)2 + P2 *(1 - P), where P is the outcome incidence. (90) We calculated the maximum value for the non -informative model in this coh ort and generated Brier scores for comparison between the different models. - Multiple imputation To handle the missing data on the predictor variables I used multiple -imputation procedure. To choose the sufficient number of imputations, the data set was im puted twice, once using five imputations and 40 then using 20. SAS procedure ‚PROC MI™ was used to impute missing data on categorical and continuous variables. The parameter estimates and variances from the two MI procedures (5 or 20 imputations) were compare d. Although the parameter estimates and their standard errors were similar, 20 imputations were chosen to maximize relative efficiency. Furthermore, the increase in the computation time was trivial when the number of imputation was increased from 5 to 20, thus computation time was not a limitation in this dataset. There was no missing observation on the two outcomes (i.e., death and hospice), but there was missing on 15 independent variables. As mentioned above, 6 variables with missing observations were ex cluded initially (Table 2.4) resulted in inclusion of 41 predictor variables. Nine of the 41 predictors had different proportions (0.4 -20%) of missing observations. The variables are: race, TUG answer, ADL decline, living alone, surprise question, tobacco use, KPS, albumin, and cholesterol. All the 41 predictor variables were used in the imputation procedure. As mentioned above, 6 variables with missing observations were excluded initially (Table 2.4). The 28 binary variables that has no missing observation s and all continuous variables were included in the model as continuous variables. Six categorical factors (race, TUG answer, a decline in ADLs, living alone, surprise question, and tobacco use) that had some missing observations, were imputed using a clas s statement. Age, albumin, cholesterol, and KPS are recorded as continuous variables in the data and so were included as continuous factors in the imputation model, although in the logit model they were included as categorical variables based on their quar tiles. The multiple imputation procedure is typically followed by the MIANALYSE procedure which summarizes the results of all imputations and provides summarized measure of effect such as relative risk, odds ratio, or hazard ratio. Variable selection proc edures for multivariable models based on data from multiple imputations is different from available case -based methods, since the variables selected in one imputation can be different from other imputations, thus there is not a procedure to summarize the results of different imputations. For model selection in the imputed data, several methods were 41 suggested in the literature, including using the model selected in available case data, or development of the model in the first imputation and then apply it to the other imputations. (88) I used two of the four methods proposed by Wood et al., (88) to develop a model and generate the c -statistic for it in the imputed data. The first metho d was using the model that was developed in the available case data. Using the same set of variables selected in manual variable selection, the model was developed in each derivation set of the imputed data and applied to the corresponding validation set. The predicted probability of outcome was generated for individual patients in each imputed validation data. Average of 20 probabilities for each patient was calculated, and then ROC and AUC were generated for both derivation and validation data sets by mod eling the averaged probabilities against the observed outcomes. As an alternative model development method, I used the following steps to select variables that were consistently selected in different imputations. In the first step, as described by Wood et al., a separate model was developed in each imputed derivation dataset. The three variable selection options (forward, backward, stepwise) that were applied in the available case analysis were also applied to each one of the 20 imputed datasets. A logisti c regression model was developed in each imputed derivation data and then applied to the corresponding imputed validation data set. A predicted probability was generated for each individual in the imputed data, then the average of 20 predictions for each p erson was used to generate a single AUC for each selection method. Selected variables were counted over the 20 imputations for each selection method (forward, backward, and stepwise). Only variables that were selected in all 20 imputations were included in a final model for that variable selection method and was used to calculate the AUC from the validation data. 42 Figure 2. 2. Manual variable selection in the imputed data In manual variable selection in the imputed data, variables that were selected for 15 times in all three selection methods were considered as the final model selected from the imputed data analysis (Figure 2.2). This set of variables were then applied to the original data to generate c -statistic for both derivation and validation datasets. To compare the performance of the alternative approaches (final model developed in this chapter, surprise question model, proposed USMM model), in addition to the AUC, sensitivity and specificity of the different models were also calculated. In MV model, Sensitivity, specificity and predictive values were calculated at two different thresholds. All the observations were sorted based on their predicted probability of the outcome. Then the top 10% and top 20% threshold were used to calculate sensitivity and specificity of the model. The two thresholds were chosen arbitrarily for identifying the high risk and low risk groups. However the selection of optimal threshold for ri sk groups depends on multiple factors including the cost of false positive cases vs. the false negative cases. Also the services and resources that can be allocated to each risk level groups influence 43 the selection of threshold for RS. The selection of thr eshold is discussed in more details in the discussion section of this chapter. o Alternative risk stratification approaches The AUCs for the various multivariable logistic regression models were compared to each other as well as to the alternative risk stratification approaches. The USMM researchers proposed two approaches for risk stratification: the SQ and a 3 -level risk strat ification approach. The answer to the SQ is used to find high -risk patients (answer = no). The three -level approach can be operationalized as a decision tree that categorizes patients into three risk level (referred to as level 3, 4, 5) based on five varia bles: SQ, albumin, an episode of fall, hospitalization, and ER visit since their last USMM visit. If serum albumin is <2.5 mg/dl, the patient is considered high risk. If SQ is answered ‚No' and the patient has a history of fall or hospitalization or ER vis it since their last visit, then the patient is high risk and assigned to level 5 of risk. If SQ is answered ‚No' without any fall or hospitalization/ER, then the patient is at intermediate risk level or level 4. If Albumin is >2.5 mg/dl and SQ answer is ‚Y es' then the patient is in the low -risk (level 3) in this approach (Figure 2.3). Figure 2. 3. The USMM proposed 3 -level risk stratification approach 44 To have comparable measures between different models, univariate logistic reg ression analyses were performed for the SQ and for the USMM proposed risk levels. The AUC, sensitivity and specificity of these two approaches in the validation data were generated and compared to the different multivariable logit models. Results o Study p opulation The final study population consisted of 7445 patients who had their first USMM visit in the calendar year of 2015, had available claims data, and were followed up for at least one year (Figure 2.1). The minimum and maximum follow up time for thi s cohort were 1 and 794 days, respectively; with average (standard deviation) of 459 (239) days and median (interquartile range) of 517 (q1=246, q3=658) days (Table 2.5). In the final cohort of 7445 patients (Table 2.4), 66% were female, 63% white, the ave rage age was 82 years, 99% had Medicare coverage, and 27% were dual eligible (both Medicare and Medicaid); 54% of the cohort had a KPS 40 Œ indicating severe disability with the need for necessary assistance and specialized care. Prevalence of hypertension , hyperlipidemia, diabetes, and cancer were 81%, 50%, 34%, and 8% respectively. Over 50% of patients had 5 or more medical conditions. Overall, 45% (n=3345) of the cohort died, and 19% were admitted to the hospice over the total follow up time. However, th e 1-year mortality and hospice admission rates within the first year of follow up were 32% (n=2408) and 10% (n=752), respectively (Table 2.5). Among hospice -admitted patients, 765 (55%) died within three months of their admission. Overall 2680 deaths (80% of all deaths) occurred outside of hospice. Table 2.4 demonstrates the population characteristics and Table 2.5 displays outcome events. 45 Table 2. 4. Cohort population description, by the outcome rates and unadjusted odds ratios (N=7445) Variable N (%) Missing N (%) Death % Unadjusted OR Hospice % Unadjusted OR Baseline characteristics Age -65 -74 -75 Œ 84 -85 Œ 94 -95+ 1826 (24.5) 2249 (30.2) 2796 (37.6) 574 (7.7) 0 21.0 30.5 39.6 40.2 Ref 1.65* 2.47* 2.53* 4.5 8.5 13.9 15.5 Ref 1.95* 3.39* 3.85* Sex -Male -Female 2513 (33.7) 4932 (66.3) 0 36.7 30.1 1.34* Ref 10.8 9.8 1.12 Ref Race -White -Black -Other 4684 (62.9) 1148 (15.4) 201 (2.7) 1412 (19.0) 27.4 18.5 19.9 Ref 0.6* 0.66* 11.3 6.6 6.0 Ref 0.56* 0.5* Tobacco use (current vs not) -Yes -No 645 (8.7) 6412 (86.1) 388 (5.2) 21.7 31.1 0.61* Ref 7.4 10.1 0.71* Ref Dual -eligible -Yes -No 2024 (27.2) 5421 (72.8) 0 23.1 35.8 0.54 Ref 6.0 11.6 0.49* Ref Lives alone -Yes -No 884 (11.9) 5511 (74.0) 1050 (14.1) 18.0 30.3 0.5* Ref 5.4 10.7 0.48* Ref S.Q - No -No -Yes 1045 (14.0) 5381 (72.3) 1019 (13.7) 44.4 25.3 2.36* Ref 19.1 8.0 2.7* Ref KPS -Mild /moderate (50 -100) -Severe disability (10 -40) 3376 (44.9) 4042 (54.3) 27 (0.4) 22.3 40.5 Ref 2.38* 5.7 13.8 Ref 2.66* TUG -<30 sec -30 sec -Non ambulatory 2538 (34.1) 1377 (18.5) 2027 (27.2) 1503 (20.1) 17.5 22.9 30.3 Ref 1.4* 2.1* 7.2 10.0 10.9 Ref 1.43* 1.58* Decline in ADLs -Decline -Improve -No change 1063 (14.3) 311 (4.2) 4889 (65.7) 1182 (15.9) 30.3 1.6 26.8 1.19* 0.05* Ref 13.2 2.3 9.6 1.43* 0.22* Ref Pressure ulcer -Yes -No 940 (12.6) 6505 (87.4) 0 37.3 31.6 1.29* Ref 13.2 9.7 1.42* Ref 46 Table 2. 4. (cont™d) cancer -Yes -No 566 (7.6) 6879 (92.4) 0 38.0 31.9 1.31* Ref 12.9 9.9 1.35* Ref Cholesterol result (mg/dl) Quartiles -<136 -136 - <164 -164 - <195 - 195+ 1554 (20.9) 1625 (21.8) 1589 (21.3) 1621 (21.8) 1056 (14.2) 38.3 27.5 24.4 21.5 2.27* 1.39* 1.18 Ref 10.9 9.4 9.3 9.6 1.16 0.98 0.97 Ref Albumin result (g/dl) Quartiles -<3.2 -3.2 Œ <3.5 -3.5 Œ <3.8 -3.8+ 1669 (22.4) 1610 (21.6) 1820 (24.5) 1709 (23.0) 637 (8.6) 50.5 30.4 22.3 15.3 5.66* 2.43* 1.59* Ref 13.4 10.9 9.3 6.5 2.22* 1.77* 1.47* Ref Medical history (CCW variables) Hypothyroidism -Yes -No 2050 (27.5) 5395 (72.5) 0 30.4 33.1 0.89* Ref 9.5 10.3 0.91 Ref Myocardial infarction -Yes -No 3 (0.04) 7442 (99.9) 0 33.3 32.3 1.05 Ref 0 10.1 -- Ref Anemia -Yes -No 2243 (30.1) 5202 (69.9) 0 26.4 34.9 0.67* Ref 10.7 9.9 1.1 Ref Asthma -Yes -No 309 (4.2) 7136 (95.9) 0 20.1 32.9 0.51* Ref 5.8 10.3 0.54* Ref Atrial fibrillation -Yes -No 1233 (16.6) 6212 (83.4) 0 37.4 31.3 1.31* Ref 11.4 9.9 1.17 Ref BPH -Yes -No 504 (6.8) 6941 (93.2) 0 30.8 32.5 0.92 Ref 10.7 10.1 1.1 Ref Breast cancer -Yes -No 224 (3.0) 7221 (97.0) 0 29.9 32.4 0.89 Ref 8.9 10.1 0.87 Ref Cataract -Yes -No 184 (2.5) 7261 (97.5) 0 14.7 32.8 0.35* Ref 3.3 10.3 0.29* Ref Chronic kidney diseases -Yes -No 3006 (40.4) 4439 (59.6) 0 24.6 37.6 0.54* Ref 10.3 10.0 1.03 Ref Colorectal cancer -Yes -No 95 (1.3) 7350 (98.7) 0 36.8 32.3 1.22 Ref 9.5 10.1 0.93 Ref 47 Table 2. 4. (cont™d) COPD -Yes -No 1946 (26.1) 5499 (73.9) 0 29.2 33.5 0.82* Ref 8.7 10.6 0.81* Ref Depression -Yes -No 1615 (21.7) 5830 (78.3) 0 23.5 34.8 0.58* Ref 9.7 10.2 0.95 Ref Diabetes -Yes -No 2519 (33.8) 4926 (66.2) 0 29.3 33.9 0.81* Ref 8.1 11.1 0.7* Ref Endometrial cancer -Yes -No 27 (0.4) 7418 (99.6) 0 25.9 32.4 0.73 Ref 14.8 10.1 1.55 Ref Glaucoma -Yes -No 337 (4.5) 7108 (95.5) 0 30.9 32.4 0.93 Ref 9.8 10.1 0.97 Ref Heart failure -Yes -No 2542 (34.1) 4903 (65.9) 0 29.0 34.1 0.79* Ref 10.0 10.1 0.98 Ref Hip fracture -Yes -No 81 (1.1) 7364 (98.9) 0 35.8 32.3 1.17 Ref 9.9 10.1 0.98 Ref Hyperlipidemia -Yes -No 3686 (49.5) 3759 (50.5) 0 24.1 40.4 0.47* Ref 8.5 11.7 0.7* Ref Hypertension -Yes -No 6056 (81.3) 1389 (18.7) 0 29.7 44.1 0.54* Ref 9.5 12.7 0.72* Ref Ischemic heart diseases -Yes -No 1270 (17.1) 6175 (82.9) 0 31.6 32.5 0.96 Ref 11.3 9.9 1.16 Ref Lung cancer -Yes -No 70 (0.9) 7375 (99.1) 0 52.9 32.2 2.37* Ref 17.1 10.0 1.86 Ref Osteoporosis -Yes -No 819 (11.0) 6626 (89.0) 0 21.1 33.7 0.53* Ref 8.6 10.3 0.82 Ref Prostate cancer -Yes -No 175 (2.4) 7270 (97.7) 0 43.4 32.1 1.63* Ref 17.1 9.9 1.88* Ref Osteoarthritis -Yes -No 2761 (37.1) 4684 (62.9) 0 24.5 37.0 0.55* Ref 9.5 10.4 0.9 Ref TIA/stroke -Yes -No 800 (10.8) 6645 (89.3) 0 29.6 32.7 0.87 Ref 12.5 9.8 1.31* Ref 48 Table 2. 4. (cont™d) Continuous variables ƒ Age (mean ± sd) 82.2 ± 9.3 0 -- 1.04 -- 1.05* Number of lab tests (Median, IQR) 0 (0 Œ 5) 0 -- 1.02* -- 0.97* Number of medications (Median, IQR) 9 (5 Œ 13) 0 -- 0.98* -- 0.97* Comorbidity count (Median, IQR) 5 (3 -6) 0 -- 0.81* -- 0.95* Variables that were not included in the analysis due to >20% missing observations Decline IADLs -Decline -Improve -No change 730 (9.8) 524 (7.0) 984 (13.2) 5207 (69.9) 2.7 1.2 2.0 1.36 0.56 Ref 2.1 2.1 3.3 0.62 0.64 Ref Global health compared to a year ago -Better -Worse -The same 55 (0.7) 316 (4.2) 1185 (15.9) 5889 (79.1) 21.8 54.4 28.3 0.71 3.03* Ref 10.9 15.2 7.3 1.57 2.29* Ref Fall since last visit -Yes -No 184 (2.5) 1546 (20.8) 5715 (76.8) 35.9 34.2 1.08 Ref 8.2 9.1 0.89 Ref Hospitalization since last visit -Yes -No 872 (11.7) 1565 (21.0) 5008 (67.3) 45.1 52.3 o.75* Ref 9.4 5.1 1.93* Ref ER since last visit -Yes -No 790 (10.6) 1649 (22.2) 5006 (67.2) 32.2 54.4 0.4* Ref 8.2 5.5 1.55* Ref Lost weight -Yes -No 1243 (16.7) 2431 (32.7) 3771 (50.7) 22.1 1.8 15.4* Ref 12.8 4.5 3.1* Ref IQR: interquartile range; sd: standard deviation; S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; IADL: instrumental activities of daily living; TIA: transient ischemic attack; FU: f ollow -up; mg/dl: milligram per deciliter; g/dl: gram per deciliter; * P-value < 0. 05 in univariate analysis with the outcomes; ƒ The unadjusted OR for continuous variables were generated for 1 unit change in the independent variable; Age was included as categorical variable in the analyses; 49 Table 2. 5. Outcomes and follow up duration Variable N (%) Missing N FU time in days -mean ± sd* -median (q1 - q3) 459 ± 239 517 (246 - 658) 0 Death -over the total follow up time -one -year 3345 (44.9) 2408 (32.3) 0 Hospice admission -over the total follow up time -one -year 1391 (18.7) 752 (10.1) 0 * sd: standard deviation; Nine of the 41 patient -level independent variables that were included in the analysis, have missing data. To explore the importance of the missing data, the association between predictor™s missingness and five key variables without missing (i.e. death, hospice, age, sex, and dual eligibility) were evaluated. A dummy variable was generated for missing data on each of the seven predictors (1= missing and 0= non -missing). Table 2.6 contains the p -values from the univariate regression models. Also the direction and magnitude of the association were also shown in Table 2.6. Although nine predictor variables had missing data, variables KPS and tobacco -use had a small percentage of missing (0.4%, and 5%, respectively) and were not included in Table 2.6. The fact that missingness on all seven predictors was consistently and significantly associated with a higher rate of mortality suggests that missingness was not at ran dom in this data. In contrast, missingness on predictors were not significantly associated with hospice admission. Additionally older age and male gender were often associated with missingness on the predictors. Hospice admission was not significantly asso ciated with the missingness except for two variables, TUG and cholesterol. A conclusion from findings shown in Table 2.6 is that missing data can be very informative in this study and exclusion of the observations with missing data (as occurs automatically in regression procedures) could negatively affect the validity of the model. 50 Table 2. 6. Association between missing observations on predictor variables and the outcomes, age, gender and Medicare/Medicaid dual -eligibility, p-values, magnitude and direction of the effect Outcome Variable* Missing (N=7445) Death (all) Death 1-yr Hospice (all) Hospice 1-yr Age Male Dual -eligible Race 19% <.0001 <.0001 0.29 0.58 0.13 0.001 0.06 SQ 14% <.0001 <.0001 0.70 0.06 <.0001 0.05 0.17 TUG 20% <.0001 <.0001 0.0008 <.0001 <.0001 0.01 <.0001 Lives alone 14% <.0001 <.0001 0.26 0.23 0.02 0.03 0.43 ADL decline 16% <.0001 <.0001 0.57 0.06 0.005 0.0006 0.4 Cholesterol 14% <.0001 <.0001 0.46 0.03 <.0001 0.60 0.0002 Albumin 9% <.0001 <.0001 0.46 0.23 0.23 0.66 <.0001 gender, and insurance status; Shaded cells show the statistically significant association; Arrows indicate the direction of the association between the missingness and outcome and number of arrows show the magnitude of the association rate is higher when the variable is missing than when it is not missing); o Outcome: One -year mortality For each of the two outcomes (mortality and hospice admission) analyses were done in two parts, first using the available case data (original data that has missing observations), and then using the imputed data. - Available case analysis The alternative variable selection approaches (automatic and manual selection) were applied to the derivation dataset using logistic regression model. A total of 41 in dependent variables were included in the model building process. All variables were included in the model as categorical variable except for comorbidity count, number of medications, and number of lab tests (shown at the bottom of Table 2.4). Age, albumin and cholesterol were categorized as illustrated in Table 2.4. More than one -third of the observations were excluded from the analysis due to missing data on one or more predictors. 51 The number of observations included in the various models differed when mo dels were based on a different set of variables (which have a different number of missing observations). For example in all three stepwise, backward and forward methods, all 41 variables were in the model statement, therefore observations with missing on a ny of the predictors were excluded right at the beginning. Whereas, when using adaptive lasso method, the variable selection was first made using PROC GLMSELECT and then the variables were included in the logistic regression model in both derivation and va lidation data sets, therefore the number of observations which are excluded from the analysis is different from the one in stepwise selection methods. The results of different variable selection methods are demonstrated in Table 2.7. The SAS built -in selec tion methods are reported at first, following by the adaptive lasso and elastic net selection methods (each with two selection rules), and manual selection method. At the bottom of the table, the best model that was developed in imputed data (later in this chapter) was also applied to the available data for comparison. Brier score was generated for each model as a measure of the overall goodness of fit. As mentioned in the method section, the lower Brier score means the model fits better. However the maximu m limit for the Brier score is not a constant and is calculated based on the incidence of the outcome. The incidence rate of mortality (33%) was used in the equation P*(1 - P)2 + P2 *(1 - P ), and the maximum limit of 0.18 was calculated for the Brier score of a non -informative model. 52 Table 2. 7. Model development using alternative variable selection methods for 1 -year mortality in available case data AUC and 95% confidence limits for both derivation and validation data sets , Brier score in validation and final variable selected (N=3722 derivation and 3723 validation) Variable selection N analyzed * Derivation AUC Derivation AUC Validation Brier Score Validation Selected variables in the final model Automatic variable selection methods Stepwise selection 2055 0.7522 (0.7231 - 0.7813) 0.7697 (0.7476 -0.7919) 0.1473 13 variables: age, sex, race, dual -eligible, SQ, albumin, cholesterol, KPS, ADL decline, anemia, depression, hyperlipidemia, number of meds Forward 2055 0.7458 (0.7162 - 0.7754) 0.7636 (0.7411 -0.7861) 0.1473 11 variables: race, dual - eligible, SQ, albumin, cholesterol, KPS, ADL -decline, anemia, depression, hyperlipidemia Backward 2055 0.7453 (0.7166 - 0.7740) 0.7624 (0.7402 -0.7846) 0.1479 10 variables: race, dual -eligible, SQ, albumin, cholesterol, KPS, ADL -decline, AF, IHD, dx -count Adaptive lasso ƒ (validation data, Gamma=1.0) 2089 0.7631 (0.7351 - 0.7911) 0.7673 (0.7427 -0.7918) 0.1277 24 variables: age, sex, race, dual -eligible, SQ, albumin, cholesterol, KPS, ADL - decline, TUG, number of meds, hypothyroidism, anemia, AF, BPH, cataract, CKD, depression, diabetes, hyperlipidemia, hypertension, IHD, RA/OA, stroke/TIA Adaptive Lasso ƒ (4-fold CV Gamma=0.1) 2081 0.7645 (0.7365 -0.7924) 0.7616 (0.7368 -0.7863) 0.1290 27 variables: age, sex, race, dual -eligible, SQ, living -alone, albumin, cholesterol, KPS, ADL -decline, TUG, number of meds, number of labs, diagnosis -count, cancer, anemia, asthma, AF, BPH, cataract, CKD, colorectal cancer, depression, hyperlipidemi a, hypertension, IHD, stroke/TIA 53 Table 2. 7. (cont™d) Elastic Net ƒ (validation data) 2081 0.7644 (0.7364 -0.7923) 0.7631 (0.7385 -0.7876) 0.1287 32 variables: age, sex, race, dual -eligible, SQ, living -alone, albumin, cholesterol, KPS, ADL -decline, TUG, number of meds, number of labs, diagnosis -count, cancer, pressure -ulcer, hypothyroidism, anemia, asthma, AF, BPH, cataract, CKD, colorectal cancer, depression, endometrial -ca, glaucoma, hyperlipidemia, hypertension, IHD, RA/OA, stroke/TIA Elastic Net ƒ (4-fold CV) 2055 0.7653 (0.7371 -0.7935) 0.7668 (0.7420 -0.7916) 0.1270 33 variables: age, sex, race, dual -eligible, SQ, living -alone, smoking, albumin, cholesterol, KPS, ADL -decline, TUG, number of meds, number of labs, diagnosis -count, cancer, pressure -ulcer, hypothyroidism, anemia, asthma, AF, BPH, cataract, CKD, colorectal cancer, depression, endometrial -ca, glaucoma, hyperlipidemia, hypertension, IHD, RA/OA, stroke/TIA Manual variable selection Full model 2055 0.7653 (0.7370 - 0.7935) 0.7664 (0.7415 - 0.7912) 0.1270 All 41 variables included, no selection method Manual variable selection - final model 2290 0.7719 (0.7476 - 0.7962) 0.7634 (0.7410 -0.7859) 0.1437 11 variables: age, race, dual - eligible, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia, depression Forced to the model: sex 54 Table 2. 7. (cont™d) Model developed in imputed data and applied to the available data Backward variable selection - in the imputed data 2636 0.7854 (0.7648 - 0.8060) 0.7624 (0.7422 - 0.7826) 0.1564 18 variables: age, dual -eligible, SQ, albumin, cholesterol, KPS, ADL -decline, anemia, CKD, hyperlipidemia, depression, hypertension, rheumatoid arthritis, pressure -ulcer, Cataract, osteoporosis, number of meds, number of labs S.Q: surprise question; KPS: Ka rnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; HF: heart failure; CKD: chronic kidney disease; RA/OA: rheumatoid arthritis/osteoarthritis; IHD: ischemic heart diseases; BPH: benign prostatic hyper plasia; TIA: transient ischemic attack; *The numbers are different because first the variable selection was done in PROC GLMSELECT and then variables included in PROC LOGISTIC to generate AUCs, so not all the variables included in the final model ƒAdaptive lasso and elastic net methods were conducted using two methods of validation and weighting parameters (Table 2.8) When applied to the validation data, the different variable selection strategies resulted in AUCs that were very similar for all mo dels. The confidence intervals around the c -statistic are also comparable in width, so the precision of the C -statistics are also similar. The stepwise selected model (AUC=0.7697) had the highest c -statistic, although the difference between it and other mo dels is trivial and of no practical importance. Likewise the difference in the Brier score between different models is small, although this metric indicates slightly better fit in the models that were based on adaptive lasso and elastic net selection metho ds. Although the advanced variable selection methods made minimal differences in the discrimination of the model, the number of selected variables was much more than with the stepwise and manual methods. Thus there was no evidence that any of the variable selection approaches has significantly better performance than other methods in terms of the discrimination ability (C -statistic); however, the manually selected model has a good performance (c -statistic of 0.7634), is parsimonious (only 11 variables) and clinically logical (it includes demographics, functional, and indicators of nutritional status including albumin and cholesterol) compared to the other models. 55 There are variables that were consistently selected in the different models regardless of the va riable selection method, including albumin, cholesterol, ADL -decline, SQ, KPS, race, and dual eligibility for Medicare and Medicaid. This emphasizes the central importance of these variables in the prediction of mortality in this cohort of older adults. Fu nctional status variables are also shown to be important in the prediction of the adverse outcomes. Other variables including age, sex, and TUG were frequently selected but not in all models. The most variation between different models was observed for the medical history variables. Hyperlipidemia and depression were often selected but other CCW variables such as endometrial and colorectal cancer were only occasionally selected. The number of medications was also selected in multiple variable selection meth ods; it can represent the general health of the patient as well as the frequency and severity of different conditions. Table 2.8 displays the results of specifying different gamma values and different selection rules in the adaptive lasso variable select ion method for 1 -year mortality. Gamma is a parameter in the adaptive weight calculation, and alternative selection rules in the adaptive lasso method are k -fold cross -validation or use of validation data. Table 2.8 was generated to help determine the appr opriate gamma to be used in adaptive lasso variable selection. The number of effects is the total number of variables that selected including each level of classification variables as a dummy variable. It means the number of variables is often less than th e number of effects illustrated in table 2.8. This table indicates that the optimal gamma for the different selection method (cross -validation or using separate validation dataset) are different, although the difference in the optimal model criteria (ASE a nd CV Press) between the different gammas is minimal. Average square error (ASE) and cross -validation predicted the residual sum of squares statistic (CV Press) are the model fit summary statistics that used for variable selection. A lower score in both cr iteria means a better fit of the model. 56 Table 2. 8. Different gamma - adaptive lasso variable selection for 1 -year mortality Selection Selected variables Optimal model criteria ƒ Validation Number of predictors ASE + 0.001 Gamma=0 38 effects 0.1298 Gamma=0.1 36 effects 0.1298 Gamma=0.3 34 effects 0.1297 Gamma=0.5 31 effects 0.1296 Gamma=0.7 31 effects 0.1295 Gamma=0.9 31 effects 0.1294 Gamma=1.0 * 29 effects 0.1294 4-fold CV Number of predictors CV PRESS Gamma=0 34 effects 0.1117 Gamma=0.1 * 33 effects 0.1117 Gamma=0.3 34 effects 0.1118 Gamma=0.5 32 effects 0.1118 Gamma=0.7 31 effects 0.1119 Gamma=0.9 31 effects 0.1119 Gamma=1.0 30 effects 0.1120 *Selected gamma based on the criteria and the number of variables; ƒAverage square error (ASE) and CV PRESS are error measures that represent the goodness of model fit. Figures 2.4 and 2.5 demonstrate the process of adding and removing variables using adaptive lasso and elastic net variable selection methods, respectively. The bottom panel in each figure shows the average squared error (ASE) of each model. It illustrates the lowest ASE of the selected model that can be correlated to the predictors in the model in the top panel. Both figures show that a few steps before step 40, the minimum ASE was achieved and after that it is a plateau with no more gain from adding or rem oving variables. SAS output provides a table of details of the variable selection process at each step. 57 Figure 2. 4. Adaptive lasso variable selection process using GLMSELECT for the mortality outcome (gamma=1.0 and validatio n dataset) Figure 2. 5. Elastic net variable selection process using GLMSELECT for the mortality outcome (validation dataset) 58 - Imputed data analysis To compare and choose the optimal number of imputations multiple -imputation was performed with 5 and 20 imputations. Tables 2.8 and 2.9 show the parameter estimates and variances from the multiple imputation procedure. These tables contain information on the continuous variables only, despite the fact that both continuous and clas sification variables were included in the model and imputed. PROC MI (SAS version 9.4) does not report the summary statistics for classification variables. Therefore the summary tables in SAS results (Tables 2.8 and 2.9) include information on only continu ous variables that have missing data, although the model includes all the variables with and without missing data. As described in methods, parameter estimates, variances, and confidence intervals for the variables were similar between the 5 and 20 imputat ions; the number 20 was selected for imputation to maximize the relative efficiency. Table 2. 9. Parameter estimate s for the continuous variables from multiple imputation procedure - compari son of 20 and five imputations Variable Mean Std Error 95% Confidence Limits DF Min Max Mu0 t for H0: Mean= Mu0 Pr > |t| Parameter Estimates (20 Imputations) Albumin Result 3.41 0.01 3.40 3.42 3260.5 3.41 3.41 0 585.70 <.0001 Cholesterol Result 167.7 2 0.54 166.66 168.77 1547.9 167.3 8 167.98 0 312.15 <.0001 KPS 44.43 0.12 44.19 44.66 7407.8 44.42 44.44 0 371.49 <.0001 Parameter Estimates (5 Imputations) Albumin Result 3.41 0.01 3.40 3.42 892.26 3.41 3.41 0 584.03 <.0001 Cholesterol Result 167.7 0 0.53 166.66 168.74 674.25 167.5 0 167.85 0 315.79 <.0001 KPS 44.43 0.12 44.19 44.66 7352.4 44.42 44.44 0 371.48 <.0001 Only continuous variables that includes missing values are outputs of the multiple imputation procedure 59 Table 2. 10. Variance information for the continuous from multiple imputation procedure - compari son of 20 and 5 imputations Variable Variance DF Relative Increase in Variance Fraction Missing Information Relative Efficiency Between Within Total Variance Information (20 Imputations) Albumin Result 0.000002 0.00003 0.00003 3260.5 0.062 0.056194 0.997 Cholesterol Result 0.03 0.26 0.29 1547.9 0.11 0.098090 0.995 KPS 0.00003 0.01 0.01 7407.8 0.002 0.002386 0.999 Variance Information (5 Imputations) Albumin Result 0.000002 0.00003 0.00003 892.26 0.07 0.06 0.993 Cholesterol Result 0.02 0.26 0.28 674.25 0.08 0.08 0.999 KPS 0.00003 0.01 0.01 7352.4 0.002 0.002 0.999 Only continuous variables that includes missing values are outputs of the multiple imputation procedure The original data set with 7445 observations were used in multiple imputation, with 20 imputations the imputed data consisted of 148900 (7445 *20) observations. An indicator of the subgroups (derivation or validation) for each patient was added to the dataset before imputati ons, thus the derivation and validation subgroups are fixed across the 20 imputations. The alternative variable selection methods were applied to the imputed data following the same steps described previously in the method section. AUCs were generated for derivation and validation data by analyzing the individual predicted probabilities of outcome (average of predictions in 20 imputations) against the observed outcome. Table 2.11 displays the results of variable selection methods for the 1 -year mortality in the imputed data. 60 Table 2. 11. Model development using alternative variable selection methods for 1 -year mortality using imputed data, AUCs for both derivation and validation data sets Variable selection Derivation AUC (N=3722) Validation AUC N=(3723) Variables * Automatic selection methods Stepwise 0.7880 0.7730 15 variables: age, dual -eligibility, SQ, albumin, cholesterol, KPS, ADL -decline, anemia, CKD, hyperlipidemia, pressure -ulcer, cancer, number of meds, number of labs, diagnosis -count Forward 0.7879 0.7728 15 variables: age, dual -eligibility, SQ, albumin, cholesterol, KPS, ADL -decline, anemia, CKD, hyperlipidemia, pressure -ulcer, cancer, number of meds, number of labs, diagnosis -count Backward 0.7877 0.7756 18 variables: age, dual -eligibility, SQ, albumin, cholesterol, KPS, ADL decline, anemia, cataract, CKD, depression, hyperlipidemia, hypertension, osteoporosis, rheumatoid arthritis, pressure ulcer, number of meds, number of labs Manual selection Manual variable selection from Imputed data & 0.7812 0.7663 15 variables: age, dual -eligibility, SQ, albumin, cholesterol, KPS, ADL decline, anemia, CKD, hyperlipidemia, rheumatoid arthritis, pressure ulcer, cancer, number of meds, number of labs Manual variable selection - from available case data, applied to the imputed data # 0.7634 0.7541 11 variables: age, race, dual -eligible, SQ, albumin, cholesterol, KPS, ADL decline, hyperlipidemia, depression Forced to the model: sex S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; HF: heart failure; CKD: chronic kidney disease; AUCs in the imputed data are based on the average of 20 predictions for each individual from the 20 imputations; *Variables that are selected in all 20 imputations built the final model; &Variables that are selected >15 times in all three methods (forward, backward, stepwise); # From Table 2.7; 61 - Comparison of the risk strati fication models Comparing the models developed in the available data shown in Table 2.7, the best model was the manually selected model, because it is a parsimonious model while its discrimination is similar to the other models that are much more complex (i.e., have twice as many variables). The models developed in the imputed data did not improve the AUC compared to the available data. Backward selection had the best AUC among the variable selection methods in the imputed data (Table 2.11). These two models (Manual select ion in available data and backward selection in imputed data) were both applied to the available case data and were compared to the two alternative approaches proposed by USMM providers: SQ, and 3 -level risk stratification. Table 2.12 describes the prevale nce of each risk level in the cohort using these two approaches. Table 2. 12. Prevalence of the risk levels determined by the USMM risk -stratification approaches (N=7445) Risk stratification approach Risk level N (%) ƒ Total (N=7 445) SQ* High risk (answer=No) 1045 (14.0) Low risk (answer=Yes) 5381 (72.3) 3-level risk approach High risk (level 5) 532 (7.2) Intermediate risk (level 4) 678 (9.1) Low risk (level 3) 4817 (64.7) *Surprise question; ƒThere are missing values on the SQ and other variables (as reported in Table 2.4) that are used in the 3 -level risk approach , h ence the totals do not add to 100% ; USMM risk stratification approaches were proposed by USMM providers; Table 2.13 displays the AUCs of the four alternative risk stratification models in this cohort (i.e., SQ, 3 -level, manual selection logistic model, backward selection in imputed data). Sensitivity and specificity of the models also provided by defining the high risk and low -risk group and comparing the observed events in each group. The high -risk group in the manually selected model were identified as the top 10% and 20% of the predicted probabilities in the model. These cutoff points are selected arbitrari ly to show the impact of different cutoffs on the number of patients who are falsely categorized. The final decision about the appropriate cutoff value for risk stratification must be made considering the resources that 62 the company can allocate to the inte rventions for different risk levels. For example if planned interventions for the high -risk group are costly and resources (including money, facilities, and human resources) are very limited, then a more stringent cutoff such as top 10% seems practical. Wh ereas if the cost of interventions for the high -risk group is relatively low and resources can support them for a larger number of patients, then cutoffs can be more relaxed (top 20%). Another approach can be using a cutoff at the predicted probability of 0.50, this approach is not suitable for our data since the predicted probability of the outcome in this study has an average and median about 0.15 which makes the probability of 0.50 a very high bar for high -risk definition in this cohort. The high -risk de finition in the two approaches that are currently in use by USMM providers is given in the methods section. In the surprise question approach, high -risk patients are those with the answer fiNofl to the surprise question. For the 3 -level USMM risk stratificat ion approach (Figure 2.3), the high -risk group was defined twice, first as level 5 and then level 4 and 5 as shown in Table 2.13. Sensitivity and specificity of each model can help policymakers in the corporation to make a better decision for the risk cuto ff point based on the cost of false positive vs. false negative cases. Table 2. 13. Comparison of the alternative risk stratification approaches for 1 -year mortality (N=3723 validation) Model AUC validation* N analyzed High -risk group (prevalence) Sensitivity Specificity PVP ⁄ NPV § SQ only 0.5552 (0.5400 - 0.5705) 3227 Answer fiNofl (14%) 24.1% 86.9% 43.6% 73.2% USMM 3 -level risk stratification 0.5994 ƒ (0.5814 - 0.6173) 3043 Level 5 (7%) 18.1% 95.0% 58.3% 75.0% Level 4 and 5 (17%) 33.9% 84.9% 46.5% 76.9% Manual selection (from available) 0.7634 (0.7410 - 0.7859 ) 2312 Top 10% 25.1% 94.0% 52.8% 82.6% Top 20% 44.5% 86.5% 46.7% 85.5% 63 Table 2. 13. (cont™d) Backward selection (from imputed) 0.7624 (0.7422 - 0.7826) 2694 Top 10% 24.7% 94.8% 60.7% 79.4% Top 20% 43.2% 87.6% 53.3% 82.5% *To make the results comparable, the AUC for SQ model and USMM model was also generated from the validation data ƒUSMM risk level included in the model as a 3 -level predictor for AUC calculation Sensitivity is the proportion of deaths that are classified as high risk by the model; Specificity is the proportion of non -deaths that are classified as low risk by the model; ⁄ PVP: predictive value positive is the proportion of model -identi fied high risk cases who are truly high risk § PVN: predictive value negative is the proportion of model -identified low risk cases who are truly low risk Both multivariable models have much higher AUCs than the current USMM approaches. The fact that the additional variables in the multivariable model are being routinely collected and recorded in the USMM database, makes this model an excellent approach for risk stratification. As mentioned above the choice of a cutoff point is dependent on multiple factor s, including the company resources, the cost of interventions for each risk level, and the cost of misclassification of patients. Sensitivity and specificities show the proportion of high and low -risk patients that are correctly classified by each model. F or example, when we apply the model to a group of 1000 patients. The mortality rate is 32% in the USMM population, which means 320 of the 1000 patients died within 12 months. The manual selection model's sensitivity of 45% (at 20% cut off) means that 144 o f the 320 patient who died were be classified as high risk by the model. Also from the 680 who did not die within a year, 592 are classified as low risk by the model (specificity) and 88 are classified as high risk. Predictive value positive and negative (PVP and PVN) present the percentage of the high -risk group who actually died (PVP), and the percentage of the low -risk group who survived (PVN). Predictive values change depending on the prevalence of the outcome. Mortality rate was 32% in this cohort; PV P and PVN of the model were 47% and 86%. It means that when the model applied to the 1000 patients, using the 20% cutoff, from the 200 patients classified high risk 106 died and 94 actually survived after a year. Also from 800 patients that are classified as low risk, 688 actually survived and 112 died. 64 Although the sensitivity and specificity of the multivariable models are better than the current approaches, the overall sensitivity is still low which means that none of these models are very good when use d for screening older adults to identify high risk patients. However, altering the cutoff value to classify more patients in the high -risk group increases the sensitivity. The predictive values of different models do not differ vastly, although the PVN of the manual selection model is slightly better than the other models. Finally when the appropriate cutoff was determined, the model could be programmed and integrated into the USMM database system. The high -risk patients that are identified based on the model, can be flagged and brought to the attention of the providers for reevaluation and interventions if indicated. - Final model selection When comparing the alternative approaches in predicting 1 -year mortality among this cohort, the manually selected mul tivariable model has the highest c -statistic. Sensitivity and specificity of this model can be optimized by changing the cutoff point that divides population to high and low -risk levels. Table 2.14 contains odds ratios and parameter estimates of the final predictive model for 1 -year mortality among this cohort of older adults. Examining the odds ratios and confidence intervals of the variables, the strongest predictors of mortality are ADL -decline and low albumin. Both of these variables are clinically esse ntial indicators of the patient's global health. Albumin level can serve as a surrogate for inflammatory status and also the nutritional status of a patient; decline in ADLs indicates functional impairment. Low cholesterol was also associated with higher o dds of death. Surprisingly, having a history of hyperlipidemia showed a protective effect for mortality. Increasing KPS, being dual eligible, and black race were all associated with a lower risk of death in this population. Dual eligibility and black race both are more common in age groups <75 years old than older. Thus the residual confounding can be the main reason for this relationship. The most important 65 predictors of death in this model are functional and nutritional indicators which are known clinica lly relevant and robust components of general health status. Table 2. 14. Final model parameter estimates and odds ratios for 1 -year mortality using derivation dataset (N=3722) Odds Ratio Estimates Parameter estimates Predictor variables Point Estimate 95% Wald Confidence Limits Parameter estimate P-value ADL -decline, Decline vs. No -change 0.790 0.577 1.081 -0.2356 0.1407 ADL -decline, Improve vs. No -change 0.096 0.023 0.397 -2.3422 0.0012 Albumin, <3.2 vs 3.8+ g/dl 3.750 2.613 5.382 1.3218 <.0001 Albumin, 3.2 -<3.5 vs 3.8+ g/dl 1.884 1.303 2.725 0.6336 0.0008 Albumin, 3.5 -<3.8 vs 3.8+ g/dl 1.486 1.015 2.175 0.3959 0.0417 Race, Black vs. White 0.588 0.415 0.833 -0.5306 0.0028 Race, Other vs. White 0.442 0.197 0.991 -0.8156 0.0475 Surprise question, No vs. Yes 2.073 1.533 2.803 0.7289 <.0001 Cholesterol, <136 vs 195+ mg/dl 1.959 1.384 2.772 0.6724 0.0001 Cholesterol, 136 -<164 vs 195+ mg/dl 1.191 0.839 1.690 0.1747 0.3285 Cholesterol, 164 -<195 vs 195+ mg/dl 1.304 0.923 1.843 0.2658 0.1317 CCW -Hyperlipidemia Yes vs. No 0.531 0.417 0.676 -0.6334 <.0001 Age, 75 -84 years vs. 65 -74 years 1.711 1.180 2.481 0.5372 0.0046 Age, 85 -94 years vs. 65 -74 years 1.804 1.259 2.584 0.5898 0.0013 Age, 95+ years vs. 65 -74 years 1.602 0.953 2.693 0.4712 0.0755 KPS, Severe vs. Moderate disability* 1.543 1.199 1.986 0.4340 0.0007 CCW -Depression, Yes vs. No 0.654 0.478 0.896 -0.4244 0.0082 Dual -eligibility, Yes vs. No 0.687 0.509 0.929 -0.3751 0.0146 Sex, Male vs. Female 1.151 0.886 1.497 0.1411 0.2917 ƒ IQR: interquartile range; sd: standard deviation; S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; IADL: instrumental activities of daily living; TIA: transient ischemic attack; FU: follow -up; mg/dl: milligram per deciliter; g/dl: gram per deciliter; *KPS was included in the final model as a categorical variable based on the clinical application of KPS value; ƒSex was included in the final logistic model, althou gh the Wald test for its coefficient was not statistically significant; - Calibration plots Calibration plots were generated for the final multivariable model applied to the validation dataset using the two methods, loess -based and decile -based methods. Figures 2.6 and 2.7 display the loess -based and decile -based calibration plots . Both plots show a small deviation from the 45 degree (diagonal) line. The diagonal line indicates the prediction models. The maximum deviation is around the predicted probabili ty of 0.25, while in the lower and higher probabilities the model shows a better fit to the data. 66 The calibration plot in the derivation data was also generated as a reference. Figure 2.8 shows the decile -based calibration plot generated from the derivat ion data where the calibration curve aligns very well with the diagonal line (almost perfect prediction). The prediction model in the validation data slightly underestimates the probability of 1 -year mortality, especially in the deciles representing an int ermediate range of death (observed risk of 0.2 -0.4). Figure 2. 6. Loess -based calibration plot for the multivariable logistic model in the validation data for the outcome of 1 -year mortality 67 Figure 2. 7. Decile -based calibration plot for the multivariable logistic model in the validation data for the outcome of 1 -year mortality Figure 2. 8. Decile -based calibration plot for the multivariable logist ic model in the derivation data for the outcome of 1 -year mortality 68 I also generated Hosmer -Lemeshow goodness of fit test statistics. The test evaluates the lack of fit of the model; a small p -value indicates a lack of fit. The Hosmer -Lemeshow goodness o f fit test for the final multivariable logistic model resulted in p -values of 0.372 and <0.0001 for the derivation and validation datasets, respectively. This means a lack of fit cannot be rejected for the model in the validation data. This result is consi stent with the calibration plots where the prediction model underestimated the probability of events compared to the observed events, especially for the lower probabilities of death. o Outcome: Hospice admission In the following section I show the results for the outcome of hospice using the same modeling strategy used above for mortality. Hospice admission in this cohort was defined according to the date of the first hospice service documented in the claims dat a. A total of 1391 (18.7%) patients were admitted to the hospice over the follow -up time. The Hospice admission rate within a year from the first visit was 10% among this cohort. The death occurred within six months of admission in 492 (65%) of those admit ted to hospice within 12 months of their first visit. From all 1124 hospice deaths, 68% happened in the first three months of admission. Overall 2221 deaths (66% of all deaths) in this cohort occurred without hospice. - Available data analysis The same mode ling approaches were utilized as for mortality outcome. Independent variables were also the same as for mortality. Automatic variable selection methods (stepwise, forward, backward, adaptive lasso, and elastic net), and manual selection methods were applie d. Using these selection methods, each model was developed in the derivation and applied to the validation datasets. The area under the ROC and Brier score were generated for each model. The results are provided in Table 2.15. 69 Table 2. 15. Model development using alternative variable selection methods for hospice admission using available case da ta AUC and 95% confidence limits for both derivation and validation data sets (N=3722 derivation and 3723 validation) Variable selec tion N analyzed * Derivation AUC Derivation AUC Validation Brier Score Validation Selected variables in the final model Automatic variable selection methods Stepwise 2055 0.7819 (0.7502 - 0.8137) 0.6981 (0.6699 -0.7262) 0.0886 4 variables: age, dual -eligible, SQ, KPS Forward 2055 0.8091 (0.7795 - 0.8387) 0.7272 (0.6976 -0.7568) 0.0874 9 variables: age, race, dual - eligible, SQ, lives alone, KPS, ADL decline, cataract, Heart failure Backward 2055 0.7962 (0.7648 - 0.8276) 0.7295 (0.7006 -0.7585) 0.0858 8 variables: age, race, dual - eligible, SQ, lives alone, KPS, Heart failure, number of lab tests Adaptive lasso ƒ (validation data, Gamma=1.0) 2199 0.8173 (0.7881 - 0.8465) 0.7440 (0.7101 - 0.7779) 0.0764 18 variables: age, race, dual -eligible, SQ, TUG , ADL -decline, KPS, albumin, number of meds, number of labs, pressure -ulcer, cataract, osteoporosis, RA/OA, Hyperlipidemia, hypertension, hip fracture, diagnosis -count Adaptive Lasso ƒ (4-fold CV Gamma=0.1) 2229 0.8036 (0.7737 - 0.8335) 0.7276 (0.6939 - 0.7614) 0.0776 17 variables: age, race, dual - eligible, SQ, TUG, KPS, living -alone, albumin, number of meds, number of labs, pressure -ulcer, cataract, hip fracture, hyperlipidemia, hypertension, HF, AF Elastic Net ƒ (validation data) 2191 0.8181 (0.7891 - 0.8471) 0.7339 (0.7003 - 0.7674) 0.0779 18 variables: age, race, dual -eligible, SQ, lives alone, TUG, ADL -decline, KPS, albumin, number of meds, number of labs, pressure ulcer, AF, HF, cataract, Hyperlipidemia, hypertension, hip fracture 70 Table 2. 15. (cont™d) Elastic Net ƒ (4-fold CV) 2081 0.8241 (0.7947 -0.8536) 0.7313 (0.6956 - 0.7671) 0.0772 20 variables: age, race, dual -eligible, SQ, TUG, ADL -decline, KPS, lives -alone, albumin, cholesterol, number of meds, number of labs, pressure -ulcer, AF, HF, cata ract, Hyperlipidemia, hypertension, hip fracture, colorectal cancer Manual variable selection Full model 2055 0.8276 (0.7983 -0.8569) 0.7090 (0.6709 -0.7471) 0.0783 All 41 variables No variable selection Manual selection - Available data 2601 0.7749 (0.7473 - 0.8026) 0.7351 (0.7055 -0.7646) 0.0864 7 variables: age, race, dual - eligible, SQ, KPS, ADL decline Forced to the model: sex Model developed in imputed data and applied to the available data Manual selection - Imputed data 3051 0.7602 (0.7335 - 0.7868) 0.7090 (0.6803 -0.7376) 0.0877 6 variables: age, race, dual -eligible, SQ, KPS, ADL decline S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; HF: heart failure; RA/OA: rheumatoid arthritis/osteoarthritis; *The numbers are different because first the variable selection was done in and then variables included in Proc LOGISTIC to generate AUCs, so not all the variables included in the final model and not all the miss ing observations were excluded. ƒAdaptive lasso and elastic net methods were conducted using two methods of validation and weighting parameters (Table 2.16) Overall, the alternative variable selection methods resulted in comparable c -statistic and Brier score. Larger c -statistic and smaller Brier score both indicate a better fit of the model. To set the maximum Brier score for a non -informative model, the incidence rate of the outcome in the validation data (10%) was used and the maximum value of 0.25 wa s calculated as the Brier score for the non -informative model. The largest AUC was seen in adaptive lasso (0.7440) and manual (0.7351) selection methods. However, the manually -selected model included fewer variables than the adaptive lasso model (7 vs. 18 variables). Brier score was slightly better in the adaptive lasso method indicating that the adaptive lasso model fits the data better than the other models, although the difference between the models is small. 71 Similarly the gain in the AUC when the adapti ve lasso was used for variable selection was tiny and of no practical importance, however the model is larger with twice the number of predictors compared to the manual selection model. There are a few variables that were consistently selected regardless o f the selection method, including age, dual eligibility, SQ, and KPS. Increasing age and answer ‚No™ to SQ were associated with higher hospice admission, whereas being dual -eligible and higher KPS decreased the risk of hospice admission, as it was for the mortality outcome. It is related to the residual confounding of age, but it also can be due to selection bias. The dual -eligible group might be somehow different from the other patients, so the unobserved variables in this group can cause a lower rate of o utcomes. Variables race and ADL decline were also often selected. Being of the race black was associated with lower outcome compared to the white race, which in part is related to the residual confounding of age. As expected, the functional status of patie nts is an important predictor of hospice admission. Surprise question that indicates the physician™s assessment of the patient™s prognosis was also a good predictor of the hospice referral. Interestingly, sex was not a predictor factor of hospice admission . Nutritional status indicators (albumin and cholesterol) were not significant predictors of hospice admission, although they were essential predictors of mortality. Similar to what was done for the mortality analysis, weighting parameter gamma was select ed for the adaptive lasso method by testing different levels of gamma and two validation options (i.e. k -fold cross -validation or validation data) in the PROC GLMSELECT. The results are reported in Table 2.16. The number of effects is the total number of v ariables that selected including each level of classification variables as a dummy variable. 72 Table 2. 16. Using d ifferent gamma in adaptive lasso variable selection for 1 -year hospice admission Selection # of selected variables Optimal model criteria ƒ Validation Number of predictors ASE Gamma=0 21 effects 0.0759 Gamma=0.1 21 effects 0.0758 Gamma=0.3 20 effects 0.0757 Gamma=0.5 24 effects 0.0755 Gamma=0.7 23 effects 0.0754 Gamma=0.9 22 effects 0.0754 Gamma=1.0 * 21 effects 0.0754 4-fold CV Number of predictors CV PRESS Gamma=0 18 effects 0.0717 Gamma=0.1 * 19 effects 0.0717 Gamma=0.3 15 effects 0.0718 Gamma=0.5 20 effects 0.0718 Gamma=0.7 9 effects 0.0719 Gamma=0.9 8 effects 0.0720 Gamma=1.0 8 effects 0.0720 *Selected gamma based on the criteria and the number of variables; ƒAverage square error (ASE) and CV PRESS are error measures that represent the goodness of model fit; The optimal gamma for each one of the two methods was chosen in a way to mini mize the deviation from the true outcome (i.e. ASE or CV PRESS). Selected gammas were used to generate the adaptive lasso results that are shown in Table 2.16. Figures 2.9 and 2.10 demonstrate the process of adding and removing variables using adaptive la sso and elastic net variable selection methods, respectively. The lower panel in each figure shows the average squared error of each model. It visualizes the lowest ASE of a model that can be correlated to the predictors of the model in the top panel. Figu re 2.9 displays that with using the adaptive lasso selection, the optimal ASE was obtained at step 21 which was associated with the lowest ASE value of 0.075. The corresponding model consists of 18 variables. Similarly Figure 2.10 shows the optimal criteri a for the elastic net selection method. The same graphical results can be generated for both methods when using 4-fold cross -validation in the selection option for the model statement. 73 Figure 2. 9. Adaptive lasso variable selection process using GLMSELECT for the hospice admission outcome (gamma=1.0 and validation dataset) Figure 2. 10. Elastic net variable selection process using GLMSELECT for the hospice admission outcome (validation dataset) 74 - Imputed data analys is With the same considerations as in the mortality analysis, the multiple imputation procedure with 20 imputations was performed for the hospice admission analysis. The same modeling approach was applied as used for 1 -year mortality. The c -statistic from dif ferent models was generated for derivation and validation data and provided in Table 2.17. Also, the variables that were selected in manual selection - available case analysis (Table 2.15) was applied to the imputed data set and AUC was generated for the im puted validation data. Table 2. 17. Model development using alternative variable selection methods for 1 -year hospice admission using imputed data, AUC and 95% confidence limits for both derivation and validation data sets Varia ble selection Derivation AUC (N=3722) Validation AUC (N=3723) Selected variables * Automatic selection Stepwise 0.7359 0.7001 15 variables: Age, dual -eligibility, SQ, KPS, ADL decline, albumin, cholesterol, anemia, CKD, hyperlipidemia, pressure ulcer, cancer, number of meds, number lab tests, dx -count Forward 0.7373 0.6992 6 variables: Age, dual -eligibility, SQ, KPS, AF, number lab tests Backward 0.7339 0.6885 9 variables: Age, dual -eligibility, SQ, KPS, AF, depression, heart failure, number lab tests, dx -count Manual selection Manual variable selection -Imputed data ƒ 0.7227 0.7027 6 variables: Age, dual -eligibility, SQ, KPS, ADL decline, number of lab tests Manual variable selection - (from available case analysis) ⁄ 0.7204 0.6934 7 variables: Age, race, dual -eligible, SQ, KPS, ADL decline Forced to the model: sex S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; HF: heart failure; RA/OA: rheumatoid arthriti s/osteoarthritis; *Variables that are selected in all 20 imputations; ƒ Includes variables that were selected 15 times in all three methods (forward, backward, stepwise); ⁄ The model is presented in Table 2.15; 75 Overall the performance of the models developed in the imputed data did not show any improvement compared to the available case data models. In fact, the AUCs of different models in the imputed data were generally smaller than those in the available case d ata. Because the focus of this study is to develop prediction models, the primary performance measure for comparing the models is discrimination as measured by AUC. Sensitivity and specificity were also reported in Table 2.17 as another measure for compar ing the alternative models. - Comparison of the risk stratification models The models that were developed manually from available case data, and imputed data were compared with the two alternative approaches: SQ only, and USMM 3 -level risk stratification. C onsidering the AUCs for each model, the manually selected model in the available case data has the best discrimination. Also, sensitivity and specificity of the model at the cutoff point of top 20% high probability, are the best combination. However, the s election of the cutoff point depends on the cost and saving of false positive and false negative cases for the provider system. Table 2. 18. Comparison of the alternative risk stratification approaches for 1 -year hospice admission Model N validation (total=3723) AUC validation * High -risk group and prevalence Sensitivity Specificity SQ only 3227 0.5895 (0.5633 - 0.6157) Answer fiNofl 14% 32.4% 85.5% Current USMM model 3043 0.5875 ƒ (0.5591 - 0.6158) Level 5 7% 12.4% 91.7% Level 4 and 5 17% 35.5% 81.4% Manual selection 2590 0.7351 (0.7055 - 0.7646) Top 10% 25.9% 91.8% Top 20% 45.9% 81.4% Manual selection -Imputed data ⁄ 3070 0.7090 (0.6803 - 0.7376) Top 10% 25.6% 91.8% Top 20% 42.4% 79.9% *To make comparable results, the AUC for SQ model and USMM model was also generated from the validation data; ƒUSMM risk level included in the model as a 3 -level predictor for AUC calculation; ⁄ The model was developed in the imputed, applied to the available case; 76 - Final mod el selection Comparing the alternative variable selection methods in this study, the manually selected model shows the best results in the discrimination ability of the model while being a parsimonious model. Penalized selection methods such as adaptive l asso and elastic net, had comparable c -statistics, although the number of variables in these models is much larger than in manually selected models. Table 2.19 shows the parameter estimates and odds ratios resulted from the manually -selected model. Conside ring the significance and magnitude of the Odds ratios of the different variables in the final model, ADL decline is the most informative predictor of the hospice admission. Similar to the mortality prediction, functional status is the main predictor of ne ed to hospice care. Age and dual -eligibility for Medicare and Medicaid are the following important variables. Surprisingly, having dual eligibility decreases the probability of hospice admission. Unlike the results of mortality outcome, nutritional status indicators (albumin and cholesterol) did not play a role in the prediction of hospice admission. Table 2. 19. Final model parameter estimates and odds ratios for 1 -year hospice admission using derivation data set (N=3722) Odds Ratio Estimates Parameter estimates Predictor variables Point Estimate 95% Wald Confidence Limits Parameter estimate P-value ADL -decline, Decline vs. No change 1.034 0.735 1.454 0.0335 0.8473 ADL -decline, Improve vs. No change 0.086 0.012 0.628 -2.4477 0.0155 Age, 75 -84 years vs. 65 -74 years 2.392 1.386 4.127 0.8720 0.0017 Age, 85 -94 years vs. 65 -74 years 3.345 1.986 5.633 1.2073 <.0001 Age, 95+ years vs. 65 -74 years 3.870 2.055 7.286 1.3531 <.0001 KPS, Severe vs. moderate disability* 3.125 2.239 4.361 1.1393 <.0001 Surprise question, No vs. Yes 2.131 1.547 2.934 0.7566 <.0001 Dual -eligibility, No vs. Yes 2.023 1.336 3.064 0.7045 0.0009 Race, Black vs. White 0.654 0.426 1.004 -0.4242 0.0522 Race, Other vs. White 0.547 0.213 1.404 -0.6035 0.2095 Sex, Male vs. Female 1.165 0.866 1.567 0.1525 0.3137 ƒ *KPS was included in the final model as categories based on the clinical application of KPS value; ƒSex was included in the final logistic model, although the Wald test for its coefficient was not statistically significant; 77 Discussion The study population - The USMM patient population is a unique population in terms of demographics and functional sta tus. They are older and sicker and have more comorbidity and disability than most other similar study population that referred to in the background section. At the same time these patients remain community -dwelling and so are different from institutionaliz ed, or hospitalized patients. They are home -bound by the CMS definition (5) which means the patient needs the help of another person or medical equipment such as a walker, or a wheelchair to leave t heir home, or their doctor believes that the patient™s health or illness could get worse if they leave their home. (111) Therefore many of the previously developed prognostic indices ar e not applicable to this population. (17 Œ19, 21Œ24) For example indices that include physical activities such as walking several blocks are probably not as rele vant in this population as in a healthier older population. (49) Also because this population is different from those who are institutionalized, the prognostic models developed in hospit alized or nursing home setting may not be as accurate in this population. (62) The most similar population to the USMM population is the PACE participants who are nursing home, community living older adults. However the mortality rate in PACE cohort studied by Carey et al., (1 -year mortality 13%) is much lower than the USMM population. They have a high frequency of comorbidities and multi -comorbidity compared to the similar studies in community -dwelling older patients. (7,49,52,57) These older adults Functional status measures including TUG (45% non -ambulatory or >30 seconds) and KPS (54% with a severe need to assistance) indicate a high prevalence of impaired function which makes this group of patients especially vulnerable and pro ne to the adverse outcomes. The one -year mortality rate of 32% in this cohort is much higher than the rate of comparable studies which ranged between 9% and 13% one -year mortality. (52,57) The high mortality rate in this cohort is comparable to the mortality rate of nursing home population which has been reported between 17 -35% in different studies. (59 Œ61) It indicates that the USMM population are similar to the nursing home residents in terms of mortality and 78 other adverse outcomes. The study reported by Carey et al., used the PACE patient population which by definition are nursing home eligible, however the one -year mortality rate was 13%. (57) As discussed in chapter one, the lower mortality rate of this population may be explained by the fact that PACE patients are adults aged 55 years and older who need the nursing -home level of care. So the PACE population might include young er adults with disabilities that made them eligible for long term care but did not necessarily increase their risk of 1 -year mortality. These unique characteristics of the USMM population make them prone to adverse outcomes and denote a need for a risk str atification approach to be developed specifically for this population. Additionally, other RS indices often involved variables that are not available in this population. For example level of income, detailed information on dependency in functional status ( e.g., bathing, dressing, etc.) are not available in the USMM data. This model has been developed to be used in the USMM patient population, however it could be tested for use in other similar populations. Important predictor variables - We tested all avai lable predictor variables including demographics, socioeconomic status, comorbidity, functional, laboratory tests and other variables such as surprise question, smoking status, number of lab tests ordered, and the number of medications. However, we should note that some of the previously known predictors of the adverse outcomes (such as number of hospitalization in the past year, decline in IADLs, recent fall) were not available for the analysis in this cohort of the USMM data. ADL -decline, age, race, SQ, KPS, and dual -eligibility were important predictors of both outcomes - mortality and hospice admission. Functional impairment in ADLs have been shown to be predictors of adverse outcomes in hospitalized older adults. (112,113) Our findings is consistent with previous studies in showing that ADL -decline is an important predictor of death and hospice admission in the USMM patien t population. 79 Examining the parameter estimates and p -values of final model, ADL -decline, serum albumin and cholesterol were the strongest predictors of mortality outcome in this cohort. This information can be useful in designing the interventions that f ocus on the nutrition, inflammatory status, and functional empowerment of older adults when they are at lower risk of mortality. Also, these findings can help USMM modify their policies for the timing and frequency of lab tests and assessments of patient function. Interestingly, serum albumin and cholesterol values were not selected in the final model for the hospice outcome, but ADL -decline was an important predictor of hospice outcome. This observation suggests that impaired functional status is more like ly to be a reason for hospice referral than the biochemical laboratory tests. Although low levels of albumin and cholesterol were associated with higher mortality rate, surprisingly, having a history of hyperlipidemia correlated with lower mortality rate in this cohort. It can say that the low cholesterol level in this population of older and/or frail patients represents a worse global health status than a history of hyperlipidemia which might be mild or treated appropriately. Specifically, statins as a li pid -lowering class of medication are proved to increase survival in CVD patients. (114,115) Low cholesterol level can show a poor nutritional status either due to poor general h ealth or due to an underlying disease. (116) Surprisingly, history of depression in this cohort also had a protective effect on the mortality outcome. Unobserved confounders may be an explanation for this observation. Also dual eligibility is associated with lower rates of death and hospice admission. Variable selection methods - We tested and compared different methods of variable selection. We applied commonly used automated methods such as stepwise, backward and forward selection as well as more advanced methods including adaptive lasso and elastic net; however the advanced methods did not show superiority over the conventional selection metho ds in this dataset. One of the main benefits of the advanced penalized selection methods (like adaptive lasso and elastic net selection) is when faced 80 with high dimensional data sets with numerous predictors and a relatively small number of observations. A nother advantage of these methods is in the datasets with highly correlated predictor variables. (102) However, none of the two conditions were present in our dataset. Importance of missing data - We evaluated the association of missi ngness on several variables with the outcomes. Variables that had less than 20% missing observation were analyzed in the univariate and multivariable analysis for model development. We found a strong association between missingness in these variables (race , TUG, SQ, ADL, living alone, albumin, and cholesterol) and mortality (Table 2.6). A possible reason is that the patients who died were too sick at the time of the visit to be interviewed and evaluated thoroughly and so some of the variables were left miss ing. To assess the impact of missingness on the variable selection, we applied a multiple -imputation approach and repeated model development. We used different variable selection methods in the imputed data, however the models™ performance in the imputed d ata were not as good as in the available case data. As mentioned before, the assumption in the multiple imputation is missing at random while in the USMM data there are evidences that suggest missing is not at random. This can explain why the model™s perfo rmance is worse in the imputed data than available data. Variable selection in the imputed data resulted in a larger number of variables selected (15 vs. 9) but only a slight improvement in discrimination compared to the available case data. Also, when the selected model developed from the imputed data was applied to the available case data, the AUC was actually slightly lower relative to the AUC of the final manually selected model developed in the available case dataset (0.755 vs. 0.763). Application of t he developed model - Comparing the results of this study to the approaches that are currently practiced by the USMM providers, showed the superiority of our multivariable models Œ regardless of their exact specification. Using the two multivariable logisti c models, the AUCs for mortality and hospice admission were substantially higher (0.763 and 0.735) compared to the two alternative approaches, USMM proposed 3 -level risk model (0.599 and 0.588), and surprise question 81 only model (0.555 and 0.589). Comparing different cutoff points for our model and the USMM proposed 3-level model, sensitivity and specificity estimates of our model is similar or higher than the current model for both outcomes. Consequently for any cutoff point this model can help providers to better manage the cost of services and patients benefits by reducing the number of false positives or false negatives. The optimal cutoff point can be selected by providers and policymakers based on the cost of misclassification of cases. If the cost of m isclassification of a high -risk patient is higher than the misclassification of a low -risk patient then the more relaxed cutoff point (top 20% instead of top 10% of probability) is appropriate and vice versa. The cost and benefit of services are dependent to different factors, such as patient™s benefit from receiving or harm from neglecting a given service, cost of that service for a provider and the insurance companies, alternative options for that service, and available resources. Thus cutoff point should be explicitly selected by the USMM providers. Furthermore, the population can be grouped in more than two risk levels when needed; especially if different levels of services are available for example palliative care, hospice referral, home health care, and use of preventive services. This approach can also be useful in a clinical setting, for example prediction of 1 -year mortality risk can be beneficial to make the decision for offering a screening procedure to an older adult patient. Our risk stratific ation model can be especially advantageous for advanced care planning by identifying patients who are at high risk of mortality or hospice admission. Considering the patients and caregivers goals, different services can be offered to patients at different levels of risk. To use the developed model in the practice, a statistical software (e.g., SAS, R) will be used to integrate the final model into the USMM database. By programming the model into APRIMA , the model can be run on all observation at each new d ata entry time; then a prediction probability of death and hospice will be made for each patient. The patients will be stratified in different risk groups based on the threshold that will be determined by the USMM providers. Finally high risk patients will be flagged and 82 brought to the attention of their physician for further assessment. Clinicians can discuss different services with patient and caregiver to make the decision that is aligned with the patient™s goal of treatment. o Strengths This study has been conducted in a unique population of USMM patients. The database richness allows us to use a broad array of potential predictor variables (e.g., demographics, clinical, functional status, medical history, and lab tests) to develop the model, whereas i n the studies that use billing data to build a prognostic model these data are accessible. Moreover, in this study we used and compared several alternative variable selection methods including new methods such as adaptive lasso and elastic net variable sel ection. We also used multiple imputation to manage the missing data and evaluate the impact of missing observations on the model selection. o Limitations Although the USMM database is relatively very rich data set in both quality and quantity of the inform ation collected from the patient population, there are still variables that were not usable due to missing observations. Valuable information was lost on functional status including a decline in IADL function since the last visit, a decline in global healt h since last year, falls, hospitalizations and ER events. Another limitation of this research was the problem of missing data. One assumption of MI PROCEDURE is that missing is at random. We confirmed that the missing mechanism is not MCAR (missing complet ely at random). Although it is not possible to statically distinguish between MAR and MNAR, the strong association between the missingness on predictors and the outcomes suggests the MNAR mechanism. We used multiple -imputation in this data regardless of th e MAR assumption. There were two comorbidity variables excluded from the analysis because the number of patients with the comorbidity was too small or zero. Finally, we validated our model using the validation data that is originated from the USMM database . We did not use any external population to evaluate the accuracy of the model. 83 Conclusion We developed prognostic models for prediction of two adverse outcomes, mortality and hospice admission among the population of community -living home -bound older adults. Both models showed a significantly better performance than the current risk stratification approaches used in this population. These models also demonstrated comparable or better discrimination compared to the similar prognostic models published in the literature. These two models can be used for risk stratification among older adults in different settings (i.e., community -living, nursing home, rehabilitation centers, and hospice), also can be useful in other epidemiological studies to adjust for ba seline risk factors among such population of older adults. Future studies are required to validate the models in external population. Furthermore other risk stratification models can be developed in this population trying to improve the prognostic models. Survival analysis of time to event and using machine learning techniques to reveal the possible nonlinear relationships between the outcome and predictors will be included in subsequent chapters. 84 CHAPTER 3. Random Forest Model Introduction The population is aging faster than any other time in history. (9,10) Increasing age is associated with a high prevalence of chronic diseases and multiple comorbid conditions, which often require long term care and frequent utili zation of health care. (11,12) Health care expenditures are disproportionately higher in the older population than w orking -age patients. (14,15) The cost of health care for older adults imposes a considerable burden on the health systems and government through the Medicare program. (13) The increasing number of older adults and growing need for services among them along with limited resources necessitate the alloca tion of services to those who benefit the most. To align the appropriate levels of services with patients™ needs, risk stratification methods are required. Using statistical methods, one can develop a prediction model for an outcome - such as 5 -year mortal ity - and stratify patients based on their probability of that outcome. Then appropriate services can be allocated to each level of risk. Different risk stratification approaches have been developed for the prediction of different outcomes, including morta lity, readmission, relapse, or complications of specific diseases. There are also risk stratification models developed among older populations with a specific condition (e.g., diabetes, cancer, cardiovascular diseases) or in a specific setting (e.g., emerg ency room, surgery, nursing home) often to predict mortality, readmission or complications. (19,46,117,118) There are also a few risk stratification models developed in the community living older populations regardless of any specific condition. (48,49,52,57,64) Some of these models were reviewed in the background section of chapter two. In chapter two, a logistic regression analysis was applied to develop a risk stratification model among a subset (derivation data) of a cohort of community living homebound older adults to predict the risk of mortality and hospice admission using a set of explanatory variables. The accuracy of the model was evaluated by using the area under the ROC curve or C -statistic, and the model was 85 validated using a validation subset (derived from the same database). In this chapter, I will use a Machine Learning (ML) technique to develop a risk stratification model among the same population and compare the performance of this model to the previous logistic model. Derivation data is often called derivation data in ML algorithms, however to be consistent with other chapters, we used the same terminology (‚derivation™ data) for the subset of data in which the model is developed. In the next section, I provide a brief introduction to the random forest method , which is the machine learning (ML) algorithm used in this chapter. It is followed by a literature review, methods and materials, results, and discussion se ctions. Main concepts and definitions The development and use of big data are rapidly growing in medicine and public health, like many other industries. (119) Traditional statistical methods may not be sufficient for the analysis of these big data. The enormous sample size and high dimensionality of big data bring new statistical challenges, including noise accumulation (i.e., too much unexplained variability within a data sample), spurious correlation, incidental endo geneity (i.e., when the predictor variable is correlated with the error term), and measurement errors. (120) Likewise, big medical data introduce problems such as multicollinearity (i.e., multiple correlation between predictors or independent variables), model complexity , the comp utational cost to fit models, and model overfitting (i.e., decreased generalizability of the model). (121) Machine learning algorithms are becoming popular in big data analysis and are increasingly used in biomedical research as well. In the following sections some examples will be described. o Machine learning Machine learning (ML) is an application of artificial intelligence that allows computers to automaticall y learn and improve the algorithm without being explicitly programmed. The process of learning begins with the input data; the algorithm searches patterns and makes predictions using an iterative approach 86 in order to improve future decisions. ML techniques include a wide range of statistical methods that can be used to describe associations, search for patterns, and make predictions. ML algorithms are being increasingly used in biomedical research. There are two main methods in ML: supervised and unsupervis ed learning. Predicting an outcome based on a set of explanatory variables that are specified by data scientists is referred to as supervised learning. Whereas unsupervised learning refers to the exploration of associations or detection of patterns among v ariables regardless of a specific outcome. There are numerous different ML algorithms; neural networks, random forests, Bayes net, and support vector machines are a few examples. Random forests have been previously used in biomedical studies for the devel opment of prediction models. (65,119,122) o Machine learning in prediction models Prediction models are central in medicine and are utilized in everyday decision -making by physicians for prediction of diagnostic or clinical outcomes. (22) These models ar e often used in medical research to predict the outcome of a disease, result of a diagnostic test, the outcome of a new treatment, complications of an illness, or survival of the patients. Risk prediction usually relies on parametric regression methods, su ch as logistic regression or generalized linear model. However, new approaches such as ML techniques have been introduced in epidemiologic studies as well as in many other medical and non -medical disciplines. (74) Machine learning is often used without having any specific hypothesis regarding the association of the variables or the pattern of the associations; thus it is an excellent approach to explore the important predictors of an outcome in scenarios where they are many explanatory variables. ML algorithms are specifically preferred when the number of explanatory variables in the data is considerably larger compared to the number of observations - also referred to as big data. (65,123) Several studies have compared different machine learning algorithms to develop prediction models and determined that the random forest algorithms have better performance than other machine learning approaches such as support vector machine (SVM) and Bayes net. (73,124 Œ126) 87 - Decision tree Decision trees (also known as classification and regression trees) are recognized as powerful tools for prediction models. (22,127) Recursive partitioning is the core idea in constructing a decision tree, and it involves dividing data set into subse ts based on several independent variables or rules in order to correctly classify members of the dataset. Each tree is made of nodes, branches, and leaves. The structure of an example decision tree is shown in Figure 3.1. Figure 3. 1. The schematic structure of a decision tree Variable X is the first variable that splits the study population the best; Variable Y is the next best variable; Classis I -IV are the leaves and represents different classes or groups of risk predicted for the outcome of interest; Branches are rules that connect a node to its child nodes (internal nodes and leaves) based on the value of a predictor varia ble; Nodes are points of decision in a decision tree where a predictor variable splits the study pop ulation into subgroups (named child nodes) based on their observed data for that predictor. In other words, each node tests the data on an attribute (predictor). Branches are the outcomes of that attribute. The first node is called root node and uses or id entifies?? the best predictor to split the cohort into two or more child nodes based on the optimal separation (maximum separation of the subgroups and minimum variability within subgroups in respect to the outcome). (22) Internal nodes (child nodes) then are split 88 again using the next best predictor at each node. The split will be repeate d for each child node until the algorithm reaches the final decisions or classification nodes were obtained, also called leaves. o Random forest Random forest is a data mining algorithm first proposed by Leo Breiman in 2001. (68,69) It combines several (to potentially many) decis ion trees (ranges from 10 to thousands) and generates predictions by averaging over all the trees in the forest. (124) The term forest represents the numerous replications of decision trees. Each decision tree is developed in a randomly selected subsample (bootstrap samples) of the derivation population. Figure 3.2 displays the random forest algorithm, as pre sented in machine learning literature. (68) Bagging or bootstrap aggregation is a ML techniques for reducing the variance of an estimated prediction. In general, when several bootstrapped samples of the original data are constructed, and separate decision trees are trained in each subsample, averaging over these trees is referred to as bagging or bootstrap aggregation. Random forest is a fundamental modification of bagging; in the bagging technique, when constructing the decision trees, all predictors are searched at each split -point, and the best splitting variable is selected. However in random forest method, building the tress involves an additional step of randomly sampling predictors at each node. In other words the difference between random forest and bagging is that in a random forest at each split -point of each t ree, the optimal splitting variable will be selected from a random subset of all predictors. This method minimizes the correlation between trees and increases bagging accuracy. The number of predictors that are tested at each split -point can be specified a s a parameter in the model and is often calculated as m= Square root (p) where m is the number of randomly selected predictors, and p is the number of all predictors. For each bootstrap sample from the derivation dataset, there are samples left behind and not included in the model construction. These samples are called out -of -bag (OOB) samples. The performance of each model, when averaged over its OOB samples, is a good estimation of the model accuracy. The OOB 89 average square error is a good estimate of tes t error rate and is generated for each tree by default in SAS output. Figure 3. 2. Random forest algorithm for regression and classification Source: figure adapted from ‚The Elements of Statistical Learning™ by Hastie, Tibsh irani, and Friedman (68) The bootstrap samples (Z) are randomly sampled with replacement in the derivation dataset. A decision tree is developed in each Z subsample using a randomly selected set of predictors. When random forest is built, to make a prediction for a new observation (x), the observation (e.g., a patient in our study) passes through all trees and the predictions from all trees are aggregated. When the outcome is interval (continuous), the prediction is the average of all trees. When the outcome is a classification variable, the average is determin ed by the majority vote, which means among all the trees, the class that has been predicted most often for observation (x), will be the RF prediction for this observation. 90 Random forests became popular during the past decade as a statistical method in many scientific fields. (69,128) RF -based methods are used for risk stratification and for identifying important variables among a large number of potential predictors. Parametric linear regression models are powerful statistical methods to explore the relationship between explanatory variables and the response in a linear fashion. These approaches generate a single model to fit the full data set. However, when the data has many e xplanatory variables with complex interactions, building a best single linear model can be very difficult. Random forests are proposed as an excellent alternative approach for datasets with a large number of predictors and the potential for complex interac tions between them. (68,128) The structure of the decision tree permits a variable to be selected in multiple splitting nodes at different depth of the tree, also a single variable can split the nodes using different rules for different nodes. These specific structure in the RF gives it the strength to manage complex interactions. Also non linearity in the data, which requires the use of polynomial terms in parametric models, can be also handled in rand om forest. Overfitting of a model often happens in a single decision tree when the tree grows deep. It means the model is too specified to the derivation data so that its generalizability to external data is weakened. Overfitted models often have poor perf ormance when applied to validation data. Random forest overcomes this problem by averaging over hundreds of different de -correlated trees which prevents overfitting. Averaging over the trees also diminishes the sensitivity of the trees to the noise (meanin gless data) so long as the trees are not correlated, and the use of bagging (bootstrap aggregation) in the random forest algorithm prevents correlation between the trees. (68) Unlike the regression models, in construction of a random forest model, observations that are missing data on one or more independent variable s are not excluded from the analysis. There are different ways to manage the missing data in random forest development. The default is that missing observations are 91 included as a separate category in the model building process. Therefore no observations ar e excluded from the analysis. Machine learning methods, including the random forest approach, are designed to make the most accurate predictions possible, and they have demonstrated high predictive accuracy. (128) However, to gain this accuracy, random forest models do not output the same metrics as regression models. (129) A logistic regression model for example, provides beta coefficients (and odd s ratios) that indicate the magnitude and direction of the predictor impact on the outcome. On the other hand, random forests do not provide conceptual equivalents to regression coefficients or measures of effect for each predictor. In fact, being non -line ar, the sensitivity of a random forest model's output to the independent variable is not straight -forward to formulate. Instead, random forest models output ranked importance tables for the predictors. Importance of the predictors are ranked using differen t methods according to the frequency and order that each predictor was selected. Therefore, to compare random forest models with linear prediction models such as logistic regressions, discrimination metrics (i.e. AUC) and misclassification rate (defined as the fraction of study population that are misclassified Œ [false positive+ false negative]/total) are used as standard measures of model performance. Compared to a single decision tree, random forests are more generalizable to new data, because the most i nfluential predictors are selected by growing a large number of trees. However, unlike a single decision tree, the results of random forests are not interpretable as a defined set of decision nodes, rather as ranked importance of predictor variables. In ot her word, RF does not output a single tree that can be used manually to classify an observations based on the splitting nodes. Random forests are used to rank the variable importance; thus, one can identify the essential predictors but not the relationship between them. The main strengths of the random forest approach can be summarized as below: 92 1. Suitable for nonlinear data (where not a single linear model appropriately fits to the data) 2. Avoids overfitting of the data, which is a drawback of a single decision tree 3. Provides the rank of importance of explanatory variables 4. Is robust to noise 5. Avoids exclusion of observations with partly missing data 6. Excellent for complex interaction and highly correlated data (69,128) - Random forest construction parameters In random forest models, both the outcome and explanatory variables can be categorical or quantitative (continuous). A random forest model can also handle missi ng observations on explanatory variables as legitimate values, so unlike logistic regression, the observations with partly missing data on explanatory variables are not excluded from the analysis. However, observations with the outcome missing will still be excluded in the RF method. The number of trees in each forest and the depth that each tree grows can be specified at the model building phase. In addition to these two parameters, there are other optional parameters that can be specified in the model to control the characteristics of the RF; for example the minimum number of observations in the leaves can be specified. However the two main determinant of the model are number of trees and the depth of trees. An increasing number of trees increases the acc uracy of the estimated probability because it represents the averaging over all the trees. The depth of a tree determines how deep the model can develop, which means how many times a tree can be split sequentially in classification until reaching the final nodes (leaves). In other words, the depth of a tree indicates how much the model fits the data. As the depth of the forest increases, the leaves will have a smaller number of observations, and the model risks overfitting the derivation data. A common mist ake is to develop shallow trees (i.e., with very few splitting levels such as 3 or 4 consecutive splits) in order to avoiding overfitting the model. In fact deep trees with some degrees of overfitting are preferred to 93 the shallow ones, because averaging ov er all trees prevents the problem of overfitting of the forest to the derivation data. - Variable importance As explained previously, random forest models do not provide coefficient estimates for the explanatory variables, consequently there is no P -values in the RF output to measure the significance of a predictor. Therefore, there is a need for alternative measures to evaluate the significance of independent variables in a RF model; this alternative measure is the ranked importance table of independent var iables. Importance of a variable is the contribution of it to the model success Œ where success is defined as the accuracy of the predictions. Generally, prediction models rely mostly on a few predictor variables, although they may include many independent variables; a good measure of importance is the one that identifies those few essential predictors among all the predictors. Identifying the importance of variables helps to understand the relationships between the predictors and the outcome better. Also, random forest can serve as an initial step to select relatively influential predictors from a list of all possible predictors. Then other model development strategies can use the predictors that are selected in RF. (23) There are different measures to rank the importance of variables, for instance, Breiman's method, loss reduction, Stroble's method, and random branch assignment (RBA). Breiman and Stroble methods are computationally intensive and so often have a long run ning time. Loss reduction method is less intensive, however it is biased towards the correlated predictors and so inflates the importance of correlated variables at the expense of other independent variables. The importance of a variable using the loss red uction method is proportional to the sum of the impurity measures, summed over all the nodes that the variable is splitting. Impurity is often measured by Gini splitting criterion, so this method is also named Gini impurity or Gini increase. Impurity repre sents how well the tree split the data. Gini Impurity measures how often a randomly chosen subject from the derivation data set will be incorrectly labeled (regarding the outcome) if it was randomly labeled according to the distribution of labels in the 94 da taset. ‚ Proc HPforest™ replaced the word ‚ impurity ™ with ‚ loss ™, to show the reduction in loss from usin g the model. The random branch assignment (RBA) is the most recently developed method to rank the variables importance and it has advantages over other methods without their drawbacks. It was introduced in 2014 by Neville and Tan (131) and was claimed to satisfy the objectives of the previously developed methods (Breiman and Strobl) (128,132) while avoiding the problems of inflating the importance of correlated variables and i ntensive computation. Compared to loss reduction method of ranking variable importance, the RBA method diminishes the inflation of correlated variables and so results in the most accurate ranking of the predictors. The RBA method measures the importance of each variable by replacing the splitting rule with a randomized rule in the nodes that involve the variable. When the model is structured in the derivation data, the proportion of observations in each node are saved. When evaluating the variable importanc e in RBA, the observations that reach the node will be randomized to the branches with a probability proportional to the observed proportions in the derivation data. Then the model fit is compared to the fit statistics of the model with the variable includ ed. The importance measure is proportional to the randomized fit minus the model fit without randomization. In this study, the table of ranked importance of predictors was generated using both loss reduction and RBA methods. Literature r eview Machine lea rning literature has been expanding in the past few decades. Also, biomedical researches have increasingly used ML techniques. There are studies that compared the use of machine learning methods and traditional parametric regressions and often have found t hat the performance of machine learning methods in risk prediction was superior to the parametric regression methods. Among Machine learning techniques, random forest has been used frequently in biomedical researches (65) because of its strengths that were summarized above. The overall goal of literature search in this chapter was t o 95 find examples of studies that utilized ML techniques in the development of prediction models for adverse outcomes (specifically mortality) among a population of older adults. Search Methods - Searching Pubmed for the word ‚random forest™ resulted in about 8500 hits (when using no other limits). Searching the words ‚machine learning™ and ‚prediction™ also resulted in the similar number of hits. Limiting the search words to ‚random forest™ and ‚risk stratification™ in the title or abstract of the paper, redu ced the results to 50 hits. Searching for ‚risk stratification™, ‚random forest™, and ‚mortality™ found 7 results, none of them were relevant to the community living older adults. Adding ‚older adult™ or ‚elderly™ to the search resulted in no findings. Sea rching the three words ‚random forest™, ‚risk stratification™, and ‚elderly™ in all fields resulted in 34 findings. Searching the Google Scholar database with the words ‚random forest™, ‚risk stratification™, ‚mortality™, and ‚elderly™ resulted in more tha n 20000 hits. Reviewing the first 50 hits in Google scholar and the 34 findings in Pubmed, along with forward and backward reference searching of any relevant article, I found four studies that fit broadly into my original goal for the literature review i. e., to identify studies that utilized a random forest algorithm to develop a prediction model for an adverse outcome. I did not find any studies that are exactly comparable to this thesis topic in terms of prediction of mortality in community -living older adults. Khalilia et al. (75) utilized the Healthcare Cost and Utilization Project (HCUP) d ata set to develop prediction models. They compared the performance of random forest and three other ML methods (Specifically SVM, bagging, and boosting) in predicting the risk of the following eight disease categories: breast cancer, diabetes with no comp lication, diabetes with complication, hypertension, coronary atherosclerosis, peripheral atherosclerosis, other circulatory diseases, and osteoporosis. Diseases categories are developed by HCUP and are based on a combination of diagnosis and procedure ICD codes. Random forest outperformed the other three models for prediction of seven out of eight disease categories when comparing AUCs. (75) Schneider et al. (133) studied mortality risk in acute cholangitis 96 patients. They developed e leven different risk prediction models, including logistic regression with stepwise variable selection, generalized linear models with lasso penalties, and random forest model. They found that the random forest model had the best predictive performance (AU C=0.92). Weng et al. (77) studied the performance of 4 Machine learning techniques to predict cardiovascular risk and compared the performance to the American College of Cardiology guidelines for prediction of the 10 -year cardiovascular event. (134,135) They concluded that the performance of machine learning algorithm was better than the established approach which was based on the prediction o f the risk of future CVD based on the well -known risk factors such as hypertension, cholesterol, diabetes, and smoking (coefficients from a proportional hazard model) . Chong et al. (76) compared the performance of the machine learning approach and multivariable logistic regression in prediction of the diagnosis for pediatric traumatic brain injury in the emergency room patients. Their results demonstrated that the machine learning model had better AUC (AUC=0.98 v s. 0.93), sensitivity, specificity, and predictive values than the logistic regression model. Rose et al. (74) developed a super learner algorithm to predict mortality in a population of adults 54 years and older. The super learner is an ensembling machine learning approach that combin es multiple machine learning algorithms into a single algorithm and gives the prediction with the best (lowest) mean square error. They demonstrated that this super learner algorithm had better performance than every single algorithm. Peng at al. (136) stu died the performance of random forest in prediction of 30 -day mortality in patients diagnosed with spontaneous intracerebral hemorrhage (ICH). They found that RF model (AUC=0.87) outperformed other models including logistic regression model (AUC=0.78), art ificial neural network algorithm (AUC=0.81), support vector machine (AUC=0.79), and ICH score (AUC=0.72). (136) In chapter two of this dissertation, a logistic regression model was developed for risk stratification in a cohort of USMM patients. About one -third of the observations were excluded from the logistic regression model due to missing data on one or more exp lanatory variables. Multiple imputation 97 approach was applied to overcome the missing data issue. However, the model developed in the imputed data did not improve the predictive performance (AUC); in fact, it was a slight decrease in the AUC of the model fo r 1-year mortality when applied to the imputed data (AUC in imputed data=0.75 vs. AUC in available data= 0.76). In this chapter, a random forest algorithm is used to develop a risk stratification model with the intent of improvement in the model performanc e. My hypothesis is that a random forest model will have a better performance than the logistic regression model because unlike the logistic model, random forest: 1. handles the missing data and so uses many more observations, 2. accounts for the potenti al non -linearity in the relationships between the explanatory and/or outcome variables, 3. able to fit data with complex interactions (which is common in the biomedical data). Methods and materials Data source - we developed the random forest model utili zing the same dataset and the same study population that were used in Chapter two. Study population - the 2015 cohort was defined as all patients who had their first ever medical visit by a USMM provider between January 1 st and December 31 st, 2015. The USM M data was linked to the claims data, and those patients who had claims data were included. To have comparable results to the logistic regression model (Chapter 2) the cohort was limited to the patients who have been followed up for at least 365 days or wh o had an outcome (death or hospice admission) within a year of their first USMM visit. Figure 2.1 displays the flow diagram of the study population. Outcome - 1-year mortality was determined if a date of death was recorded in the claims data within 12 month s of the first USMM visit. Likewise, 1 -year hospice admission was determined according to the recorded date of first hospice service in the claims data. Claims data was processed data and included the intervals of hospice or home -health services (2 weeks p eriod). Therefore the first date of earliest 98 hospice service was considered as the date of the outcome. If death happened in hospice, both outcomes (death and hospice admission) were analyzed in the respective analysis. Exposure - Variables with less than 2 0% missing observations were considered for the analysis. These data were collected from the baseline visit for each patient. Random forest model can handle missing values on explanatory variables; however, I limited the independent variables to those with less than 20% missing to have a comparable data set for both random forest and logistic regression models. A total number of 41 variables (the same as Chapter two) were analyzed as predictors, including demographics : age, gender, race; socioeconomic status: insurance status representing if a patient has dual eligibility for both Medicaid and Medicare, living alone, smoking; functional status : functional decline in ADLs, timed up and go (TUG), Karnofsky performance sc ale (KPS value); lab tests : serum albumin, cholesterol; and other variables: having a pressure ulcer, surprise question answer, number of medications, and number of lab test ordered by the provider. There are 24 medical history variables (CCW variables) as listed in Chapter two. These 41 variables are the same predictor variables that were used in chapter 2. o Statistical analysis The analyses for this study was done using SAS software (SAS Institute Inc., Cary, NC, version 9.4). The data were randomly split into two equal size cohorts, derivation and validation. The two main SAS procedures used for random forest modeling are ‚ Proc HPforest™ and ‚Proc HP4score™ . ‚HPforest™ procedure generates random forest in the derivation data. The number of trees and the ma ximum depth are specified in this procedure. The ™HP4score™ procedure is used for both scoring the validation dataset ( ‚score™ statement) and ranking the variable™s importance ( ‚importance™ statement). Scoring the validation dataset means applying the model that was developed in the derivation dataset to the validation dataset. The model is applied to all observations (even those with missing data), and the prediction is generated for each. The two statements, ‚score™ and ‚importance™ cannot be specified at 99 the same time in ‚ proc HP4score™. Therefore the ™HP4score™ procedure is separately specified for each of the two statements. In this analysis, the random forest model was developed in the derivation data using ‚ proc HPforest™ . The developed model was then applied to the validation data using ‚ Proc HP4score™ . The predicted probability of the outcome is computed individually for each patient. Receiver operating curve (ROC) and area under the curve (AUC) were generated as an indicator of discrimination of the model in both derivation and validation data sets. A random forest model has two main parameters that can be specified in the model development phase, the number of trees (MAXTREES) and the depth of them (MAXDEPTH). The number of trees determines how man y decision trees at maximum are developed in the forest. The default number is 100 trees. The MAXDEPTH option specifies the maximum depth of a node in each tree of the forest. It is the number of splitting rules needed to define a node. Therefore the root node has a depth of 0, and the children of the root node have a depth of 1 and so on. The default depth is 20. (130) To find the optimal number of trees and depth of them in random forest analysis, the developed model was repeated with different numbers (1, 10, 50, 100, and 200) and depths (2, 10, 20, and 30) of trees and the ROC and respecting AUC for each model were generated. The number of trees and depth that result in the highest AUC were selected as the model parameters. In deriva tion data set, increasing the depth and the number of trees always increases the AUC, however, when the model is applied to the validation data, there is a reflection point after which the AUC will not increase anymore. At this point, the model begins to o verfit the derivation data; thus, the discrimination decreases in the validation data. Table 3.1 and Figure 3.2 in the results section demonstrate the AUCs of the random forest model with different numbers and depths of trees. Additionally, to demonstrate the effect of an increasing number of trees on the accuracy of the model, a fit statistic (average square error) was also plotted against the different numbers of trees for the full 100 data and out -of -bag observations (Figure 3.5). Out of bag average square error is computed among the observations that were not used to train the decision tree. To validate the trained predictive model, in addition to generating a validation AUC, I evaluated the model accuracy by applying it to the validation data and calculati ng the misclassification rate (test error rate) in predicted outcomes. A 2x2 table is generated from the predicted and observed outcomes. Misclassification rate is calculated by adding the number of false positive and false negatives and divide the sum by the total number of cases. The tables of ranked importance were generated for all predictors using both methods: loss reduction, and random branch assignment (RBA). Similar to what was done in the logistic regression model (second chapter), calibration plo ts and Hosmer -Lemeshow goodness of fit test were performed using the predicted probabilities generated by the random forest model in the validation dataset. The random forest model was compared to the logistic regression model that has been developed in th e same cohort, previously. The model performances were compared using AUC and misclassification rate between the two models. ROC for the two models was drawn in a single plot to make the comparison easier. Additionally, to evaluate the performance of the R F model compared to the logistic regression model regardless of the number of observations included in the analysis, the RF model was also applied to the imputed data. In the second chapter of this dissertation, multiple imputation was used to impute the m issing data, and then the logistic regression model was applied to the imputed data. In this chapter, the RF model was applied to the same imputed data as used in the logistic regression chapter. The AUC of the two models in the imputed data was then compa red. Moreover, to include a fair comparison between the RF and logistic models, two more analysis steps were done. First the RF model that developed in the derivation data was applied to the exact same number of patients in the validation cohort that h ave been used in the logistic model (i.e., those with no missing observation on any of the model predictor variables). Second, the logistic model was applied to 101 the validation cohort while missing observations on the predictor variables were recoded as a l egitimate category. The AUC of the models were then compared to assess whether the inclusion of missing observations induce the difference between the two models™ performance. Results o Study population The study cohort consisted of 7445 patients who had their first medical visit by a USMM provider in the calendar year of 2015 and were followed up for at least 12 months. Figure 2.1 displays the flow diagram of the study population. The patients were 66% female, 63% white, the average age was 82 ± 9 years, 99% had Medicare coverage, and 27% were dual eligible (both Medicare and Medicaid). Functional status of the patients measured by KPS demonstrated severe disability (KPS 40 Œ defined as the need for essential assistance and specialized care) in 54% of the patients. Prevalence of hypertension, hyperlipidemia, diabetes, and cancer were 81%, 50%, 34%, and 8% respectively. Over 50% of patients had 5 or more medical conditions. The study population is the same as used in chapter two (logistic regression). Table 2.4 demonstrates the population characteristics. The minimum and maximum follow up time for this cohort were 1 and 865 days, respectively; with average (standard deviation) of 413 (210) days and median (interquartile range) of 444 (q1=244, q3=581) days (T able 2.5). Overall, during the total follow up time, 45% of the cohort died, and 19% were admitted to the hospice. However, the 1 -year mortality and hospice admission rates within the first year of follow up were 32% and 10%, respectively. Among hospice -admitted patients, 765 (55%) died within three months of their admission. Overall 2680 deaths (80% of all deaths) occurred outside of hospice. 102 o Outcome: one -year mortality - Random forest development Figure 3.4 and Table 3.1 demonstrate the s ensitivity of the model's AUC to the two random forest hyper -parameters, the number of trees and the depth of trees. In the derivation data set, increasing the depth and the number of trees always increases the AUC (Table 3.1) . Figure 3. 3. Impact of RF hyper -parameters on the AUCs of the random forest model applied to the validation dataset Œ 1-year mortality Colors and patterns indicates different number of trees The number of trees varies between 1 and 200 a nd is indicated by colored lines The depth of trees varies between 1 and 50 and is indicated on the X axis Applying the model to the validation data reveals the point of reflection which corresponds to the optimal number of trees and depth of the forest. F igure 3.4 shows the AUCs for the different number of trees (1, 10, 50, 100, and 200) and depth (2, 10, 20, and 30). The vertical axis is the AUC, and the horizontal axis is the depth of trees. Patterns represent the different number of trees. It shows the reflection point at a depth of 10, however when the number of trees is greater than 100, there is no significant change in AUC after the depth of 10. 103 As shown in Table 3.1, in this data set the optimal number of trees and depth are 200 and 10respectively, because they resulted in the highest AUC, although the difference in the AUC between depth 10 and 20 and between numbers of trees 100 to 200 is small. Therefore the selection of depth and number of trees depends on computational aspects, i.e., for the larg e data set and low capacity machine, 50 trees with the depth of 10 will be as sufficient as a model with 100 trees with the depth of 20. Table 3. 1. AUC from random forest model in derivation and validation data sets using diffe rent depth and number of trees - mortality outcome AUC Max -Depth=2 Max -Depth=10 Max -Depth=20 Max -Depth=30 Derivation N-Trees=1 0.6883 (0.67 - 0.70) 0.8132 (0.80 - 0.83) 0.8519 (0.84 - 0.87) 0.8524 (0.84 - 0.87) N-Trees=10 0.8214 (0.81 - 0.84) 0.9226 (0.91 - 0.93) 0.9809 (0.978 - 0.984) 0.9820 (0.979 - 0.985) N-Trees=50 0.8340 (0.82 - 0.85) 0.9409 (0.93 - 0.95) 0.9924 (0.991 - 0.994) 0.9933 (0.992 - 0.995) N-Trees=100 0.8352 (0.82 - 0.85) 0.9453 (0.94 - 0.95) 0.9943 (0.993 - 0.996) 0.9949 (0.994 - 0.996) N-Trees=200 0.8342 (0.82 - 0.85) 0.9453 (0.94 - 0.95) 0.9951 (0.994 - 0.996) 0.9957 (0.995 - 0.997) Validation N-Trees=1 0.6696 (0.65 - 0.68) 0.7074 (0.69 - 0.73) 0.6664 (0.65 - 0.68) 0.6647 (0.65 - 0.68) N-Trees=10 0.7922 (0.78 - 0.81) 0.8101 (0.80 - 0.82) 0.7990 (0.78 - 0.81) 0.7990 (0.78 - 0.81) N-Trees=50 0.8077 (0.79 - 0.82) 0.8251 (0.81 - 0.84) 0.8224 (0.81 - 0.84) 0.8227 (0.81 - 0.84) N-Trees=100 0.8109 (0.80 - 0.83) 0.8286 (0.81 - 0.84) 0.8266 (0.81 - 0.84) 0.8268 (0.81 - 0.84) N-Trees=200 0.8106 (0.80 - 0.83) 0.8299 (0.82 - 0.84) 0.8291 (0.82 - 0.84) 0.8292 (0.82 - 0.84) AUC: area under the ROC curve; Max -Depth: maximum depth of decision trees in the random forest model; N -Trees: number of decision trees in the random forest model; Given the size of this data set, this analysis was not computationally intensive; thus, I chose to perform the analysis with the number of trees of 200 and the depth of 30. 104 The validation AUC of about 83% in the model with N -trees=200 and depth 10 indicates that the model has good discri mination ability. Compared to the validation AUC from the logistic regression model, random forest shows a 7% increase in the AUC (AUC 83% in random forest vs. 76% in the logistic regression model). To demonstrate the effect of an increasing number of tree s on the accuracy of the model, a fit statistic (average square error) was also plotted against the different numbers of trees for the full data and out -of -bag observations (Figure 3.5). As expected, the average square error for out of bag sample is higher than the one for the full data. Out of bag average square error is computed among the observations that were not used to train the decision tree. The average squared error turns out to be stable at the number of trees equal to 40 -50 for OOB sample. After this point, increasing the trees in the forest does not decrease the prediction error anymore. The conclusion from Table 3.1 and Figures 4.2 and 4.3 is that the number of trees equal to 200 is more than enough for building the forest in this data. Also, th e depth of 10 can be enough, although by increasing the depth to 30, there is no evidence of overfitting the model to the derivation data, i.e., the AUC in the validation data did not decrease meaningfully. Thus the parameters for the construction of RF we re selected as the number of trees=200 and depth=30. 105 Figure 3. 4. The average squared error of the RF model by the number of trees for both OOB (top line) and full data (lower line) - Variable importance All explanatory variables were evaluated for the importance, and the ranked importance table was generated using both the random branch assignment (RBA) and loss reduction methods. Table 3.2 contains the first ten ranked important variables in the random forest model base d on the RBA and loss reduction methods. Complete tables of importance measures for all variables using both methods are found in the appendix. The RBA results are provided for both derivation and validation group, whereas the loss reduction importance is reported only for derivation data. The reason is that loss reduction is a product of the ‚Proc HPForest™ which develops the model in derivation data, whereas RBA importance is computed in ‚Proc HP4score™ which applies the developed RF algorithm to any give n data including 106 validation and derivation. Among the highest ranked variables, TUG, Albumin, race, age, KPS, and cholesterol are consistently among the first ten important variables when looking at the RBA importance table. Variables ADL decline and SQ ra nked high in loss reduction method; however, as discussed in the background section, the loss reduction method can be biased towards the importance of correlated variables. The medical history variables (CCW variables) were ranked among the first ten varia bles, inconsistently. The number ten was selected arbitrarily to compare the results of the random forest 'Importance' statement with the final logistic regression model results. Table 3. 2. The first ten ranked important variabl es in the random forest model - Mortality outcome Ranked Importance RBA - Validation data RBA - Derivation data Loss reduction - Derivation data 1 TUG Answer Albumin result TUG Answer 2 Albumin result Cholesterol result Race 3 Race TUG Answer ADL decline 4 ADL decline Age Surprise question 5 KPS value Diagnosis -count CCW -Hyperlipidemia 6 Age Race Lives alone 7 Cholesterol result KPS value Tobacco use 8 Dual eligible CCW -Hyperlipidemia Albumin result 9 CCW -Depression Number of Medications CCW -Cataract 10 CCW -Chronic Kidney Disease Surprise question CCW -Alzheimer™s - Comparison to the logistic regression model There are different methods to evaluate the importance of variables in the logistic regression methods; standardization of the coefficients, odd s ratio, and Wald test results are a few examples. (137,138) None of the me thods is agreed upon by data scientists. I used the odds ratio and Wald test p -values to evaluate the importance of predictors in the logistic regression model developed in the second chapter. Considering odds ratios and parameter estimates in the logisti c model, ADL -decline had the largest effect among all the variables, following by albumin, race, SQ and cholesterol (Table 3.3). TUG was not selected in the final logistic model since it was not significantly associated with the outcome. It is noteworthy t hat in logistic regression analysis, more than 20% of the observations had missing value on 107 TUG variable and therefore were excluded from the analysis. On the other hand, the random forest model uses all observations in the analysis regardless of missing v alue on the explanatory variables. It can be a reason for the fact that TUG was not a significant predictor of outcome in the logistic regression model where it is first ranked important variable in RF. Table 3. 3. The variable importance in the logistic regression model, (by estimates and significance) - Mortality outcome Odds Ratio Estimates Parameter estimates effect Point Estimate 95% Wald Confidence Limits Parameter estimate P-value ADL -decline, Decline vs. No -change 0.790 0.577 1.081 -0.2356 0.1407 ADL -decline, Improve vs. No -change 0.096 0.023 0.397 -2.3422 0.0012 Albumin, 3.2 -<3.5 vs 3.8+ gr/dl 1.884 1.303 2.725 0.6336 0.0008 Albumin, 3.5 -<3.8 vs 3.8+ gr/dl 1.486 1.015 2.175 0.3959 0.0417 Albumin, <3.2 vs 3.8+ gr/dl 3.750 2.613 5.382 1.3218 <.0001 Race, Black vs. White 0.588 0.415 0.833 -0.5306 0.0028 Race, Other vs. White 0.442 0.197 0.991 -0.8156 0.0475 Surprise question, No vs. Yes 2.073 1.533 2.803 0.7289 <.0001 Cholesterol, 136 -<164 vs 195+ gr/dl 1.191 0.839 1.690 0.1747 0.3285 Cholesterol, 164 -<195 vs 195+ gr/dl 1.304 0.923 1.843 0.2658 0.1317 Cholesterol, <136 vs 195+ gr/dl 1.959 1.384 2.772 0.6724 0.0001 CCW -Hyperlipidemia Yes vs. No 0.531 0.417 0.676 -0.6334 <.0001 Age, 75 -84 years vs. 65 -74 years 1.711 1.180 2.481 0.5372 0.0046 Age, 85 -94 years vs. 65 -74 years 1.804 1.259 2.584 0.5898 0.0013 Age, 95+ years vs. 65 -74 years 1.602 0.953 2.693 0.4712 0.0755 KPS, Severe vs. Moderate disability 1.543 1.199 1.986 0.4340 0.0007 CCW -Depression Yes vs. No 0.654 0.478 0.896 -0.4244 0.0082 Dual -eligibility, Yes vs. No 0.687 0.509 0.929 -0.3751 0.0146 Sex, Male vs. Female 1.151 0.886 1.497 0.1411 0.2917* *Sex was included in the final logistic model, although the Wald test for its coefficient was not statistically significant A correlation plot was generated to evaluate the correlation of the predicted probabilities between the two models, logistic regression and RF (Figure 3.6). The correlation was strongly positive with the coefficient of 0.6512 and p -value=<0.0001 when compa ring the two models™ predicted values among the 2312 patients that were included in both analyses. Notably, the best correlation is in the lowest values of the predictions (i.e., probabilities 0.4). It means the two models agreement on the risk of 108 outcome is better in lower probabilities. In other words, the two models are consistent in identifying the low -risk group, but they are inconsistent in assigning the patients to the high -risk category. Figure 3. 5. Correlation of the predicted probability of death between the two models (N=2312) LR predictions are on the vertical axis, and RF predictions are on the horizontal axis AUC, sensitivity, specificity, and misclassification (test error) rate were calculated for each model (L R and RF) at two different cut points, top 10% and top 20% of predicted probability. Table 3. 4. Comparison of the model performance for prediction of 1 -year mortality, logistic regression and random forest models (validation N= 3723) Model AUC validation N analyzed High -risk group sensitivity specificity PVP PVN Test error rate & Logistic regression model 0.7634 (0.74 - 0.79) 2312* Death=485 (21%) Top 10% 25.1% 94.0% 52.8% 82.6% 20.4% Top 20% 44.5% 86.5% 46.7% 85.5% 22.3% Random forest model 0.8292 (0.82 - 0.84 ) 3723 Death=1241 (33.3%) Top 10% 24.9% 97.4% 82.8% 72.2% 26.8% Top 20% 46.7% 93.3% 77.7% 77.8% 22.2% PVP: predictive value positive; PVN: predictive value negative; *The number of observations that were analyzed in the logistic regression is less than total due to missing data &Test error rate or misclassification rate calculated as the number of misclassified predictions divided by total observations 109 The AUC of the prediction model increased by 9% from the LR to the RF model (Table 3.4). Figure 3.7 shows the two ROC curves from RF and logistic regression models. Comparison of the 95% confidence intervals (CI) of the AUC between the two models demonstrated a better precision (i.e., narrower CI) of the AUC for th e random forest model than logistic regression. Sensitivity of the two models was similar whereas specificity of the random forest was better than LR. Thus both models had similar ability to identify patients that died but RF model had better identificatio n of patients who lived. Comparison of predictive values is complicated by the higher prevalence of death in the RF population which will increase PVP and decrease PVN. So despite similar sensitivity values, the RF model had lower PVN because of the highe r mortality rate. The higher specificity in the RF model will results in a higher PVP but the difference in PVP between the LR and RF models are much greater because of the combination of lower false positive rates and higher prevalence. In summary, the RF results in substantially better PVP but slightly lower PVN compared to the LR model. But neither model has sufficient PVP or PVN to rule in or rule out mortality with confidence. Misclassification rate in the RF was higher than LR (27% vs.20%) when the 10% threshold was used, whereas, with the 20% threshold, both models had similar misclassification rates (22%). A possible explanation is that the population analyzed in the LR model is smaller than those analyzed in the RF model because patients with part ly missing data were excluded in the LR model. These excluded patients had higher rate of mortality (Table 3.4). Again the selection of the threshold is dependent on different factors, including the cost of the interventions and services for different risk groups and the resources that the company can allocate to them. 110 Figure 3. 6. Comparison of ROCs between the two models - RF and LR, logistic regression (N=2312) and random forest model (N=3723) As shown in Table 3.4, the total number of observations in the validation data is different between the two models due to missing observations. In the logistic regression procedure, observations with missing on any of the variables are excluded, while rand om forest can handle observations with missing in explanatory variables. Therefore there are 1411 fewer observations in logistic regression than the random forest analysis. For a better understanding of the two models, the random forest was also applied to the same 2312 patients in the logistic regression model, and the difference between AUCs was tested using ROC Contrast statement. Figure 3.8 shows the small difference between the two AUCs when the same observations are used in the generation of the ROCs (AUC=0.77 for random forest vs. 0.76 for logistic regression). The chi -square test showed that the difference in AUC between the two models was not statistically significant (Chi -square=0.563 and p -value=0.45). This finding suggests that the gain in random forest's AUC compared to 111 the logistic regression model is mainly due to the inclusion of all patients. In fact, including the observations with partly missing data causes the increase in AUC of the random forest model in this data. Nevertheless, it is an essential advantage of the random forest algorithm that all observations with and without missing data can be included in the analysis. Figure 3. 7. Comparison of ROCs between the logistic regression and random forest models w hen using the same validation cohort in both models (N=2312) Table 3. 5. ROC and 95% confidence intervals from the RF and LR models (N=2312) Model Mann -Whitney Area Standard Error 95% Wald Confidence Limits Random Forest 0.7709 0.0122 0.7469 0.7948 Logistic Regression 0.7634 0.0114 0.7410 0.7859 112 Table 3. 6. ROC contrast between the two models, RF and LR ROC Contrast Test Results Contrast DF Chi -Square Pr > ChiSq Reference = Logistic Regression 1 0.5632 0.4530 In another attempt to confirm the fact that the main advantage of RF model in this data is due to inclusion of missing data, the missing observations were replaced and included in the analysis as a new category. For example a binary variable with levels 0=No and 1=Yes, now has another level 2= missing. This way no any observations is excluded from the analysis. The LR model was applied to this population and the AUC was much higher (N=3723, AUC=0.8379) than the LR model in the available data (N=2312, AUC=0.7634). This sensitivity analysis confirms again that in this cohort missing data carry valuable information in prediction of adverse outcomes. - Applyi ng the RF model to imputed data Additionally, the RF model was applied to th e imputed dataset. The imputed data was the same data as was used in chapter two Œ logistic regression. Multiple imputation was used to generate the imputed data with 20 imputations. RF model was applied to all 20 datasets, and predictions were generated fo r all observations. An average of the predicted probabilities for each observation was calculated across the 20 imputations. Then the ROC for the validation cohort was generated (Figure 3.9). The AUC for the RF model in the imputed data was slightly lower than the LR model in imputed data (AUC=0.7605 vs. 0.7756) and was notably lower than AUC of the RF model in available data (0.8292). These results indicate that observations with partially missing data are informative, and they cannot be excluded from the analysis. Also, imputation of these missing observations did not result in better discrimination in the RF or LR models. The most probable reason is that the mechanism of missing in this data is not random. The results of modeling in the imputed data confi rms again that missingness on the predictor variables per se is important in the prediction of the outcomes. 113 Figure 3. 8. ROC from the random forest model applied to the imputed validation data (average of 20 predictions for eac h individual was generated from 20 imputed dataset) The model™s ROC is compared to the null model (red line) Table 3. 7. AUC and the 95% confidence intervals from the RF model in the imputed data ROC Association Statistics ROC Model Mann -Whitney Area Standard Error 95% Wald Confidence Limits Model 0.7605 0.00822 0.7444 0.7766 - Model™s goodness -of-fit To evaluate the model™s goodness of fit, calibration plots and Hosmer -Lemeshow test were generated. Similar to the logistic regression chapter, two methods were used to generate the calibration plots - loess based and decile based. As observed in Figures 3. 10 and 3.11, the model fit is the best in the lower 114 predicted probabilities. The deviation of the perfect fit model begins at the predicted probabilities of 0.50 where the model underestimate the mortality risk. Figure 3. 9. Loess -based calibration plot fo r RF model - mortality outcome - validation cohort (N=3723) The vertical axis is the observed outcome, and the horizontal axis is the predicted probability Figure 3. 10. Decile based calibration p lot f or RF model - mortality outcome -validation cohort (N=3723) The vertical axis is the observed outcome, and the horizontal axis is the predicted probability 115 Hosmer -Lemeshow test for the random forest model was performed using the observed and predicted outcomes in the validation data set and resulted in a test statistic= 54.32 and p -value <0.0001 which means the model™s lack of fit in this data cannot be rejected. Again it is consistent with the calibration plots, which shows loss of fit at probabilitie s higher than 0.50. o Outc ome: one -year hospice admission To develop a random forest model for prediction of 1 -year hospice admission and to evaluate the importance of predictors, the same methods that were applied for the mortality outcome, also were used here. - Random forest development To find the optimal parameters for the random forest in the prediction of hospice admission, different numbers (1, 10, 50, 100, and 200) and depths (2, 10, 20, and 30) of trees were tested, and the AUC of the models were provided in Table 3.8. Developed RF was applied to both derivation and validation datasets. 116 Table 3. 8. AUC from random forest model in derivation and validation data sets using different depth and number of trees - hospice outcome AUC Max -Depth=2 Max-Depth=10 Max -Depth=20 Max -Depth=30 Derivation N-Trees=1 0.5907 (0.56 - 0.62) 0.7884 (0.76 - 0.81) 0.8096 (0.78 - 0.83) 0.8095 (0.78 - 0.83) N-Trees=10 0.7078 (0.68 - 0.73) 0.9503 (0.94 - 0.96) 0.9909 (0.98 - 0.99) 0.9913 (0.98 - 0.99) N-Trees=50 0.7431 (0.72 - 0.77) 0.9866 (0.98 - 0.99) 0.9999 (0.99 - 1.0) 0.9999 (0.99 - 1.0) N-Trees=100 0.7500 (0.73 - 0.77) 0.9893 (0.98 - 0.99) 1.00 (0.99 - 1.00) 1.00 (0.99 - 1.00) N-Trees=200 0.7503 (0.73 0.77) 0.9907 (0.98 - 0.99) 1.00 (1.00 - 1.00) 1.00 (1.00 - 1.00) Validation N-Trees=1 0.5415 (0.51 - 0.57) 0.5648 (0.54 - 0.59) 0.5392 (0.51 - 0.57) 0.5394 (0.51 - 0.57) N-Trees=10 0.6641 (0.64 - 0.69) 0.6556 (0.63 - 0.68) 0.6371 (0.61 - 0.67) 0.6378 (0.61 - 0.67) N-Trees=50 0.6953 (0.67 - 0.72) 0.6885 (0.66 - 0.71) 0.6727 (0.65 - 0.70) 0.6721 (0.64 - 0.70) N-Trees=100 0.6997 (0.68 - 0.72) 0.6986 (0.67 - 0.72) 0.6919 (0.67 - 0.72) 0.6915 (0.66 - 0.72) N-Trees=200 0.7022 (0.68 - 0.73) 0.7028 (0.68 - 0.73) 0.6973 (0.67 - 0.72) 0.6971 (0.67 - 0.72) AUC: area under the ROC curve; Max -Depth: maximum depth of decision trees in the random forest model; N -Trees: number of decision trees in the random forest model; The validation AUC of about 70% in the model with N -trees=200 and depth 10 indicates a moderate to good discrimination a bility. Compared to the validation AUC from the logistic regression model, random forest shows slightly lower AUC for the hospice admission outcome (AUC=0.7251 for logistic regression vs. 7028 for random forest). Figure 3.12 shows the effect of increasing the number of trees on model performance. Average squares error - a fit statistics - was plotted against the different numbers of trees for the full data and out -of -bag observations. The conclusion from Table 3.8 and Figure 3.12 is that the number of trees e qual to 200 is more than enough for building the forest in this data. Also, the depth of 10 can be enough, although by increasing the depth to 30, there is no evidence of overfitting the model to the derivation data, i.e., the 117 AUC in the validation data di d not decrease meaningfully. Thus the parameters for construction of the RF were selected as number of trees=200 and depth=30. Figure 3. 11. The average squared error of the RF model by the number of trees - Hospice outcome - Variable importance To evaluate the importance of predictors for 1 -year hospice admission, the same methods that were applied for the mortality outcome, also were used here. Table 3. 9. The first ten ranked important variables in the random forest model - Hospice outcome Ranked Importance RBA - Validation data RBA - Derivation data Loss reduction - Derivation data 1 Surprise question Age KPS (category) 2 Age Albumin result Surprise question 3 KPS (category) Cholesterol result CCW -Hip fracture 4 Number of lab tests Number of Medications Age CCW -Cataract 5 Albumin result Diagnosis -count CCW -endometrial Ca 6 Living alone KPS (category) CCW -Lung Ca 7 Race Number of lab tests CCW -Asthma 8 Dual eligible Surprise question Dual eligible 9 TUG Answer TUG Answer CCW -Prostate Ca 10 Cholesterol result CCW -Hyperlipidemia CCW -Glaucoma 118 Since RBA is the most commonly recommended method for evaluation of the variable importance in the RF model, the predictors selected in this method are compared to the LR model. Interestingly SQ is the first ranked important variable in the prediction of h ospice admission in this model. It is consistent with the literature on the importance of SQ (100) and with the finding s in chapter two. Also, age and KPS functional indicator are the next ranked predictors. Living alone, worse functional status, and poor nutritional condition (indicated by albumin and cholesterol levels) are all variables that indicate the possible need f or hospice care. The main difference in predictors of the mortality and hospice admission is that hospice admission must be ordered by a physician to be eligible for insurance reimbursement, Medicare in this population. Therefore factors that point to pati ents™ short life expectancy and their inability to live at home are flags for physicians and subsequently the predictors of hospice order. It is noteworthy that in Medicare criteria for hospice eligibility, the first requirement is a physician confirms the patient™s life expectancy of less than 6 months, but then for a patient to be covered for hospice services, they have to give up traditional Medicare coverage (e.g., curative treatment for cancer patients). (35,139) It is notable that loss reduction method ranked predictors differently from the RBA method. Since the first 10 ranked variables in RBQ methods are more similar to the variables in the logistic regression model for hospice, this suggest that the RBA method was better in identific ation of the important predictors of hospice. - Comparison to the logistic regression The logistic regression model that was developed for the hospice admission in chapter 2 included 7 variables. The results of the model are presented in Table 3.10. O dds r atios and Wald test p -values were used to evaluate the importance of predictors in the logistic regression model. Similar to the LR model for mortality, ADL -decline was the most influential predictor of 1 -year hospice admission, followed by age and KPS lev el. Five of the seven variables (surprise question, age, KPS, Race, and dual eligibility) in 119 the final logistic model are among the 10 first important variables ranked in the RF model by RBA method in the validation data. Surprisingly the ADL -decline was n ot among the first 10 variables in RF. It might be explained by the fact that ADL -decline was missing in 16% of observations and the missingness on ADL was not associated with the hospice admission (as shown in the second chapter, Table 2.6). Therefore inc luding all of the observations in RF model caused ADL -decline importance to be ranked much lower (number 11), whereas it ranked the first according to the odds ratio and p -value from the LR model. Age and KPS are ranked second and third in both RF and LR m odels. Overall the importance of predictors of hospice admission in both LR and RF models were consistent. Table 3. 10. The variable importance in the logistic regression model, (by estimates and significance) - Hospice outcome Odds Ratio Estimates Parameter estimates effect Point Estimate 95% Wald Confidence Limits Parameter estimate P-value ADL -decline, Decline vs. No change 1.034 0.735 1.454 0.0335 0.8473 ADL -decline, Improve vs. No change 0.086 0.012 0.628 -2.4477 0.0155 Age, 75 -84 years vs. 65 -74 years 2.392 1.386 4.127 0.8720 0.0017 Age, 85 -94 years vs. 65 -74 years 3.345 1.986 5.633 1.2073 <.0001 Age, 95+ years vs. 65 -74 years 3.870 2.055 7.286 1.3531 <.0001 KPS, Severe vs. moderate disability 3.125 2.239 4.361 1.1393 <.0001 Surprise question, No vs. Yes 2.131 1.547 2.934 0.7566 <.0001 Dual -eligibility, No vs. Yes 2.023 1.336 3.064 0.7045 0.0009 Race, Black vs. White 0.654 0.426 1.004 -0.4242 0.0522 Race, Other vs. White 0.547 0.213 1.404 -0.6035 0.2095 Sex, Male vs. Female 1.165 0.866 1.567 0.1525 0.3137* *Sex was included in the final logistic model, although the Wald test for its coefficient was not statistically significant A correlation plot was generated to show the correlation of the predicted probabilities of outcome between the two models (Figure 3.13). The correlation coefficient was 0.7424 and p -value was <0.0001, using the total of 2590 observations that were analyzed in both LR and RF. The correlation was strong and positive; however, similar to the mortality models, the best correlation is seen where the predicted 120 probabilities are low in both models. The two models (Rf and LR) showed higher correlation in prediction s for hospice admission compared to mortality (correlation 0.74 vs. 0.65). Figure 3. 12. Correlation of the predicted probability of hospice admission betwe en the two models, logistic regression and random forest, (N=2590) LR predictions are on the vertical axis, and RF predictions are on the horizontal axis To compare the two prediction models, their discrimination ability was compared by generating AUC for each model. Sensitivity, specificity, predictive values, and misclas sification rates were also calculated with two thresholds (top 10% and 20%), and the results are provided in Table 3.11. 121 Table 3. 11. Comparison of the model performance for prediction of 1 -year hospice admission, logistic regression and random forest models (validation cohort) Model AUC validation N analyzed High -risk group sensitivity specificity PVP PVN Test error rate & Logistic regression model 0.72 (0.70 - 0.75) 2590* Hospice=266 (10.3%) Top 10% 30.1% 90.5% 26.5% 91.9% 15.8% Top 20% 34.6% 87.2% 23.7% 92.1% 23.3% Random forest model 0.70 (0.67 - 0.72 ) 3723 Hospice=384 (10.3%) Top 10% 25.3% 91.8% 26.1% 91.4% 15.1 % Top 20% 41.2% 82.5% 21.2% 92.4% 21.8% PVP: predictive value positive; PVN: predictive value negative; *The number of observations that were analyzed in the logistic regression is less than total due to missing data &Test error rate or misclassification rate calculated as the number of misclassified predictions divided by total observations Unli ke the analysis of mortality outcome, the RF model did not have better discrimination in the prediction of hospice admission compared to the logistic regression model. Although the difference is small, at the 20% threshold, RF has higher sensitivity, also lower sensitivity and misclassification rate than the LR model. Predictive values were roughly similar between the two models. There is no significant superiority between the two models in prediction of the hospice outcome. Both models are good in the clas sification of the low -risk groups, but poor in identifying the high -risk patients. The most practical point for clinicians is related to the predictive values. Looking at the PVP and PVN, one can say when the models classify a patient as low risk, there is more than 90% probability that the patient will not go to hospice in the next year. Whereas among the patients who are classified as high risk for hospice, about 25% will actually go to hospice. So the models have utility in ruling out the need for hospic e care (with 90% certainty) but are not useful in identifying patients that will be referred to hospice (ruling in). Figure 3.14 displays the two ROCs from the LR and RF models in prediction of hospice admission. The two ROCs line up closely, although the AUC of the LR model is slightly larger than the RF model (0.7251 122 vs. 0.6971), and confidence intervals implies no significant difference between the two ROCs (Table 3.12). Figure 3. 13. Comparison of ROCs between the two model s- Hospice outcome, logistic regression (N=2590) and random forest model (N=3723) The two ROCs in Figure 3.14 were generated from different number of observations in the validation data, i.e., 2590 observations were analyzed in the LR model, and 3723 in RF model. To assess whether the application of the models to the same number of observations can make any difference in the model discrimination, both models were applied to the 2590 observations that were included in the LR analysis. Figure 3.15 shows the ROCs from the two models in the same population. The two AUCs are very close (Table 3.12) and the statistical test for ROC contrast is non -significant for the difference between the two model AUCs with a P-value of 0.43 . 123 Figure 3. 14. Comparison of ROCs between the two models, logistic regression and random forest when using the same validation cohort in both models (N=2590) Table 3. 12. AUC and 95% confidence intervals from the two models, LR and RF applied to the same population (N=2590) - Hospice outcome ROC Association Statistics ROC Model Mann -Whitney Area Standard Error 95% Wald Confidence Limits Logistic Regression 0.7251 0.0151 0.6955 0.7547 Random Forest 0.7345 0.0154 0.7044 0.7646 124 - Applying the RF model to imputed data The RF model that was developed in the available data was also applied to the imputed dataset. The imputed data was the same data as was used in logistic regression. Using the same methods that were described for the mortality outcome in the imputed data, a ROC was generated for the RF model in the imputed validation data (Figure 3.16 and Table 3.15). The AUC was similar to the one from the RF model in available data (AUC=0.6936 in imputed data v s. 0.6971 in available data). Indeed, imputation of the missing data did not improve discrimination in the RF model compared to when missing observations were included as missing. As discussed in chapter 2, imputation did not improve the discrimination in the logistic regression model either. It confirms that the missing observations on the predictors in these models were not associated with the hospice outcome (as was already shown in chapter 2). Unlike the mortality outcome, missingness on predictors did not predict the hospice outcome. Thus, although the random forest model allows the inclusion of all observations in the analysis, it did not improve the model discrimination in the prediction of hospice outcome in this data, mostly because missing observa tions were not predictive of outcome. 125 Figure 3. 15. ROC from the random forest model applied to the imputed validation data - Hospice outcome (N=3723) Table 3. 13. AUC and 95% confidence intervals from the RF model applied to the imputed data (20 replications) - Hospice outcome ROC Association Statistics ROC Model Mann -Whitney Somers' D Gamma Tau -a Area Standard Error 95% Wald Confidence Limits Model 0.6936 0.0133 0.6675 0.7197 0.3872 0.3872 0.0717 Additionally, to confirm that using the RF model that was developed in the available data (with missing data included as a legitimate category) in the imputed data, a random forest model was developed in the imputed derivation data and applied to the imputed validation data. The AUC of this model was almost the same as the AUC from the previous model in the same data (AUC=0.6939 in the imputed data vs. AUC=0.6971 in the available data). 126 - Model™s goodness -of-fit Calibration plots and Hosmer -Lemeshow test were generated for the RF model. Similar to the logistic regression chapter, two methods were used to generate the calibration plots - loess based (Figure 3.17) and decile based (Figure 3.18). Although the overall fit does not seem good, the model fit s better in the lower predicted probabilities than higher probabilities, like what was observed in the mortality model. In other words, models under predict the mortality risk at higher risk levels. Figure 3. 16. Loess -based ca libration plot for the RF model - Hospice outcome The vertical axis is the observed outcome and the horizontal is the predicted probability 127 Figure 3. 17. Decile based calibration plot for the RF model - Hospice outcome The vertical axis is the observed outcome and the horizontal axis is the predicted probability Hosmer -Lemeshow goodness of fit test was also done and resulted in a test statistic=10.17 and p -value= 0.2532 which means there is no statistical evidence for lack of fit in this model. Although as discussed before, the interpretation of this test is limited because it is dependent to the number of groups that one selects for dividing the population. Discussion In the validation data the random forest mortality model had a much higher AUC compared to the logistic regression mortality model ; the model™s AUC was 9% higher compared to the logistic regression model (Table 3.4). The better discrimination ability of the RF in this data could be explained by the presen ce of underlying complex interactions and non -linear relationship between predictors and outcomes in this data, or by the inclusion of about 30% of observations that were excluded from the LR 128 model because of missing data. Our sensitivity analysis showed t hat almost all the improvement in the performance of RF compared to the LR model was due to the latter reason, i.e., inclusion of the observations with partly missing data. Misclassification rate in the RF model was higher than LR when using the top 10% of predicted probabilities as high -risk group (27% vs. 20%), however, at top 20% cut point, the misclassification rates were the same (22%) for both models. These results implies that RF estimation of the mortality risk in patients with the highest -risk (i.e ., 10% P -hat) was not as good as the LR estimation. This fact was confirmed in the correlation plot as well. The random forest model shows better sensitivity and specificity than the logistic regression model except for the sensitivity at the 10% threshold in which sensitivity of both models are almost equal. In chapter two, I evaluated the performance of the logistic regression model and demonstrated its superiority compared to the alternative clinical risk prediction models currently used in this populati on. The random forest model for the mortality outcome is performing even better than the logistic regression model, although when the high threshold is selected for classification (i.e., top 10% of predicted probabilities), it has worse misclassification r ate. In other words, to take advantage of the better discrimination, sensitivity, and specificity of the RF model without adding misclassification rate, the threshold of top 20% of predictions should be used for classification in this cohort. Predictive va lue positive of the RF model is substantially higher in RF than in LR, which indicates in this population with the mortality rate of 32%, about 80% of those who were identified by the RF as high risk, actually died within the next year. Whereas the PVP for LR model is about 50% (for both cutoff points). The key point from the performance measures is that both models (RF and LR) have PVPs higher than PVN which implies that the negative results of the model (i.e., predicted low risk by the model) are more rel iable than the positive results (i.e., predicted high risk by the model). So the models have utility in ruling out the need for hospice care (with 90% certainty) but are not useful in identifying patients that will be referred to hospice (ruling in). 129 Unli ke the mortality outcome, RF model for the hospice outcome did not show any improvement in the model's discrimination. In fact, the LR model shows slightly better AUC. Other performance measures including sensitivity, specificity, and predictive values wer e similar between the two models. Again, there is no evidence of superiority of the RF model in prediction of the hospice outcome in this data. Although RF has many advantages especially when the data has complex interactions and non -linear relationship be tween the variables, in this study the main benefit of the RF model was due to its ability to include all observations including those with missing data on some predictors. We confirmed the gain in the AUC of the RF model for mortality, was mostly due to i ncluding of missing data, because when the RF applied to the complete case data (i.e., the same population as was used in LR model), the improvement in the AUC vanished; both LR and RF model had similar AUCs when applied to the complete case data. It is no table that almost one third of the observations were excluded from the LR because of missing data, and as observed in chapter two, missingness on the predictors were associated with the mortality outcome but not with the hospice admission Œ thus for mortal ity outcome we have evidence that the data is MNAR. The association between missing data on predictors (e.g., TUG, ADL -decline) and the outcome can be explained by survival bias. For instance a terminally ill patient is more likely to die before the doctor can complete the medical records. Also, in case of a very sick patient, the doctor may decide not to perform and record a given test or procedures particularly those that requires a patient™s ability to move and cooperate (such as TUG). The association b etween missingness on the seven key variables and the two outcomes in chapter two, revealed that there was trivial to no association between missingness and hospice admission, whereas a consistent and strong association with mortality was seen for all 7 pr edictors. These findings are in agreement with the results of RF model, where a substantial improvement in AUC was observed for mortality outcome, but not for hospice admission. Since missingness was not associated with the 130 hospice admission, inclusion of the missing data in the RF analysis, did not improve the AUC of the RF compared to the LR model for the hospice outcome. Interestingly, when the RF model was applied to the imputed data, the AUCs for both outcomes were similar to the values from LR model ( and notably lower than the RF in the available data). These observations can again be explained by the fact that missingness in this data is related to the mortality outcome, so including observations with missing variables (as in random forest) results in better performance when compared to effect of imputing the missing data. As a conclusion, the RF model had a better discrimination than logistic regression in this population for the mortality. In fact, taking account of the observations with partly missi ng data is the most important advantage of the RF model in this data. It seems that in this data complex interactions and non -linearity of the relationship among predictors and outcome is not a problem, because exclusion of the missing observations from th e cohort resulted in the similar AUC for both RF and LR models. The developed RF model may be incorporated into the routine data management programs in healthcare systems to facilitate identification of patients with different needs based on their risk le vels and to support the provider's decision making. The developed random forest algorithm can be programmed into the USMM data system and the results would be embedded in the EMR to make a predicted probability for each new observation in the dataset . Then according to a predetermined cutoff value, different risk level patients are flagged for further attention. The model can be updated when further research result s in a better model (e.g., having more complete data can result in including more predictors in the model development and it can change the final model, the changes then can be programmed into the database). o Strength s The first and most important strength of the random forest analysis was inclusion of all the observation in the analysis (i.e., ob servations with missing values on the predictor variables were allowed in the 131 random forest analysis whereas they were excluded by default in logistic regression model). We compared the RF and LR model and showed that the performance of both models are the same when applied to the same cohort. Another strength is our use of MI methods to account for missing data. We compared the results of RF models in imputed data and in the data with missing observations. o Limitation s The most important limitation of this data was missing information. We excluded predictors with more than 20% missing values in the model development, so important variables such as hospitalization in the past year, or weight loss were excluded. There were still 9 variables with <20% missing included in the model development and which resulted in the subsequent exclusion of observations with partly missing data from the logistic regression model, however random forest overcame the problem of exclusion of missing observations. The strong associ ation of missing values and mortality, in addition to disappearing of the AUC gain in the imputed data indicates the MNAR mechanism for missing data. Another limitation is that the RF output does not provide a tangible single model with familiar parameters that can be applied manually to any new observation. The algorithm saves hundreds of decision trees in the developed RF model and put any new data into all of the trees to make prediction and average them. Conclusion The use of machine learning techniq ues can improve the discrimination of predictive models compared to standard parametric regression models. In this study, random forest model demonstrated a better accuracy (i.e., higher AUC and lower test error rate) than the logistic regression model for mortality but not for hospice admission. The gain in the RF model discrimination in this data is mostly due to the inclusion of observations with partly missing data and the fact that missingness in this data was related to the mortality outcome (MNAR). F urther analysis is needed to evaluate the performance of this model in an external database. Use of more complex machine learning methods such as an ensemble of 132 different algorithms (e.g., random forest, support vector machine, neural networks) could impr ove the risk stratification performance of models even more. 133 APPENDIX 134 APPENDIX Figure 3A. 1. Correlation between predicted probability in LR and random forest Table 3A. 1. Ranked importance of predictor variables in the random forest model, RBA method - mortality outcome Random Branch Assignments Variable Importance Variable Margin MSE TUG Answer 0.02621 0.00725 Albumin Result 0.02361 0.00688 Race 0.01657 0.00491 ADL decline 0.01471 0.00319 Cholesterol Result 0.00973 0.00177 KPS value 0.00809 0.00344 Surprise question 0.00769 0.00060 Age 0.00542 0.00209 CCW -Hyperlipidemia 0.00518 0.00044 Diagnosis count 0.00509 0.00010 Tobacco Use 0.00437 0.00073 Lives alone 0.00435 0.00061 135 Table 3A. 1. ( cont™d) CCW -Rheumatoid Arthritis, Osteoarthritis 0.00371 0.00132 Dual eligible 0.00366 0.00143 Pressure ulcer 0.00338 0.00066 Number of lab orders 0.00302 0.00050 CCW -Chronic Kidney Disease 0.00302 0.00085 CCW -Anemia 0.00289 0.00106 CCW -Depression 0.00281 0.00110 Number of Medications 0.00278 0.00078 CCW -Atrial Fibrillation 0.00264 0.00045 Sex 0.00264 0.00060 Cancer 0.00253 0.00058 CCW -Heart Failure 0.00220 0.00039 CCW -Hypertension 0.00216 0.00027 CCW -Ischemic heart disease 0.00215 0.00049 CCW -COPD 0.00210 0.00045 CCW -Diabetes 0.00201 0.00046 CCW -Benign prostatic hyperplasia 0.00197 0.00054 CCW -Asthma 0.00190 0.00028 CCW -Colorectal Cancer 0.00187 0.00046 CCW -Osteoporosis 0.00183 0.00021 CCW -Cataract 0.00181 0.00038 CCW -Hip/pelvic fracture 0.00180 0.00042 CCW -Breast cancer 0.00178 0.00037 CCW -Glaucoma 0.00174 0.00033 CCW -Stroke/TIA 0.00172 0.00018 CCW -Lung cancer 0.00165 0.00018 CCW -Prostate cancer 0.00165 0.00032 136 Table 3 A. 2 shows the ranked importance based on the loss reduction method. In th is table, the number of rules present s the times that each variable has been used in the forest in the decision nodes. Table 3A. 2. Ran ked importance of the explanatory variables in the random forest model, the loss reduction method - Mortality outcome Loss Reduction Variable Importance Variable Number of Rules Gini OOB Gini Margin OOB Margin TUG Answer 1727 0.030819 0.02574 0.061639 0.05698 Race 1061 0.017981 0.01423 0.035962 0.03208 ADL decline 858 0.014258 0.01134 0.028517 0.02581 Surprise question 894 0.008971 0.00569 0.017942 0.01468 CCW -Hyperlipidemia 1471 0.006207 0.00241 0.012414 0.00834 Lives alone 586 0.004102 0.00212 0.008204 0.00618 Tobacco use 592 0.003850 0.00200 0.007699 0.00582 Albumin Result 8229 0.044973 0.00063 0.089947 0.04620 CCW -Cataract 60 0.000158 0.00000 0.000316 0.00012 Alzheimer 0 0.000000 0.00000 0.000000 0.00000 MI 0 0.000000 0.00000 0.000000 0.00000 CCW -Endometrial cancer 0 0.000000 0.00000 0.000000 0.00000 CCW -Lung cancer 11 0.000045 -0.00003 0.000090 0.00000 CCW -Hip/pelvic fracture 19 0.000027 -0.00004 0.000055 -0.00002 CCW -Colorectal cancer 37 0.000163 -0.00006 0.000325 0.00010 CCW -Breast cancer 72 0.000133 -0.00008 0.000266 0.00006 CCW -Prostate cancer 47 0.000134 -0.00012 0.000269 -0.00003 CCW -Asthma 135 0.000253 -0.00016 0.000507 0.00005 CCW -Glaucoma 166 0.000346 -0.00021 0.000692 0.00013 CCW -Benign prostatic hyperplasia 225 0.000470 -0.00035 0.000940 0.00005 Cancer 297 0.000573 -0.00043 0.001146 0.00003 Dual eligible 1213 0.002908 -0.00046 0.005815 0.00225 CCW -Osteoporosis 557 0.000908 -0.00051 0.001817 0.00027 137 Table 3A. 2. (cont™d) CCW -Depression 923 0.001987 -0.00069 0.003974 0.00124 CCW -Stroke/TIA 469 0.000692 -0.00069 0.001385 -0.00006 CCW -Atrial fibrillation 658 0.001462 -0.00089 0.002923 0.00057 Pressure ulcer 570 0.001271 -0.00095 0.002543 0.00043 CCW -Ischemic heart disease 761 0.001176 -0.00109 0.002351 0.00002 CCW -Hypertension 1094 0.001912 -0.00129 0.003824 0.00076 CCW -Chronic kidney disease 1375 0.002707 -0.00140 0.005414 0.00129 CCW -Rheumatoid arthritis/Osteoarthritis 1760 0.003110 -0.00140 0.006220 0.00163 CCW -Anemia 1123 0.002231 -0.00146 0.004461 0.00056 Sex 1260 0.001983 -0.00159 0.003966 0.00050 CCW -COPD 1172 0.001756 -0.00162 0.003512 0.00028 CCW -Acquired hypothyroidism 1001 0.001578 -0.00165 0.003156 -0.00006 CCW -Diabetes 1303 0.001823 -0.00192 0.003645 -0.00020 CCW -Heart failure 1435 0.002131 -0.00206 0.004263 -0.00002 KPS value 6385 0.017562 -0.00672 0.035123 0.01103 Number of lab orders 6468 0.015026 -0.01314 0.030051 0.00174 Diagnosis count 9035 0.021521 -0.01899 0.043041 0.00340 Age 8496 0.025198 -0.02146 0.050396 0.00374 Cholesterol result 9707 0.036575 -0.02203 0.073149 0.01478 Number of Medications 9857 0.024768 -0.02569 0.049536 -0.00013 138 Figure 3A. 2. Correlation between predicted probability in LR and random forest - Hospice admission Table 3A. 3. Ranked importance of predictor variables in the random forest model, RBA method - Hospice outcome Random Branch Assignments Variable Importance Variable Margin MSE sq_n 0.00557 0.00232 AgeAtVisit 0.00585 0.00180 KPS_Cat 0.00215 0.00162 NumLabOrders 0.00026 0.00111 Albumin_Result 0.00113 0.00055 Lives_Alone 0.00023 0.00053 race_n 0.00044 0.00051 dual_eligible 0.00042 0.00048 TUG_Answer 0.00211 0.00025 Chol_Result -0.00037 0.00024 139 Table 3A. 3. (cont™d) adl_decline 0.00084 0.00023 CCW_Hyperlipidemia 0.00235 0.00016 CCW_Stroke_TIA 0.00012 0.00014 CCW_Depression 0.00097 0.00011 CCW_Ischemic_Heart_Disease 0.00006 0.00009 CCW_Diabetes 0.00106 0.00006 CCW_Asthma -0.00001 0.00005 CCW_Osteoporosis -0.00007 0.00005 Number of meds 0.00010 0.00004 tobacco use 0.00021 0.00004 CCW_Prostate_Cancer -0.00004 0.00003 CCW_Lung_Cancer 0.00004 0.00002 CCW_Acquired_Hypothyroidism 0.00026 0.00001 CCW_Hip_Pelvic_Fracture -0.00003 0.00001 CCW_Hypertension 0.00165 0.00001 CCW_Cataract -0.00003 0.00000 CCW_Breast_Cancer 0.00005 -0.00000 CCW_Endometrial_Cancer -0.00000 -0.00001 CCW_Rheumatoid_Arthritis_Osteoar 0.00011 -0.00002 CCW_Glaucoma 0.00001 -0.00003 CCW_Colorectal_Cancer -0.00005 -0.00004 Cancer -0.00025 -0.00004 CCW_Anemia 0.00017 -0.00007 CCW_COPD 0.00040 -0.00007 CCW_Atrial_Fibrillation -0.00024 -0.00009 CCW_Benign_Prostatic_Hyperplasia -0.00010 -0.00011 CCW_Chronic_Kidney_Disease 0.00024 -0.00012 sex 0.00032 -0.00014 CCW_Heart_Failure 0.00042 -0.00017 140 Table 3A. 3. (cont™d) Pressure_Ulcer -0.00009 -0.00017 DX_count 0.00043 -0.00054 Table 3A. 4. Ranked importance of predictor variables in the random forest model, Loss reduction method - Hospice outcome Loss Reduction Variable Importance Variable Number of Rules Gini OOB Gini Margin OOB Margin KPS_Cat 741 0.002112 0.00109 0.004223 0.00317 sq_n 771 0.002094 0.00024 0.004189 0.00273 CCW_Hip_Pelvic_Fracture 40 0.000039 0.00001 0.000077 0.00002 CCW_Cataract 37 0.000029 0.00001 0.000059 0.00002 CCW_Endometrial_Cancer 2 0.000002 -0.00000 0.000004 -0.00000 CCW_Lung_Cancer 17 0.000027 -0.00002 0.000054 0.00001 CCW_Breast_Cancer 63 0.000067 -0.00006 0.000135 0.00002 CCW_Colorectal_Cancer 33 0.000042 -0.00006 0.000084 -0.00002 CCW_Asthma 77 0.000074 -0.00007 0.000148 -0.00006 dual_eligible 701 0.001024 -0.00007 0.002048 0.00084 CCW_Prostate_Cancer 69 0.000128 -0.00011 0.000256 0.00004 CCW_Glaucoma 103 0.000105 -0.00011 0.000209 0.00001 CCW_Osteoporosis 320 0.000298 -0.00022 0.000595 -0.00004 CCW_Stroke_TIA 321 0.000344 -0.00029 0.000689 0.00006 CCW_Benign_Prostatic_Hyperplasia 225 0.000291 -0.00031 0.000582 -0.00003 TobaccoUse 377 0.000413 -0.00033 0.000826 0.00017 Cancer 241 0.000303 -0.00034 0.000606 -0.00009 Lives_Alone 528 0.000623 -0.00035 0.001247 0.00020 race_n 691 0.001134 -0.00054 0.002269 0.00047 Pressure_Ulcer 548 0.000629 -0.00060 0.001258 0.00004 CCW_Hypertension 744 0.000853 -0.00062 0.001705 0.00021 141 Table 3A. 4. (cont™d) CCW_Ischemic_Heart_Disease 530 0.000639 -0.00063 0.001278 -0.00000 CCW_Depression 701 0.001008 -0.00065 0.002016 0.00035 CCW_Diabetes 837 0.000844 -0.00070 0.001688 0.00002 CCW_Atrial_Fibrillation 643 0.000818 -0.00073 0.001636 0.00014 adl_decline 728 0.001109 -0.00079 0.002219 0.00002 CCW_Acquired_Hypothyroidism 769 0.000868 -0.00083 0.001736 0.00015 sex 964 0.000955 -0.00085 0.001911 0.00000 CCW_COPD 789 0.000795 -0.00085 0.001590 -0.00021 CCW_Anemia 796 0.000856 -0.00090 0.001712 -0.00005 CCW_Chronic_Kidney_Disease 983 0.000898 -0.00093 0.001797 -0.00004 CCW_Hyperlipidemia 1163 0.001262 -0.00095 0.002523 0.00034 CCW_Heart_Failure 1082 0.001013 -0.00104 0.002027 -0.00009 CCW_Rheumatoid_Arthritis_Osteoar 1120 0.000966 -0.00115 0.001932 -0.00031 TUG_Answer 1302 0.002177 -0.00123 0.004355 0.00093 NumLabOrders 3484 0.006394 -0.00601 0.012788 0.00019 DX_count 5715 0.010744 -0.01101 0.021487 0.00039 AgeAtVisit 5506 0.013971 -0.01138 0.027941 0.00247 Albumin_Result 5381 0.011902 -0.01254 0.023805 -0.00022 NumMeds 6729 0.014074 -0.01565 0.028148 -0.00057 Chol_Result 6148 0.015780 -0.01700 0.031561 -0.00090 142 Table 3A. 5. Sample of fit statistics from the RF model for hospice outcome Fit Statistics Number of Trees Number of Leaves MSE (Train) MSE (OOB) Misclassification Rate (Train) Misclassificatio n Rate (OOB) Log Loss (Train) Log Loss (OOB) 1 265 0.0764 0.1297 0.0881 0.1336 0.972 2.241 2 513 0.0600 0.1267 0.0760 0.1317 0.368 2.029 3 738 0.0543 0.1204 0.0750 0.1260 0.240 1.787 4 988 0.0504 0.1187 0.0690 0.1250 0.183 1.644 5 1250 0.0482 0.1123 0.0690 0.1216 0.166 1.380 6 1497 0.0475 0.1074 0.0731 0.1145 0.166 1.191 7 1775 0.0460 0.1048 0.0720 0.1137 0.163 1.107 8 2016 0.0454 0.1017 0.0723 0.1106 0.161 0.989 9 2293 0.0443 0.0994 0.0717 0.1087 0.159 0.881 10 2553 0.0436 0.0966 0.0715 0.1059 0.157 0.771 100 26097 0.0399 0.0860 0.0905 0.0989 0.150 0.308 101 26390 0.0398 0.0859 0.0905 0.0989 0.150 0.308 102 26664 0.0398 0.0860 0.0903 0.0989 0.150 0.308 103 26938 0.0398 0.0859 0.0905 0.0989 0.150 0.308 104 27190 0.0398 0.0860 0.0911 0.0989 0.150 0.308 195 50953 0.0399 0.0852 0.0943 0.0989 0.150 0.304 196 51181 0.0399 0.0853 0.0940 0.0989 0.150 0.304 197 51439 0.0399 0.0853 0.0940 0.0989 0.150 0.304 198 51706 0.0399 0.0852 0.0940 0.0989 0.150 0.304 199 51939 0.0399 0.0852 0.0946 0.0989 0.150 0.304 200 52219 0.0399 0.0853 0.0948 0.0989 0.150 0.304 143 CHAPTER 4. Cox Proportional H azard Model and Comparison between the Three Models Introduction The need for accurate risk stratification approaches in the population of community -living older adults was discussed in previous chapters. In the second and third chapters, respectively, we developed prediction models by applying the logistic regression (LR) and random forest (RF) modeling approaches to the USMM database. These two models were developed to predict 1 -year mortality and the risk of hospice admission over a similar 1 -year period. However, the maximum possible follow -up time for the 2015 USMM cohort exceeds two years (maximum follow up is 794 da ys between Jan 1 st, 2015 and Mar 6th, 2017 which is the date of the claims data inquiry). When follow -up was restricted to 12 months there were 2408 (32%) deaths, and 752 (10%) hospice admissions, whereas using all of the available follow -up time (max 794 days), the number of events and associated cumulative incidence rates increased to 3341 (45%) deaths and 1389 (19%) hospice admissions. To capture the experience of patients who had the outcomes beyond the first year of their USMM care, and to take into ac count the time -to-event, we used a Cox proportional hazard model to analyze the two outcomes as time -to-death and time -to-hospice. The same cohort of USMM patients, as used in the LR (chapters two) and RF (chapter three) models, were used in this analysis as well. The Cox model™s performance metrics were compared to the two alternative approaches, i.e., LR and RF models. The objective of this chapter is, therefore, to develop and validate multivariable time -to-event (also known as failure time) Cox models f or the two outcomes, death and hospice admission; and to compare their performance to the alternative models. 144 Main concepts and definitions o Survival analys is methods and the COX PH model Survival analysis is a branch of statistics that involves the modeling of time to an event. It attempts to answer questions such as what proportion of a population will survive past a certain time point, or what is the failure rate (hazard rate) in the sub jects who survived up to a certain time. Analysis of survival time needs special techniques because of the nature of the follow -up time. (140) In survival data, subjects are followed until the outcome happened, but almost always the data is incomplete; some subjects withdraw the study before an event happens; some others do not experience the outcome before the end of the study. These partially observed subjects are called censored ob servations. There are different methods developed for survival analysis, but the two most popular models are accelerated failure time model (141) and Cox proportional hazard model. (142) Both models have the assumption of a parametric form for the effect of independent predictor variables. The difference between the two models is an assumption for the underlying survival function, wh ere accelerated failure time assumes a parametric distribution for the underlying survival function, the Cox PH model has an unspecified survival function. The parametric form of the predictor variables enters the two models in different ways. Because of these assumptions, the Cox PH model is considered a semi -parametric model. Survival analysis can be used in several ways. For example, Kaplan -Meier (KM) curves estimate the survival function from censored data and provides a summary of the survival experie nce overall, and in subgroups. The log -rank test can be used to compare KM curves across subgroups. Regression analyses of time -to-event based on the accelerated failure time model or the Cox proportional hazard model serve to quantify the effect of one o r more variables on survival time. o Definitions Let T denote a non -negative random variable representing the failure time for an individual in the study population. The survival function is defined as: 145 ()[] StPTt The probability of being event -free at time t. The corresponding hazard function, denoted by (t), is defined as the instantaneous probability that the event of interest happens soon after time t. The formal definition of (t), for a continuous time T is: where f (t) is the probability density function of T. (142,143) Cox PH model is widely used in sur vival data analysis to evaluate the effect of explanatory variables Z on the hazard function. For each subject (index i) in the population his or her hazard function is expressed as: Where 0 (t) is unspecified, called the baseline hazard, iZ is the vector of fixed (i.e., not time -dependent) covariates for the ith subject, and are coefficients associated with iZ and assumed to be the same for all subjects. The terminology fiproportional hazardsfl come from the fact that for any two subjects the hazard ratio ()/() ijtt is constant in time. Hazard ratio is the ratio of two hazard rates corresponding to the two levels of an explanatory variables. For example, patients with severe functional impairment may die at twice the rate per unit time as the patients with no functional impairment. The Cox model allows both continuous and categorical explanatory variables in the model; also, it supports multivariable models while the KM method is inconvenient when faced with continuous explanatory variables. (143,144) . Consid er the following Cox model with three explanatory variables: 0112233 ()()exp( )ttxxx . Holding 23,xxconstant and increasing 1xby d units gives the hazard ratio (HR) 1exp() d. This gives us 146 the interpretation of 1exp() as the hazard ratio associated with a unit increase in 1xwith the other two covariates held fixed. If 1xwere binary, then 1exp() is the HR comparing group coded 1x=1 with the group coded 1x=0. o Performance evaluation Survival analysis models such as Cox PH are a useful approach to develop prediction models. In the LR model, the perform ance of a prediction model is assessed using discrimination measures such as AUC, and also calibration plots to evaluate the accuracy of the model™s predictions. Equivalent discrimination measures have been developed for the Cox PH analysis. Three summary statistics can be generated for the Cox model: concordance (also known as C -index), AUC (at a specific time), and integrated AUC (iAUC). Concordance is defined as the proportion of all usable pairs of subjects for which the greater event risk was predicted for the one that experienced the event earlier. The concordance statistic in the Cox model is called C -index and can be calculated in PROC PHREG by option ‚concordance™. (1,4) With some mod ification equivalent measures to the ROC and AUC generated for LR and RF models can also be produced from the Cox model if the measures are generated for a specific time point during the follow -up period. The definition of AUC at each time point is the sam e as the concordance definition, but it is limited to the events that occurred up to a specific time point at which the AUC was generated (e.g., 6 -month, 1 -year). It is called time -dependent AUC, and it changes at each event time point. The changes in AUC over study time can also be plotted and integrated. The integrated AUC (iAUC) is an average of the AUC at all possible time points in the study period. The C -index, time -dependent AUC, and iAUC are generated in the Cox model using PROC PHREG. (146) Unlike the LR model, calibration for the Cox model is sparsely discussed in the literature. Calibration is a way to validate a predictive model by evaluation of its predictive accuracy, however, assessing the calibration of the Cox model is not straightforward because the predictions have to be made relative to an unspecified baseline function. (147) 147 o Proportional hazard assumption Proportionality assumption is the central assumption of the Cox model; hence, the model is often called the Cox Proportional Hazard (PH) model. The PH assumption means that for any two individuals, the hazard ratio is proportional over the follow -up time and does not depend on time. Comparing two covariate profiles, x1, x2, the ratio o To evaluate the coefficient given the x2 and x3 are the same for both subjects, the hazard ratio is To test the PH assumption in the Cox model, different methods have been proposed. (7) The most common way is to generate the KM survival function for the predictor variables and observe if the curves cross each other; non -parallel curves suggest a violation of the PH assumption that can then be tested using a more formal statistical test. If x1 and x2 represent two groups, say, a treatment and a control group, the Kaplan -Meier plot of their survival functions should not cross, if the PH holds. This is NOT a formal test of the PH assumption, but gives a quick graphical check to see if the assumption is plausible. A formal test of the PH assumption can be approached in two ways: [1] as suggested by Cox in his 1972 article, proportionality in a covariate x is tested by include a time -dependent term in the Cox model. This term is x times g(t) where g(t) is often the function g(t)=log(t/c), where c is a constant Še.g, the median follow -up t ime. If the regression coefficient of the interaction of x with g(t) is significant (P-value <0.05) , a violation of the PH assumption for x is indicated. There is also an overall Wald test for testing all the interaction coefficients together. The stateme nt TEST in PROC PHREG conducts this test. (8) [2] a more sophisticated test of the PH assumption is the supremum test. It involves generating a few simulations of the score process for a covariate x, and comparing it to the observed process. The supremum test is similar in spirit to the Kolmogorov -Smirnov test comparing the maximal departure between the observed and expect ed, SAS PHREG implements the procedure via the ASSESS statement. This method takes a long time to generate the plots for each predictor levels. Therefore to test the PH 148 assumption of the Cox model, I used the KM survival plots and tested the 2-level intera ction terms in this chapter. If the proportionality assumption is violated, then there are alternative approaches for handling non -proportionality in the Cox model; for example including a time interaction term in the model, or using accelerated failure time modelling approach. (149) Literature review Two comparable studies used the Cox PH model to develop a prognostic model in a population of community -living older adults. In 1998, Fried et al., developed a prognostic score in a cohort of 5201 adults aged 65 years and older to predict the 5 -year mortality using a Cox PH model. (58) These adults participated in the cardiovascular health study (CHS) in 4 states Œ California, Maryland, North Carolina, and Pennsylvania. The 5 -year mortality rate in this population was 12%. There is a difference between this cohort and USMM cohort in exclusion criteria; in the CHS cohort patients were excluded if they were wheelchair -bound, or were unable to participate in the examination at the field center, or were under cancer treatment; none of these groups were excluded from the USMM cohort. In fact, USMM patients were all home -bound based on the CMS definition (Chapter 1). The major difference betwee n these two cohorts was their mortality rate (32% a year in USMM population vs. 2.5% a year in CHS cohort). Fried et al., assessed 78 characteristics including demographics, social, functional, physical examination, and comorbidity variables and found 20 v ariables to be predictors of 5 -year mortality including demographics (age, gender, income), lifestyle (physical activity, smoking), comorbidities (heart failure), physical examination (systolic blood pressure, body weight), lab tests (albumin, creatinine, fasting blood sugar), respiratory test, ECG abnormalities, and echocardiography findings (Table 1.1). They included missing data on predictors as a legitimate level of the variable. They validated the model in a separate cohort of the same study by computi ng a risk score for each individual and then comparing the mortality rate between quantiles of the prognostic score in both the derivation and validation data. This study found a 149 significant difference in the mortality rates in quantiles of the prognostic score in the validation data; however, it did not provide a discrimination measure or any other performance metric for the model. (58) The proportional hazard assumption was assessed by testing the interaction between time and covariates. In 2008, Carey et al. con ducted a multi -State US -based study and developed a prognostic index to predict mortality in a cohort of community -based, chronically -ill, frail older adults. (57) This cohort had 1-year and 3 -year mortality rates of 13% and 37% respectively. Carey et al. used a cox model to develop the prognostic index in the derivation cohort (n=2232) and then validate the index by applying it to the validation cohort (n=1667). They found eight variables (two demographics [age, gender], two functional [dependence in bathing, and dressing], and four comorbidities [cancer, hear t failure, COPD, and chronic kidney insufficiency]) in the Cox model as significant predictors of mortality. They then developed a risk score by assigning different points to the predictor variables based on the coefficient from the Cox model. The risk sco re ranged from 0 -14, and they assigned a 3 -level risk value to each patient based on their score (i.e., 0 -3 low risk, 4 -5 intermediate -risk, and >5 high -risk). They compared the 1 - and 3 - year mortality rates between the different risk levels. They reporte d a good calibration based on the similarity of the mortality rates between the derivation and validation cohorts. They also reported the AUC of 0.66 and 0.69 for derivation and validation data. Both study populations described above have differences with the USMM cohort. The first population was generally healthier, and younger (mean age=73 years) than the USMM cohort (mean age=82), and unlike USMM patients, they were not homebound. The 5 -year mortality rate in this cohort was 12% (Grossly estimated one -ye ar mortality of 2.5%) vs. 32% one year mortality rate in the USMM cohort. The second study population is more similar to the USMM cohort in terms of the age (mean age=79 years) and overall vulnerability, however, the mortality rate in this population is 13 % a year, whereas it is 32% in USMM cohort. The PACE study population are also eligible for nursing -home by confirmation from the 150 State's Medicare staff. Both data developed the Cox model to select the predictors and then generate a risk score from those v ariables. To validate the model, both studies applied the risk score to the validation data and then reported the mortality rate in order to evaluate the accuracy of the model. Fried et al. did not provide discrimination measures from the Cox model, but th ey evaluated the proportionality assumption by testing the interaction be tween time and each predictor. Methods and materials Data source - A Cox proportional hazard model was developed utilizing the same USMM dataset as used in the LR and RF model chapt ers. The main difference is that all of the available follow -up time was used to identify outcomes in this analysis, whereas in the LR and RF, only outcomes that occurred within 12 months of the first visit were analyzed. Therefore outcome events (deaths a nd hospice admissions) that occurred after the first year of patients™ registration up until the end of follow -up (median=1.4 years, max = 2.2 years) were included in this analysis. The USMM claims data was again used to determine if an outcome (death or h ospice admission) occurred and if so, to identify its date. The final date of Claims data inquiry (Mar 6 th, 2017) was used as the administrative end date of the study period; all subjects still in follow -up were censored at this point. Hospice coverages we re reported in 3 -month intervals; therefore, the date of first hospice admission was used to calculate the outcome of time -to-hospice. Study population - As with the prior analyses, the 2015 cohort was defined as all patients who had their first -ever medica l visit by a USMM provider between January 1 st and December 31 st, 2015. Since the outcomes of interest were recorded in the claims data, the USMM EMR data was linked to the claims data, and those patients who had claims data available were included. Patien ts with age<65 years were excluded. Like the previous chapters of this dissertation (LR and RF models), the cohort was limited to those who received care from the USMM for at least 12 months. In other words, if a patient was withdrawn from USMM care within the first year of registration, they were excluded. Also, four patients 151 were excluded due to time -to-event of zero or negative (incorrect date of event). The study population in this chapter is the same as the chapters 2 and 3, except for the four patient s that were excluded in this chapter due to time to event less than 1 day. Table 4.1 contains the inclusion and exclusion criteria for this population, and Figure 4.1 shows the flow diagram of the study population. Table 4. 1. Inclusion and exclusion criteria for the Cox cohort Inclusion criteria - Register in the USMM system in the calendar year 2015 - Had at least one USMM visit January 1 st and Dec 31 st, 2015 Exclusion criteria - Claims data not available - Age <65 years old - Less than 12 months care by USMM (withdrawal in the first year) - Time -to-event <1 day 152 Figure 4. 1. Flow diagram of the study population A total of 2182 patients were excluded because they had been followed up for less than 12 months; the reasons for their withdrawal have been summarized in chapter two (Table 2.2). There were additional four patients excluded due to time -to-event of zero (the event occurred at the same date as the first -ever visit) or negative (incorrect date of event). Outcomes - Time -to-death for the patients who deceased, was calculated as the number of days between the date of the first visit (recorded in the USMM 2015 data) and the date of death (rec orded in the claims data). For those who survived, the follow -up time was calculated as the number of days between the date of their first visit and the end of follow up defined as the date of the claims data No claims data available Age< 65 years old Claims -linked cohort N=12,634 Patients who had their first -ever USMM visit in 2015 N=20,424 Age 65 years old N=9,627 Received 1-year care N=7,445 USMM care < 1 year Final cohort N=7,441 Follow up time < 1 day 153 inquiry (March, 6 th, 2017). If a patient had no outcome reported before the end of the study, then the follow -up time was the days between the first visit date and 03/06/2017, which is referred to as administrative censoring. Therefore the longest possible follow up time was 794 days or 2.2 years (the number of days between Jan 1 st, 2015 and Mar 6 th, 2017). Time -to-hospice was calculated for the patients who were admitted to hospice as the number of days between the date of their first visit and the date of first hospice admission (in claims data). For those who did not have a hospice admission, the follow -up time was again calculated as the number of days between the first visit and the end of follow up (i.e., Mar 6 th, 2017). Exposure variables - As with our previous approach, only variables with less t han 20% missing observations were considered for the analysis. The same 41 variables as used in LR and RF models, were also included in this analysis. These were: D emographics : age, gender, race; socioeconomic status : insurance status representing if a pat ient has dual eligibility for both Medicaid and Medicare, living alone, smoking; functional status : functional decline in ADLs, timed up and go (TUG), Karnofsky performance scale (KPS value); lab tests : serum albumin, cholesterol; and other variables: havi ng a pressure ulcer, surprise question answer, number of medications, and number of lab test ordered by the provider. There are 24 medical history variables as listed in the Chronic Condition Warehouse (CCW) variables: history of hypothyroidism, asthma, at rial fibrillation, cataract, chronic kidney diseases, osteoporosis, hyperlipidemia, hypertension, anemia, breast cancer, colorectal cancer, benign prostatic hyperplasia, COPD, depression, diabetes, endometrial cancer, glaucoma, heart failure, hip/pelvic fracture, ischemic heart disease, lung cancer, prostate cancer, stroke/TIA, rheumatoid arthritis/osteoarthritis. Diagnosis count is a variable that counted the number of existing CCW conditions for each patient. Another variable, cancer, was generated if a p atient had one or more of the four cancers listed in the CCW variables. 154 o Statistical analysis The statistical analyses for this study was done using SAS software (SAS Institute Inc., Cary, NC, version 9.4). Dataset of 7441 was split into two equal -size dat asets, termed derivation (n=3721) and validation (n=3720), using SAS procedure SURVEYSELECT. These derivation and validation groups are the same as were used in chapters two and three. The Kaplan Meier (KM) survival plots were generated for the total popul ation for both outcomes. PROC LIFETEST was used to generate the KM plots. The Cox model was developed for each outcome using the derivation data and then applied to the validation data. Time -to-death and time -to-hospice were analyzed using PROC PHREG to d evelop a Cox regression model that examined all the 41 predictor variables. Different variable selection methods were examined, including automatic and manual selection methods. Automatic (SAS built -in) variable selection methods including stepwise, forwar d, and backward were specified in separate models and the model's performance, and the number of selected variables were compared. Also, the same manual selection method that was described for LR methods in chapter two was utilized; briefly variables that were significant in univariate analysis (p -value< 0.2), were put in a multivariable Cox model and those with p -value< 0.05 were included in the final manual selected model. The performance measures were generated for each model for a comparison between t he different variable selection methods. To compare these Cox models to the predictive models developed in the previous two chapters, I generated AUC statistics. To have comparable metrics from the Cox model, three summary statistics were generated for the final models: concordance (also known as C -index), AUC (at day 365), and integrated AUC (iAUC). (146) Concordance in the Cox model has an equivalent interpretation as the C -statistic in the LR model only the Cox model consid ers the timing of the event. For the Cox model, concordance is the proportion of all usable subject pairs in which the case with the higher risk predictor had an event before the case with the lower risk predictor. (146,147) In other 155 words, the concordance is the fraction of all pairs where the predictor score is higher for the individual with the earlier event. Usable pairs are pairs that one or both subjects had an event. There are different methods to generate Concordance statistic in survival analysis, naming UNO and Harrell. I used Harrell™s method, which is the default method in SAS. (140) Harrell's option in PROC PHREG also provides the standard error for the concordance which can be used to calculate the confidence limits. The ROC for the Cox model is sensitive to the time and can be generated at any time point in the study period, h ence called time -dependent ROC. (146) The time -dependent ROC and its respective time -dependent AUC varies only at the event times; i.e., AUCs at the time points between the two event times are the same. For comparison to the results from 1 -year mortality in LR and RF models, I generated an AUC and the 95% confidence limits at day 365 for Cox models using PROC PHREG options. (150) The AUC (365) has the same definition as concordance except in AUC (365) only events that happened between day 0 and 365 are counted. Changes in the time -dependent AUCs generate a plot that shows AUCs and the confidence limits at all possible time points. The integrated AUC (iAUC) is an average of all AUCs over all time points. (146) To generate ROC at day 365 and integrated AUC (iAUC) for the model, ROCOPTION was specified in the PROC PHREG statement. Proportionality assumpt ion is the central assumption of the Cox model; hence, this assumption must be satisfied for the Cox model to be an appropriate model. The proportional hazard assumption means the hazard ratio between two individuals is independent of the time. The statist ical details of the assumption were presented in the background section. The PH assumption in this model was tested for all variables of the final model using two methods, KM survival plots (with examining on non -parallel curves) and testing the 2 -way inte raction between each covariate and time. The KM survival curve was generated in the derivation data, stratified by the key covariates. Also, interaction terms were generated in the PROC PHREG by multiplication of the predictors by the log function of the f ollow -up time divided by the median of it. For example, the interaction of age by time was made by this formula 156 Age*Time=Age* log (Time/median of Time). This method of making interaction terms between the covariates and a log function of time was first int roduced in the original paper by David Cox. (142) The time variable is divided by a constant (often its median) in order to stabilize the estimates in the model. Then a log function of this product will be used in interaction terms. The interaction terms were added to the model one by one and evaluated for significance at the p< 0.05 level. Additionally, an overall PH test was done for all the interaction terms together in PROC PHREG. The PH test is testing the hypothesis that all interaction™s coefficien ts are zero. If the PH assumption is violated, the analysis will be performed stratified by the predictor that has significant interaction term. The model performance metrics (i.e., AUC and C -index) were then compared to the results of LR and RF models usi ng the validation data. The importance of explanatory variables in the final model was assessed by their coefficient estimates and then were compared with the alternative models. Although calibration plots were generated for the LR and RF model to validat e the accuracy of predictions, they are not as useful in the Cox model as in the previous approaches. Generation and interpretation of the calibration plots in the Cox model is not straightforward because the predictions are time -dependent. Therefore, a co mparison of the calibration plots between the Cox and other two models is not conducted. Results o Study population The starting population for this analysis consisted of 20424 patients who joined the USMM in 2015 and had at least one visit in 2015. Since the outcomes of interest were reported in the claims data, those with no claims data available (n=7790) were exclude d. Also, 3007 patients with age <65 years old were excluded from the analysis, because the objective of this study was to develop a risk stratification model for the older adults who live in the community. 157 Additionally, 2182 patients were excluded because they were under USMM care for less than 12 months. These patients did not have an outcome, and their last documented visit with the USMM was less than 1 -year from their first visit. Finally, the four patients who had time to event < 1 day were removed. Ex clusion of these four patients are the only difference between the study population in this chapter and in chapters two and three. Two of these four patients died at the same date as the first visit, and the other two had negative follow up time (which is assumed to be due to a mistake in data entry). Figure 4.1 shows the flow diagram of the study cohort. The final cohort consisted of 7441 subjects. Table 4.2 displays the baseline characteristics of the patient population, as well as the unadjusted hazard r atios for both outcomes (time -to-death and time -to-hospice). In this cohort of 7441 patients, the average age was 82 years with a standard deviation of 9, 66% were female, 63% white, 99% had Medicare coverage, and 27% were dual -eligible. Prevalence of como rbidities in this population included 81% hypertension, 51% hyperlipidemia, 34% diabetes, 26% COPD, and 7% cancer. Impaired functional status was documented in this population by three variables: KPS (54% severe need for assistance), TUG (45% abnormal test or non -ambulatory), and ADL (14% decline in ADL). In the univariate analysis of the CCW comorbidities, 13 variables (for mortality outcome) and nine variables (for hospice outcome) had the significant unadjusted hazard ratio less than 1.0, which means tha t a positive history of the disease was associated with a lower hazard of the outcomes (death or hospice admission). Overall the characteristics of this cohort were the same as the cohort of 7445 patients analyzed in the prior two models, LR a nd RF (see Ta bles 2.3 and 3.1). 158 Table 4. 2. Study population characteristics and association of predictors with the outcomes (N=7441) over an average of 459 days of follow -up Variable N (%) Missing N (%) Death % Unadjusted HR Hospice % Unadjusted HR Baseline characteristics Age -65 -74 -75 Œ 84 -85 Œ 94 -95+ 1826 (24.5) 2247 (30.2) 2794 (37.6) 574 (7.7) 0 29.4 42.7 54.2 58.0 Ref 1.60* 2.22* 2.38* 9.0 16.8 24.3 29.3 Ref 2.17* 3.69* 4.47* Sex -Male -Female 2512 (33.8) 4929 (66.2) 0 49.8 42.4 1.25* Ref 18.9 18.6 1.11 Ref Race -White -Black -Other 4681 (62.9) 1148 (15.4) 201 (2.7) 1411 (19.0) 40.4 29.3 28.4 Ref 0.67* 0.65* 21.0 11.8 11.0 Ref 0.50* 0.46* Tobacco use (current vs not) -Yes -No 644 (8.7) 6410 (86.1) 387 (5.2) 32.0 44.0 0.67* Ref 13.8 19.0 0.65* Ref Dual -eligible -Yes -No 2024 (27.2) 5417 (72.8) 0 34.2 48.9 0.62* Ref 13.4 20.6 0.54* Ref Lives alone -Yes -No 884 (11.9) 5508 (74.0) 1049 (14.1) 27.7 43.7 0.56* Ref 11.2 20.1 0.46* Ref S.Q - No -No -Yes 1044 (14.0) 5380 (72.3) 1017 (13.7) 61.6 37.3 2.01* Ref 29.7 16.6 2.6* Ref KPS -Mild /moderate (50 -100) -Severe disability (10 -40) 3376 (45.4) 4038 (54.3) 27 (0.4) 32.2 55.3 Ref 2.08* 12.8 23.7 Ref 2.54* TUG -<30 sec -30 sec -Non -ambulatory 2538 (34.1) 1377 (18.5) 2027 (27.2) 1499 (20.1) 28.6 35.3 44.9 Ref 1.28* 1.80* 15.2 19.2 20.4 Ref 1.35* 1.61* Decline in ADLs -Decline -Improve -No change 1062 (14.3) 311 (4.2) 4887 (65.7) 1181 (15.9) 46.4 10.0 39.4 1.22* 0.21* Ref 24.3 9.3 18.2 1.43* 0.39* Ref Pressure ulcer -Yes -No 940 (12.6) 6501 (87.4) 0 53.1 43.7 1.28* Ref 22.1 18.2 1.35* Ref 159 Table 4. 2 . (cont™d) cancer -Yes -No 565 (7.6) 6876 (92.4) 0 49.6 44.5 1.18* Ref 19.3 18.6 1.13 Ref Cholesterol result (mg/dl) Quartiles -<136 -136 - <164 -164 - <195 - 195+ 1554 (20.9) 1623 (21.8) 1589 (21.3) 1621 (21.8) 1054 (14.2) 53.0 38.9 37.4 33.0 1.92* 1.25* 1.17* Ref 19.8 18.1 18.3 18.0 1.40* 1.08 1.05 Ref Albumin result (g/dl) Quartiles -<3.2 -3.2 Œ <3.5 -3.5 Œ <3.8 -3.8+ 1667 (22.4) 1609 (21.6) 1820 (24.5) 1709 (23.0) 636 (8.6) 67.1 44.0 34.2 23.7 4.19* 2.15* 1.55* Ref 23.5 20.0 18.5 12.5 3.33* 2.01* 1.66* Ref Medical history Hypothyroidism -Yes -No 2050 (27.5) 53915 (72.5) 0 43.1 45.6 0.90* Ref 18.4 18.8 0.92 Ref Myocardial infarction -Yes -No 3 (0.04) 7438 (99.9) 0 66.7 44.9 1.65 Ref 33.3 18.7 1.87 Ref Anemia -Yes -No 2243 (30.1) 5198 (69.9) 0 40.2 46.9 0.78* Ref 19.6 18.3 0.96 Ref Asthma -Yes -No 309 (4.2) 7132 (95.9) 0 33.3 45.4 0.65* Ref 14.9 18.8 0.65* Ref Atrial fibrillation -Yes -No 1231 (16.5) 6210 (83.5) 0 53.0 43.3 1.29* Ref 21.9 18.0 1.31* Ref Benign prostatic hyperplasia -Yes -No 504 (6.8) 6937 (93.2) 0 45.2 44.9 0.99 Ref 20.0 18.6 1.05 Ref Breast cancer -Yes -No 224 (3.0) 7217 (97.0) 0 39.7 45.1 0.86 Ref 16.5 18.7 0.84 Ref Cataract -Yes -No 184 (2.5) 7257 (97.5) 0 21.2 45.5 0.39* Ref 7.6 19.0 0.31* Ref Chronic kidney diseases -Yes -No 3005 (40.4) 4436 (59.6) 0 38.0 49.6 0.67* Ref 19.3 18.2 0.88* Ref 160 Table 4. 2. (cont™d) Colorectal cancer -Yes -No 94 (1.3) 7347 (98.7) 0 51.1 44.8 1.22 Ref 20.2 18.7 1.20 Ref COPD -Yes -No 1945 (26.1) 5496 (73.9) 0 42.1 45.9 0.88* Ref 17.0 19.3 0.82* Ref Depression -Yes -No 1615 (21.7) 5826 (78.3) 0 36.6 47.2 0.69* Ref 19.0 18.6 0.86* Ref Diabetes -Yes -No 2518 (33.8) 4923 (66.2) 0 40.8 47.0 0.82* Ref 16.3 20.0 0.74* Ref Endometrial cancer -Yes -No 27 (0.4) 7414 (99.6) 0 40.7 44.9 0.84 Ref 22.2 18.7 1.11 Ref Glaucoma -Yes -No 337 (4.5) 7104 (95.5) 0 41.0 45.1 0.87 Ref 16.3 18.8 0.81 Ref Heart failure -Yes -No 2541 (34.1) 4900 (65.9) 0 41.5 46.7 0.84* Ref 18.5 18.8 0.90 Ref Hip fracture -Yes -No 81 (1.1) 7360 (98.9) 0 48.2 44.9 1.06 Ref 22.2 18.6 1.16 Ref Hyperlipidemia -Yes -No 3686 (49.5) 3755 (50.5) 0 35.9 53.7 0.56* Ref 17.0 20.3 0.64* Ref Hypertension -Yes -No 6055 (81.4) 1386 (18.6) 0 42.2 56.8 0.63* Ref 18.3 20.5 0.69* Ref Ischemic heart diseases -Yes -No 1269 (17.1) 6172 (82.9) 0 45.2 44.9 0.99 Ref 21.3 18.1 1.15* Ref Lung cancer -Yes -No 70 (0.9) 7371 (99.1) 0 66.7 44.7 1.88* Ref 21.4 18.6 1.59 Ref Osteoporosis -Yes -No 818 (11.0) 6623 (89.0) 0 33.3 46.3 0.63* Ref 19.2 18.6 0.86 Ref Prostate cancer -Yes -No 175 (2.4) 7266 (97.7) 0 56.0 44.6 1.38* Ref 18.6 21.1 1.34 Ref Osteoarthritis -Yes -No 2760 (37.1) 4681 (62.9) 0 37.0 49.5 0.65* Ref 18.7 18.7 0.82* Ref 161 Table 4. 2. (cont™d) TIA/stroke -Yes -No 799 (10.7) 6642 (89.3) 0 45.3 44.9 0.97 Ref 23.7 18.1 1.28* Ref Continuous variables ƒ Age (mean ± sd) 82.2 ± 9.3 0 -- 1.03* -- 1.06* Albumin g/dl (mean ± sd) 3.4 ± 0.5 636 (8.6) 0.32* 0.41* Cholesterol mg/dl (mean ± sd) 167.7 ± 44.0 1054 (14.2) 0.99* 0.99* Number of lab tests (Median, IQR) 0 (0 Œ 5) 0 -- 1.01* -- 0.97* Number of medications (Median, IQR) 9 (5 Œ 13) 0 -- 0.98* -- 0.98* Diagnosis count (Median, IQR) 5 (3 -6) 0 -- 0.86* -- 0.92* Variables that were not included in the analysis due to >20% missing observations Decline IADLs -Decline -Improve -No change 730 (9.8) 524 (7.0) 984 (13.2) 5203 (69.9) 10.7 7.3 12.0 0.90 0.60* Ref 7.7 7.6 12.3 0.62* 0.61* Ref Global health compared to a year ago -Better -Worse -The same 55 (0.7) 315 (4.2) 1185 (15.9) 5886 (79.1) 30.9 72.4 38.5 0.74 2.50* Ref 14.6 27.6 15.1 0.86 2.96* Ref Fall since last visit -Yes -No 184 (2.5) 1545 (20.8) 5712 (76.8) 49.5 46.1 1.08 Ref 15.8 17.8 0.86 Ref Hospitalization since last visit -Yes -No 870 (11.7) 1564 (21.0) 5007 (67.3) 57.9 65.8 o.84* Ref 18.2 14.5 1.18 Ref ER since last visit -Yes -No 788 (10.6) 1648 (22.2) 5005 (67.3) 45.1 67.4 0.54* Ref 17.1 14.8 0.83 Ref Lost weight -Yes -No 1243 (16.7) 2431 (32.7) 3767 (50.6) 37.4 13.1 3.53* Ref 23.7 12.9 2.35* Ref IQR: interquartile range; sd: standard deviation; S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; IADL: instrumental activities of daily living; TIA: transient ischemic attack; FU: follow -up; mg/dl: milligram per deciliter; g/dl: gram per deciliter; * P-value < 0.05 in univariate analysis with the outcomes; ƒ The unadjusted HR for continuous variables were generated for 1 unit change in the independent variable; however, only three variables were included as continuous in the analyses: Number of meds, number of labs, and diagnosis count; 162 The maximum and minimum follow up time for the mortality outcome were 1 and 794 days, respectively; with a median of 517 days (q1=246, and q3=658) and mean of 459 days or 1.25 years. The median and mean follow -up time for hospice outcome were 497 and 440 days, respectively. From the 7441 patients, 45% (n=3341) died over the FU period, and 19% (n= 1389) were admitted to hospice. Of those a dmitted to hospice 1122 (81%) died in hospice by the end of follow -up in March 2017. Overall 2219 deaths (66% of all deaths) occurred outside of hospice. A total of 3833 patients were censored at the end of the study without experiencing the outcomes. Tabl e 4.3 displays the follow -up time and the frequency of outcomes. Table 4. 3. Follow -up time and outcomes in the Cox study cohort ( N=7441) Variable N (%) Outcome: death Number of deaths over the total follow -up time 3341 (44.9) Follow -up time in days -mean ± sd -median (q1 - q3) 459 ± 239 517 (246 - 658) Outcome: hospice admission Number of hospice admissions over the follow -up time 1389 (18.7) Follow -up time in days -mean ± sd -median (q1 - q3) 440 ± 242 497 (207 - 647) Hospice admitted patients (n=1389) Number of deaths in the hospice over the follow -up time 1122 (80.8) Follow up time from hospice admission (Time to death or censoring) (days) -mean ± sd -median (q1 - q3) 104 ± 116 58 (10 - 169) 163 o Outcome: one -year m ortality Figure 4.2 illustrates the Kaplan Meier (KM) survival curve for the total cohort (n=7441). In this cohort, 3341 (45%) events (deaths) happened over the follow -up time, and 4100 (55%) observations were censored at the end of follow -up which means they were alive at the administrative end date. Figure 4. 2. KM survival plot for the whole data (N=7441) The number of at -risk patients is shown inside the plot over the time axis Figure 4.3 is the estimated hazard rate over the follow -up time for the whole population. The time unit in hazard rates analyses, is day. The hazard rate is the highest at the beginning of the follow -up and decreases over time with two spikes of increase at about 450 and 65 0 days. Overall the hazard rates are not constant over time; however, the difference between the maximum and minimum hazard rates are relatively small (0.04% vs. 0.15%). 164 Figure 4. 3. Hazard rate estimates for the whole data ( N=7441) - Model development To develop the cox model in the derivation data, four methods of variable selection were used, including stepwise, forward, backward, and manual selection. A full model that included all the predictors was also presented. The d eveloped models were then applied to the validation data, and performance metrics were generated for comparison between the alternative variable selection models. The C -index and AUC (365) from each method were reported for both derivation and validation d atasets (Table 4.4). The number of observations and predictors for each model are shown in Table 4.4. 165 Table 4. 4. Comparison of alternative variable selection methods in the derivation data ( N= 3721) Model selection Derivation Validation N analyzed validation Variables C-index * AUC at 365 days ƒ C-index AUC at 365 days Full model (all) 0.7168 (0.70 - 0.74) 0.7480 0.7035 (0.69 - 0.72) 0.7475 (0.58 - 0.92) 2073 41 variables Stepwise 0.7004 (0.68 - 0.72) 0.7347 0.6961 (0.68 - 0.71) 0.7404 (0.71 - 0.77) 2312 9 variables: age, sex, race, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia Forward 0.7004 (0.68 - 0.72) 0.7347 0.6961 (0.68 - 0.71) 0.7404 (0.71 - 0.77) 2312 9 variables: age, sex, race, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia Backward 0.7107 (0.69 - 0.73) 0.7389 0.7059 (0.69 - 0.72) 0.7504 (0.68 - 0.82) 2312 32 variables: age, race, SQ, albumin, cholesterol, KPS, ADL -decline, hypothyroidism, anemia, asthma, AF, BPH, breast ca, cataract, CKD, Colorectal ca, COPD, depression, DM, endometrial ca, glaucoma, HF, hip fx, hyperlipidemia, hypertension, IHD, lung ca, osteoporosis, prostate ca, RA/OA, stroke/TIA, diagnosis -count Manual selection 0.7163 (0.70 - 0.73) 0.7570 0.6924 (0.68 - 0.71) 0.7346 (0.71 -0.76) 2312 8 variables: age, race, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; BPH: benign prost atic hyperplasia; ca: cancer; fx: fracture; CKD: chronic kidney diseases; COPD: chronic obstructive pulmonary diseases; DM: diabetes mellitus; HF: heart failure; IHD: ischemic heart diseases; RA/OA: rheumatoid arthritis/osteoarthritis; TIA: transient ische mic attack; *Confidence intervals for the C -index was calculated by using the standard error for Harrell™s estimate of the concordance; ƒConfidence intervals for the AUC (365) in the derivation cohort are not provided because using variable selection meth ods and multiple iterations of the model cause a very wide CL for the AUC; The performance of the models developed with different variable selection methods was similar, so the model that was selected through stepwise variable selection was chosen as the final best model because it has slightly better AUC (365) and C -index based on the validation data, while also being parsimonious with only nine variables. The backward selection model resulted in a tiny increase in AUC (365) compared to the stepwise model; however, the confidence interval for this statistic is not attained, and 166 the number of variables is much higher than the stepwise model (32 vs. 9). Manual variable selection resulted in a model with eight variables, but performance measures were slightly lower than stepwise selection. Unlike the results of LR variable selection methods, the variables selected in the Cox model with different selection method are also very consistent. Indeed, selected variables are precisely the same in stepwise, forward, and manual selection methods, except for the variable sex that was not select ed in the manual selection model. The eight variables included in all the models (including backward selection), were demographics (age, race), SQ, nutritional status indicators (albumin and cholesterol), history of hyperlipidemia and functional status ind icators (KPS, ADL -decline). These results are consistent with the important variables selected from the other approaches developed in chapters two and three (Table 4.9, comparison of the three approaches). Final Selected model - The model developed with s tepwise variable selection was the best model. The C -index and AUC (365) of the model in the derivation data were 0.7004 and 0.7140, respectively. To evaluate the importance of variables in the Cox multivariable model, the parameter estimates, hazard rati os, and 95% confidence limits from the stepwise model are presented in Table 4.5. Albumin, age and surprise questions have the largest hazard ratios. Similar to what was observed in the LR model, the lowest levels of albumin and cholesterol resulted in the highest hazard ratios for mortality and hospice. As expected, male sex and older ages are also associated with increased HRs, although age does not show a dose -response relationship. It means that the HR for age 95+ years was not higher than the HR for ag e 85 -95. The direction and magnitude of the hazard ratios in the Cox model is similar to the odds ratios from the LR model. ADL (improve vs. no change) also has a relatively large HR; however, the prevalence of this value (ADL=improve) is very low (4%) and so of little clinical importance in this population. For the ADL variable, fino changefl had a higher hazard compared to the fideclinefl, although it is not statistically significant. The relationship between ADL and mortality is also similar between the Cox and LR models. 167 Table 4. 5. Parameter estimates, hazard ratios, and 95% CL for predictors of the MV Cox model for mortality outcome - derivation data (N=2289) Variable Parameter Estimate P-value Hazard Ratio 95% HR Confidence Limits Age, 75 -84 years vs. 65 -74 years 0.37319 0.0070 1.452 1.107 1.905 Age, 85 -94 years vs. 65 -74 years 0.61698 <.0001 1.853 1.438 2.388 Age, 95+ years vs. 65 -74 years 0.40875 0.0311 1.505 1.038 2.182 Age, 65 -74 years Ref Sex, Male vs. Female 0.23499 0.0137 1.265 1.049 1.525 Race, Black vs. White -0.35441 0.0042 0.702 0.550 0.894 Race, Other vs. White -0.37761 0.1660 0.685 0.402 1.170 Race, White Ref Surprise question, No vs. Yes 0.51093 <.0001 1.667 1.344 2.067 Albumin, <3.2 vs 3.8+ gr/dl 1.14039 <.0001 3.128 2.372 4.124 Albumin, 3.2 -<3.5 vs 3.8+ gr/dl 0.66669 <.0001 1.948 1.474 2.573 Albumin, 3.5 -<3.8 vs 3.8+ gr/dl 0.43304 0.0032 1.542 1.156 2.056 Albumin, 3.8+ gr/dl Ref Cholesterol, <136 vs 195+ gr/dl 0.44608 0.0005 1.562 1.218 2.004 Cholesterol, 136 -<164 vs 195+ gr/dl -0.07845 0.5551 0.925 0.712 1.200 Cholesterol, 164 -<195 vs 195+ gr/dl 0.16740 0.1857 1.182 0.923 1.515 Cholesterol, 195+ gr/dl Ref KPS, Severe vs. Moderate disability 0.40052 <.0001 1.493 1.239 1.799 ADL -decline, Decline vs. No -change -0.05349 0.6268 0.948 0.764 1.176 ADL -decline, Improve vs. No -change -0.82404 0.0050 0.439 0.247 0.780 ADL -decline, No -change Ref CCW -Hyperlipidemia, No vs. Yes 0.39815 <.0001 1.489 1.251 1.772 *KPS values 0 -40 indicate severe disability, while values 50 -100 shows moderate/mild and no disability; 168 - Model performance To validate the model, it was applied to the validation dataset, and the three performance metrics were generated. Table 4.6 shows the C -index of th e model in the validation data, and Figure 4.4 presents the AUC at 365 days. Table 4. 6. Concordance (C -index) of the Cox MV model for mortality in the validation data ( N=2312) Harrell's Concordance Statistic Source Estimate Standard Error Comparable Pairs Concordance Discordance Tied in Predictor Tied in Time Model 0.6961 0.0086 1053996 459642 1441 554 Figure 4. 4. ROC for the mortality outcome at time=365 days and AUC (365) from Cox MV model - validation data ( N=2312) Time -dependent AUC was generated for the Cox MV model in the validation data and resulted in an iAUC of 0.7318. This drop in the AUC at the end of the study period is related to the censored subjects at the end of the stu dy for which no event has been reported. 169 Figure 4. 5. Time dependent AUC (stepwise selection, validation data) ( N=2312) Integrated Time -Dependent AUC Source Estimate Tau Model 0.7318 750 - Proportionality assumption To test the proportionality assumption in this data, KM survival plots were generated in the derivation cohort and stratified by all nine predictors. Also, the interaction terms for these variables by the time were included in the PHREG procedure to evalua te the effect of each level of predictors over time. Figures 4.6 Œ 4.14 illustrate the KM survival curves stratified by the five key predictors. None of the KM survival plots show curves that cross each other over time. That is to say there is no graphical evidence of non -proportionality. 170 Figure 4. 6. KM survival curve stratified by age - derivation data Figure 4. 7. KM survival curve stratified by sex - derivation data 171 Figure 4. 8. KM survival curve stratified by race - derivation data Figure 4. 9. KM survival curve stratified by albumin - derivation data 172 Figure 4. 10. KM survival curve stratified by cholesterol - derivation data Figure 4. 11. KM survival curve stratified by SQ - derivation data 173 Figure 4. 12. KM survival curve stratified by KPS - derivation da ta Figure 4. 13. KM survival curve stratified by ADL decline - derivation data 174 Figure 4. 14. KM survival curve stratified by hyperlipidemia - derivation data Additionally, the 2 -way interactions between time and predictors in the final model were also tested by adding them into the final main effects Cox model. The significance of the coefficient of interaction terms indicates the violation of the proportionality assumption for that p redictor. Table 4.7 contains the estimates and p -values for the nine interaction terms. The main effects in the model, are not shown in this table. Three of the interaction terms (Cholesterol, ADL, and hyperlipidemia) were statistically significant at the P<0.05 level; however, stratified KM survival curves did not show any evidence of the significant violation of proportionality assumption. 175 Table 4. 7. Parameter estimates and p -values for the interaction terms between time and key predictors - derivation data Parameter DF Parameter Estimate P-value for interaction Hazard Ratio 95% Hazard Ratio Confidence Limits Age*Time 1 0.02164 0.6850 1.022 0.920 1.135 Sex*Time 1 0.13337 0.1647 1.143 0.947 1.379 Race*Time 1 0.13021 0.2119 1.139 0.928 1.397 Albumin*Time 1 0.06108 0.1745 1.063 0.973 1.161 Cholesterol*Time 1 0.08470 0.0419 1.088 1.003 1.181 SQ*Time 1 0.00529 0.9592 1.005 0.821 1.231 KPS*Time 1 0.04411 0.6497 1.045 0.864 1.264 ADL -decline*Time 1 0.29245 0.0091 1.340 1.075 1.669 Hyperlipidemia*Time 1 0.20372 0.0311 1.226 1.019 1.475 An overall test for proportionality was performed using the statement TEST in PROC PHREG when all the interaction terms included in the model and the test statement. The result of this test is consistent with the results in Table 4.7 and rejected the null hypothesis that overall none of the interaction coefficients are statistically different from zero (Ta ble 4.8). Considering the KM survival curves (Figures 4.6 -4.14) which do not show a significant violation of the PH assumption, the Cox model can be appropriately used to model the mortality in this cohort. Table 4. 8. Overall te st for proportionality assumption for all interaction terms together Linear Hypotheses Testing Results Label Wald Chi -Square DF P-value PH- test 20.8861 9 0.0132 176 - Comparison between the alternative approaches (Cox, LR, and RF) To compare the performance of this model to the previous approaches, the AUC of the best model in each of the LR and RF approaches were compared to the Cox model results (Table 4.9). Because the outcome of interest for LR and RF models was 1 -year mortality , the AUC at day 365 for Cox MV model was reported for comparison. Also, C -index as an overall measure of discrimination over the study time was displayed in Table 4.9. Table 4. 9. Comparison of the model performance between th e three models, Cox, LR, and RF models using validation dataset Model N analyzed, Validation AUC at 1 -year Validation Variables Cox Model 2312* 0.7404 (0.71 - 0.77) 0.6961 ƒ (0.68 - 0.71) 9 variables: age, sex, race, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia, Logistic regression 2312* 0.7634 (0.74 - 0.79) 11 variables: age, sex, race, dual -eligible, SQ, albumin, cholesterol, KPS, ADL -decline, hyperlipidemia, depression Random forest 3723 0.8292 (0.82 - 0.84 ) 15 first ranked important variables: TUG, albumin, race, ADL -decline, cholesterol, KPS, SQ, age, hyperlipidemia, diagnosis -count, tobacco, living -alone, RA/OA, dual -eligible, pressure -ulcer S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; RA/OA: rheumatoid arthritis/ osteoarthritis; *Observations with partly missing data were excluded by default in LR and Cox procedures; ƒC-index is the concordance obt ained by applying the developed Cox model to the validation data; Compared to the LR and RF models, Cox MV model had the lowest discrimination ability in this data. Therefore, the analysis of time -to-event instead of the fixed time event analysis (1 -year mortality) does not seem to improve the accuracy of the predictions. Similar to the variable selection in the LR model, TUG was not selected in the Cox model; however, it is high -ranked in the variable importance in the RF model. Missing data on TUG resul ted in the exclusion of 20% of observation from the LR and Cox analysis. 177 o Outcome: one -year h ospice admission The second outcome, time -to-hospice, was analyzed following the same methods used for the mortality outcome. Figure 4.15 is the KM survival curve for time to hospice in the total cohort (n=7441). During the study time, 1389 (19%) events (hospice admissions) occurred and 6052 (81%) of patients were censored, 30% were censored due to death and the rest of 51% were censored at the administrative end date of the study. Figure 4. 15. KM plot for time -to-hospice admission in the whole cohort (N=7441) Figure 4.16 illustrates the estimated hazard rates for hospice admission over follow -up time. Unlike the hazard rate for mortality, the hazard rate for hospice admission is low at the beginning but increases until around a year from the first USMM visit, then decreases over time. 178 Figure 4. 16. Hazard rate for hospice admission f rom the first US MM visit - whole cohort (N=7441) Figure 4.17 shows the KM survival plots for mortality stratified by the hospice status. The red color showed the hospice admitted group, and it shows very few events at the beginning of the curve. It seems that patients w ho were admitted to hospice had their first few months (about 180 days) free of death, which is not correct. In fact, 59 % of patients who were admitted to hospice died within the first three months of their admission. Note that the time -to-death in Figure 4.17 represents the number of days between the first -ever visit and the date of death. Therefore the slow slope at the left end of the hospice curve does not show the number of deaths at the beginning of hospice stay. It shows the patients who finally adm itted to hospice had a lower number of deaths within the first six months of their joining USMM services. Likewise, Figure 4.18 displays the hazard rates for mortality in the total cohort stratified by hospice admission. Interestingly the hazard rate is in creasing in the hospice admitted group, whereas it is slowly decreasing in those without hospice admission. 179 Figure 4. 17. KM plot for time -to-death from the first USMM visit stratified by hospice admission status (N=7441) Figure 4. 18. Estimated hazard rates for time -to-death stratified by hospice admission status ( N=7441) 180 Figure 4.19 illustrates the KM survival curve for the time from hospice admission to death among the 1389 hospice admitted patients. In this group, 1122 (81%) deaths occurred during the follow -up time, and 267 (19%) subjects were censored at the end of the study. Figure 4. 19. KM survival among hospice admitted patients (N=1389) As mentioned abo ve, 59% of patients who were admitted to hospice died within the first three months of their admission, and 76% died within the first six months. The steep slope of the KM curve in Figure 4.19 confirms the high rate of death within the first 100 days in th e hospice admitted population. Figure 4.20 displays the estimated hazard rate for mortality among the hospice admitted patients over time; it is another illustration of the high rate of death at the beginning of hospice admission in this cohort. The life expectancy <6months in the 76% of hospice admitted group means that the screening for hospice based on the CMS eligibility criteria was done with a reasonable estimation of patients' prognosis. 181 Figure 4. 20. Estimated hazard r ate for mortality among hospice admitted patients (N=1389) - Model development Similar to the methods for mortality outcome, four variable selection methods were applied to develop the models in the PHREG procedure. Table 4.10 compares the performance res ults of these methods. Table 4. 10. Alternative variable selection methods for hospice outcome - derivation data ( N=3721) Model selection Derivation Validation N analyzed validation Variables C-index* AUC at 365 days ƒ C-index AUC at 365 days Full model (all) 0.7075 (0.69 - 0.73) 0.7502 0.6837 (0.67 - 0.70) 0.7207 (0.49 - 0.95) 2073 41 variables Stepwise 0.6947 (0.67 - 0.71) 0.7396 0.6750 (0.66 - 0.69) 0.7199 (0.68 - 0.76) 2498 9 variables: age, race, SQ, living -alone, albumin, KPS, Hip fx, hyperlipidemia, number of labs, Forward 0.6947 (0.67 - 0.71) 0.7396 0.6750 (0.66 - 0.69) 0.7199 (0.68 - 0.76) 2498 9 variables: age, race, SQ, living -alone, albumin, KPS, Hip fx, hyperlipidemia, number of labs, 182 Table 4. 10. (cont™d) Backward 0.7006 (0.68 - 0.72) 0.7426 0.6730 (0.66 - 0.69) 0.7152 (--) 2498 29 variables: age, race, dual -eligible, SQ, living -alone, albumin, KPS, cancer, hypothyroidism, anemia, asthma, AF, BPH, cataract, CKD, COPD, depression, DM, glaucoma, HF, hip fx, hyperlipidemia, hypertension, IHD, osteoporosis, RA/OA, stroke/TIA, number of labs, diagnosis -count Manual selection 0.6866 (0.67 - 0.71) 0. 7330 0.6827 (0.67 - 0.70) 0.7212 (0.66 - 0.78) 2227 10 variables: age, race, dual -eligible, SQ, living -alone, albumi n, KPS, TUG, hyperlipidemia, number of labs, S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; AF: atrial fibrillation; BPH: benign prostatic hyperplasia; ca: cancer; fx: fracture; CKD: chroni c kidney diseases; COPD: chronic obstructive pulmonary diseases; DM: diabetes mellitus; HF: heart failure; IHD: ischemic heart diseases; RA/OA: rheumatoid arthritis/osteoarthritis; TIA: transient ischemic attack; *Confidence intervals for the C -index was c alculated by using the standard error for Harrell™s estimate of the concordance; ƒConfidence intervals for the AUC (365) in the derivation cohort are not provided because using variable selection methods and multiple iterations of the model cause a very w ide CL for the AUC; The performance measures of the models were very similar when using different variable selection methods. However, the stepwise selection method is the most parsimonious model with nine variables. The manually selected model had one additional variable (TUG) that did not make a significant change in the AUC and C -index compared to the stepwise model. However, 271 more observations were excluded due to missing on TUG. Therefore the best Cox MV model for the outcome hospice admission is the one selected through stepwise variable selection method. The variables that were consistently selected in all four selection methods are age, race, SQ, living alone, KPS, albumin, hyperlipidemia, and the number of lab tests. Final selected model - The stepwise selected model with nine variables had the AUC (365) of 0.7291 and C-index of 0.6947 in the derivation data. The parameter estimates, p -value, and hazard ratios for predictors in this model are shown in Table 4.11. 183 Table 4. 11. Parameter estimates and hazard ratios from the Cox model for hospice outcome, derivation data (N=2055) Variable Parameter Estimate P-value Hazard Ratio 95% HR Confidence Limits Age, 75 -84 years vs. 65 -74 years 0.96481 <.0001 2.624 1.703 4.043 Age, 85 -94 years vs. 65 -74 years 1.33531 <.0001 3.801 2.527 5.719 Age, 95+ years vs. 65 -74 years 1.27587 <.0001 3.582 2.167 5.921 Age, 65 -74 years Ref Race, Black vs. white -0.72244 <.0001 0.486 0.345 0.684 Race, Other vs. White -0.69036 0.0720 0.501 0.236 1.064 Race, White Ref Surprise question, No vs. Yes 0.77141 <.0001 2.163 1.677 2.789 Living -alone, Yes vs. No -0.63107 0.0032 0.532 0.350 0.809 Albumin, <3.2 vs 3.8+ gr/dl 0.90295 <.0001 2.467 1.759 3.459 Albumin, 3.2 -<3.5 vs 3.8+ gr/dl 0.51051 0.0033 1.666 1.185 2.342 Albumin, 3.5 -<3.8 vs 3.8+ gr/dl 0.34028 0.0560 1.405 0.991 1.992 Albumin, 3.8+ gr/dl Ref KPS, Severe vs. Moderate disability* 0.83124 <.0001 2.296 1.788 2.949 CCW -Hip/pelvic fracture, No vs. Yes 1.00489 0.0472 2.732 1.013 7.369 CCW -Hyperlipidemia, No vs. Yes 0.33907 0.0023 1.404 1.129 1.745 Number of Labs (continuous) -0.03736 0.0022 0.963 0.941 0.987 *KPS values 0 -40 indicate severe disability, while values 50 -100 shows moderate/mild and no disability; Based on the parameter estimates, age, hip -fracture, albumin, and KPS had the strongest impact on the time -to-hospice. Hip - fracture has a large coefficient estimate, but the prevalence of it in this population is very low (1%). Therefore the effect of thi s variable would not be clinically meaningful. Surprisingly, the direction of the association between hip fracture and hospice admission shows that patient with a history of hip fracture had less hospice admission than those without hip fracture. A possibl e 184 explanation might be the old patients with hip fracture are more likely to die before hospice referral or before making a decision about hospice admission (e.g., death in the hospital while hospitalized for the index fracture. Mortality rate of patients with a hip fracture is 15 -30% in the first year after the fracture and a small proportion of patients are discharged to hospice. (151,152) Additionally, we observed in this cohort that many comorbidities had a rev erse association with both outcomes, death, and hospice admission. The possible explanations are discussed in Chapter 5. Age, albumin, and KPS were also among the most important predictors for mortality in all different variable selection methods. - Model performance The predictive performance of the Cox MV model for the hospice outcome is evaluated by applying the model to the validation data. C -index and AUC at day 365 from the model in the validation data were shown in Table 4.12 and Figure 4.21, respect ively. Table 4. 12. Concordance of the Cox MV model for hospice outcome - validation data ( N=2498) Harrell's Concordance Statistic Source Estimate Standard Error Comparable Pairs Concordance Discordance Tied in Predictor Tied in Time Model 0.6750 0.0080 1337345 642072 6604 881 185 Figure 4. 21. ROC at day 365 from the Cox MV model for the hospice outcome - validation data ( N=2498) A time -dependent AUC and its summary measure (iAUC) was also generated for the model in the validation data. Similar to the results of mortality outcome AUC was around 0.70 for most of the follow -up time, except for the end of follow -up time that shows a steep drop in the AUC. 186 Figure 4. 22. Integrated AUC from the Co x MV model for hospice outcome - validation data ( N=2498) Integrated Time -Dependent AUC Source Estimate Tau Model 0.7028 750 - Proportionality assumption test Proportionality assumption was tested for all predictors in the final Cox MV model for the hospice admission, naming age, race, SQ, albumin, and KPS. KM survival curves stratified by the five predictors were generated for the patients in the derivation dat aset (Figures 4.23 - 4.30). None of the KM curves showed a significant violation of the proportionality assumption. There was not a clear crossing between lines for different levels of predictor, except for albumin, in which the curves for the two middle categories (i.e., 3.2 -<3.5 and 3.5 -<3.8) were crossing; however, they line up closely, and therefore the crossing does not necessarily mean the violation of the proportionality assumption. 187 Figure 4. 23. KM survival curve stratified by age - derivation data Figure 4. 24. KM survival curve stratified by race - derivation data 188 Figure 4. 25. KM survival curve stratified by SQ - derivation data Figure 4. 26. KM survival curve stratified by living alone - derivation data 189 Figure 4. 27. KM survival curve stratified by albumin - derivation data Figure 4. 28. KM survival curve stratified by KPS - derivation data 190 Figure 4. 29. KM survival curve stratified by hip fracture - derivation data Figure 4. 30. KM survival curve stratified by hyperlipidemia - derivation data 191 Additionally, the 2 -way interactions between time and predictors in the final model were also tested by adding them into the final main effects Cox model. Table 4.13 contains the estimates and p -values for the nine interaction terms. All the interaction terms were non -significant at the significance level of 0.05, which implies that none of the interaction coefficients are statistically different from zero. Table 4. 13. Parameter estimates and p -values for the interaction terms between time and key predictors - derivation data Parameter DF Parameter Estimate P-value For interaction Hazard Ratio 95% Hazard Ratio Confidence Limits Age*Time 1 0.07382 0.4732 1.077 0.880 1.317 Race*Time 1 0.06076 0.7513 1.063 0.730 1.547 SQ*Time 1 0.26372 0.1374 1.302 0.919 1.843 Albumin*Time 1 -0.14643 0.0523 0.864 0.745 1.001 KPS*Time -0.24868 0.2212 0.780 0.524 1.162 Lives -alone*Time 0.05288 0.8620 1.054 0.581 1.914 Hip -fracture*Time 0.64143 0.5364 1.899 0.249 14.509 Hyperlipidemia*Time 0.28888 0.0917 1.335 0.954 1.867 An overall test for proportionality was performed using the statement TEST in PROC PHREG when all the interaction terms included in the model and in the test statement (Table 4.14). The overall PH test result was also statistically non -significant, which m eans there is no statistical evidence to reject the hypothesis that all of the interaction coefficients are different from zero. In other words, there is no evidence of a violation of the proportionality assumption in this data. Therefore the Cox model can be appropriately applied in this data to model the outcomes of interest. 192 Table 4. 14. Overall test for proportionality assumption for all interaction terms together Linear Hypotheses Testing Results Label Wald Chi -Square DF P-value PH-test 10.3797 8 0.2394 - Comparison between the alternative approaches (LR, RF, and Cox) To compare the performance of this model to the previous approaches (LR and RF models), the AUC of the best model in each of the LR and RF approaches were compared to the Cox model results (Table 4.15). The performance measures of the Cox model for prediction of hospice admission was comparable to the previous approaches, LR and RF. Interestingly, the LR model with only seven predictors has the best dis crimination among the three approaches for hospice outcome. Four of the selected predictors were in common between LR and Cox model, naming age, race, SQ, and KPS. Age, SQ and KPS were also the three highest -ranked important variables in RF model and race was the seventh important one. Table 4. 15. Comparison of the Cox model performance with the LR and RF models - Hospice outcome Model N analyzed, Validation AUC at 1 -year Validation Variables Cox Model 2498 0.7199 (0.68 - 0.76) 0.6750* 9 variables: age, race, SQ, living -alone, albumin, KPS, hip -fx, hyperlipidemia, number of labs, Logistic regression 2590 0.7251 (0.70 - 0.75) 7 variables: age, sex, race, dual -eligible, SQ, KPS, ADL -decline Random forest 3723 0.6971 (0.67 - 0.72 ) 15 first ranked important variables: SQ, age, KPS, number of labs, albumin, living -alone, race, dual -eligible, TUG, cholesterol, ADL -decline, hyperlipidemia, stroke/TIA, depression, IHD S.Q: surprise question; KPS: Karnofsky performance scale; TUG: timed up and go; ADL: activities of daily living; RA/OA: rheumatoid arthritis/ osteoarthritis; TIA: transient ischemic attack; IHD: ischemic heart diseases; hip -fx: Hip fracture; *C-index is the concordance measure for the cox model over the study time; 193 Discus sion Overall the Cox PH models that were developed in this data for the two outcomes had a good performance in terms of prediction accuracy (AUC at 365=0.74 for mortality and 0.72 for hospice outcome) when using the rule of thumb for interpretation for th e AUC. AUC values can be roughly interpreted as excellent (AUC above 0.80), good (between 0.70 and 0.80), and weak (between 0.50 and 0.70). However, comparing the Cox model to the LR (AUC of 0.76 for mortality and 0.73 for hospice outcome), and RF(AUC of 0 .83 for mortality and 0.70 for hospice outcome) models that were developed in the previous two chapters, its performance was worse than the other two models for the mortality outcome and was comparable for the hospice outcome (Tables 4.9 and 4.15). The RF model outperformed the other two models for mortality, but not for hospice admission. A possible explanation for the poor performance of the RF in the prediction of hospice admission is the fact that missingness on predictors was significantly associated w ith mortality, but not with hospice admission (Table 2.6). In Chapter three, we discussed that the gain in AUC of the RF model for mortality was mainly due to including the incomplete cases. Since there was no association between the missingness and hospic e outcome, including the missing observations, did not increase the accuracy of the RF model for hospice. Cox model in this data did not show any improvement in the prediction accuracy compared to the other two models. One reason might be the fact that the maximum follow -up time in this study was about two years, and the mean was only 1.25 years. So there was not long enough follow -up time to make a difference in the model performance when including the time component in the analysis. The mortality and hosp ice admission are similar outcomes, and so the predictors of them are expected to be similar when developing the models. There were six variables that predict both outcomes in different models (i.e., age, race, SQ, albumin, KPS, hyperlipidemia). However, t he performance of the Cox model in prediction of mortality was a little better than the prediction of hospice admission. This difference can be in part due to the fact that unlike death, hospice admission is not completely a result 194 of patient™s risk level, rather it depends on other factors, including patient and family preferences. For example, a patient who is at high risk for hospice admission (identified by the model) can refuse the hospice admission and therefore die at home or hospital. This scenario results in a false positive case (patients is high risk by the model but was not actually admitted). In fact, all the three approaches (LR, RF, and Cox models) had higher accuracy in prediction of mortality outcome than hospice admission. This fact support s the hypothesis that the nature of hospice admission outcome, and not the model specification itself, is the reason for this poorer performance of the models for hospice admission. The importance of variables in the Cox model was appraised by the value of coefficient parameter estimates and the corresponding hazard ratios in the MV model. Comparing the importance of variables between the Cox model and the other two models, revealed a few predictors that are consistently selected in all approaches. These va riables include age, race, albumin, SQ, and KPS, which were selected in all three approaches and for both outcomes (death, hospice admission). Older age was associated with an increasing rate of adverse outcomes. However, the hazard ratio for the oldest ol d (95+ years) was lower compared to the age group 85 -95 t=years (1.5 vs. 1.8). This paradoxical effect might be due to survival bias (153) , meaning those who survived up to ages >95 years had better health than the other group. Surprisingly, black patients had a lower rate of adverse outcomes compared to white. On average black patients are younger than whites in this cohort (mean 79 vs. 83 years). However, the association of race and outcomes persisted after adjustment for age. There might, therefore, be unobserved characteristics of the population that made the black patients o verall healthier than the whites. These five variables are strongly predicting the adverse outcomes in this population of older adults. Albumin is a surrogate of the patient nutritional status (154) and low albumin is associated with impaired functional status and disability. (155) Low albumin and low cholesterol both have been shown to be associated with an increased rate of death in older adults. (156 Œ159) Different factors can explain the effect of low albumin on the mortality rate. For example, poor nutrition and low albumin 195 concentration can be indicators of an underlying disease or the patient's inability to take care of themselves. Albumin level decreases with increasing age independent of health status. (160) Additi onally, low levels of albumin have been shown to be indicators of inflammation and inadequate nutrition in patients with chronic conditions. (161) As discussed in Chapter two, importance of the answer ‚No™ to SQ in the prediction of mortality has been shown frequently in cancer and chronic kidney diseases (98,99,162) however, its prognostic value in older adults was not well evaluated. KPS is an indicator of the patient's functional status and disability and lower KPS that indicates severe disability had a higher rate of adverse outcomes. KPS score has been often used to determine the prognosis of cancer patients. (97,163,164) Its performance in the prediction of adverse outcomes among the older adults was better or equally well as the ADL and IADL measures. (165) In this cohort of community -living older adults, lower KPS was an essential predictor of the adverse outcomes (Tables 4.10 and 4.16). Other variables such as ADL -decline, and cholesterol, were consistently selected in all three models (Cox, LR, and RF) only for the mortality outcome. Whereas for the hospice outcome, living alone, dual -eligibility and number of lab tests or dered were also important in the prediction of outcome. Interestingly, a history of hyperlipidemia had usually a protective effect against the adverse outcomes. It might partly be due to the known protective effect of lipid -lowering medications, particular ly statins, which increase survival in cardiovascular diseases. (12 Œ14) Many of the chronic conditions in this data had an inverse association with the outcomes, including diabetes, hyperlipidemia, hypertension, depression, cataract, and chronic kidney diseases. In univariate analysis of the CCW variables and the two outcomes (Table 4.2), all of these variable s are significantly associated with a lower hazard of the outcomes. However, in the adjusted analysis, nearly all of these effects became statistically non -significant. 196 Among the 24 CCW comorbidities, often the presence of the comorbidity had a protective effect against the outcomes; in univariate analysis, 13 and nine comorbidities had statistically significant hazard ratios <1.0 (protective effect) for mortality and hospice outcome, respectively. However, almost all of these associations became non -signi ficant after adjustment, except for hyperlipidemia, which was consistently significant and entered in almost all final models of the three approaches and for both outcomes. A possible explanation for the protective effect of hyperlipidemia might be the tre atments that can affect the survival of patients. For example, lipid -lowering medications, especially statins, have been shown to decrease mortality in cardiovascular diseases. (166,167) Their effect in many other conditions and diseases has also been studied. (169 Œ171) Another potential explanation is the inconsistency in the documentation of the comorbidities. For instance, it is expected that the provider cannot complete the documentation of all comorbidities in a very sick patient. Since these CCW variables are recorded as binary (yes/ no) variables in the APRIMA, it is likely that the default value is ‚No' unless otherwise documented. Therefore if for some reason the information was not attainable, the comorbidity is recorded as absent. The same reasoning is valid for explaining the str ong association between the missingness on predictors and the mortality. In this scenario, the EMR for sicker patients with a poorer prognosis is more likely to be incomplete on comorbidities compared to healthier patients with better prognosis. However, the prevalence of most chronic conditions in this population is higher than the US population of age 65 years old. The prevalence of chronic conditions in the US population was evaluated in a study using administrative claims data for a population -based cohort of over 31 million Medicare Fee -for -service beneficiaries. (7) For example preval ence of following conditions in this cohort vs. the general elderly are: hypertension (81% vs. 60%), hyperlipidemia (50% vs. 45%), heart failure (34% vs. 18%), COPD (26% vs. 11%), chronic kidney disease (40% vs. 13%), and cancer (8% vs. 7%). These findings weaken the previous assumption that there is a lack of documentation for chronic conditions in the APRIMA data; however, it still is 197 plausible that for those at highest risk of mortality the documentation of comorbidities is less than optimal. Lastly, it is likely that comorbidities such as cataract that were not lethal can be identified and treated more in older patients who survived because of the good general health. Anyhow it is not possible to confirm any of these potential explanations for some of th e paradoxical associations between CCW comorbidities and outcomes. In the analysis of time from hospice admission to death, the average survival time in hospice was 104 days, and the median was 58 days (Table 4.3). According to the Medicare criteria, a pat ient is eligible for hospice services, if determined to have a terminal illness (defined as having a prognosis of 6 months or less if the disease or illness runs its normal course). (35) In this cohort, 76% of patients who have been admitted to hospice died within the first six months. It indicates that the screening and referral process for the hospi ce-eligible patients accurately identify and refer these patients, so the criteria of life expectancy <6 months is met for 75% of the patients who were admitted to hospice. Mortality after hospice admission in this data was very high soon after admission, i.e., 21% died within seven days of admission, 59% died within three months of their admission, and 24% lived beyond six months of their admission. A large hospice study on Medicare beneficiaries who were enrolled in the hospice program in 5 US states show ed that 15% of patients died in 7 days and 15% lived beyond six months of their enrollment date. The median survival in hospice was 36 days. (172) The higher rate of early death (<7 days) in the USMM cohort compared to the previous study (21% vs. 15%) implies that hospice referral was delayed until the very end of life for about one -fifth of those who were ultimately admitted to hospice. On the oth er hand, the higher rate of long stay (>6 months) in the USMM cohort compared to the previous study (24% vs. 15%) indicates that screening and referral process requires improvement, to avoid the potential for over -use of hospice facilities by the patients for whom the life expectancy was underestimated. 198 o Limitations Patient turnover in the USMM system is high; therefore about 23% of the 9627 patients who met the primary inclusion criteria for this study, were excluded because the total time they were under USMM care was < 1 year. Another limitation of this study was the missing data. Some key variables, such as a decline in IADLs, recent hospitalization, and recent fall were left out from the analysis due to a large number of missing observations. Moreover, some other variables that were included in the analysis had some missing observations. The missingness on those variables was strongly associated with the mortality outcome. On the other hand, in the SAS procedure PHREG (and LOGISTIC) the observations with missing on any predictor are excluded by default from the analysis at the beginning of the model development. It means some valuable information from observations with partly missing data was lost in this analysis. Another limitation of this analysis was that the advanced variable selection methods that were used in the logistic regression model development, such as adaptive lasso and elastic net, are not available options for survival model development. However, I applied the commonly used variable select ion methods, namely stepwise, backward, forward, in addition to a manual selection method. Additionally, there was no evidence of an improvement in the model performance using these advanced variable selection methods in LR models in this data. In this an alysis, the dataset was made by linking the USMM EMR database (APRIMA) to the processed claims data provided by a third party Company named eSolution. The claims data contained information on 62% of the 2015 cohort, which means 7790 patients were not linke d to the claims data and excluded from the analyses. It is not clear whether these patients did not appear in claims data because they did not have any event, or because for some reason, their information was not obtained by eSolution. If the first hypothe sis is correct, the event rates in this cohort, and consequently, the analyses results will change dramatically from what it is now. 199 Lastly, the lack of information about the patients' enrollment in the USMM programs, confined our ability in understanding and interpretation for the paradoxical findings. Answer to questions such as "how and when does a patient enroll in the USMM care?fl, fihow long did they meet the definition of homebound?fl, fiwhat are the motivations for joining the USMM program?fl, and fiwhere did the patients receive care before?fl can help to better understand the models and explain the findings. o Future direction Having the USMM data for a longer follow up time and more complete data can help to improve the prediction model for survival analys is. The maximum follow -up time in this cohort was about two years, with an average of 1.25 years. In this study, separate Cox models were developed for each outcome, but since death and hospice admission can be competing risk, a future analysis accounting for competing risks (173,174) might be useful to assess the joint effect of the two outcomes on survival. Finally, i n this research, only the baseline values of the independent variables were considered. Most of the independent variables did not change over the study period, however, if the data were available for a longer follow -up time, and documentation was improved to reducing missing data, predictors that may change over time especially functional measures (KPS, ADL, TUG), lab tests (albumin, cholesterol), and body weight could be evaluated as time -varying covariates. This trajectory -based analysis (with time -varyin g predictors) can be one of the future analysis when the required data is available. The Cox model developed in this chapter can be later used to develop a prognostic index using the same methods applied by Fried and Carey. (57,58) A prognostic index is generated by assignin g different points to the predictors (based on their Cox regression coefficients) and is easily usable for risk determination in different settings. The RF model was the best among the three models (LR and Cox) for the mortality outcome, although its accu racy for hospice outcome was poor compared with the other two models. Survival tree is a similar concept to the decision tree, only with survival time as the outcome. Additionally, survival random 200 forest is an alternative method to the survival tree. (175) It develops multiple survival tree using randomly selected subsamples of the data. The survival time was estimated by averagi ng over all the survival trees. The software packages for survival tree and survival random forest is available R statistical software and can be used in a future study to evaluate whether using this approach can improve the survival analysis in this cohor t. Conclusion The survival analysis of these data for the two outcomes of mortality and hospice admission did not indicate any essential superiority to the LR or RF model. Despite the inclusion of additional outcome events and taking account of the time t o the event, the Cox model performance measures (C -index and AUC at 365) were worse than the other two models for the mortality outcome and was comparable for the hospice admission outcome. However, the most important predictors of both outcomes in this analysis were consistent with the selected variables in the other two models. Variables age, race, KPS, albumin, and SQ were among the most important predictors in all three approaches. This is to say that collecting data on these variables is essential in t he prediction of mortality and hospice admission among homebound older adults. 201 CHAPTER 5. Conclusion This research aimed to develop, validate, and compare three different prediction models to be used for risk stratification in the USMM patient populati on. The USMM database was used to construct a cohort of community -living homebound older adults for this study. The three objectives of the study were: 4. To develop and validate multivariable logistic models for prediction of 12 -month mortality and hospice admission among the USMM population of community -living homebound older adults. 5. To develop and validate a random forest (RF) algorithm for prediction of 12 -month mortality and hospice admission among the USMM population. The model performance will be eval uated compared to the logistic regression (LR) model from aim 1 and Cox model from aim 3. 6. To develop and validate a multivariable failure time model (Cox proportional hazard) to model time -to-event for mortality and hospice admission separately. These mode ls will also be compared to the logistic regression and random forest models developed in aims 1 and 2. The prediction models developed for the three aims were compared primarily by their discrimination ability. The area under the receiver operating curv e (AUC) and its equivalents for the Cox model were generated for the models. Calibration methods were also applied to evaluate and compare the model's goodness of fit for LR and RF models. Additionally, the specific variables that were selected in the fina l model for each approach were compared to evaluate the importance of individual predictors in the different models. The important aspects of this study include: 1. Using a unique clinical population of community -living homebound older adults 2. Using a rich database that includes a wide range of different types of EMR -based information including demographics, socioeconomic variables, comorbidities, functional status, and 202 laboratory test results, that was linked to claims data to obtain informatio n on outcomes events and utilization 3. Using multiple imputation for missing data and applying the models developed in the available date to the imputed data in addition to developing models in the imputed data 4. Applying different variable selection methods including advanced methods such as adaptive lasso and elastic net to build multivariable models 5. Utilizing a machine learning algorithm -random forest - for model development in order to handle both missing data and account for potential non -linear relation ships in the data 6. Comparing the different models by generating discrimination metrics for all three models Population This cohort of USMM patients is a group of older adults that is different from most of the other comparable study populations that are sum marized in chapter one. (48,49,53,54,56,57) Unlike institutionalized older patients, USMM patients live in the community. However, importantly, they were homebound based on the definition from the CMS. (111) These patients needed to receive health services at home because they were unable to leave home to seek medical services or because, according to a physician's judgment, leaving home would be associated with an unacceptably high le vel of risk them. These characteristics made this cohort different from other populations that are commonly studied in the literature, such as nursing home patients, (48) hospitalized older adults, (46,47) older patients who visit the ER, (176,177) and community -living non -homebound elderly. (53,54) More importantly, the one -year mortality rate in the USMM cohort (32%) was much higher than the mortality rates reported in these other populations (Table 1.3) but was more comparable to the mortality rates reported in nu rsing home populations which range from 17 to 34%. (59 Œ61,178) Therefore because of the uniqueness of the USMM populat ion and the fact that existing RS models are likely not applicable to this population there was a knowledge gap regarding the most appropriate RS models for the USMM 203 patient population. This dissertation aimed to develop alternative risk stratification mod els for this cohort. Data source The USMM database is a rich data source including a wide variety of different variables for each patient. The data was obtained from 2 different sources: USMM electronic medical record named APRIMA and the claims data proc essed by a third party called eSolution. (6) The USMM dataset includes many variables; however, a drawback of such a large dataset is the missing data. There are other problems in data collection (e.g., data on some variables were not coll ected on every visit, rather a previously collected value, was repeated for the next few visits), documentation (information on a single variable were documented in different datasets and therefore there was not a unique source of data for a single variabl e) and storage in each one of the different sources that may cause inaccurate inferences. For example, event rate in this population were calculated based on the reported event from claims data, where the data is not available for about 1/3 of the cohort. Therefore the accuracy of the analysis results which depends to the event rate, cannot be confirmed. One of the main issues affecting the source data in this study is the uncertainty about those patients who were not linked to the claims data. There were 7 790 patients (38%) in the USMM 2015 cohort that did not have any claims data reported, and so these subjects were excluded from all analyses. Each patient in the USMM database has a unique ID number and this ID links the APRIMA and claims databases togethe r. The reason for a patient ID not being found in the claims data is unclear. There are two possible scenarios. The first possibility is that claims data for the 7790 IDs were missed for some reason; for example there was a delay between an event and repor ting it in the claims data or claims data from patients with only private insurance are not reported to Center for Medicare and Medicaid Services (CMS). However, only 15% of the excluded group had commercial insurances so this 204 explanation seems unlikely. I t means that coverage by a private insurance is not likely to be the reason for absence of any claims data for the 7790 IDs. The second possible explanation for the absence of claims data is that the 7790 subjects did not experience any outcome event. If this scenario is correct, then exclusion of these observations from the analysis substantially inflates the event rates in the remainder of the cohort, because all of the excluded subjects were actually event -free (i.e., were alive and were not admitted to hospice) during the study period. There is some evidence that the second scenario is false. There are a substantial number of observations (n=6201, 49%) in the claims data which did not have any outcome event. However because the claims data we received w as processed data, it is possible that there was some other type of claims information other than the events reported in the processed claims data that we received for the 7790 IDs who did not experience death or hospice. If this scenario is true then they should have been retained in the analysis and assumed to be still alive and not in hospice. This uncertainty about the origin of the claims data and the reason for missing claims for 7790 patients remains a major limitation of this dataset. The importance of the missing data Missing data is a persistent problem in biomedical studies. (179,180) In this dataset, a different number of missing observations were found for different variables. Variables with more than 20% missing observations were not included in the model development phase of this study, so some valuable information has certainly been overlooked. Variables such as ‚decline in IADLs', ‚general health reported by patient', ‚fall' and ‚hospitalization' were some of the variables that were excluded. The high missing rates of data likely reflect USMM™s approach to data collec tion and documentation. For example, some variables listed in APRIMA require medical examination and documentation by USMM staff at each home visit, while others are documented only annually at the annual wellness visit. In the latter category are variable s such as ‚IADL decline™ and ‚general health™ which are part of the annual wellness 205 visit. However, other variables such as ‚hospitalization since last visit™ and ‚fall since last visit™ should have been recorded at each medical visit In other words, when a variable is evaluated annually (e.g., change in general? health compared to the prior year) it will be recorded as missing in the routine visits that are conducted every 4 weeks or so between the annual wellness visits. H however missing data on variable s that explicitly indicate an incident events since last visit (i.e., fall, hospital visit) cannot be explained by this same mechanism. Furthermore, there are nine variables that were included in the analysis that had missing rates between 0.4% and 20%. T hese variables were race, surprise question, Timed Up and Go (TUG), living -alone, decline in activities of daily living (ADL), albumin level, cholesterol level, smoking status, and KPS (Table 2.5). In univariate analysis of these variables with the outcome s, where missing values counted as a legitimate category in the analysis, missingness was significantly associated with at least one of the outcomes which suggests that the data is missing not at random (MNAR). Typically, in most SAS statistical procedure s, observations with any missing values are excluded from the analysis by default. (86) Therefore in this study, about one third of patients with pa rtly missing observations were excluded automatically at the beginning of the model development in LR and Cox models. Given the strong association found between the missingness of some predictors and the outcomes (Table 2.5), the exclusion of these data ca n induce bias into the results. In other words, the missingness in this data is informative and ignoring it can potentially undermine the validity of the results. (181) In order to further evaluate the influence of missing observations, a multiple imputation procedure was applied to this data. The assumption was that with inclusion of all observations the risk of bias due to missing data is red uced. Also, with increased number of observations that are included in the analysis, the model does not lose its power and precision. (88,181) Surprisingly, using the imputed data did not improve the prediction model performance in this data. With the LR model, the model developed in the 206 imputed data did not have a better discrimination than the original model developed in the available data. Application of the Cox model to the imputed data is not straightforward and the results from the imputed sets cannot be summarized with a single measure of performance or discrimination. (182) Therefore the Cox model was not applied in the imputed data. In the RF model, missing data are not excluded from the analysis. As explained in chapter 3, the random forest procedure let s the missing values be included in the analysis as a legitimate category. The performance of random forest model was remarkably better than the LR model for the mortality outcome. The RF model that was developed in available data, was also applied to the imputed data; the discriminative performance of the model in the imputed data was similar to the LR model and was notably worse than the RF in the available data which means that imputation for missing values in this data cannot capture the information tha t missing values represent. The reason is that the basic assumption of the multiple imputation method used in SAS was that the data were MAR (missing at random), whereas the associations between the missingness of independent variables and the outcomes sug gested that MNAR is the more likely explanation of the mechanism of missing data. To conclude, missing data in this study was an important predictor of the outcomes. In the RF model, exclusion of the missing observations (i.e. limiting the cohort to the s ame members as in LR model) or imputation of the missing values (applying the RF model to the imputed data) diminishes the performance of the model equally. These results are therefore supportive of the hypothesis that missing data in this dataset is missi ng not at random (MNAR). The multiple imputation procedure uses the assumption of missing at random (MAR) for imputation, (86,88) which is probably the main reason why the model performance was w orse in the imputed data than the available data. The better performance of the RF model for mortality outcome when including missing observations, suggests the clear advantage of the RF when data are MNAR. 207 Using multiple imputation method in management of missing data Missing data in this analysis resulted in the exclusion of almost one -third of the observations from the LR and Cox analyses. Multiple imputation (MI) is a commonly used method in dealing with missing data. (181) The SAS procedure PROC MI builds a specified number of imputed datasets; PROC MIANALYZE applies a specified model to each data and summarizes the results from all imput ations to generate measures of interest such as regression coefficients and effect size (e.g., odds ratio or relative risk). The MIANALYZE procedure reads and combines the coefficients and standard errors that were generated from the model in each imputed set. These statistics are stored in tables and covariance matrices produced by the regression model in each imputation. There are two sources of variance when multiple imputation is used: ‚within imputation variance™ which is the result of the variation be tween observations in each imputed dataset, and ‚between imputation variance™ which is the result of the variation in the data between the different imputed dataset. Using the between and within covariance matrices, PROC MIANALYZE derives valid multivariab le inferences based on Wald tests. (183) The MIANALYZE procedure does not support AUC option or its equivalent, so to summarize the AUCs from the imputed datasets in the LR model we applied a manual m ethod that was described in chapter two. The method involved taking the average of all predicted probabilities from the 20 imputed datasets for each patient and then generating an estimate of the AUC from these average probabilities. This method cannot be applied to the Cox model results, because generating the AUC from the averaged survival is not an option in PROC PHREG. Lastly, as discussed above, the underlying assumption of missing at random for the multiple imputation procedure was not satisfied in th is data. The exact mechanism of missing cannot be identified in this data. Although we can reject that the data missing is completely at random (MCAR), it is not possible to distinguish between missing at random (MAR) and missing not at random (MNAR). Howe ver, as evidenced by the significant association of missingness on predictors and the mortality outcome, MNAR is likely the primary mechanism of missing in this data. 208 Thus multiple imputation may not be an appropriate method to handle the missing data if i t is MNAR. A sensitivity analysis can test the appropriateness of the MI procedure in this data by using a pattern -mixture model approach which models the distribution of a response as the mixture of a distribution of the observed responses and a distribut ion of the missing responses. (86) As a conclusion, the multiple imputation procedure used in this analysis did not improve the model performance com pared to the model based on only available data. The most likely explanation is that that MI uses the assumption of MAR, whereas evidences suggests that MNAR is the mechanism of missing in the USMM dataset. Variable selection methods In this analysis diff erent variable selection methods including stepwise, backward, and forward selection were applied to develop the LR and Cox models. In LR models, the more advanced methods of adaptive lasso and elastic net were also used in variable selection. Although SAS does not support these methods in the logistic procedure, these selection methods are supported in GLMSELECT procedure and were used to select the variables for the LR model. However, using these variable selection methods did not improve the performance of the models in this data. As explained in chapter two, adaptive lasso and elastic net are useful methods in big data analysis where the number of predictors is very large and the number of observations is relatively small (high dimensional data) such as genetic analysis data. (102,184) It was not the case in this study where the number of observations was almost 200 times the number of predictors. Using r andom forest method Use of machine learning (ML) algorithms has been increasing in many disciplines including biomedical research. (65,122) Some studies have found that ML -based analyses outperformed the traditional methods in finding risk predictors and improving the predictive model accuracy. (185) However accuracy of any predictive models, ML -based or not, depends on the quality of the data. Thus common problems 209 with the EMR data (such as missing data, time liness of the available data, and poor quality data) affect the ML -based methods same as other traditional methods. (186) Random forest is the machine learning algorithm that has been used in this dissertation. Two key advantages of r andom forest are its ability to include incomplete (partly missing) observations and to capture non -linear relationships and complex interactions. (69) Using random forest to develop a prediction model in this data let the missing values to be included as legitimate values in the analyses. I n other words, all observations can contribute in model development without the need for imputation of missing values. The random forest resulted in substantially improved discrimination compared to the LR model for the mortality outcome. Although the prim ary reason of using the RF model in this data was to explore and capture any non -linear relationships (higher degree) and complex interactions in the data, the improvement in the RF model performance for mortality outcome was mainly due to the inclusion of missing data; we concluded this because when the RF model was applied to the subjects with no missing (that were analyzed in the LR model), the model™s AUC was very similar to the LR model AUC. Also when the RF was applied to the imputed data, the AUC was again similar to the LR model. Therefore RF improved the discrimination only when missing observations are included as missing. Additionally, when the missing values were recoded as a legitimate category and included in the LR model, the LR model perform ance was comparable (and slightly better) than the random forest model. It confirms again that the gain in the AUC of RF is almost completely due to inclusion of missing data. In contrast to the mortality outcome, the RF performance in the analysis for hos pice outcome was not notably different from the LR model. It can be concluded that missingness on the predictors is not associated with hospice admission as it has been presented in Table 2.5. Missingness was itself a predictor of the mortality and the pos sibility of MNAR mechanism for missing data was reinforced again. For example when a patients is very sick and at the end of life, it is more likely that physicians or other health professionals do not evaluate all of the predictors and complete the EMR. 210 The conclusion from the comparison of the RF and LR models was that RF model had substantially improved AUC for the mortality outcome compared to the LR model. The main advantage of RF model in this data was due to the inclusion of missing observations. Th e same AUC gain was also observed when the missing values were recoded and included in the LR model. Important predictors of mortality and hospice The importance of variables in the prediction of outcomes were evaluated by the magnitude of their effect in LR and Cox model (adjusted odds ratio and adjusted hazard ratio, respectively). Unlike LR and Cox models, RF does not provide coefficients for the predictors, rather it generates a table for ranked importance of the variables. A few variables were among t he most important variables in all three approaches and for both outcomes. These variables were age, race, SQ, albumin, KPS, and hyperlipidemia. ADL -decline and cholesterol level were also selected in multiple models. Older age and male sex are associated with higher rate of both outcomes. African American patients in our study had lower risk of mortality (adjusted OR=0.59, 95% CL=0.42 Œ 0.83) and hospice admission (adjusted OR=0.65, 95% CL=0.43 Œ 1.0) than whites. A study that evaluated the racial differe nce in mortality among Medicare beneficiaries demonstrated a substantially higher mortality among Black older adults. (187) The lower mortality rate in African American compared to the whites in this study could have been explained by the age difference between the two race groups, black patients were younger than the whites (mean 79 vs 83 years). However, the association of race and mortality persisted after adjustment for age. There might be therefore unobserved factors (such as socio economic status, or education) that caused the black patient in this cohort less susceptible to death a nd hospice admission. As expected, a ‚No™ answer to the surprise question was also strongly related to both outcomes. Validity of the answer No to SQ in prediction of mortality has been shown frequently in cancer and chronic kidney diseases (98,99,162) however, its prognostic value in older adults was not well evaluated. A study 211 in Spain showed the value of SQ as a screening tool to identify older patients who may require palliative care. (188) Albumin has been used as a surrogate of the p atient nutritional status. (154) Low albumin is associated with inflammation (161) , impaired functional status and disability. (155) Low albumin and low cholesterol both have been shown to be associated with an increased rate of death in older adults. (156 Œ160) Different factors can explain the effect of low albumin on the mortality rate. For example poor nutrition and low albumin concentration can be indicators of an inflammato ry status, an underlying disease, or patient™s inability to take care of themselves. Albumin level decreases with increasing age independent of health status. (160) In our study, lower level of albumin and cholesterol were associated with higher ri sk of death and hospice admission. Albumin is consistently was among the most important variables in different models for both outcomes (Tables 4.8 and 4.14). KPS is an indicator of patient functional status and disability. Lower values of KPS indicate mo re severe disability and is associated with higher rate of adverse outcomes. KPS score has been often used to determine the prognosis of cancer patients, (97,163,164) and is used as part of hospice eligibility criteria in some diseases such as cancer. (189) Its pe rformance in prediction of adverse outcomes among a population of elderly veterans (who were referred to geriatric care clinic) was better or equally good as the use of ADL and IADL measures. (165) In the USMM cohort of community -living older adults, lower KPS was an essential predictor of adverse outcomes (Tables 4.8 and 4.14). Routine documentation of KPS is valuable approach for health care programs in old er adults. With respect to changes in ADL compared to the prior assessment, 66% of the USMM had no -change whereas 14% declined and 4% improved (data were missing in 16%). Improvement in ADL compared to ‚no -change™ was, as expected, associated with lower risk of death. However, unexpectedly, a decline in ADLs also had slightly lower risk compared to ‚no -change™; the latter association was not statistically significant. We do not have an explanati on for this paradoxical finding. 212 Timed up and go (TUG) variable had 20% missing values but was included in the model development. This variable was the first ranked important variable in RF analysis for the mortality and the 9 th for the hospice outcome. As mentioned before, in LR and Cox models, observations w ith any missing value are left out from the analysis by default, whereas in RF the partly -missing observations are also included. One can conclude that missing on TUG is again an important predictor of the outcomes in this data. To measure TUG, the patient needs to understand the test and have motivation and ability to do the test. It is very likely that the doctor or other health professionals who visited the patients overlook testing TUG for terminally ill patients, bed -bound patients, or when patient™s s afety can be a concern. Consequently, missing on TUG will be a strong predictor of mortality or hospice admission. In this dataset, 24 variables representing the CCW comorbidities were evaluated as the predictors of the outcomes. Interestingly, often the presence of the comorbidity had a protective effect against the outcomes. However, almost all of these associations became non -significant after adjustment, except for hyperlipidemia which remained consistently significantly associated with better outcome s (and was included in most of the final models). As discussed in chapter four, the reason for this protective effect of hyperlipidemia may be in part due to the treatments that can affect the survival of patients. For example, lipid -lowering medications, especially statins, have been shown to decrease mortality in cardiovascular diseases. (166,167) The protective effect of statins have been reported in many other conditions and diseases. (169 Œ171) Another potential explanation is the inconsistency in the documentation of the comorbiditie s. For instance, it is expected that the provider does not complete the documentation of all comorbidities in a very sick patient. (190) Since these CCW variables are recorded as binary (yes/no) variables in the APRIMA, it is likely that the default value is ‚No™ unless otherwise documented. In this scenario, the EMR for sicker patients with a poorer prognosis, are more likely to be incomplete on comorbidities than the 213 healthier patients with better prognosis. However, the prevalence of most chronic conditions in this population is higher than the US population of age 65 years old. The prevalence of chronic conditions in the US population was evaluated in a study using administrative claims data for a population -based cohort of over 31 mil lion Medicare Fee -for -service beneficiaries. (7) For example prevalence of following con ditions in this cohort vs. the general elderly are: hypertension (81% vs. 60%), hyperlipidemia (50% vs. 45%), heart failure (34% vs. 18%), TIA/stroke (11% vs. 5%), diabetes (34% vs. 27%), atrial fibrillation (17% vs. 9%), COPD (26% vs. 11%), chronic kidney disease ( 40% vs. 13%), and cancer (8% vs. 7%). On the other hand, the comorbidity rates for some variables are lower in the USMM cohort: ischemic heart diseases (17% vs. 35%), osteoporosis (11% vs. 14%), and Alzheimer™s disease (0% in this cohort vs. 13 % in the US population). These results weaken the previous assumption that there is a lack of documentation for chronic conditions in the APRIMA data, however it still is plausible that for those at highest risk of mortality the documentation of comorbidit ies is less than optimal. Unfortunately it is not possible to confirm any of these potential explanations for some of the paradoxical associations between CCW comorbidities and outcomes. Difference between the two o utcomes - The two outcomes of interest, d eath, and hospice admission are clearly related variables. According to the Medicare criteria, a patient is eligible for hospice services, if determined to have a terminal illness (defined as having a prognosis of 6 months or less if the disease or illness runs its normal course). (35) Therefore the models for the two outcomes are expected to be simil ar in terms of selected predictors and the performance of the models. However, in this data using the same set of potential predictors, the model performance for the two outcomes was different in terms of the AUC and selected variables in the final models. Mortality after hospice admission in this data was very high soon after admission i.e., 21% died within seven days of admission, 59% died within three months of their admission and 25% lived beyond 6 months of their admission. Median survival after hospi ce admission was 58 days. A large hospice study 214 of Medicare beneficiaries who were enrolled in hospice program in 5 US states showed that 15% of patients died in 7 days and 15% lived beyond 6 months of their enrollment date. The median survival in hospice was 36 days. (17 2) In the USMM cohort, early death ( 7 days) occurred in higher proportion of the patients than the previous study which implies the patients were referred to hospice late. However, it is noticeable that unlike death, hospice admission is dependent on factors other than the patient™s clinical condition. For instance, unobserved variables such as patient or family preferences can influence the admission and its timing. So it is probable that some caregivers preferred to take care of the patient at home u ntil the very end of life. On the other hand, racial and ethnic disparity in end -of -life care has been shown in literature. Black patients are more likely to receive higher intensity (e.g., intensive care unit) and higher cost care (frequent hospitalizatio ns and ER visits) instead of hospice enrollment at the end of life. (191 Œ193) Therefore the late admission to hosp ice does not necessarily mean that the original hospice referral by USMM providers was late. On the other hand, about 25% of the hospice admitted patients lived beyond six months compared to the 15% in the national study which implies that screening and re ferral process require improvement, to avoid the potential for over -use of hospice facilities by the patients for whom the life expectancy was underestimated. Limitations This study had limitations. The details of limitations was provided in each chapters 2-4, here is a summary of the limitations of this dissertation. 1. Although the USMM database is a rich data set with a wide range of information collected, but missing data is a serious problem. There are potential predictors that are not included in the analysis because of the high missing rate: decline in IADL function since the last visit, a decline in global health since last year, falls, hospitalizations and ER events. 2. We used the independent variables data that were collected at baseline (first visit in the USMM system) because the change of variables over time were not well documented. 215 3. Another limitation of this analysis is the assumption about the mechanism of missing data. The Multiple Imputation procedure has a basic assumption of missing at random . We used multiple -imputation in this data although there were evidences that the mechanism of missing is this data is MNAR (missing not at random). 4. There were two comorbidity variables excluded from the analysis because the number of patients with the co morbidity was too small or zero. 5. To evaluate the accuracy of the model, we used the validation data which is originated from the same database as derivation cohort. Using an external validation data was useful to confirm external validity of the model. Future direction The models developed in this study were validated using the data from the same origin as the derivation data. To assess the external validity of the models, future application of the model to cohorts of community -living homebound older adu lts is needed. It is interesting to also evaluate the validity of the model among older adults who are not homebound. Inclusion of variables which indicates functional status in all possible ranges from normal to severely impaired (i.e., KPS, TUG, and ADL) allow to use these models in prediction of outcomes in general older population. In the RF model I used a single machine learning algorithm to predict the mortality outcome and it resulted in a remarkably improved discrimination ability of the model. Rese archers commonly use an ensemble of different machine learning algorithms to obtain a better model. (74) Future studies can incorporate different machine learning algorithms to attain a prediction model with higher discrimination. In this study, separate Cox models were developed for e ach outcome, but since death and hospice admission can be competing risk, a future analysis accounting for competing risks (173,174) might be useful to assess the joint effect of the two outcomes on survival. Finally in this research only the baseline values of the independent 216 variables were considered. Most of the independent variables did not change over the study per iod, however if the data were available for a longer follow up time, and documentation was improved to reducing missing data, predictors that may change over time especially functional measures (KPS, ADL, TUG), lab tests (albumin, cholesterol), and body we ight could be evaluated as time -varying covariates. This trajectory based analysis (with time -varying predictors) can be one of the future analysis when the required data is available. Learning about the quality of the USMM data was an important result of this dissertation. The quality of the USMM database needs substantial improvement; development of a protocol that regulate the data collection process and significantly improve the quality of database. Potential i mplementation of new RS approach for USMM Our models can be programmed and integrated into the electronic medical databases to stratify patients and provide them with targeted care. These developed model can be received by the USMM computer programmers using different statistical software that supp ort our models (e.g., SAS and R support all three models: logistic, random forest, and Cox PH). Then the programmer can code the model into the data system in order to run the model on all observations at the time of each new data entry. Logistic and Cox r egression models can be programmed into the USMM database by knowing the regression coefficients for each predictor, however for RF model, a statistical software is required. The fact that in ML -based algorithms, the user cannot directly see how exactly th e predictions are generated, remains as a limitation in utilizing ML -based algorithms. (185) Ultimate ly a predicted probability for each patient is calculated from the model, and then a risk level will be assigned to them based on their probability of death (or hospice admission). The high -risk patients would be flagged and brought to the attention of the provider team for appropriate and timely intervention. The intervention can include a range of services such as a change in medications, nutritional support, additional home visit, hospice referral, or offering palliative care and advanced care planning. Lower risk patients can be targeted for other levels of services according to the USMM policies and care plans. For example, 217 preventive services such as providing medical equipment to reduce the risk of fall, screening tests, more intensive treatment regim en for prevention of complications of diabetes, and rehabilitation referral are services that may be offered to the patients with estimated long survival and low risk of adverse outcomes. As the conceptual framework from American Geriatrics Society Guiding Principles indicates (Table 1.2), estimation of life expectancy and health trajectory is a part of the suggested care for older adults with multiple comorbidity. It is important that all decisions and care options must be aligned with patient™s priorities and health trajectory; and must be communicated with patient, caregiver, and other clinicians. Finally, the results of this research can be used by the USMM to improve the quality of the database in terms of dat a collection and documentation. Conclusion As a conclusion, the different statistical approaches for the development of a prediction model in this data resulted in similar model discrimination, except for the random forest model for the mortality outcome which had remarkably better discrimination th an other models. A few variables such as SQ, KPS, and albumin were consistently associated with both outcomes. We think that these variables should be considered by researchers who are working on prognostic indices for older populations. SQ and KPS are sim ple but valuable pieces of information that can be quickly evaluated and documented by physicians or other health providers. 218 BIBLIOGRAPHY 219 BIBLIOGRAPHY 1. Sonnega A, Robinson K, Levy H. Home and community -based service and other senior service use: Prevalence and characteristics in a national sample. Home Health Care Serv Q . 2017;36(1):16 Œ28. 2. Siegler EL, Lama SD, Knight MG, et al. Community -Based Suppor ts and Services for Older Adults: A Primer for Clinicians. J Geriatr [electronic article]. 2015. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339950/). (Accessed October 29, 2019) 3. Program of All -Inclusive Care for the Elderly. Centers for Medicare an d Medicaid Services . (https://www.medicaid.gov/medicaid/ltss/pace/index.html). (Accessed November 5, 2019) 4. Anderson KA, Dabelko -Schoeny HI, Fields NL. Home - and Community -Based Services for Older Adults: Aging in Context. Columbia University Press; 201 8. 5. Medicare Bulletin - January 2019. (https://www.cgsmedicare.com/hhh/pubs/mb_hhh/2019/j15_hhh_01 -19.pdf). (Accessed July 25, 2019) 6. eSolutions . (https://www.esolutionsinc .com/). (Accessed July 14, 2019) 7. Salive ME. Multimorbidity in Older Adults. Epidemiol Rev . 2013;35(1):75 Œ83. 8. White N, Kupeli N, Vickerstaff V, et al. How accurate is the ‚Surprise Question™ at identifying patients at the end of life? A systematic review and meta -analysis. BMC Medicine . 2017;15(1):139. 9. Werner CA. The Older Population: 2010. 2010;(https://digitalcommons.unomaha.edu/cparpublications/60). (Accessed May 2, 2019) 10. Public Health and Aging: Trends in Aging --- United States and Worldwide. (https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5206a2.htm). (Accessed May 2, 2019) 11. Barnett K, Mercer SW, Norbury M, et al. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross -sectional stu dy. The Lancet . 2012;380(9836):37 Œ43. 12. Guralnik JM. Assessing the impact of comorbidity in the older population. Annals of Epidemiology . 1996;6(5):376 Œ380. 13. Alemayehu B, Warner KE. The Lifetime Distribution of Health Care Costs. Health Services Research . 2004;39(3):627 Œ642. 14. de Meijer C, Wouterse B, Polder J, et al. The effect of population aging on health expenditure growth: a critical review. Eur J Ageing . 2013;10(4):353 Œ361. 220 15. Payne G, Laporte A, Deber R, et al. Counting Backward to He alth Care™s Future: Using Time -to-Death Modeling to Identify Changes in End -of -Life Morbidity and the Impact of Aging on Health Care Expenditures. The Milbank Quarterly . 2007;85(2):213 Œ257. 16. Gage Brian F., van Walraven Carl, Pearce Lesly, et al. Selec ting Patients With Atrial Fibrillation for Anticoagulation. Circulation . 2004;110(16):2287 Œ2292. 17. Hustey FM, Mion LC, Connor JT, et al. A Brief Risk Stratification Tool to Predict Functional Decline in Older Adults Discharged from Emergency Department s. Journal of the American Geriatrics Society . 2007;55(8):1269 Œ1274. 18. Martin TP, Hanusa BH, Kapoor WN. Risk Stratification of Patients With Syncope. Annals of Emergency Medicine . 1997;29(4):459 Œ466. 19. Meldon SW, Mion LC, Palmer RM, et al. A Brief Risk -stratification Tool to Predict Repeat Emergency Department Visits and Hospitalizationsin Older Patients Discharged from the Emergency Department. Academic Emergency Medicine . 2003;10(3):224 Œ232. 20. Levy D, Wilson PWF, Anderson KM, et al. Stratifyin g the patient at risk from coronary disease: New insights from the framingham heart study. American Heart Journal . 1990;119(3, Part 2):712 Œ717. 21. Sanchis J, Bonanad C, Ruiz V, et al. Frailty and other geriatric conditions for risk stratification of old er patients with acute coronary syndrome. American Heart Journal . 2014;168(5):784 -791.e2. 22. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer Science & Business Media; 2008 508 p. 23. ePrognosis. (https://eprognosis.ucsf.edu/). (Accessed October 19, 2019) 24. Huang ES, Zhang Q, Gandra N, et al. The effect of comorbid illness and functional status on the expected benefits of intensive glucose control in older patients with type 2 diabetes : a decision analysis. Ann Intern Med . 2008;149(1):11 Œ19. 25. Mor V, Pacala JT, Rakowski W. Mammography for older women: Who uses, who benefits? Journal of Gerontology . 1992;47(Spec Issue):43 Œ49. 26. Schonberg MA, McCarthy EP, Davis RB, et al. Breast C ancer Screening in Women Aged 80 and Older: Results from a National Survey. Journal of the American Geriatrics Society . 2004;52(10):1688 Œ1695. 27. Meissner HI, Tiro JA, Haggstrom D, et al. Does Patient Health and Hysterectomy Status Influence Cervical Ca ncer Screening in Older Women? J GEN INTERN MED . 2008;23(11):1822. 28. Wee CC, McCarthy EP, Phillips RS. Factors associated with colon cancer screening: the role of patient factors and physician counseling. Preventive Medicine . 2005;41(1):23 Œ29. 29. Go ldberg TH, Chavin SI. Preventive Medicine and Screening in Older Adults. Journal of the American Geriatrics Society . 1997;45(3):344 Œ354. 221 30. Harrold J, Rickerson E, Carroll JT, et al. Is the Palliative Performance Scale a Useful Predictor of Mortality in a Heterogeneous Hospice Population? Journal of Palliative Medicine . 2005;8(3):503 Œ509. 31. Lau F, Downing GM, Lesperance M, et al. Use of Palliative Performance Scale in End -of -Life Prognostication. Journal of Palliative Medicine . 2006;9(5):1066 Œ1075. 32. Glare P, Eychmueller S, Virik K. The use of the palliative prognostic score in patients with diagnoses other than cancer. Journal of Pain and Symptom Management . 2003;26(4):883 Œ885. 33. Arenella C. The Importance of Risk Stratification for Referrals to Palliative Care Programs. National Hospice and Palliative Care Organization . 2016; 34. Hospice_Card__JSR_SSR_JMH_20.pdf. (https://cdn.ymaws.com/www.nmnpc.org/resource/resmgr/2018_annual_conf -_presentations -handouts/6_johnson/Hospice_Card__JSR_SSR_JMH_ 20.pdf). (Accessed October 22, 2019) 35. Casarett DJ. Rethinking Hospice Eligibility Criteria. JAMA . 2011;305(10):1031 Œ1032. 36. PRIME Registry . (https://primeregistry.o rg/). (Accessed October 21, 2019) 37. Risk_Stratification_Care_QuickStart_Guide.pdf. (https://primeregistry.org/wp -content/uploads/2019/08/Risk_Stratification_Care_QuickStart_Guide.pdf). (Accessed October 21, 2019) 38. Steenkamer BM, Drewes HW, Heijink R, et al. Defining Population Health Management: A Scoping Review of the Literature. Population Health Management . 2016;20(1):74 Œ85. 39. Sprague L. Disease Management to Population -Based Health: Steps in the Right Direct ion? NHPF Issue Brief . 2003;(791):16. 40. Action -Guide_Pop -Health_Models -of -Care -Sept -2017.pdf. (http://www.nachc.org/wp -content/uploads/2017/09/Action -Guide_Pop -Health_Models -of -Care -Sept -2017.pdf). (Accessed October 21, 2019) 41. Lavery LA, Armstrong DG, Wunderlich RP, et al. Predictive Value of Foot Pressure Assessment as Part of a Population -Based Diabetes Disease Management Program. Diabetes Care . 2003;26(4):1069 Œ1073. 42. Haas LR, Takahashi PY, Shah ND, et al. Risk -stratification methods for iden tifying patients for care coordination. Am J Manag Care . 2013;19(9):725 Œ732. 43. Tkatch R, Musich S, MacLeod S, et al. Population Health Management for Older Adults. Gerontol Geriatr Med [electronic article]. 2016;2. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5486489/). (Accessed October 21, 2019) 44. Guiding Principles for the Care of Older Adults with Multimorbidity: An Approach for Clinicians. Journal of the American Geriatrics Soc iety . 2012;60(10):E1 ŒE25. 222 45. Boyd C, Smith CD, Masoudi FA, et al. Decision Making for Older Adults With Multiple Chronic Conditions: Executive Summary for the American Geriatrics Society Guiding Principles on the Care of Older Adults With Multimorbidity . Journal of the American Geriatrics Society . 2019;67(4):665 Œ673. 46. Inouye SK, Peduzzi PN, Robison JT, et al. Importance of Functional Measures in Predicting Mortality Among Older Hospitalized Patients. JAMA . 1998;279(15):1187 Œ1193. 47. Pilotto A, Fe rrucci L, Franceschi M, et al. Development and Validation of a Multidimensional Prognostic Index for One -Year Mortality from Comprehensive Geriatric Assessment in Hospitalized Older Patients. Rejuvenation Research . 2008;11(1):151 Œ161. 48. Yourman LC, Lee SJ, Schonberg MA, et al. Prognostic Indices for Older Adults: A Systematic Review. JAMA . 2012;307(2):182 Œ192. 49. Carey EC, Walter LC, Lindquist K, et al. Development and Validation of a Functional Morbidity Index to Predict Mortality in Community -dwell ing Elders. Journal of General Internal Medicine . 2004;19(10):1027 Œ1033. 50. Cappola AR, Fried LP, Arnold AM, et al. Thyroid Status, Cardiovascular Risk, and Mortality in Older Adults. JAMA . 2006;295(9):1033 Œ1041. 51. Studenski S, Perera S, Patel K, et al. Gait Speed and Survival in Older Adults. JAMA . 2011;305(1):50 Œ58. 52. Gagne JJ, Glynn RJ, Avorn J, et al. A combined comorbidity score predicted mortality in elderly patients better than existing scores. Journal of Clinical Epidemiology . 2011;64(7): 749 Œ759. 53. Han PKJ, Lee M, Reeve BB, et al. Development of a Prognostic Model for Six -Month Mortality in Older Adults With Declining Health. Journal of Pain and Symptom Management . 2012;43(3):527 Œ539. 54. Lee SJ, Lindquist K, Segal MR, et al. Develop ment and Validation of a Prognostic Index for 4 -Year Mortality in Older Adults. JAMA . 2006;295(7):801 Œ808. 55. Schonberg MA, Davis RB, McCarthy EP, et al. Index to Predict 5 -Year Mortality of Community -Dwelling Adults Aged 65 and Older Using Data from th e National Health Interview Survey. J GEN INTERN MED . 2009;24(10):1115. 56. Fischer SM, Gozansky WS, Sauaia A, et al. A Practical Tool to Identify Patients Who May Benefit from a Palliative Approach: The CARING Criteria. Journal of Pain and Symptom Manag ement . 2006;31(4):285 Œ292. 57. Carey EC, Covinsky KE, Lui L -Y, et al. Prediction of Mortality in Community -Living Frail Elderly People with Long -Term Care Needs. Journal of the American Geriatrics Society . 2008;56(1):68 Œ75. 58. Fried LP, Kronmal RA, Ne wman AB, et al. Risk Factors for 5 -Year Mortality in Older Adults: The Cardiovascular Health Study. JAMA . 1998;279(8):585 Œ592. 223 59. Tabue -Teguo M, Kelaiditi E, Demougeot L, et al. Frailty Index and Mortality in Nursing Home Residents in France: Results Fr om the INCUR Study. Journal of the American Medical Directors Association . 2015;16(7):603 Œ606. 60. Li S, Middleton A, Ottenbacher KJ, et al. Trajectories Over the First Year of Long -Term Care Nursing Home Residence. Journal of the American Medical Direct ors Association . 2018;19(4):333 Œ341. 61. study over three years. PLoS One [electronic article]. 2018;13(9). (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC614 3238/). (Accessed October 22, 2019) 62. Flacker JM, Kiely DK. Mortality -Related Factors and 1 -Year Survival in Nursing Home Residents. Journal of the American Geriatrics Society . 2003;51(2):213 Œ221. 63. Eng C, Pedulla J, Eleazer GP, et al. Program of Al l-inclusive Care for the Elderly (PACE): An Innovative Model of Integrated Geriatric Care and Financing. Journal of the American Geriatrics Society . 1997;45(2):223 Œ232. 64. Mazzaglia G, Roti L, Corsini G, et al. Screening of Older Community -Dwelling Peop le at Risk for Death and Hospitalization: The Assistenza Socio -Sanitaria in Italia Project. Journal of the American Geriatrics Society . 2007;55(12):1955 Œ1960. 65. Koohy H. The rise and fall of machine learning methods in biomedical research. F1000Res [el ectronic article]. 2018;6. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5760972/). (Accessed May 28, 2019) 66. Kording KP, Benjamin AS, Farhoodi R, et al. The Roles of Machine Learning in Biomedical Science. National Academies Press (US); 2018 (Accessed April 18, 2019).(https://www.ncbi.nlm.nih.gov/books/NBK481619/). (Accessed April 18, 2019) 67. Luo W, Phung D, Tran T, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res [electronic article]. 2016;18(12). (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5238707/). (Accessed February 28, 2019) 68. Hastie T, Tibshirani R, Friedman J. Random Forests. In: The Elements of Statistical Learning . New York, NY: Sprin ger New York; 2009 (Accessed February 28, 2019):1 Œ18.(http://www.springerlink.com/index/10.1007/b94608_15). (Accessed February 28, 2019) 69. Breiman L. Random Forests. Machine Learning . 2001;45(1):5 Œ32. 70. Maniruzzaman Md, Rahman MdJ, Al -MehediHasan Md , et al. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. J Med Syst . 2018;42(5):92. 71. Xu W, Zhang J, Zhang Q, et al. Risk prediction of type II diabetes based on random forest model. In: 2017 Third Inte rnational Conference on Advances in Electrical, Electronics, Information, Communication and Bio -Informatics (AEEICB) . 2017:382 Œ386. 224 72. Ion Titapiccolo J, Ferrario M, Cerutti S, et al. Artificial intelligence models to stratify cardiovascular risk in inci dent hemodialysis patients. Expert Systems with Applications . 2013;40(11):4679 Œ4686. 73. Chen Y, Cao W, Gao X, et al. Predicting postoperative complications of head and neck squamous cell carcinoma in elderly patients using random forest algorithm model. BMC Medical Informatics and Decision Making . 2015;15(1):44. 74. Rose S. Mortality Risk Score Prediction in an Elderly Population Using Machine Learning. Am J Epidemiol . 2013;177(5):443 Œ452. 75. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak . 2011;11:51. 76. Chong S -L, Liu N, Barbier S, et al. Predictive modeling in pediatric traumatic brain injury using machine learning. BMC Med Res Methodol [electronic article] . 2015;15. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374377/). (Accessed March 31, 2019) 77. Weng SF, Reps J, Kai J, et al. Can machine -learning improve cardiovascular risk prediction using routine clinical data? PLOS ONE . 2017;12(4):e0174944. 78. Kattan MW. Comparison of Cox Regression With Other Methods for Determining Prediction Models and Nomograms. The Journal of Urology . 2003;170(6, Supplement):S6 ŒS10. 79. Horvath J, Berenson R. Developing the Right Approaches to Chronic Care in Medicare. Medicare Policy Brief . 2004;3. 80. Wagner EH, Austin BT, Von Korff M. Organizing Care for Patients with Chronic Illness. The Milbank Quarterly . 1996;74(4):511 Œ544. 81. Ellrodt G, Cook DJ, Lee J, et al. Evidence -Based Disease Management. JAMA . 1997;278(20 ):1687 Œ1692. 82. DeSalvo KB, Fan VS, McDonell MB, et al. Predicting Mortality and Healthcare Utilization with a Single Question. Health Serv Res . 2005;40(4):1234 Œ1246. 83. Robert A. Cohen. SAS Global Forum 2009 Statistics and Data Analysis. 84. Cohen RA. Introducing the GLMSELECT procedure for model selection. In: Proceedings of the Thirty -First Annual SAS Users Group International Conference . SAS Institute Inc.; 2006:Paper 207. 85. Lund B. Logistic Model Selection with SAS® PROC™s LOGISTIC, HPLOGIST IC,. MWSUG 2017 - Paper AA02 . :18. 86. SAS® Help Center: PROC MI Statement. (https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mi_syntax01.htm&doc setVersion=14.3&locale=en). (Accessed February 24, 2019) 225 87. Perkins NJ, Cole SR, Harel O, et al. Principled Approaches to Missing Data in Epidemiologic Studies. Am J Epidemiol . 2018;187(3):568 Œ575. 88. Wood AM, White IR, Royston P. How should variable selection be performed with multiply imputed data? Statistics in Medicine . 2008;27(17):3227 Œ3246. 89. Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its Associated Cutoff Point. Biometrical Journal . 2005;47(4):458 Œ472. 90. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology . 2010;21(1):128 Œ138. 91. Chronic Conditions Data Warehouse. (https://www2.ccwdata.org/web/guest/home/). (Accessed November 12, 2019) 92. Lunney JR, Lynn J, Foley DJ, et al. PAtterns of functional de cline at the end of life. JAMA . 2003;289(18):2387 Œ2392. 93. Centers for Medicare and Medicaid Services Releases 2012 MCBS Access to Care Research Files. (https://www.cms.gov/Research -Statistics -Data -and -Systems/Research/MCBS/Downloads/Data_Brief_002.pdf ). (Accessed September 13, 2019) 94. Millán -Calenti JC, Tubío J, Pita -Fernández S, et al. Prevalence of functional disability in activities of daily living (ADL), instrumental activities of daily living (IADL) and associated factors, as predictors of morb idity and mortality. Archives of Gerontology and Geriatrics . 2010;50(3):306 Œ310. 95. Shumway -Cook A, Brauer S, Woollacott M. Predicting the Probability for Falls in Community -Dwelling Older Adults Using the Timed Up & Go Test. Physical Therapy . 2000;80(9):896 Œ903. 96. Friendlander AH, Ettinger RL. Karnofsky performance status scale. Special Care in Dentistry . 2009;29(4):147 Œ148. 97. Schag CC, Heinrich RL, Ganz PA. Karnofsky performance status revisited: reliability, validity, and guidelines. JCO . 1984;2(3):187 Œ193. 98. Moss AH, Ganjoo J, Sharma S, et al. Utility of the fiSurprisefl Question to Identify Dialysis Patients with High Mort ality. CJASN . 2008;3(5):1379 Œ1384. 99. Moss AH, Lunney JR, Culp S, et al. Prognostic Significance of the fiSurprisefl Question in Cancer Patients. Journal of Palliative Medicine . 2010;13(7):837 Œ840. 100. Lakin JR, Robinson MG, Obermeyer Z, et al. Priorit izing Primary Care Patients for a Communication Intervention Using the fiSurprise Questionfl: a Prospective Cohort Study. J Gen Intern Med . 2019;34(8):1467 Œ1474. 101. Hosmer DW, Lemeshow S. Applied logistic regression. New York: Wiley; 1989. 102. Zou H, H astie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology) . 2005;67(2):301 Œ320. 226 103. Zou H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statis tical Association . 2006;101(476):1418 Œ1429. 104. Jacob L, Obozinski G, Vert J -P. Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning - ICML ™09 . Montreal, Quebec, Canada: ACM Press; 20 09 (Accessed September 16, 2019):1 Œ8.(http://portal.acm.org/citation.cfm?doid=1553374.1553431). (Accessed September 16, 2019) 105. Model - (http://support.sas.com/documentation/cdl/en/statug/68162/HTML/def ault/viewer.htm#statug _glmselect_details01.htm). (Accessed September 16, 2019) 106. (http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug _glmselect_det ails12.htm). (Accessed May 21, 2019) 107. (http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug _glmselect_details12.htm). (Accessed September 16, 2019) 108. Calibration plots in SAS. The DO Loop . (https://blogs.sas.com/content/iml/2018/05/14/calibration -plots -in-sas.html). (Accessed May 22, 2019) 109. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regres sion models by using loess smoothers. Statistics in Medicine . 2014;33(3):517 Œ535. 110. PROC LOGISTIC: The Hosmer -Lemeshow Goodness -of - (https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer .htm#statu g_logistic_sect046.htm). (Accessed April 12, 2019) 111. The homebound requirement. Medicare Interactive . (https://www.medicareinteractive.org/get -answers/medicare -covered -services/home -health -services/the -homebound -requirement). (Accessed June 1 5, 2019) 112. Covinsky KE, Justice AC, Rosenthal GE, et al. Measuring Prognosis and Case Mix in Hospitalized Elders: The Importance of Functional Status. J GEN INTERN MED . 1997;12(4):203 Œ208. 113. DePalma G, Xu H, Covinsky KE, et al. Hospital Readmissio n Among Older Adults Who Return Home With Unmet Need for ADL Disability. Gerontologist . 2013;53(3):454 Œ461. 114. Hebert PR, Gaziano JM, Chan KS, et al. Cholesterol Lowering With Statin Drugs, Risk of Stroke, and Total Mortality: An Overview of Randomized Trials. JAMA . 1997;278(4):313 Œ321. 115. Mills EJ, Rachlis B, Wu P, et al. Primary Prevention of Cardiovascular Mortality and Events With Statin Treatments: A Network Meta -Analysis Involving More Than 65,000 Patients. J Am Coll Cardiol . 2008;52(22):1769 Œ1781. 227 116. Omran ML, Morley JE. Assessment of protein energy malnutrition in older persons, part II: laboratory evaluation. Nutrition . 2000;16(2):131 Œ140. 117. Whellan DJ, Cox M, Hernandez AF, et al. Utilization of Hospice and Predicted Mortality Risk Among Older Patients Hospitalized With Heart Failure: Findings From GWTG -HF. Journal of Cardiac Failure . 2012;18(6):471 Œ477. 118. O™Hare AM, Bertenthal D, Covinsky KE, et al. Mortality Risk Stratification in Chronic Kidney Disease: One Size for All Ages? JASN . 2006;17(3):846 Œ853. 119. Larrañaga P, Calvo B, Santana R, et al. Machine learning in bioinformatics. Brief Bioinform . 2006;7(1):86 Œ112. 120. Fan J, Han F, Liu H. Challenges of Big Data analysis. Natl Sci Rev . 2014;1(2):293 Œ314. 121. Lee CH, Yo on H -J. Medical big data: promise and challenges. Kidney Res Clin Pract . 2017;36(1):3 Œ11. 122. Genuer R, Poggi J -M, Tuleau -Malot C. Variable selection using random forests. Pattern Recognition Letters . 2010;31(14):2225 Œ2236. 123. Singh SP, Jaiswal UC. Machine Learning for Big Data: A New Perspective. 2018;13(5):10. 124. Xu W, Zhang J, Zhang Q, et al. Risk prediction of type II diabetes based on random forest model. In: 2017 Third International Conference on Advances in Electrical, Electronics, Informa tion, Communication and Bio -Informatics (AEEICB) . 2017:382 Œ386. 125. Maniruzzaman Md, Rahman MdJ, Al -MehediHasan Md, et al. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. J Med Syst [electronic article]. 2018;42(5). (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5893681/). (Accessed March 31, 2019) 126. Zhang W, Zeng F, Wu X, et al. A Comparative Study of Ensemble Learning Approaches in the Classification of Breast Cancer Metastasis. In: 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing . Shanghai, China: IEEE; 2009 (Accessed November 8, 2019):242 Œ245.(http://ieeexplore.ieee.org/document/5260680/). (Accessed November 8, 2019) 127. SONG Y, LU Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry . 2015;27(2):130 Œ135. 128. Strobl C, Boulesteix A -L, Kneib T, et al. Conditional variable importance for random forests. BMC Bioinformatics . 2008;9(1):3 07. 129. Luo W, Phung D, Tran T, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res [electronic article]. 2016;18(12). (https://www.ncbi.nlm.nih.gov/pmc/a rticles/PMC5238707/). (Accessed March 31, 2019) 228 130. SAS Enterprise Miner 14.2: High -Performance Procedures. :273. 131. Neville, P. G., and Tan, P. -Y. A Forest Measure of Variable Importance Resistant to Correlations. Alexandria, VA: American Statistica l Association; 2014 132. Breiman, L., and Cutler, A. Manual Œsetting up, using, and understanding random forests V4. 0. 2003;(https://www. stat. berkeley. edu/~ breiman/Using_random_forests_v4. 0. pdf.) 133. Schneider J, Hapfelmeier A, Thöres S, et al. Mo rtality Risk for Acute Cholangitis (MAC): a risk prediction model for in -hospital mortality in patients with acute cholangitis. BMC Gastroenterol [electronic article]. 2016;16. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4746925/). (Accessed March 31, 20 19) 134. Goff DC, Lloyd -Jones DM, Bennett G, et al. 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Journal of the American Colle ge of Cardiology . 2014;63(25 Part B):2935 Œ2959. 135. D™Agostino RB, Vasan RS, Pencina MJ, et al. General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation . 2008;117(6):743 Œ753. 136. Peng S -Y, Chuang Y -C, Kang T-W, et al. Random forest can predict 30 -day mortality of spontaneous intracerebral hemorrhage with remarkable discrimination. European Journal of Neurology . 2010;17(7):945 Œ950. 137. Koslowsky S, Consultant SA, Hanks H. On Variable Importance in Logistic Regression - Predictive Analytics Times - machine learning & data science news. Predictive Analytics Times . 2018;(https://www.predictiveanalyticsworld.com/patimes/on -variable -importance -in-logistic -regression/9649/). (Accessed May 3, 2019) 138. Thompson D. Ranking predictors in logistic regression. Paper D10 -2009. Online available at http://www. mwsug. org/proceedings/2009/stats/MWSUG -2009 -D10. pdf.(visited 2015, June 25) . 2009; 139. Casarett DJ, Fishman JM, Lu HL, et al. The Terrible Choice: Re -Evaluati ng Hospice Eligibility Criteria for Cancer. J Clin Oncol . 2009;27(6):953 Œ959. 140. Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med . 1996;15(4):361 Œ387. 141. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. John Wiley & Sons; 2011 464 p. 142. Cox DR. Regression Models and Life -Tables. Journal of the Royal Statistical Society: Series B (Methodological) . 19 72;34(2):187 Œ202. 229 143. (https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statu g_phreg_sect001.htm). (Accessed July 8, 2019) 144. Cox DR. Analysis of Survival Data. Ch apman and Hall/CRC; 2018 (Accessed July 15, 2019).(https://www.taylorfrancis.com/books/9781315137438). (Accessed July 15, 2019) 145. Liu L, Forman S, Barton B. 236 -2009: Fitting Cox Model Using PROC PHREG and Beyond in SAS®. 2009;10. 146. Guo C. Evaluat ing Predictive Accuracy of Survival Models with PROC PHREG. :16. 147. Royston P, Altman DG. External validation of a Cox prognostic model: principles and methods. BMC Medical Research Methodology . 2013;13(1):33. 148. Testing the proportional hazard ass umption in Cox models. (https://stats.idre.ucla.edu/other/examples/asa2/testing -the -proportional -hazard -assumption -in-cox -models/). (Accessed July 8, 2019) 149. Patel K, Kay R, Rowell L. Comparing proportional hazards and accelerated failure time models: an application in influenza. Pharmaceutical Statistics . 2006;5(3):213 Œ224. 150. Gardiner JC. Evaluating the accuracy of clinical prediction models for binary and survival outcomes. 2018 151. Hannan EL, Magaziner J, Wang JJ, et al. Mortality and Locomoti on 6 Months After Hospitalization for Hip Fracture: Risk Factors and Risk -Adjusted Hospital Outcomes. JAMA . 2001;285(21):2736 Œ2742. 152. Nguyen -Oghalai TU, Kuo Y, Zhang DD, et al. Discharge Setting for Patients with Hip Fracture: Trends from 2001 to 2005 . Journal of the American Geriatrics Society . 2008;56(6):1063 Œ1068. 153. Glesby MJM. Survivor Treatment Selection Bias in Observational Studies: Examples from the AIDS Literature. Ann Intern Med . 1996;124(11):999. 154. Ritchie CS, Burgio KL, Locher JL, et al. Nutritional status of urban homebound older adults. Am J Clin Nutr . 1997;66(4):815 Œ818. 155. Salive ME, Cornoni -Huntley J, Phillips CL, et al. Serum albumin in older persons: Relationship with age and health statu s. Journal of Clinical Epidemiology . 1992;45(3):213 Œ221. 156. Manolio T A, Ettinger W H, Tracy R P, et al. Epidemiology of low cholesterol levels in older adults. The Cardiovascular Health Study. Circulation . 1993;87(3):728 Œ737. 157. Forette B, Tortrat D, Wolmark Y. Cholesterol as Risk Factor for Mortality in Elderly Women. The Lancet . 1989;333(8643):868 Œ870. 158. Goldwasser P, Feldman J. Association of serum albumin and mortality risk. Journal of Clinical Epidemiology . 1997;50(6):693 Œ703. 230 159. Sahyoun NR, Jacques PF, Dallal G, et al. Use of albumin as a predictor of mortality in community -dwelling and institutionalized elderly populations. Journal of Clinical Epidemiology . 1996;49(9):981 Œ988. 160. Klonoff -Cohen H, Barrett -Connor EL, Edelstein SL. Albumin levels as a predictor of mortality in the healthy elderly. Journal of Clinical Epidemiology . 1992;45(3):207 Œ212. 161. Don BR, Kaysen G. Poor Nutritional Status and Inflamation: Serum Albumin: Relationship to Inflammation and Nutrition. Semina rs in Dialysis . 2004;17(6):432 Œ437. 162. Ouchi K, Jambaulikar G, George NR, et al. The fiSurprise Questionfl Asked of Emergency Physicians May Predict 12 -Month Mortality among Older Emergency Department Patients. Journal of Palliative Medicine . 2017;21(2): 236Œ240. 163. Weizer Alon Z., Joshi Daya, Daignault Stephanie, et al. Performance Status is a Predictor of Overall Survival of Elderly Patients With Muscle Invasive Bladder Cancer. Journal of Urology . 2007;177(4):1287 Œ1293. 164. Yates JW, Chalmer B, Mc Kegney FP. Evaluation of patients with advanced cancer using the karnofsky performance status. Cancer . 1980;45(8):2220 Œ2224. 165. Crooks V, Waller S, Smith T, et al. The Use of the Karnofsky Performance Scale in Determining Outcomes and Risk in Geriatric Outpatients. J Gerontol . 1991;46(4):M139 ŒM144. 166. Brugts JJ, Yetgin T, Hoeks SE, et al. The benefits of statins in people without established cardiovascular disease but with cardiovascular risk factors: meta -analysis of randomised controlled trials. BMJ. 2009;338:b2376. 167. Taylor F, Ward K, Moore TH, et al. Statins for the primary prevention of cardiovascular disease. Cochrane Database of Systematic Reviews [electronic article]. 2011;(1). (https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.C D004816.pub4/abstract). (Accessed July 7, 2019) 168. Mann D, Reynolds K, Smith D, et al. Trends in Statin Use and Low -Density Lipoprotein Cholesterol Levels Among US Adults: Impact of the 2001 National Cholesterol Education Program Guidelines. Ann Pharmac other . 2008;42(9):1208 Œ1215. 169. Vaughan CJ, Murphy MB, Buckley BM. Statins do more than just lower cholesterol. The Lancet . 1996;348(9034):1079 Œ1082. 170. Almog Yaniv, Shefer Alexander, Novack Victor, et al. Prior Statin Therapy Is Associated With a Decreased Rate of Severe Sepsis. Circulation . 2004;110(7):880 Œ885. 171. Søyseth V, Brekke PH, Smith P, et al. Statin use is associated with reduced mortality in COPD. European Respiratory Journal . 2007;29(2):279 Œ283. 172. Christakis NA, Escarce JJ. Sur vival of Medicare Patients after Enrollment in Hospice Programs. New England Journal of Medicine . 1996;335(3):172 Œ178. 231 173. So Y. Using the PHREG Procedure to Analyze Competing -Risks Data. :9. 174. Satagopan JM, Ben -Porat L, Berwick M, et al. A note on competing risks in survival data analysis. British Journal of Cancer . 2004;91(7):1229. 175. Ishwaran H, Kogalur UB, Blackstone EH, et al. Random survival forests. Ann. Appl. Stat. 2008;2(3):841 Œ860. 176. Carpenter CR, Shelton E, Fowler S, et al. Risk Factors and Screening Instruments to Predict Adverse Outcomes for Undifferentiated Older Emergency Department Patients: A Systematic Review and Meta -analysis. Academic Emergency Medicine . 2015;22(1):1 Œ21. 177. Hastings SN, Purser JL, Johnson KS, et al. F railty Predicts Some but Not All Adverse Outcomes in Older Adults Discharged from the Emergency Department. Journal of the American Geriatrics Society . 2008;56(9):1651 Œ1657. 178. Flacker JM, Kiely DK. Mortality -Related Factors and 1 -Year Survival in Nurs ing Home Residents. Journal of the American Geriatrics Society . 2003;51(2):213 Œ221. 179. Berglund PA. 265 -2010: An Introduction to Multiple Imputation of Complex Sample Data Using SAS® 9.2. 2010;12. 180. Rubin DB. Inference and missing data. :12. 181. Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ . 2009;338:b2393. 182. Moscovici JL. Combining Survival Analysis Results after Multiple Imputation of Cens ored Event Times. :11. 183. SAS/STAT MIANALYZE Procedure. (https://support.sas.com/rnd/app/stat/procedures/mianalyze.html). (Accessed July 22, 2019) 184. (http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug _glmselect_syntax01.htm). (Accessed February 24, 2019) 185. Peterson ED. Machine Learning, Predictive Analytics, and Clinical Practice: Can the Past Inform the Present ? JAMA [electronic article]. 2019;(https://jamanetwork.com/journals/jama/fullarticle/2756195). (Accessed November 24, 2019) 186. Goldstein BA, Navar AM, Pencina MJ. Risk Prediction With Electronic Health Records: The Importance of Model Validation and Cli nical Context. JAMA Cardiol . 2016;1(9):976 Œ977. 187. Gornick ME, Eggers PW, Reilly TW, et al. Effects of Race and Income on Mortality and Use of Services among Medicare Beneficiaries. New England Journal of Medicine . 1996;335(11):791 Œ799. 232 188. Gómez -Ba tiste X, Martínez -Muñoz M, Blay C, et al. Utility of the NECPAL CCOMS -ICO© tool and the Surprise Question as screening tools for early palliative care and to predict mortality in patients with advanced chronic conditions: A cohort study. Palliat Med . 2017; 31(8):754 Œ763. 189. Hospice Eligibility Criteria & Requirements: Crossroads. Crossroad Hospice and Palliative Care . (https://www.crossroadshospice.com/hospice -care/hospice -eligibility -criteria/##targetText=Hospice%20eligibility%20requirements%3A,taking%2 0into%20consideratio n%20edema%20weight)&targetText=Specific%20decline%20in%20condition). (Accessed November 5, 2019) 190. Sharafoddini A, Dubin JA, Maslove DM, et al. A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational S tudy. JMIR Med Inform [electronic article]. 2019;7(1). (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329436/). (Accessed November 15, 2019) 191. Byhoff E, Harris JA, Langa KM, et al. An Examination of Racial and Ethnic Differences in End -of -Life Medicare Expenditures. J Am Geriatr Soc . 2016;64(9):1789 Œ1797. 192. Hanchate A, Kronman AC, Young -Xu Y, et al. Racial and Ethnic Differences in End -Of-Life Costs: Why Do Minorities Cost More Than Whites? Arch Intern Med . 2009;169(5):493 Œ501. 193. Rizzuto J, Al dridge MD. Racial Disparities in Hospice Outcomes: A Race or Hospice -Level Effect? J Am Geriatr Soc . 2018;66(2):407 Œ413.