{vi-iii 324%,...” nu’ 2. d) .2 My: . 4 31?. . . .— . p. 4.. - vu... .3” 3 .. .l m . M. w. I‘ mm 3;. i C ". wt 7 '1\ (r«)__3t:’l7/L/(J(0 I This is to certify that the dissertation entitled ASSESSING MEDICAL COSTS FROM A LONGITUDINAL MODEL presented by CORINA MIHAELA SIRBU has been accepted towards fulfillment of the requirements for the Doctoral degree in Statistics and Probability Ma'or Profisor’s Signature Q ICMM / 27 2001/ Date / MSU is an Affirmative Action/Equal Opportunity Institution _ LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. y TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJClRC/Dateouepes-ols ASSESSING MEDICAL COSTS FROM A LONGITUDINAL MODEL By Corina Mihacla Sirbu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2004 ABSTRACT ASSESSING MEDICAL COSTS FROM A LONGITUDINAL MODEL By Corina Mihacla Sirbu The United States spends a larger share of its gross domestic product (GDP) on health care than any other major industrialized country. Expenditures for health care represent nearly one-seventh of the Nation's GDP, and they continue to be one of the fastest growing components of the Federal budget. In 1960, for example, health care expenditures accounted for about 5 percent of the GDP; by 2000, that figure had grown to more than 13 percent. Although the rate of growth in health care costs slowed somewhat in the mid-19908, it has once again started to rise at' a rate that exceeds other sectors of the economy. Thus, identifying methods to accurate estimate health care costs continues to be a priority for policymakers and public and private payers. In medical follow up studies incomplete observation due to censoring would preclude ascertainment of outcomes in some subjects. Standard assumptions used in survival analysis do not apply to medical costs because the cumulative cost at the endpoint of interest will generally be correlated with the cumulative cost at the time of censoring. We use a dynamic regression model in which costs are incurred in random amounts at transition times between and during sojourn in health states. A Markov model describes the unfolding over time of individual patient event histories, with transition intensities depending on patient specific demographic and clinical characteristics through a multiplicative intensity model. A random effects model is used for transition and sojourn costs. We then estimate the net present value of expenditures incurred over a finite time horizon. While incorporating explanatory variables, the joint model can accommodate heteroscedasticity, skewness and censoring in cost and health outcome data and provides a flexible approach to analyses of health care costs and outcomes. Our transition model can be viewed as an extension of the simpler two state model, case in which we obtain and revise already developed techniques for regression '7 analysis of medical costs with the focus being on estimation in the presence of time censoring that might result in incomplete costs data on some patients. Using the 2000 Nationwide Inpatient Sample data set of Health Care Utilization Project we focus on estimating costs for patients admitted in the hospital with acute myocardial infarction (AMI), a common high-mortality condition whose outcomes are affected by the process of care. Our methods provide flexible approaches to estimating medical costs. Estimates from cost studies are not only needed to determine the economic burden of disease, to predict the economic consequences of new medical interventions, but also for comparative purposes such as cost-effectiveness analysis. Other possible extensions of our methods are in this area. Cepyright by CORINA MIHAELA SIRBU 2004 To my husband, parents and brother for all his love and support! ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Joseph C. Gardiner, for having the patience to allow me to pursue my own interests, with the occasional push when required to keep me on track. I felt very privileged to have his guidance and support during my graduate studies. I also want to express my thanks to the other committee members for their friendly support. My research was supported by the Agency for healthcare Research & Quality, under Grant 1R01HSO9514. I thank my family for always believing in me. My parents, Rodica and Ion Radut, my brother, Radu have continually offered me support and encouragement. I also want to thank my fi‘iends at Michigan State University for always helping me to turn worries into laughs. Last and most of all, I thank my dear husband, George. This last year was trying for both of us, but his unwavering support and encouragement certainly made my struggles more bearable and I am forever gratefirl to him. vi TABLE OF CONTENTS LIST OF TABLES ..................................................................................................... LIST OF FIGURES ..... ' .............................................................................................. ABBREVIATIONS ................................................................................................... INTRODUCTION ..................................................................................................... CHAPTER 1 ESTIMATING MEDICAL COSTS FROM A TRANSITION MODEL .................. 1.1 A Markov model for describing patient health histories ............................ 1.1.1 Marked point processes ................................................................... 1 .1 .2 Incorporating censoring ................................................................... 1.2 Incorporating costs in the Markov model ................................................... 1.3 Estimation of transition probabilities .......................................................... 1.4 Estimation of average transition costs ........................................................ 1.4.1 Modification for censoring .............................................................. 1.4.2 Consistency and asymptotic normality of ,3“, ................................. 1.4.3 Computation of AW and Bw ............................................................ 1.4.4 Estimation of G(- | z) ...................................................................... 1.5 Asymptotic distribution of the mean transition cost ................................... CHAPTER 2 ESTIMATING MEAN COST FOR THE TWO-STATE CASE .............................. 2.1 Introduction and Background ..................................................................... 2.2 Applying general transition model methods to the 2-state model ............... 2.2.1 Mean cost in the minimal cost data case ......................................... 2.2.2 Mean cost in the interval cost data case .......................................... 2.3 Asymptotic normality of the mean cost ...................................................... 2.3.1 Regression model for log-cost ........................................................ 2.4 Parametric estimation of the survival distribution for censoring time ......... vii ix xi 13 14 18 20 22 31 51 53 56 57 58 61 78 79 86 87 97 101 110 114 2.4.1 Minimal cost data ............................................................................ 114 2.4.2 Interval cost data ............................................................................. 121 CHAPTER 3 130 ESTIMATING HOSPITAL COST FOR AMI PATIENTS IN THE N18 2000 ........ 3.1 Description of the data set and background ................................................. 130 3.2 Creating a working subset data set ............................................................... 144 3.3 Estimation methods ...................................................................................... 148 3.3.1 Unconditional means model .............................................................. 149 3.3.2 Including effects of hospital level (level 2) correlates ....................... 150 3.3.3 Including effects of discharge level (level 1) correlates .................... 151 3.3.4 Including both effects of discharge level (level 1) and hospital 1 53 level (level 2) correlates ..................................................................... 3.3.5 Inverse probability weighting ............................................................ 155 3.4 Results of estimation ..................................................................................... 159 CHAPTER 4 CONCLUSIONS AND FUTURE WORK ................................................................ 176 APPENDIX Mathematical background .......................................................................................... 1 84 BIBLIOGRAPHY ...................................................................................................... 192 viii Table 2.1 Table 3.1 Table 3.2 Table 3.3 Table 3.4 LIST OF TABLES Estimating the survival distribution for censoring time ............................... 92 Primary procedures .................................................................................... 164 Descriptive statistics (N =58469) ............................................................... 166 Parameter estimates in multilevel regression ............................................. 170 Estimates of total charges at median LOS by procedure types, comorbidity levels (0, 1+) and regions. .......................................................................... 173 ix Figure 1.1 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 LIST OF FIGURES Transition diagram for a multi-state Markov model ..................................... 16 Creating a working subset data set .............................................................. 165 Histogram of Log (TOTCHG) ..................................................................... 167 Graph of Log (TOTCHG) versus LOS ........................................................ 168 Estimation of G: Graphs of the Kaplan—Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-normal and gamma models ........................................................... 169 Estimation of S: Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-logistic and gamma models .......................................................... 172 Estimates of total charges at median LOS — no comorbidities .................... 174 Estimates of total charges at median LOS — at least one comorbidity ........ 175 CEA CER NHC NHB NPV QALY LE CABG PTCA CATH LOS TOTCHG AMI DRG HMO NIS HCUP AHRQ CCS CCI 1DC-9-CM OECD WHO iid CLT El ABBREVIATIONS -Cost-Effectiveness Analysis -Cost-Effectiveness Ratio -Net Health Cost -Net Health Benefit -Net Present Value -Quality Adjusted Life Years -Life Expectancy -Coronary Artery Bypass Grafting -Percutanerous Transluminal Coronary Angioplasty -Cardiac Catheterization -Length of Stay -Total Charges -Acute Myocardial Infarction -Diagnosis Related Group -Health Maintenance Organization -Nationwide Inpatient Sample data -Health Care Utilization Project -Agency for Healthcare, Research and Quality -Clinical Classifications Software -Charlson Comorbidity Index. -Intemational Classification of Diseases 9th Revision, Clinical Modification ~Organization for Economic Co-operation and Development -World Health Organization -Independent Identically Distributed -Central Limit Theorem -End of Statement xi INTRODUCTION Economic evaluations of health care interventions are increasingly important in an era of constrained health care budgets. As policymakers seek to prioritize health care expenditures, an accurate assessment of costs and health benefits of competing interventions and treatments is critical in informing resource allocation decisions in health care. A recent report1 from the Office of the Actuary at the Center for Medicare and Medicaid Services projects that the national health expenditures would reach $3.4 trillion in 2013, growing at an average annual rate of 7.3 percent during the forecast period 2002-2013. As a share of gross domestic product (GDP), health care spending is projected to reach 18.4 percent by 2013, up from its 2002 level of 14.9 percent. This demands extraordinary restructuring of the organization and financing of US health services. Over the past decade there has been an explosion of research on methodology for economic evaluations in health care. With increasing availability of large databases on patient outcomes and costs, statistical methods for comparing outcomes with costs need to be developed. The field is young, however, and there are many important and challenging problems that remain unresolved. In this dissertation we will address several statistical issues with analysis of medical cost data. We adopt a longitudinal framework in which costs incurred as the individual level are random quantities associated with events that occur as an individual’s health history unfolds over time. A Markov process is used to describe the dynamics of movement of an individual through different health states. Costs are incurred at transition times, and in sojourn in health states. Total expenditures over a finite period of time is then defined as an ‘expected value’ called a net present value (NPV). Because individual characteristics such as demographics (age, gender, race) and clinical factors (treatments, comorbidities) can influence NPV, we incorporate covariate effects into our model for estimation of NPV. Our longitudinal framework provides a natural setting for estimating medical costs. We will demonstrate how some recent approaches to analysis of costsz'll can be subsumed into this framework. Importance of Cost-Effectiveness Analysis In addition to evidence of clinical effectiveness of treatments, evidence of their cost-effectiveness has become an important consideration as policy-makers world-wide face decisions in allocating resources for health care services. In Australia, the Pharmaceutical Benefits Advisory Committee makes recommendations, based on effectiveness and cost-effectiveness evidence, on drug products that should be subsidized and placed in the Pharmaceutical Benefits Scheme”. The National Institute of Clinical Excellence13 in the UK makes similar requirements for use of new healthcare technologies in the National Health Service, and in Ontario, Canada, the Drug Benefits Plan uses economic data when supporting new additions to its formulary”. The Phenomenal penetration of HMOs into the US health care market has heightened aWar-eness of cost-effectiveness among providers and consumers of healthcare services. The US Preventive Services Task Force and the Panel of Cost-Effectiveness in Health and Medicine have urged consideration of cost-effectiveness in addition to clinical effectiveness to help inform investment of health care dollars15 . Measures used in cost-effectiveness analysis Cost-eflectiveness analysis (CEA) has been promoted as a useful tool in the effort to prioritize expenditure on health care programs15 . By quantifying the trade-offs between resources that need to be deployed and health benefits that accrue from use of alternative interventions, CEA offers guidance in decision-making by structuring comparisons between these interventions. A cost-identification analysis is often conducted for treatments and procedures that are believed to be equivalent in their clinical efficacy. For example, if two competing programs do not differ on average in their health benefits, then the one with the lower average cost will be preferred. On the other hand, if the costs of two programs are judged equivalent, the intervention with the greater health benefit will be preferred. An intervention that delivers higher benefit at lower cost than its competitor is said to be dominant. A decision has to be made when one program has both higher cost and greater benefit than does its competitor. Is there a critical value below which society would consider the more costly intervention still “cost-effective”? In this situation, the cost-effectiveness ratio (CER) becomes a useful summary statistic for ranking competing interventions. It is the ratio of the incremental cost relative to the incremental benefit. With costs measured in dollars and health benefits measured in their natural units such as life expectancy, number of lives saved, or preferably quality- adjusted life years (QALYs), the CER is stated in dollars per unit of effectiveness. In CEAs conducted with a societal perspective that accounts for all costs of the interventions, whether borne by the recipient of care, the provider or the insurer, the critical value of a CER is the upper limit of what society is willing to pay for an additional unit of health benefit. Other summary statistics used in CEA are the net health benefit (NHB) or net health cost (NHC)”’. Suppose the incremental health benefit is monetized using a value for each additional unit of health benefit. This could be the upper limit of the CER as judged by what society is willing to pay for adopting the competing intervention. The NHC, expressed in dollars, is the difference between the incremental cost and the monetized incremental benefit. The net health benefit can be defined in an entirely analogous manner, and would be expressed in units of effectiveness. Many researchers have pointed out that the CER has undesirable properties that make its use in decision making problematic. Analysis of medical costs Incomplete data are likely to arise in longitudinal studies, because patient follow up will not be complete in all subjects. In survival analysis, censoring occurs when the time to event variable T is not observed in some individuals because a censoring event occurs first at time U, that is U < T. Most survival analysis models assume T, U ‘ independent, or, when covariates z are present, that conditionally on 2, T and U are independent. This is the usual random censorship model. With accumulating costs this assumption is untenable. If y(t) is the accumulated cost up to time t, then y(T) and y(U ) are generally correlated. Therefore analyzing costs by traditional survival analysis techniques is not possible. In this dissertation, we first consider the situation in which a single cost variable y is observed together with covariate information 2 in a sample of subjects. In our general framework costs can potentially be accrued over a fixed time period [0, 2'] with expenditure terminating at some event time T so that complete cost observatiOn occurs if a patient is followed through time T’ = rnin(T, 7). Suppose y(t) is a right-continuous process that represents the cumulative cost up to time t (including time t) for a typical patient in the population under study. If lifetime cost is of interest then Tdenotes survival time. Since costs do not accumulate after T, y(t) = y(T) for all t 2 T. The cumulative cost y(r) at time 2' is the principal random variable of interest, so inference focuses on the mean cost, ,u = E ( y(T)) = E ( y(T' )). With lifetime medical cost, the cumulative cost is y(T) and estimating the average E ( y(T)) is of interest. Because of possible censoring at time U, y(T) is not observed if T > U . If this is the case we observe y(U). A simple sample average of the observed costs in the patient sample would underestimate the true expected medical cost for the treatment under study. Also using the average in the sub—sample of patients with complete costs would be inefficient. Even if complete costs were available, standard regression techniques for assessing the influence of covariates on costs can not be directly applied. Cost data are often very skewed, usually to the right. They also exhibit considerable heterogeneity across patients. Standard assumptions used in ordinary least squares (OLS) for example can not be applied. To mitigate the effects of skewness, the log—transformation of costs might be considered. However, this too has adherents and non-adherents as explained in Manning (2001)”. Even if a transformation were feasible, a retransforrnation would be needed to obtain estimates of mean costs (and other statistics) across specified covariate profiles: retransforrnation itself presents some methodological challenges18'2'. Extreme form of skewness occurs in cost data when proportion of subjects in the sample have zero costs. For example if we examine costs of office visits to a doctor or other health professional in the year 2000, 20% of adults 18 years of age and over did not make any office visit22 and therefore incurred zero expense. This creates a 2-part distribution for costs, one part for the sub-population with an expense, and the second part for those without. These groups differ considerably in their demographic characteristics and medical history. In a two-part model, one has a model for the likelihood of expense, for example, a probit or logit model for P( y > 0 | z) , where z is a vector of covariates, and then a second model for E ( g( y)| y > 0,z), where g is a transformation, such as the logarithmic. Debate continues on the proper analysis of the two-part models and comparisons to other models such as the Heckman model and sample selection modelsB’ 24. Cox regression has been the mainstay for analysis of censored time to event data. However using this method directly with costs is not possible. As noted earlier, cost at censoring time and cost at event time are correlated. In this dissertation we maintain the traditional use of Cox regression for time to event analysis. It is used to model covariate effects on the transition intensities as patients move from one health state to another. We then combine this with a linear mixed effects model for costs (incurred at transition times or sojourn in states) conditional on event times. Finally we derive estimators of NPV given a covariate profile and develop the asymptotic theory of these estimators. A transition model for analysis of medical costs: Outline of Dissertation When an intervention is deployed costs are incurred in random amounts at random points in time. Typically these costs are associated with health states that a patient might visit in the course of the intervention, and the different lengths of time spent in each state. The probabilistic mechanism that governs transition between these states and the distribution sojourn times in health states vary at the individual level depending on patient specific demographic and clinical characteristics. This thesis concerns the development of new statistical methodologies for estimating medical costs with censored data in both this multiple states setting and the two-state case. In Chapter 1 we describe the evolution of a patient’s health as the unfolding in time of a finite state stochastic process. A non-homogeneous Markov process X = { X(t) :IE 7'} with finite state space E = {l,2,...,k} , provides a natural setting to describe the probabilistic mechanisms that govern transitions between states, where X (t) is the patient status or health state occupied at time t e T = [0,2'], and 1's 00. Transition probabilities are denoted by PM (s,t) and transition intensities by ah]- (I). The state space of X typically consists of several transient states, such as “well”, “recovery”, “relapse” and one or more absorbing states such as “dead” or “disabled”. Over the follow up period the typical patient would transit to other health states, X I = X (T1), X 2 = X (T2 ),... at random times T1 ,T2 and these transition times and health states describe the event history of each patient. If observation of X is ceased after some random time U, independent of X, then we will need to account for censoring accordingly. The survival model is an example of a two-state process with a single transient state “alive” and a single absorbing state “dead” with survival time T = T, and T, = 00 for n 2 2. The multi-state analog of survival time is the time to absorption in state k given byz'k =inf{t >0:X(t)=k}. Having described the evolution of a patient history by the finite state space non- homogeneous Markov process X, we now consider two types of costs that might be incurred in the course of follow up, costs at transition between health states, and costs of sojoums in a health state. Incorporating costs in the model enlarges the usual 0' -field used in the multiple states survival theory, namely ff =a{ (Tn , X n ) :0 _<. T, S t A U } , by adding cost information. Under specified assumptions the martingale theory still obtains and the compensators remain unchanged. The estimation of PM (s,t | Z0) from a Cox regression model (multiplicative intensity model) for the a,,,- (t | 20 ), has been very well developed by Andersen et al (1993)25. Numerous applications of this method are published regularly in the medical and epidemiologic literature. To analyze costs incurred at transition times, we adopt a mixed model approach. If 7}, ,7}2 ,...,7},,_ denotes the observed sequence of n, transition times in the ith individual and Y, = ( y” , y,2 ym )' the associated vector of costs (or transformed costs), then the random-effects model Y,- = X,,B + Z,v, +u, is the basis for estimation of ,6. Here the covariate matrix X, will include terms for the times 7}, ,T,2,...,7},,, , individual patient characteristics, and the matrix Z,- will include a subset of these factors, most likely variables for modeling the effect of transition times such as 7},- and Ti]? . The unobserved heterogeneity is the vector v,, inducing dependence among the ya. '5 and u,- is the residual error. We will derive the NPV for all transition costs in the interval [0, 21, in the form NPV(Z0)= Z Ee-"chj(t|ZO)a'hj-(tIZO)P,h(0,t|Zo)dt conditional on the initial h¢j state X 0 =i and a specified covariate profile Z0, where r denotes the discount rate. Here ch, (t I Z0 ) is the expected cost incurred at time t if the transition h—-> j occurred and it can be obtained from the components of E (Y, |X, ) . Similarly we can define NPV for all sojourn costs. The mathematical form for NPV depends on the underlying transition probabilities and intensities. Following Andersen et al (1993)25 we account for heterogeneity across patients by semiparametric modeling of the transition intensities of the process through patient- specific covariates. We estimate transition probabilities using a Cox regression model. We combine the two parts to form an estimate of the mean present value of all expenditures and use the inverse-probability of censoring-weighted (IPCW) technique to account for censored observations. The estimators are obtained conditional on an initial state and given a covariate profile. By applying the delta-method to functionals that arise in the estimation of the NPV, we obtain large sample properties of these estimators. This approach is new and builds upon a similar idea used by Praestgaard (1991)26 to estimate actuarial values in life insurance. In that context the benefit (cost) is fixed and the stochastic elements are the sojoums in policy states or transitions between policy states. The ‘cost’ at transitions (called assurances in the life-insurance literature) are fixed by the terms of the policy. Also the ‘unit cost Of sojourn’ (called annuity payments) are also fixedzng. In our context these quantities are no longer fixed. With longitudinal data these costs are observed and vary across patients. In Chapter 2, we Show that our transition model described in Chapter 1 also captures costs under the simpler two state survival model with a single transition time and sojourn. In this case several investigators have developed techniques for regression analysis of medical costs with the focus being on estimation in the presence of time censoring that might result in incomplete cost data on some patients. Our transition model can be therefore viewed as an extension of this methodology to multiple transition times and sojoums. If we specialize our multiple transition model methods from Chapter 1 to a two state model with patients starting in state ‘0’ (alive) and followed until they reach a terminal state ‘1’ (death) at time T , the total cost for a patient can be interpreted as a sojourn cost that ends at time T or 2' whichever occurs first or as a transition cost at time T if the patient dies in the interval of time [0,2']. Current methods for estimation of the population mean cost are both nonparametric and semi-parametric. The key references are Lin et al (1997, 2000, 2003)“'5’3°, Bang and Tsiatis (2000)“, Strawderman (2000)”, Willan et al (2002, 2003 W, Wooldridge (2002, 2003) 33"“. Semi-parametric models would assume a special parametric form for the distribution of cost. For a single cost, Zhou el al (2000)35 primarily uses a log-normal regression model to assess covariate effects on mean costs. He also discusses approaches to deal with heteroscedasticity, skewness, censoring and zero costs, all in the context of a parametric model354'. 10 Our approach uses the same inverse-probability of censoring-weighted (IPCW) technique to derive consistent and asymptotically normal estimators of regression parameters and for the net present values. We discuss parametric methods for estimating the survival distribution for censoring time, and therefore the weights in the IPCW technique, as well as methods of estimation for the survival distribution for event time. Chapter 3 focuses on applications to real data: We use the inpatient utilization data from the Nationwide Inpatient Sample (NIS) of Health Care and Utilization Project (HCUP), a database of all hospital inpatient stays drawn from a stratified sample of approximately 1,000 community hospitals in the US. For 2000, the NIS contains over 7.4 million discharges from 28 states. Total charge and length of stay (LOS) are the main healthcare utilization variables for each hospital stay. There is a growing literature on use of the NIS in health services research that we use for guidance 4247. Following our experience with analyzing charges and LOS of acute myocardial infarction (AMI) 48’50, we will focus on patients admitted in the hospital with patients in the MICH study AMI that have undergone either no procedures or Coronary Artery Bypass Grafting (CABG), Cardiac Catheterization (CATH) or Percutaneous Transluminal Coronary Angioplasty (PT CA) as a primary procedure. Hospital characteristics, such as quality of service or managerial performance may impose distinct effects on the costs of treating patients. Rice et al (1997)“, Carey (2000, 2002) 52’ 53 and Goldstein (2002)54 insist on the usefulness of multilevel methods in studies where data on cost are collected over multiple sites (hospitals in our data). In such circumstances it can be expected that hospitals may have an impact on the cost regardless of treatment the patient receives. The inclusion of hospital as a level in a 11 multilevel analysis will ensure that the clustering effects within hospitals will be adequately controlled for. Traditional estimation procedures such as OLS, which is used for example in multiple regression, are inapplicable because of the existence of a non- zero intra-hospital correlation, resulting from the presence of more than one residual term in the model. We use a multilevel modeling technology to estimate costs for patients diagnosed with acute myocardial infarction as they relate to both patient and hospital level characteristics. Patients transferred to a short-term hospital, as well as other transfers, including skilled nursing facilities (SNF), intermediate care, home health care have incomplete total charges and length of stay so they will be treated as censored. In our working data set 32% of the discharges are censored. Selection probabilities from the censored sample of event times are estimated using parametric estimators of the censoring distribution. We then estimate total charges at the median LOS for specific covariate profiles. Our method accounts for the existence of a non-zero intra-hospital correlation, censoring and skewness of total charges. In the concluding Chapter 4 we outline some extensions of our work particularly to estimation of summary measures in CEA such as the CER, net health benefit (NHB) or cost (NHC), net present value (NPV), life-expectancy (LE) and quality-adjusted life years (QALY)- 12 CHAPTER 1 ESTIMATING MEDICAL COSTS FROM A TRANSITION MODEL Economic evaluations of health care interventions are increasingly important in an era of constrained health care budgets. As policymakers seek to prioritize health care expenditures, an accurate assessment of costs and health benefits of competing interventions and treatments is critical in informing resource allocation decisions in health care. A multi-state model is defined as a model for a stochastic-process, which at any time occupies one of a set of discrete states. The states can describe conditions like healthy, diseased, diseased with complications and dead. When an intervention is deployed costs are incurred in random amounts at random points in time. Typically these costs are associated with health states that a patient might visit in the course of the intervention, and the different lengths of time spent in each state. The probabilistic mechanism that governs transition between these states and the distribution sojourn times in health states vary at the individual level depending on patient specific demographic and clinical characteristics. 13 The main themes of this chapter are arranged as follows. In section 1.1 we provide a description of patient history as the unfolding in time of a finite state stochastic process. A non-homogeneous Markov process describes the probabilistic mechanisms that govern transitions between states. Following Andersen et al (1993)25 we account for heterogeneity across patients by semiparametric modeling of the transition intensities of the process through patient-specific covariates. We then describe how costs are incorporated into this framework. We consider two types of costs, costs while sojouming in a health state and costs incurred at transitions between health states. Net present values of all expenditures incurred over a finite time horizon are then defined and have mathematical forms that depend on the underlying transition probabilities and intensities. We will write each term of these mathematical forms as sum of independent and identically distributed variables. Conditional on the initial state i, given the vector Z0 of basic covariates, we will assess the asymptotic normality of the net present value of all expenditures associated with the h to j transitions in (0,t] , i.e. the asymptotic distribution of n“2 Nfiv‘.”(t i,Z )—NPV‘.”(: i,Z ) h} 0 h} 0 using the Functional Delta Method. 1.1 A Markov model for describing patient health histories Let (S2,f’,P) a probability space and let {X(t), re ’1'} with T =[0,T].. a non- homogeneous continuous time Markov process with finite state space E = {1,2,...,k} , 14 having transition probabilities Phj (s,t) and transition intensities ah}. (I). This Markov process describes the evolution of one patient’s health history, with X (t) the patient health state occupied at time t. Typically E consists of several transient states, such as “well”, “ill”, “recovery”, “relapse”, and one or more absorbing states such as “disabled” or “dead”. Let a = (ah, ),h, j e {l,2,...,k} be the matrix of these transition intensities, ark-(t): lim P[X(t+At)=j|X(t) =h]/Ar, j¢h ’ Atio and ahh = -Z ah,- . Thus, starting from the time of entry into state h, the sojourn times in jth the given state h are continuously distributed, with hazard rate function —ahh. Given that the process jumps out of state h at time t, it jumps into state j at h with probability ah] /‘ahh - Let Ah,- (I) = Jgahj(s)ds and AM, =—Z Ah,- . For h #5 j the function Ah]. is called jath the integrated intensity function for transitions from state h to state j, whereas A“, is called the negative integrated intensity function for transitions out of state h. The matrix A = (Ah, ,h, j E {1, 2,...,k }) is also called the intensity measure of the Markov process X. Hereafter integrated intensity functions Ah,- ,h, j 6 {1, 2,...,k } are supposed continuous, unless otherwise mentioned. Let P(s,t)= H(I+dA) for s_ 0} : t2 0}, the multivariate counting process 11¢ j,h,jeE}, N = { Nhj , h ¢ j} has random intensity process I lily . where 2th,. (I) = ah]. (t)Y,, (t). Moreover, Mk,- (I) = N,, (I)- Er, (um, (u)du, h it j,h,je E are zero mean local square integrable martingales. These results were first proved by Jacobsen (1982)55 and are summarized in Theorem 11.6.8, p94, Andersen et al. (1993)”. Using the continuity ofAhj ,h, je {1,2,...,k} , it can be shown that (th ,Mé,) = O for all pairs (h, j) and (q, r) with (h, j) ¢ (q, r). Here we denote by (M ,M ') the predictable covariation process of M and M '. In our absolutely continuous case with transition intensities ah,- and Ah,- (t) = Law- (s)ds , we say that the multivariate counting process N has intensity xi = (2h,- ,h :6 j), with 2,, (t) = Y, (0a,,- (t). 17 1.1.1 Marked point processes Over the follow up period the subject transits to other health states, X, , X 2 ,...at random times T, ,T2 and these transition or epoch times {Tn :n 2 0} and health states {X n :n 2 0} describe the event history of each patient. Formally, these are defined in terms Of theforward recurrence time W(t) = inf{s > 0 : X (t + s) at X (1)} , which is the waiting time from t until the next transition out of the state X (t). Having set X() = X(0) ,To = 0, W(oo) z 00, we define for all n 2 0: TM, =Tn +W(T,,) with X“, = X(T,,+,)if Tn+1 <00 and X“, = X" if Tn+1 =oo. The sojourn at the nth transition is W(T,, ) in state X n . The survival mode] is an example of a two-state process with a single transient state “alive” and a single absorbing state “dead” with survival time T = T, and T, = 00 for n 2 2. The multi-state analog of survival time is the time to absorption in state k given by 2', = inf{t > 0 : X (t) = k}. Suppose that R, is a general mark corresponding to T, and we define the marked point process N(t,A) =Z[r,, $2.12,, 6 A] n21 where A is a subset in the range of the R". For our case we could think of R n = (Xn_, ,Xn) with values in R={(h,j) : h,je [5,]: ¢ j}. Our previously described counting process Nhj (t) can be identified with N (t,A) by taking A = (h, j}. The natural filtration f; is generated by{N(s,A):OSsSt,ACR}. l8 We also have56 £=0{N(s,A):0_<.sSt,ACR}=0'{(T,,,X,,):0$Tn St}. We can endow R with an appropriate 0 -field and regard A -9 N (t, A) for each t as a random measure, and t —> N (t, A) as a counting process for each A. { f; ,t 2 0 } is a right-continuous filtration and £0 = 0'(U f; ) , where if, = 0'( X (0), z). :20 For a stopping time T with respect to.7-',' ,t 2 O we define f}={BE foo:Br\[TSt]E f; forall t} fT_={Bn[T>t]:120, Be .5} We have f;_= ft, for all Tn_l .0}the compensator Ah,- (I) of the process Nhj (I) is given by P[V,, e du,x,, =j|J-‘Tn P[Vn 2 u | fr“ ] A,,(dt) = -' ][X,,_, 2h] on where V” = T, —Tn_, and u = t —T,,_, is the duration of the current sojourn at time t. Our Markov assumption gives Ahj(dt) =[Xn_, =h]P[T,, e [t,t+dr),X,, = j|Tn_,,X,,_,]=[X,,_, =h]a,,j(t)dt, for IE (Tm,l ,Tn ]. l9 1.1.2 Incorporating censoring If observation of X is ceased after some random time U, independent of X, we will need to replace N h,- (t)by the censored process N§j(t)=#{sSt/\U:X(s—)=h,X(S)=j}ih¢jand Yh(t) by Y,f(t)=[X(t—)=h, U 2t]. Then, with respect to an expanded filtration the aforementioned martingale property still obtains. For the ith patient we observe a basic covariate z,- (t) , an initial state X,- (0) , the state indicator Y,,‘, (t) = [X,- (t—) = h,U, Z t] and Ni],- (t) , the number of n event transition times before I from state h to j. Let Y,f(t) = Zth, (t) the number of i=1 subjects who were not censored at time t and just before I were in state h n and N5, (t) = ENE}, (z). i=1 In terms of marked-point processes we define the marked point process N" (t,A) = EU" 5 t A U,R,2 e A] where A is a subset in the range of the R". Our n21 previously described counting process N ,f, (t) can be identified with N c (t, A) by takingA = {h,j} . The natural filtration f,“ is ff=a{N‘ (s,A) :03 s S t,A C R} =0{(T,,,X,,):OST,, SIAU}. With respect to the filtration { f :t Z 0} the compensator A2,. (I) of the multivariate counting process N ,f, (t) is given by 20 C P[Vn e du,X,, = j,T,,_, 5U lff_,] hj(dt)= c " [X,.-i =h] P[Vn zu,T,,_, sU|fTHI on IE (Tn_,,Tn ] . Equivalently, the processes ng defined by ME, = N5, - z,- are martingales. We assume that the censoring variable, U. is independent of everything else in the model. Then P[v,, e du,x,, =j,T,,_l SU If; 1 C _ n-l P[V,, zu,T,,_, sum-3,4] _ C“Tn—l )P[Vn E du’xn =j|Tn-1’Xn-1] G(T,,_, )P[Vn 2 u ITn_, , Xn_,] [x,,_l =h]=A,,j(dt) and N ,f, (t) has the same compensator as Nhj (t) . Therefore conform to the Definition 111.21 in Andersen et al. (1993)25, the right-censoring of the process N generated by U is independent. In the sequel we will assume that censoring has been accommodated in this way. Next we will incorporate costs in the model. the assumption of independence of costs of everything else in the model is often violated, for example the longer the length of stay in the hospital, the higher the costs. We will incorporate censoring without assuming independence. 21 1.2 Incorporating costs in the Markov model As previously mentioned, we consider two types of costs that might be incurred in the course of follow-up: costs at transition between health states and costs of sojoums in a particular health state. Suppose an amount Ch,- (t) is incurred just after time t if a transition h to j takes place at time t. The present value of expenditures in (0,t] associated with these transitions is t —rs c},}’(t)=j0e c,,(s)d1v,,,(s). where r is the discount rate. In economic studies expenditures to be incurred in the future are discounted to present value. A dollar spent now is worth more than a dollar that would be spent later. The discount rates used for the US have usually been between 3% and 5% per year, reflecting the rates on savings accounts or certificates of deposit. Let Z0 be a given fixed vector of basic covariates. The p-dimensional vectors Zhj0 of type-specific covariates are computed from the vector 20, reflecting that some of these basic covariates may affect the different transition intensities differently. Conditional on the initial state, given the vector Z0 of basic covariates with the corresponding type-specific covariates Zth' the mean of this present value is: . ’ -rs . vaé” (t | 1320) = 15(C,§,‘.’(t)|xO = 1,20) = E( Le Chj(s)th,-(s)| x0 =i,zo). We assume that: 22 A1 C h,- (.) are bounded, non-negative processes over 7 , adapted to { flzte T }, left continuous with right hand limits (so Ch, (.) are bounded, predictable processes). A2 For te (0,2'] , E(Chj (t)| X0 = i, X(t-) = h,iZ0) = E(Chj (t) | X(t—) = h,ZO ) , so that at any t > O the expected transition costs do not depend on the initial health state. It is known that if N is a counting process with intensity process A , M = N — J1 and H is locally bounded and predictable, then M and IHdM are local square integrable martingales, with E (M ) = E ( IHdM) = 0 (see Proposition 11.4.1, p70, Andersen et al (1993)”). Then, by assumption A1, (I) - I -rs - NPV,, (t | 1.20) = E( [De C,, (sth (s)ds| x0 = 1,20) = I —rs - = E( [0e Chj(s)Y,, (nah, (s)dsIXo =l,Z0). By Fubini’s Theorem: ’ -rs - NPV,f,”(t|i,ZO)=Le E(Ch,(s)r,,(s)|X(0)=t,zo)a,,,(s)ds. We can write E(C,,(s)Y,,(s)| XO =i,Z0) = E(Chj(s)|X0 =i,X(s—) = lz,Zo)P(X(s—) =h| XO =i,ZO). By assumption A2, NPV,,(jl) (t | i ,Z0 ) has the form NPV,f,”(I|i,ZO)= Ee"-‘ch,(s|zo)13,,(0,s|zo)a,,,(s)ds, (1.1) where ch, (s | Z0) = E(Chj (s)| X(s—) = h,Z0). We now turn to the cost of sojoums in a health state. Suppose that the cost in state h is incurred at the rate B, (u) at time u. The observed rate is zero at time it whenever, just before u, the patient is not in state it anymore, so [X (u—) = h] = 0. Then the observed present value of all expenditures in state h, started at time s and ended after the duration time d is given by (2) 5+1! _m c, (s,d)=L e B,(u)Y,(u)du, where r is the discount rate and Y, (u) = [X(u—) = h]. Conditional on the initial state, given the vector Z0 of basic covariates, the mean Of this present value is NPV,”)(s,d|i,zo) = I:(C,‘,2)(s,d)|X0 =i,ZO)= = fide*"’E(B,(u)Y,(u) | X0 =i,Z0)du. Conditions similar to A1 and A2 are assumed for B, (.) : A3 B, (.) are bounded, non-negative real stochastic processes over[0, 2'] , adapted to (f; ). A4 E(B, (u) | X0 = i,X(u—) = h,ZO) = E(B, (u) | X(u—) = h,Z0) for all u 6 [0,1]. Denote b, (u l Z0) = E(B, (u) | X(u—) = h,Z0). We can write E(B,(u)Y,(u)|XO = 1,20) =E(B,(u)|X(u-) = h,X0 =i,ZO)P(X(u-)=h | X0 =i,Z0). By assumption A4: 24 5+ S NPV,”) (s,d | i,ZO) =[ de’mb,(u|zo)13,(0,u IZO)du (1.2) The right hand side of (1.1) may be interpreted intuitively as follows. Starting at 3:0 in state i, a patient is in state h at time s with probability P,, (0, s | Z0 ). Conditional on being in state h just prior to s, suppose a transition to state j occurs at s with intensity a,,- (s | Z0 ) and this transition incurs a cost. Then (1.1) is the NPV for all h—>j transition costs in [0, t]. Similarly, for the right hand of (1.2) consider the cost of sojourn in state h in the interval (3, s+ds]. This is b, (s I ZO )ds , conditional on reaching state h at s. To incur this cost a patient must move from the initial state i to h by time s, with probability R, (0, s | 20 ). So (1.2) is the NPV of the total sojourn cost in state h in [0, t]. Suppose costs potentially accrue up to a fixed time horizon 2' for sojoums in, and transitions among the transient states. If k is the only absorbing state costs would cease at the absorption time 2', = inf {t > 0 : X (t) = k} or 1' whichever is observed first. The net present value of all expenditures is NPV(i,ZO) = ZEe"~‘c,,(s|zo)a,,(s|zo)13,(0.s|zo)ds+ h¢j + Z Emmizo>a. j and (1.1) is its corresponding actuarial value given X 0 = i. Likewise, B, (t) is the annuity payment rate at time tin policy state h, C,” (t) is the discounted value (at time 0) of all annuity payments received in [0, t] while the insured is in policy state h and (1.2) is its associated actuarial value. Usually C,,- (t) and B, (t) are known non- random functions and one is interested in the total payment function 2 hi]. C f,” (t) + 2 h CE) (I) . In this context Praestgaard (1991)26considers the estimation of (1.1) and (1.2) using a framework very similar to that we have described. However, because costs are incurred in random amounts at random points in time during the course of a health care intervention the average expense functions b, (t | z) and c,, (t | z) are no longer known and need to be estimated from appropriate data along with the transition probabilities P,j (0,t | Z0) and integrated transition functions A,”- (t | Z0 ) = Law- (s | Z0 )ds . For easiness of notations we will assume the discount rate is null, i.e. r = 0 , unless otherwise specified. 26 Insights into assumption A1 The assumption A1: C ,j (.) are bounded, non-negative processes over?" , adapted to { f: :tE T }, left continuous with right hand limits (so C ,j (.) are bounded, predictable processes) is not as naive as it seems. Without it we would have to extend the observed history at time t to}: , the minimal o-field generated by f; and 0'{C,j (s) : s S t, h ¢ j, h, je E}. With respect to this new a-field one should wonder whether the compensator A ,j (t) of the process N ,j (t) can be estimated in the same way BIC . Some insights into assumption A1 can be gained by viewing the cost C, as a mark associated with the transition time T, and describing the underlying process as a marked point process. Suppose that R, is a general mark corresponding to T, and we define the marked point process N(t,A)=Z[T, St,R,e A] n21 where A is a subset in the range of the R,. For our case we could think of R, = (X,_, ,X, ,C,) with values in R={(h,j,c) : h,je E, h at j,c >0}. Our previously described counting process N,,- (t) can be identified with N (t,A) by takingA = (h, j,(0,oo)} . The natural filtration f; is generated by{N(s, A) :0 S s S t,A c: R}. We can endow R with an appropriate 0’ -field and regard A —> N (t, A) for each t as a random measure, and t —> N (t, A) as a counting process for 27 each A. Moreover with respect to{f-: ,t 2 0} the compensator A(t,A) of N (I, A) is given by '-Trt-l an (u,A) ,ue (0,T, —T,_,] l-F, (u-,R) A(t,A)=A(T,_,,A)+ where F, (u, A) is the conditional distribution F, (u, A) = P[T, -T,_, s u, R, e A | f3.“ ] and f,“ =0{(T,,R,):1San-1}.For te (T,_,,T,] we have _ dF,(u,A) dF,(u,R) dF, (u,R)'1—F, (u-,R) dA(t,A) where u = I —T,_, is the duration of the current sojourn at time t. The first term dF, (u,A) can be interpreted as the conditional probability of R, E A given 7:} and an (u, R) n-l dF,(u,R) l—F,(u—,R is the conditional hazard rate for the T-T , ,_, = u . The second term sojourn T, —T,_, given ET“ . In the case of interest R, = (X ,_, , X, ,C, ). Then writing V, = T, —T,_, and taking A = {h,j,(y,y+dy]} we can express dF (u,A) . ~ . ~ —"——=X_=hPCed v,= ,X,= ,.F PX,= V,=u,.F dF,(u.R) I ,.i ll. yl H J TM] [ 1| TH] and dF,(u,R) —[x _, Plvnedulj-‘T,,1 1—F,(u-,R) "“ P[V, 2u|.7:}n_l] so that on IE (T,_, ,T,] and recalling that u =t—T,_,, 28 A(d A) P[C d w x 'f“ IPIV"Edu’X"=j|fT'][X l] (14) t, = ,Ey ,=u, ,=, ~ "' ,_=z. . 1 TH P[V,Zu|.7-}n_l] ‘ With respect to the filtration {.7; :t 2 0} the compensator A ,j (t) of the process N ,j (t) is given by P[V, e du,X, =j|.7»',, ] A,j(dt)= PIV.Zu|7"r,_,l - [X,_,=h] on tE(T,_,,T,]. (1.5) Now f3,“ =0“{(T,,X,,Cj):an—l}whereasJ-'Tn_l =U{(Tj,X,-):an—l}.lt is reasonable to assume that (T,,X,)is independent of {C}. :an—l} givenf'ml , or, at least that {C j : j S n—l} is fr", -measurable. If (T,,X,)is independent of {C}. :an-l} then P[V, e du,X, = jlf‘rfl] P[V, _>. u | f}, 1 [XII—l =h]: P[V, e du,X, =j|.7-"Tn_l ,{Cj,an-l}] P[V, 2M3", ,{Cj,an-l}] n-l = h]: P[V,e du,X, =j|fT I] = "_ [XII-I =h]=Ahj(dt)° P[V, ZquTH] Since a similar conclusion can be found if we suppose {C j : j S n —l} is .77“ - measurable (i.e. f,“ = ET“ ) in any of these two cases (1.5) coincides with the second term in(1.4). Moreover, our Markov assumption makes A,j (dt)=[X,_, =h]a,j (t)dt, te (T,_,,T,]. In fact taking A={h,j,(0,oo]} we get 29 A,j (dt) = E[A(dt, A) | £_]. Also .75; is the minimal a-field generated by f, and0{C,j (s)AN,j (s) : s S t, h at j, h,j e E}. The first term in (1.4) is the distribution of the cost C, conditional on the past in _, , the destination state X, = j and the transition time], =t. With A = {h,j,[y,y + dy)} the cost incurred in [t, t +dt) is [:ledt.A)=C.,(z)ANi,-(t) and therefore the present value of costs incurred in [O,t]due to transitions of the type h—>j is C£})(t) = J36” ny(ds,A). Ignoring discounting C j,” (t) = J; nyMs, A). Let ZO a fixed covariate profile. Then using the martingale property of N (t, A), the conditional expectation of this net present value given X 0 = i ,20 is E(ClPltHXo =i,Zo)=E( [36" [”ledaAMXo =i’20)= =E(J:e’” [:yA(ds,A)|XO =i,Z,,). Using (1.4), E(C,(,l.)(t)|X0 =i,Z0) becomes E(ge'” fymc, edyl V, =u,X, =j,f,_l ][X,_, = h]a,j(s)ds | X0 =i,ZO)= = Le-mmfyflq edyl V. =u,X,, =j,f}n, ][X,_, =h]|XO =i,Zo)a,j(s|Z0)ds, 30 With our previous notations, c, (s) = [x,_, = h] E” yP[C, e dy IV, = u, x, = hf,“ 1. Then E(c}})(t)| X0 =i,ZO)= [Se-“Tm, (s)| X(s—) .—. 12.x0 = r320”), (0,s|zo)a,,. (s | Zo)ds. By A2, E(J; EyN(ds,A)|X(0)=i) = ge'rscm-(slZO)P,,(0,s|Zo)a,j(s|Z0)ds which is the same formula as (1.1), where c,, (t) = E(C,j (t)| X(t—) = h). We will estimate A,,- (t | Z0 ) and P,, (s,t | 20) from a Cox proportional hazards model for multiple states and c,, (s | Z0) from a random-effects model. Putting A everything together we are able to compute NPV,?) (t | i, 20) an estimator of NPV,,” (I | i ,Z0) and show asymptotic properties for the derived estimator. 1.3 Estimation of transition probabilities We now turn to the estimation of (1.1) and (1.2), focusing on the former. Andersen et al (1993)25 pioneered an elegant asymptotic theory for estimators of ,6, A,, (t | Z0 ) and P,, (s,t | Z0) . For each of n patients in a study we observe processes of the type described. For the ith patient the basic covariate vector is z, (t) , the initial 31 state X, (0) , the state indicator Y,, (t) = [X, (t—) = 12, U, 2 t] and N,,-(t) =#{sStAU, :X, (s—) = h,X,-(s) = j,},h at j. Conditionally on {z,,X,-(0):i= l,...,n } assume the processes { X,(t):te T } are independent and that the Cox regression model 61,,- (t | z(t)) = ath (t)exp(,6'z,j (t)) described in Section 1.1 holds for each individual with the same baseline intensities, i.e. ah,- (I I z.- (t)) = ahjo (t)exp(fl’zrj,- (t)) for all i =1,...,n . From now on denote by N ,j (t) and Y, (t) respectively, the aggregated n n processes ZN,,(I) and ZY,,(I). i=1 i=1 The following standard notation will be used. For any h o Jh(t)=[Yh(I)¢0].l-Jr,(t)=[Yh(t)=0l=IZYh.~(t)=01=[Y;,,-(t)=0.Vil i=1 For h¢ j: o 2,, (z)®"’ = 2,, (02,, (I), if m = 2; o 2,, (t)®'" = 2,, (t) if m =1 and 2,, (t)®"'=1 if m =0; o s,§;"’(t,fl)=2Y,,(t)2,,(t)®'" exp(,8'z,,(t)), me{0,1,2}; i=1 0 5,0,5)=S,‘,}’(t.fl)/S,‘,f’(t.fl); o icy-0.3):Sif’lrfiwsif’(t,A)—Eh,-or)“; o 10,29): 2 £V,,(u,fl)dN,,(u) with N, =ZN,, ; i=1 haej 32 o sif’afl):ElY,,(t)z,,-,(t)®'"cxp(,6'2,,(r))], mE{0,l,2}; o e,,-(t,,5)=s,(,;)(t,,B)/s,(,f’(t,fl); o v,,(t,,6)=s,})(t,fl)ls,gf)(t,fl)-e,§2 o A,, (t | Z0) = A,,0 (t, B) exp(,3'Z,,-o) where the maximum likelihood estimator A ,6 is defined as the solution of the equation U (2', ,6) = 0 where n It Ult.fl>=Z Z [lzhjm-E,(u,fl)ldN,,-.-(u) i=lh.j=l hat} 0 20,,6) = Z Ev,- (u,,6)sf,(.)) (u,,6)a,,o(u)du , px p nonrandom matrix does hatj not epend on n. The following assumptions will be adopted throughout this chapter. Although not all conditions are needed for every result, we state them all to avoid too many technical distractions in the theorems. We denote by H . II the supremum norm of a vector or a matrix, e.g. the norm of a vector a = (a,) or a matrix A = (a,,) is H a ll: sup, |a, | and H A II: sup“- la,- |, respectively. Convergence in probability and weak convergence are always as it tends to infinity. Model Assumptions and Conditions: A5 Conditional on z,(-), U, is independent of X, (.); A6 (N, (.), Y, (.), z, (t)) ,1 S i S n are independent identically distributed; For h¢jz 33 T A7 A,,0 (2') = La,” (t)dt < oo; A8 2 b—y 2(2,=fl) Z]; ‘0’ r — v,, (u, ,6)s, (u, ,6)a,,o(u)du is positive definite. notation hat j 0 There exist acompact neighborhood 6’ of ,6 , with ,66 6’ (the interior of 6’ ), and scalar, p-vector and px p matrix functions 5,19), sipand 3(2), h ¢ j, defined on [0,2']x6’ such that for me {0,1,2} and h,jE{1,...,k}, h ¢ j: A9 sup (1 ..BHO rlxb’" —S"’"(t 5)— s,‘,’-'”(t fl)I A10 s,}"’(.,.) are continuous functions of Be 6’ uniformly in [6 [0,1] and bounded on [0, r]x6’; SIG) (., ,6) is bounded away from zero on [0, r] and 0)(I ,5) 82,2)(1 ,6)=—s, ,1)(t, ,3) (Asymptotic Regularity Conditions); 8hr)“ fl)=afls,, 31,3" All There exists 6 > 0 such that n’“2 sup |2,,(t)| Y,,(t)[,6’z,,,(t) >-5|2,,(t)|]—P—>0 (Lindeberg Condition) h¢jJJ where [a > b] =1 if a > b and zero otherwise; Conditions A7-A11 are implied in the independent identically distributed case (i.e. under assumptions A5-A6) by more general conditions as proven by Andersen et al (1982)57 Theorem 4.1. These conditions are: . 1. 2,,- (.) and Y,, (-) are left continuous processes with right hand limits processes 34 3. Z, is positive definite 4. P(Y,, (t) =1,VIE [0,2]) > 0 5. ' E[sup Y,, (t) | z,,, (t) I2 exp(3'z,,, (t))] 0], Y, =ZY,, . We use the convention %=0. i=1 i=1 Let A,,0 (t, 8) = -Z Am.O (LB) . Thus the matrix of integrated baseline intensities , jxh A0 (I) = (A,,O (t),h, je {1...k}) is estimated by 1300.3) = IA,,O(I.B),h, jE {l...,k}) .We define for a fixed covariate profile 20, (and corresponding type-specific covariate 2,0) A,, (t I Z0 ) = Am, (t)exp(,3’Z,,-o) , h¢ j , Ahh(IIZO)=-ZAhj(IIZO)- jaeh Central to all our proofs is the derivation of asymptotically equivalent representations of JR]? — fl) and w/h(A(t I Zo)— A(t I Z0 )) in terms of iid random variables. By asymptotically equivalence of two quantities is meant convergence in probability to zero of their difference (and where appropriate, uniformly for t E [0.2'] ). We use the next theorem from Andersen et al (1993)”. Theorem 1 (Theorem VII.2.l, p497, Andersen et al (1993)”) Under A5-A11, the probability that the equation U (2', ,6) = 0 has a unique solution ,3 tends to one and ,3 —’-)——> ,6 as n ——> oo . CI The next theorem gives the asymptotic normality of ,3 and an estimator of the asymptotic covariance: 36 Theorem 2 (Theorem VII.2.2, p498, Andersen et al (1993)”) Assume AS-All. Then n1’2(3— ,6) converges in distribution to a zero mean normal p-dimensional random vector with covariance matrix 2;] and A A b-v A n-lIUHB)->-7(1,,5)II—5—-->0.Inparticular22r = n'lI(2.fl)—L>Z,.D notation SUp tdOJ] Following the proof of Theorem VII.2.2 of Andersen et al (1993)” we can show that the (h,j)element. hat j, ofthe kxk matrix J72(A(r|zo)—A(t|zo)) is asymptotically equivalent to exp(,B Z,,0){ XIII!) (t)+ X”) (1)} where n 2 I I X}.,’ (t) = JZw—fl) I,(Z,,,-o —e,., (2mm... (u)du xggjo): JTIOS —:—d(")fl),,)" M,,(u) Here M,,(t)= N,,(t)- £S(O)(u,fl)dA,jo(u) and M,,(t)=2n:M,,(t),where ..r M...- (r) = N,- (t) - [Y,,- (u)exp(fl’z.,.- mm... 00. Let h,j. (I) = [32,, -e,, (u,,8))a,,0(u)du ( pxl vector). Then X1823) (t)=JZ (B—fl)’b., (t). We expand both X I31) and X $7,) as sums of iid random variables. J,(u) WeconsiderXéZ} first. Indeed XéZJU): II: 5,0,, ufl) Hfilw(u u)" 37 l =T J; W )[s,‘,°’(u fl) sf,°’(ufl) 1 dM.,-(u)+ dM, (u) M), (u) (1'1 (ll-:1——)) - +\/1-'[;S,(,O)(u, ,3) 71:]: h 380“ “,5) For easiness of notations we will denote the three terms of the above sum as T1 , T2 . T3 , whereT,=— jzujgm )[ 0 (0 dM,,(u), T2: —I;(f:)l——"——’(u) and swam s, ,’(u ,)]fl T s... (at?) ( ) Jig—J30 Jilu ))fi To reduce the representation of X £2} as sum of iid variables, under the assumptions A5 - All we prove that the first term, T, converges in probability to O and the third term, T3 is null. Indeed, if we consider the first term 1 IT‘HTI; h‘“ ”5‘90 3) s‘°’(u .dfifiMWHS hj s,‘,,°’0 (by Assumption A9) and sup I 5,0) (u, ,6) |< B < co uSr uSt (by Assumption A10) it follows that IT, |——P——>0. The third term, T3 , is null: (u) i(u u)_ 32T£(1-Jh(u—d——__))S(O)(thfl) =:/-—;,Z:,:'E[Yhi(u): O,Vi]s(ofifh__j—— fl): 1 " Nu“) :Tg-§£[Yhi(u): OJVIW— Shj( th (u)CXP(.B’zhji (“))dAhjo (u) s9’(u 1?) =0 " t {3 join..- 00 = 0m i=1 Using M ,, (u) = Zthi (u) . the second term, T2 can be expanded as sum of iid random i=1 .Therefore we have showed that dM, (u) —X ,_ dM,,(u) ' bl T- — vana es as 2 J—‘EF_ ,9,“ ,5): n =1S,Q)(ufl) XéI',’ (t) \l—IOS ,yo) J”((——u—-du)fl) M,,(u) is asymptotically equivalent to the sum of iid dM,,- (u) random variables: —2 £———— 0() <11 X?) To Obtain the derivation of TITLE — ,8) , we use Theorems 1 and 2 stated above. Consider now XI; (t) = «5(3- ,6)'b,, (t) . where b,,(t)= I;(Z,,o—e,,(u,,6))a,,0(u)duisa pxl vector. We have 39 0:..1 \/_(fl- ,5): TEXT, ,6.) U(2', ,6), where bya, a=b, we mean a, is asymptotically equivalent to b, . We can write U (2, ,8) as a sum of independent identically distributed random variables as in the proof of Theorem 2 (see proof of Theorem VII.2.2 p498, Andersen et al. (1993)”): 1 " ' T’IU(T’fl)=;,§ 11= £11,, (u)dM,, (u) where 11,, (u) 7.;(z,j,(u)—E,,(u,fl)) are predictable, locally bounded processes (p496 Andersen et al (1993) 25). Consequently, 7301,,- (t Ian—A..- «Izod: expw’zhjonxigy (t)+ X31} 0)}: n k =exptfl’zhjoll Zr 2 £11,...(ulth,(u))’2O. (u./3)E[0.r]x‘21 Therefore £31.;(ehj (u,,6)—Ehj (u,,6))thfi(u)—P_)O and J: Hm, (u)dM,U, (u) 2 j: J.(z,,j,.(u)—e,,j(u, fl))thj,. (u).Therefore for 12:: j we have proved the next lemma Lemma 1 Under the assumptions AS-All, £01,”. (I | ZO ) — Ah]. (I | Z0 ) is asymptotically equivalent to n k exp(fl'Zhj0){Z( Z £%(zhji(u)-ehj (u,,6))dM,y-i(u))'£(2’,fl)'lbhj (t)+ i=1 h,j=l hat} " thj, (u) +JZE£T_ £0)(u, 19)} . If we denote n k s;{;,(t,r)=2( Z 13171—th}.-(u)—e,,,-(u.fl))dM,,,-,-’2(r,fl)"bhjm and i=1 h,j=l ’1 hat} ,dM (u) 5(2) _ h}! n.hj(t)" :71-—;§LST_?)(14 fl) then MAM-(zIzo)—A,.,(z|zo)a§'exp(fl’zhjo){sf,‘2,,(t)+s,‘,33,. =0 for (h,j)¢ (m,r),(h,j),(m,r)e E I. by (lb-0(a) 111)(t)=0)1?j(1) = £S£9)j(u ,60) J 9 notation iv) C0v(U;,,j (s),U;,,j- (t)) = a); (SAI) 42 The ()1, j )-th element of the limit process Ur(t) of SE,” (t,T) can be expressed as éébhj (t), where :0 is a p-dimensional normal random variable (its distribution does not depend on t) with zero mean and covariance matrix 2(2’, ,8)"1 and bhj (t) = E(Z’VO —ehj (u,,6’))a',,j0 (u)du is a p-dimensional vector depending on t. The processes Ufhj and US,”- are asymptotically independent and the limiting distribution of fi(A,,j(z|zO)—A,j(:|zo), exp( fl’zhjo )(Ufhj (t ) + ugh}. (t)) (1.7) has mean zero and variance exp(2fl’z,,jo)(w,3j (0+th (r)’£(r,,6)" h,j (t)} (1.8) The covariance matrix of (52,)”. (1,7),534. (t,r)) can also be computed directly, using the definitions of 3,?) (u,,B), m = 0,1,2 and the independence of the processes {X,-(-),i =l,...,n} . Then I: k n var(s,§{,{j (2,7)) =Var(Z( 2 j: (2,1,. (u) —e,,j (u.fl))dM,,,-,- (u))’>3(r.fl)“ b,.,- (t)) = i=1 h,j=1 hatj n k = b,.,- (t)’2(r.fl)”‘ Var<2< Z [I (2,,- (u) —e,.,- (u.fl)>dM,.,, (u>>’)>:’ 2(r Mi) 2 E( [0’ (z,,,,(u>- e,.,(u fl>)®2Y,,,-(u)exp(,6’z,,,-,(u))a,,,o(u)du)’- i=lh,j=l hatj -2(r.fl>“b,,,-(r) = :nbhj-(t) 2(7, ,6) (Z E(S/(J)(“v ,B)- eh] (u may?) (u ,3)- sf,”(u, ,5)e;,j(u,fl)+ h¢j +ef3 (u fl)s,‘,°’(u fl))a,,,0(u)du)2(r fl) .‘b,(z)= .—. nbhj (t)’2(z', [3)" 2(1, fl)'2(r, fl)“ h,j (t) = nbhj (t)'2(z’, fl)“ bk]. (t). For the last equality we use (0) 5;?)(14 )6)— ehj(u, ,3)s(l_>’(u,,5)- 5“)(u, fl)ehj (u, ,B)+e® ‘(u, ,B)shj (u,fl)= hj Sm SU)’ (u ,6) 04 fl) __ 2) Say 5(1) Shj +Shj)(u )6)SU-)(u )6) 580) 5,30% fl) (u m-vh, (u fl)s,‘,°’(u fl) and the definition of 2(1, fl)- -— Z Evhj (u, ,6)shj)(u,,6')ahjo(u)du . hatj thi(ll)) zivar £thji(u) _ Also nvar(S(2) (I 7)): Var(z [0d f,°’( um) 5(0)“ 13) _ 11, hj 9 =2 E(L—d s£9)(u,3 2(u))= zzk: (0)( _—)'25(th (“)6Xp(fl' zhj, (u)))ahj (u)du- _ Shj( l " I = ESE?) (u,fl)2a'hj (U)E(;Yhi (U)CXp(fl zhji (u)))du = t l =nJ‘0 (9)(u,fl)2ahj(u)sl(:)(u, ,=,B)du nEWL—a,,j(u)du=nw,?j(t). u.,6’) Also 521,1]. (LT) and 522,3]. (t) are asymptotically uncorrelated because the martingales M M and M M are orthogonal for i i k (by independence over subjects) and for j :1 (by continuity of the functions I‘M-0(1)) . Indeed Cov(sf,f,’y. (t, magi; (t)) = = E{ [Z( Z EVE—(liar (u) —ehj (u’flDthji (“)YZUMBYI bk! (”1' i=1 h.j=l Int} thJ, (u) [_‘/1=2£____ s£°)(u fl)” _ ,(u u) =—ZIE{[hZ=l E(zhji(u)— ehj (u fl))thji (14)) 2(7 fl) lbhja )HL— :(0)'——h—'_j fl) ]}= ' h¢jj= _ .-(u u) =—ZE{[J:(Z,U, (u) eh] (u fl))th,-,(u))'£(r l3) lbhj( )Hg‘fiy’y—‘E—l}: ,(u u) —ZE{E[ J: (zh,,(u) eh, (u fl))dM,,,.(u))’.>2(r fl)"bhj(r)J;-7m—’”—7)I S,l}= _ ,(u 14) =_ZE{IJO(zh,,(u)— eh,(u 5))thj,(u))'Z(T fl) ‘bhjumJ; T_,‘”hju ml}. Using d < thi ,Mhfl > (u) =Yhi (u)exp(fl'zhji (u))a,,j (u)du and Fubini Theorem, the covariance of the two sums becomes 45 Cov(sf,',§j (m) 58,110»: :-ZE {—_—£ (0): ,6)(Zhji(u)_ehj (usflDYd (u))Z(Z',,B)_1th-(t)} = shj( u,,6 {( 150,33, (u>exp(flz,,,, ’1a,,,(u)du) shj( u,,B -2-Bhd+5221>fih+ g=ll¢g It +2 £31: (51“ I Zo)d(5ii§g (“+52% (u))Pg,, (M | 20)}.(1.10) g=l Since 55231; (u) + 5,238, (u) = —(Z 32;, (u) + 5,33, (11)) we have latg J;(i§h(s,t|ZO)—Rh(s,r[20» .=. k k = #22 Egg (s,u | zo)d(s,‘,{;, (u)+s,‘f;, (u))(P,h (11,: | zo)-Pg,, (11,: | 20)) (1.11) g=ll¢g If we replace 521;, (u) and 5213,01) by sums of independent identically distributed variables we will get an independent identically distributed representation forx/IIUA}, (s,t | ZO)—R,, (s,t | ZO )). Here we use I: k d532,, (11):?th J: 71—;(zhfl (u1—e,,,- (u,fl)>dM,,,,- (u))’2(r,fl)"‘db,, (z) and z: ,)= hatj 48 dM 11,. (t) (0) ).Using Theorem VII.2.3, p503, Andersen et al (1993)25 and (Lfl dezizj (I): J; _2— [:1 sh!- (1. 11), the limiting distribution of J—(P Ph(s, IIZO)— Ph (5, tIZO )) is given by the next theorem. Theorem 3 Under A5 - All, x/ZUA’M (s,t | Zo)- Pm (s,t I 20 )) converges weakly to Um, (s,t | Zo)+U2,-h (s,t I 20), where Um,(s,=t|ZO) IZZI'P (s,u|zo)(P,,,(11,:|zo)—Pgh(u,t|zo))de;}(u) and g=ll¢g U2,h(s,=z|zo) JZZZI'P (s,u|Zo)(P,,,(u,t|ZO)—P W(ut|20))dU§2,(u). g= -ll¢g The processes Ulih(s,t|ZO), U2i,,(s,t|ZO) are independent. :1 Next we calculate the asymptotic covariance function of n”2 (P(s,t | ZO)—P(s,r | 20)) . Because Sin)(’) and an)(t) are asymptotically uncorrelated the k2 xk2 covariance matrix of (1.9) is the sum of two terms. From (1.6) the (i, h)-th element of the first term in (1.9) is 121/2 (3‘19), 2 6XP(,B'ZgIo)£Hg (S,u IZo)dbg1 (10PM (“J I 20):"“2 (B—fl)'a,-h g,l.g¢l where bg, (t) = E(ng -eg, (u,,B))a'g,0 (u)du is ap-dimensional vector depending on t, 49 and a”, = Z exp(,15"Zg,0)‘I'I§g (s,u|ZO)dbg,(u)P,,, (u,I|ZO). Therefore the g,l.g¢l S asymptotic covariance of the (i, h)-th and (q, r)—th elements is egg/3,1)“ aqr. In order to estimate this covariance we replace 296,7) by n'11(,3,r) and a”, by gem/‘3 73,0)ng (s, u|zo)dfig,(u)P,,,(u,z|zo),where 13,, (t): £12,,0-E,,1d2,,,_ t I z, ]. If G is the survival distribution of the censoring time U ,- , then p(tij ’11): G (th - | 2,. ) on tl-j S 2' and therefore under the mild assumption that G(z'| zi ) > 0 with probability 1, we ensure p(t z,)>0 whenever sij =1. Hence E(wij Izi,yi,X,.)=E(wij Izi)=l if tij Sr 1] ’ and 0 otherwise. Let wi be the diagonal matrix with jth entry wij. Then the applicable modification of the previously mentioned objective function q(y,. , X,- ) is ciry. ,w..X.- 1 =v21w1’2 (14)“ (y.- — x,19)y{wy21.;1 (y.- -X,-b’)}. (1.14) Here 1.,-denotes the unique lower triangular matrix with positive diagonal elements such that Q = LiLf. This exists since 52,- is positive definite. Our assumptions ensure the diagonal matrix E(wi Izi,y,.,X,-)=I,. provided tin, Sr. Then 51cm. .w..X. )1 = EtEtci ( y, , Xi) preserves the time ordering of costs because 5'”- depends only on { 1’11. ,k S j }. This is also true of the rows of i,- if only covariates ascertained at prior times are involved. 3. Model (1.12) can be generalized to include additional random effects, through a qxl random vector a,-. The model is then yi= Xifl-r Ziai +01 , with Z,- containing a subset of the covariates in X. . For example Zr =[11 Iti Itf] might be used to model the time dependence at the individual level. Under an obvious modification of A3, the composite error vi = Ziai +ui satisfies E(vi IXi) =0 and E(viv; I X,- ) = Q,- = 0311+ ZIGZI where G = E(aiafl X,- ) . Even if we assume the qxq matrix G to be constant, there is conditional heteroscedasticity in 91'- However, the same arguments leading to (1.15) and (1.16) will obtain. 55 1.4.2 Consistency and asymptotic normality of 3,, Since E(Xgii ) = E[X,’-L}1E(wi | X,- ,zi)L71Xi] = E(XEQ;1X,- ) , assumptions REl, RE2 ensure that (11‘1 Z XIX- )"1 converges in probability to (E (XEQi—IXI ))'1 by i=1 the weak law of large numbers (WLLN). Also writing 9,- = wI/zlevi we get E(ngi ) = E[X;(L; )'1 E(wi I Y1 ,Xi ,zi )levi ] = E(Xgfli'lvi ) = 0 under REl. Hence, by the WLLN we have consistency of 3w. Also from (1.16) n“2 (8.. —fl) = (A;‘ +0, (1>>(n"”ZX;1.-) (1.17) i=1 where Aw = E (xgrzg‘xi ). Application of the central limit theorem gives the normality A offlw n"2 (3.. —5)—D—>N(0,A;‘BWA;‘) (1.18) where BW = E [igvivgfg ] . Another form of Bw is derived under additional assumptions as follows. Now Bw = E[x;ir,.i1;x,. 1 = E[x;(L; )‘1 wiL;‘v,.v; (L; )‘1 w,L;'x,.] = 15rx;(L;)‘l wiL}1E[viv,’. | f](L;)‘1 wiLj'Xi ], (1.19) where f is the set (U, ,ti 1X1 ,2.) and recalling that w,- is a function of (Ui ,ti ,2, ). Since U. is independent of (ti,yi,X,-)given 2,. we getE[viv: If]: E[viv;|X,-,z,.].0ur 1 matrix X includes ti}- ,1 S j S ni , but 2‘. would have variables not in X. . From RE] and RE3 we get only E[v,- IX, ] =0 and E[v,-vf | Xi ] =fli. Strengthening these to 56 51V,- IX),Z,- ] = 0. E [vivg I X,- ,z,- ] =91 preserves the previous assumptions and gives BW = E[X:(L,'-)'1wiw,-L:'Xi]. However, without these additional assumptions on v,- we have to be content with Bw given in (1.19). 1.4.3 Computation of Awand Bw To implement the result (1.18) we will need estimates of A w and Bw. From the discussion leading to (1.19) consistent estimators are obtained as Aw = 114212;)?“ fiw = n'lZi;$i$;Xi . (1.20) i=1 i=1 Here 9,. = yi — iiflw are the residuals obtained after estimation of ,6. A consistent estimator of (2‘. may be obtained by OLS estimation in the model transformed by wI/z . that is, the model 1.7 = Xifl + V? where y: = w” 2 3’1 and the other asterisked symbols are similarly defined. Let 30,5 1‘ denote the OLS estimator from this model and i". = W?”2 (y: — x350“) the corresponding transformed residuals. Consistency of 3015 follows from RE] and the mild condition that E (XIX; ) has full rank. A general consistent estimator of (2,. is A n 12,. = 12’1 2 91.9; . However, under RE3, (2,. has a special structure involving just two i=1 S7 . 7 . unknown variance components 03 and 0,? . We would then estrmate these components 11,-1 fromaf +002:(Z:1n1 -p)]ZZv,-j and 0',, 2=(}:‘/2n, (n, —1)— ”12225111115- k>j 1'=lj=l i=lj=l 1.4.4 Estimation of G(. | 2) To completely specify the weights we are left with estimation of p(t,z) which is given in terms of the censoring distribution G(- I z) . In our observational scheme follow- up of the ith patient starting in transient state X (0) at time 1‘ ,0 =0 may lead to one of three distinct scenarios. (1) Observation ends at the jth transition time 1,,- Sz'in the absorbing state k. Then U >1,, and X (1,, )= k. (2) Observation continues beyond 1,,- but ends at Twithout the occurrence of another transition or censoring. Then U ,- > 2', 1,, ,1 > 2' andX, (1,,- )¢ k. (3) Observation continues beyond 1,, but ends at U, before for the next transition time. Here 1,., < U , < 1,.,+1 A 2' and X ,- (1,., ) at k. The likelihood function involving G(- I z) and its density g(- | 2) can be expressed as 1111100.)- Izrn‘v‘l‘j 101112.11“ ‘5“ ”U “’°**>°"”1gr>tij] QGQU'IZI) Gg( t'jlzi) 60(lei) V . . +(1—61jlj)“ E(Zdfi (60 )) = 1(60 ) in probability. The q> Owith probability 1 and 6 ——> 60in probability, by application of the uniform WLLN the first term in (1.23) converges in 59 I! probability to AW. Its estimator is AW = 11'1 XX? wix; where wi is diagonal with i=1 elements WU. For the second term in (1.23) we have the expansion sijixjivj "4’222600' - (""22 i=1 j i=1 j {G(I;jr::):16)} 2 V3001,- Izhéiinmwlao) where é is between 19 and 60. Again, by standard arguments the bracketed expression # vt ijux above converges in probability to D(60)= E(z 'j v” I {G(Iijlz 21'160”2 2V4’900ij Izi ,60)) apx q [ti]. S z']x,.'jv 'jV matrix. Note that D(60)is the same as “260,- M 190) V’gG(tij |z ,60)).Combining these results together with (1.22) into (1.23) we obtain * vi S-ijx Wm... -fl)= A401 "WZZ{G(1|VU )D(60)J"(60)d,-j(190)})+0p(1) =1 1 I} = A;‘ (It-1,2:{ki —D(60)J" (190)d,. (60)1,. })+op (1) i=1 n t t 1 S..x..v.. where kl. =2 U U U , d,-(60)the qx(ni +1) matrix of the du- (60) and lia '= G(tij Izi'BO) (n1. + l)>o. re[0.r] Also Chj (. | 20) is assumed to be bounded. We have uniform consistency of f3 , i.e. sup | 13:12 (0,t | Z0 ) -— 8,, (0,t | Z0 ) | —P—>O. Based on these results and Theorems 1 and 2, re[0,r] the net present value estimator is uniformly consistent, that is sup |NI3V,fj”(t|i,ZO)—NPV,fj”(t|i,Z0)|——P—>O. re[0.z'] This result is proved in Polverejan (2001)“. Conditional on the initial state 1', given the vector ZO of basic covariates, we will assess the asymptotic normality of the net present value of all expenditures associated with the h to j transitions in (0,t], i.e. the asymptotic distribution of 721/2 (NPV,? (t | i ,20 ) - NPV,;1)(t|i,ZO)) using the Functional Delta Method. Consider the trivariate process 2"" (r) = (21”) (2)25") (r),z§"’ (t)), with components 2:")(1)=‘/;8_n(5hj(’lzo)’chj("20)), Zé"’(z)=~/Z<é.e‘”chjo(t|zo) where chj0(I|Z0)= ng0(t)§, and g, is a multivariate normal with mean zero and covariance matrix 1133“,)“: . Using (1.7) and Theorem 3 we obtain the weakly convergences A 0.8. 25"’(t)=JZ(a,(s.rIzo)—a,.(s,t|zo» = k k =ZZJPg (s u-IZo)d(S,(,lg,(u)+5,(,2;1(u))(3h(urt[Zo)—Pgh(“vtIzo» ——D—>Um, (s,t | zo)+U2,,, (s,t | 20). where U1i;r(5’lzo)= 221 P... g=ll¢g s Uzi/z(S’IZo)= 22]: a.(s.u-Izo>—P,.>dUé131- g= -ll¢g and U1”, (s,tlZo), Uzih(5v’|Zo) are independent; and also due 23"”(2): J—(Ahj(t|20)— Ahj(t|Zo)= exp(fl’z,,jo){sf,‘,1j(t,r)+s;2,;(t)} —D—>exp(,B’ZhJ-o )(Ufhj (I) "Ll/$12; (t)) . 11¢ j where n k 53,310 r)=-i-Z( Z E—l—(zsm-eh,(u,fl))dM,.,-.(u))’2(r.fl)"‘b,,-(t). n i=1 ,)=1 )5 64 thj, (u) ”h! (:t) -T:£—— (0)(u fl) and the (h, j)-th element of the limit process U: (t) of Sf,” (t,2') can be expressed as :3th (t), where :0 is a p-dimensional normal random variable (its distribution does not depend on r) with zero mean and covariance matrix 2(7, ,6)“1 , bhj (r) = £(Zhj0 —e,,j (u,,6))a(hj0 (u)du is a p-dimensional vector depending on I; also Ufa/‘0) and Ughj (t) are independent. Theorem 4 Under assumptions A1 - A11 and REl-RE3 for a fixed time t and h :15 j : Ahj(t|Zo)-Ah,-(tlZo) U;j(tlZ0) J; iih(0atlzo)"Pih(Oit|Zo) —D_’ UihUIZO) Ehj(t|ZO)_chj(t|ZO) chj0(t|ZO) where the elements of the covariance matrix 2 are 2(1,1) = exp(Zfl’ZhJ-0 ){wfy (I) +b;,j (t)2(2',,6)’1bhj (1)}, 2(2, 2): ZZEP- §(s,u|ZO)(P,,,(u,t|zo)-Pg,,(u,t|zo))2- g= —11¢g eXp(flZg,o)d(31(t)+b;,(u)2(7 fl)"bg,(u)) 2(3, 3)=e‘2"x,’,jo(t)A; ‘B ijhjoo), 2(1, 2) = exp(fl'Zhj-O )bjy- (t)2(2', ,6)‘1 F”, (r, B) + +exp(2/3’z,,jo)jo Ph(0, u|z0 )(Pjh(u IIZO)- P,,,,(u, t|zO ))dwh2j(u)) 65 where Eh(t):Z:=lZ;g £138(o,u|20)(P,,(u,t|zo)—Pg,,(u,t|20))exp(fl’zg,o)dbg,(u)), Z(l,3) =X(2,3) = 0.1:] Proof: The expressions above for the diagonal elements of 2 have been previously proved in (1.8), Section 1.3 and (1.18). We prove that 2(1,3) = £(2,3) = 0 , which verifies that 2f")(t) and (Z§"’(t),Z§")(t)) are asymptotically independent. Let us remember that 20,3) = Covrxgj0(t)A;,‘x;(L;)" w,L',i‘v,.F(z,r.M,-.z, )1 where F(t,2',M,-,z,.)= k l I I—1 thji(u) —— ~ --, M,- 2, bvt+———. {h§1£fi(zml(“) e,,,(ufl))d ,,(u)} (r fl) m) Esgfnw) hat} The expression XXL; )’1 wiLI'v, is a function of all transition times t,- = (In ,ti2 ,...tini )observed in [0, r] as well of (U,- ,z,- ,yi ,X, ). We will impose the assumption E(vi IX,- ,z,~ ) = O . The function F(t,r,M,- ,zi) depends on th = Z thi where l thi (I) = Nhji (I) - EYM (“)CXMfl’zhfi (“))dA/tjo (u) 66 depends on a subset of each of ti ,zi,X,« and U,- . Since w is a function of (U,- ,t,. ,z,) and U,- is independent of (ti,y,-,X,-)given z,- we get E[v,- If] = E[v,. |Xi,z,.] where f is the conditioning set (Ui ,ti ,X, ,z, ). Then E(vaiMhfl (t) | f) = x; (L; )’1 w,L}‘E(v,. |x,. ,2, )M,,fl (r) and so under the assumption E(vi | X, ,zi ) = O we get 20,3) = 0. Similar considerations lead to Z(2,3) = O. The covariance of exp(,B’Z,,J-o )(U:,,j (t) +Ughj (t)) and U1,,,(O,1|ZO)+U2,~,, (0,t | 20) can be calculated using results previously stated in Section 1.3. Z(1,2)= E(exp(fl'Zhjo)(Ufhj (t)+Ughj (t)) (Ulih (0,! | ZO)+U1,.,, (0,t | Z0 )))= k k =E(exp(fl’zhjo)Uf,,j(t)ZZ L138(o,u|zo)(P,,,(u,t|ZO)—Pg,,(u,t|zo))- g=lltg -exp(,6’Zg,0 )de2 (u)) + k k t +E(exp(fl’Zhj-0)U;hj(t)zz [012g (0,14|Zo)(P,h(u,tIZO)—Pgh(u,t|Z0)). g=ll¢g -exp(,6’Zg,0 )dugfg’, (u)) = I + 11. Using U ; hj (t) = gab hj (t) , the first term of the sum can be calculated as k k I =E(exp(fllzth)Ufhj(t)ZZ £Eg<01uIZOXBhU‘JIZO)-Pgh(u’tlzo))' g=ll¢g -exp(fl’Zg,O )de; (u))= 67 k k =E(CXp(fl’Zhjo)§6bhj “)2: Jgpig (01“ I 20 X8}; (L1,! I ZO)—Pgh (“9’ I 20 )) g=ll¢g -exp(,6”Zg,0 )déabg, (t). Therefore I: exp(,B'Z,U-O)E(§E,bhj (0&6th (t,,6)) where k k F”, (t,fl)=ZZ I331: (o,u|zo)(P,,,(u,t|ZO)-Pg,, (u,I|ZO))exp(fl'Zg,O)dbg, (1.)). g=ll¢g Hence, =CXP(.6'Zhj-o )E(b;.j (050%)th (I, ,6)) = eprg’Zhjo )sz (1)2(79 ,6)—1 Fr}, (Iofl) - The second term is: II=E(exp(fl'Zhj0 )U;hj (2)2;12; £128 (O,u | zo)(P,,, (u,t | zo)- P8,, (u,t | 20 ))- -exp(,6’zg,o)dU§2, (u))= I t '1' t k k =E(exp(,8 2m)[OdU2,j(u)jO(Zg=lZ#ng (O,u|Z0)(P,h(u,t|Z0)-Pgh(u,t|ZO)). ~exp(fl'Zg,O )dug‘; (u )) = Using (Ughj,U;m,> =0 for (h,j) ¢ (m,r),(h,j),(m,r)e E‘ (so the processes {Ugh}. (.),(h,j)e E‘} are independent) and .. by t ah'O (u) U - t = (02- ’ = j < 2h} >( ) hj( )notatiml J0 51(5)) (uyflo) u we calculate I * t 11= E(CXpw Zhjo)fidUZhj(u)I0P.-h(0.uIZO)(P,-,,(u.t|zo)-P,,,,(u,t|zo)). ~exp(fl'ZhJ-O )can/$3]. (u)) = 68 =E(CXP(2.5’Zhjo)J;Bh (0M I ZOXPjh (14,! I ZO)_Phh (“it I Z0 ))d (W) :exp(2,B'ZhJ-0)J:Hh (0,u|ZO)(Pj,, (u,t|zo)-g,,,(u,t|zo))dw§j(u)). Putting I and II together, so, 2) = exp( 13’2th )bju. (t)2(r, ,6)“ F”, (t, ,6) + +exp(2fl’zhj0)£Pih(01u_IZO)(Pjh(u1IIZO)_ Phh (“it I 20 ))dng ("))- Fix the time t and consider the functional (0, IE‘ —> JR. w, (21 .zz.zg)= £z1(8)zz(s)dz3(s). where E‘ is a subset of D[O,'z']3 such that (p, is well defined. Notice we can write NPV,;"(t | 2320) as (0, (21,22,33 ) = (0, («84% (- I 20 H71. (0.-I 20%/1..,- (-| 20)). If g), has an extension to D[O,r]3 that is Hadamard differentiable in (z, ,z2.z3) then, under some extra-conditions, we can apply the Functional Delta Method. Lemma 2 Let E'. ={(x,y,z)€ D[O,2']3 : EIdZIS C}, where O.i1.,- (420))-(0, (e'r-ch) (,Izo),P,.,, (O,.|Z0),Ahj (.|20))1 LCM): (e-rthj (-IZo >13. (0,. I Z0 )iAhj ('IZO))'(e-r.chj0 (Izo ).U,-,, (0.. I Zo)'U;j ('I 20)) = = Ee"schj0(s|Z0)Bh(0,s|Z0)dAhj(s|ZO)+ £e‘”c,,j(s|ZoX/ih(0,s|Zo)dAh,-(S|Zo)+ + Ee-rschj(slzo)flh(0,s|Z0)dU;j(s|ZO)= p(,) 70 where 0;]. (t) = exp(,6'ZhJ-o )(Ufhj (t)+U;,,, (t)). U,,, (I) = Um, (int/2,, (t) and ch10 (t I Z0) = X,',j0(t)§l .13 Proof: The mapping go, :E‘ —> P is defined by (a, (zl,zz,z3 ) = £z1(s)zz(s)dz3(s), where E‘ ={(z,,z2.z3)e Dior]3 : Eldz3 I5 C} and C=Ahj(2'IZO)+lA,lj (1| ZO)< C. Thus (21,22,23)€ E‘ with probability tending to one. We have 71 "“2 (21(‘)-Zi('))—D_"Zi (') =e-r.chj0('IZO)’ Ill/2 (22 0 ’ Z2 (-))—D—>22 (') = Uih (0" I 20 ). n“2 (23(-)-z3(-))—2—>z3(-) :11}, ((20). The processes Z1 ,Z2 ,Z3 are Gaussian and hence have versions that are almost surely continuous. Let C[O,r] the set of continuous functions on [0,1]. The subset C[0,1']3 C D[O,2']3 is separable, so (Z1 ,Z2 ,Z3) has separable support. Because P(o,.|zo)=n(1+dA(.|zo)) and P(o,.|zo)=['[(1+dA(.|zo)), (0..) (0.-1 the matrices P(O,. | Z0 ), P(O,. I Z0) are functionals of A(. | Z0 ), A(. I Z0 ) , respectively. Also by Theorem 4, Ahj(’IZo)‘Ahj(’IZo) U;j(tIZO) ‘5 BMOJIZOI‘BMOJIZO) ‘2") Uih(‘IZo) e-n(5hj(tIZO)-chj(tIZO)) e'"e,,jo(t|zo) Therefore by Functional Delta Method, "1,2(¢t(21v22 ’23)-¢t (11 ’32 ’33))‘—D"d¢rE. (21’22’23)'(zl’22 ’23 ) = P(’) defined in the theorem statement. Theorem 6 (Variance of P(t)) Under the assumptions in Theorem 5, the variance of P(r)= Ee-"chj0(sIZ0)P,-h(0,s|Z0)dA,,j-(SIZ0)+ Ile—rschjbIZOX/ih(O,SIZO)dAhj-(SIZO)+ I "ifs a . +Ioe Cry-(s!Zo)R.(0.sIZo)dU,,j(s|zo)is 72 Var(P(t)) = 7],,j(t)A;leA;.17]’,,j (t)+T2hj(t)’Z(T,fl)_lT2hj-(t)+ 2 “mom i st?) (flu) +exp(2,B’Zhj-O)(2EBI +21582 + LIV-"chi (t I Zo)P,i, (0,t | 20)} ' —rs I where Two): Ioe tho(s)8h(0,sIZ0)dAhj(s|ZO), TZhj (I) = exp(fl'Zhj0) Le-rschj (5 I 20 )IP,;. (0,5 I 20 )dbhj (S) +Fih (S)dAth(S I 20 )I i I T — EB] = “P(fl ZthLe "Chj (’ I 20 )dAI1j0(t)’ winds) ' -rs -(Ioe chj (SIZO)R§(0,SIZO)IPjh (S’IIZO)-Phh (SJIZON (0) sh,- (£3.14) ) and EB: = Ee-nchjUIZo )dAth(t)'(Le—rschj(SIZO)Hih (Sit)dAhj0(S))- Proof: The form of the variance can further be elaborated with the following notation. Let F”, (r) = ZZexp(fl'Zglo)£Pig (o,u|zo )(Hh(u,I|ZO)-Pgh (u,t | Zo))dbg, (u) g=ll¢g V”, (t) = ZZexp(fl’Zglo)I;Bg (0,u|Zo)(P,h (u,t | Z0)—Pgh(u,t | 20))dugg, (u). g=lI¢g While Fm (t) is a non-random p-dimensional vector, V”, (t) is an example of an Ito process. Here U; (.) = (Ughj (.),h, jE {l,...,k}) is a kxk matrix-valued process, where Ugh}, = —2 Ugh]- and {Ugh}- (.),(h,j)€ {1....,k}} is a continuous Gaussian vector jack 73 martingale, properties of this process are described in Section 1.3. It follows that V”, is Gaussian and by Fubini’s theorem E (Vih ) = 0. Let us consider the three terms of P(t): P= fie"‘c..oP.(0.sIzo>dA..-+ + Eta-"cm. (s I 20 )Uih (0,5 I 20 )dAhj (5 I Z0 ) + + Iée'P‘chj (5 | 203,, (0,5 | zo)dU,','j (5 | 20) = [+11 + 111. Since the first term Era-”ch10 (5 | Z0 )1}, (0,5 I Z0 )dAhj (5 I 20) is uncorrelated with the other two we calculate its variance separately. Indeed Var(!) =var(I(:e"-‘c,,j0(s|z0 )Ph (o,s|zo)d.c1,,j(s|Z0 )) = =Vnr(.g1 Iée"~‘x,;j0(s)Ph(o,s|zo)dA,,j(s|zo)) For easiness of notation let Tlhj (t) = I: (”XII-0 (5)13,l (0,5 I ZO )dAhj (s | ZO ). Then Var(I) = Var(élTW (t)) = T1,,j(t)Var(§l)Tl’hj-(t)= Tlhj (t)A;1BWA;lTl’,,j (t) The second and third term are correlated and we look at them together Var(Il+III) =Var( fire—"chmIzox/ih(o,s|zo)dA,,j(s|zo)+ + Ee—"‘chj (5 | 20 )P,,, (0,5 | 20 )dug} (s | 20 )). Some algebraic calculations lead to ' —rs Var(II + [11) = Var( Ioe Chj (s I ZO)(U”,I (0,5 I Z0)+U2,~h (0,5 I 20 ))dAhj (5 I Zo)+ ' -r5 , .. “' Le Chj (S I 20 )3), (0.5 I Zo)exp(fl Zhj0)dUhj (s | 20)) = 74 =Var(exp(fl'Zth ) Ila-”ch, (5 | z() gar... (S)dA,U-o (s I 20 ) + +exp(fl’Zth)Ee-rschj(sIZO)Vih(S)dAhj0(SIZO)+ +exp(,6'Zhj0 ) Ere-”City (5 I 20 )Pih (0,5 I 20 )dUghj (S I 20 ) + +CXP(16’Zhjo).I-(:e_rschj (SI 20),)»: (0,5 I 20):“th (5)) = = Var(exp(fl’ZhjO )gg I: emchj- (5 I 20 )I 3;, (0. S I 20 )dbhj (S) “I“ Fih (SIdAhjo (S I 20 )I + +CXP(,B’Zhjo) Le—rschj (5 I 20 ){ Pih (0,5 I Zo )dUghj (5 I 20 ) +Vih (3)5“th (3 I 20 II- The last two terms are independent, and therefore Var(II + III): Var(exp(fl'Z,U-0)§6 fie'rschjUIZO){P,-h(0,5IZo)dbhj(5)+Eh(5)dAhjo(5IZ0)}+ +Var(exp(,6"Z,,j-0 ) £6""chj (5 I 20 ){ 8,, (O, 5 I Z0 )dUghj (5 I Z0 ) +Vih (SM/1MO (5 I Z0 )}). The first variance does not pose a challenge: Var(exp(/5”z,,jo)§;, Igemchj (s|20){P,.,, (0,5IZo)db,,j(5)+F,-h(5)dAhjo(5 | 20)} = = Varese.) (t)) = T...) (t)’z(r,fl)" T2,.) (t). where T2,,j (t) = exp(,6"Z,U-o) Eeqschj (5 I 20 ){Pih (0,5 I Z0 )dbhj (5) +Fl-h (5)dA,y~o (5 I Z0)}. The second term J(t) =exp(fl'Zh]-0) Ige"~‘c,,j (5 | zone, (0,5 l zo)dU;,,j (5 | zo)+v,.,, (5)dAhjo(5 | zo)})(1.27) 75 is an Ito process (here the dependence on i, h, j and Z0 has been suppressed) and it follows that J (I) is Gaussian. The first term in (1.27) is the integral of a deterministic function with respect to a Brownian motion, and hence the mean is zero. The same result applies to the second term, as seen by Fubini’s theorem. Also since C h}- (.) are bounded, non-negative processes over?’ and 8,, (0,5 I 20) are probabilities (<1),P{exp(fl'Zhj0) I:( Ige'” Ic,,j(5IZ0)1'3h(0,5IZ0)ds)2 £ (O,u|Z0)(I=;h(u,t|Z0)-Pgh(u,tlZo))° g= -ll:tg Myo(u) '(Bh (“’SIZOI—Pgh (u,SIZo))m and therefore we get —rr ‘ —r.i‘ 1532 = Ee ch}- (I | zo)dA,,jO(t).(Ioe ch]. (5| 20m”, (s,t)dAhj-0(5)). Finally evaluate EBl by noting that the processes U '2' M , U ; g, are independent for all (g, l) at (h, j). We get a single term E3] = exp(fl’zhj0)Ee-nChj (I I zoIdAthUI' r -.. (3) (Ice chj(5IZ0)B,3(0,5IZ0){PJ-h(5,tIZO)-Phh(5,II20)} 534—; )I u Therefore Var(II + III) = TZhj (t)'£(2’,,6)'1T2hj (t)+ . -n ‘ (I) +exp(2,BZhjo) 2EBI+2EBZ+I:{e chj(tIZO)P,-,,(O,IIZO)}2 3374—; u) The proof of the theorem is now complete. 77 CHAPTER 2 ESTIMATING MEAN COST FOR THE TWO-STATE CASE With the rapid escalation of costs of medical treatment, interest in accurately determining the cost of medical care has increased. Estimates from cost studies are needed to determine the economic burden of disease, to predict the economic consequences of new medical interventions, and for comparative purposes such as cost- effectiveness analyses. Health care studies are typically designed with a period of recruitment in which patients enter the study, and an additional period of follow up in which health outcomes are recorded. At study termination however, some patients would have not reached their end-point of interest which leads to right censoring of their time-to-event response. The medical cost associated with the follow up period in these patients is also right censored in the sense that the total cost at the time of censoring is less than the cost that would have accrued if their follow up continued until their end—point was reached. Due to this analogy with censoring and event times, it is tempting to use standard techniques from survival analysis for the analysis of medical costs. The survival analysis approach to costs seems also appealing because of its simplicity, its nonparametric nature and its apparent 78 robustness in the presence of censoring. However, although the approach is apparently free from distributional assumptions, it is not entirely assumption-free. Some of the assumptions that underlie such analyses, for example, the independence of event and censoring times, are not tenable with censored costs4‘ 30‘ 59. Generally, cost at the time of censoring and cost at the event time are correlated. A simple sample average of the observed costs in the patient sample would underestimate the true expected medical cost for the treatment under study, and using the average in the sub-sample of patients with complete costs would be inefficient. The current methods for estimation of the population mean cost are both nonparametric and semi-parametric. Key references are Lin et al (1997, 2000, 2003)4'5'30, Bang & Tsiatis (2000)“, Strawderman (2000)”,Wi11an et al (2002, 2003); 3, Huang & Louis (1998)“), Arijas & Haara (1984)56 and Wooldridge (2003)”. 2.1 Introduction and Background Suppose costs can potentially be accrued over a fixed time period [0, 2'] with expenditure terminating at some event time T so that complete cost observation occurs if a patient is followed through time T' = min(T,2'). Suppose y(t) is a right-continuous process that represents the cumulative cost up to time 1 (including time t) for a typical patient in the population under study. If lifetime cost is of interest then T denotes survival time. Since costs do not accumulate after T, y(t) = y(T) for all t 2 T. The cumulative cost y(z') at time 2' is the principle random variable of interest, so inference will focus 79 on the mean cost, ,u = E ( y(T)) = E ( y(T' )). With lifetime medical cost, the cumulative cost is y(T) and estimating the average E ( y(T)) is of interest. Censoring at time U might preclude observation of T‘. Let X = min(T,U ) , 6: [T S U], X. = min(T*,U) and (5‘ = [T‘ S U], where [A] denotes the indicator function of the displayed event A. The observed cost y = y(T') is uncensored if, and only if 5' = 1, and the event occurring at Tis observed if, and only if 5 =1. We assume {(7}. ,U, , yi = yi (T: )); i =1,...,n} are independent identically distributed copies of (T’ ,U , y), with the observed data given by (X; = n1in(7}*,U,-),§: = {T} s U,],y, ,i =1,...,n}. We assume U is independent of (y(- ), T). Then PM" =1| y,T] = G(T" ) , where G(t) = P[U 2 t] . For i =1,...,n the variables corresponding to the ith subject are indexed by the subscript i. Hereafter, unless otherwise specified, the distributions of the event time T and censoring time U are considered continuous. In estimating E ( y(T‘ )) the available data could be minimal, in the sense that only {X f ,6: , y,- , i =1,...,n} is recorded. However when patients are followed overtime, we may have costs observed in multiple intervals, that is the cost history is available. Lin et al (1997)30 proposed two different estimators of y = E ( y(T)) , for these two cases. Both estimators are proved to be consistent and asymptotically normal under certain conditions. Suppose that the interval [0,2’) is divided into G intervals 80 [0,7) = U[a,,_1 ,a,) with 0 = a0 < a1 < < (16 = 1'. If we use intervals closed at the left, 3 so that the gth interval is [a,,_l ,ag ), to be consistent with Lin et al (1997, 2000) and Willan et a1 (2002, 2003)“ 3" the survivor function is defined as S(t) = P(T 2 t). Keeping with the more classical definition, S (t) = P(T > t) , we would need to partition (0,2'] as U(a,,__l ,ag ]. Become S continuous, we need not be concerned with this s distinction. In both methods, either with minimal cost data or using cost histories, the Kaplan-Meier estimator of S is estimated based on {(X, , 6,- ), i =1,..., n}. Lin’s first estimator refers to the case where we observe the cumulative cost at time X ,- and at each of the observation points a,,_1 S X ,. Let Ay,g = y,(ag )— y,- (a,_,) be the incremental cost for the ith patient in interval [a a, ). Since the ith patient may be 3-1’ censored during the gth interval we define ~ y.(ag)—y,(a,_,) if X,>a, .Vi " , g yi(XiI-yi(ag_1) If ag_,.<_X,- (2.4) for any tS X0”: max{ X,- :i=1,...,n} . In (2.3) w, is then 6,- /G(7}). They show that [13, is unbiased, consistent and asymptotically normal and the variance can be estimated by: 6.0.- waif, dNC V ,u V2 "-5-: 6m 3157—,“ (y u)- (yu)} 83 1 " 6,-y,[T-Zu] I n§(u) i=1 G(T-i I where V(y,u) = Lin (2000)4 considers a regression setup )2,- = z, ,6 + 8,- , where z, is a 1x p vector of covariates. He estimates ,8 using a weighted least-squares method and proposes two different estimators, for the cases of minimal cost data and interval cost data. When no I: II covariates are present, his estimator reduces to (Z w,- y, )/(2 w, ) , where the weights i=1 i=1 are w, = 6," / G(Y?) for all i =1,...,n . For minimal cost data these weights are estimated by vi», = 6: / G(T,m ), where G is an appropriate estimator of G and the mean estimated cost is , " 6Ty.(T.“) " 6’ fl =( J—A‘I‘TI—IK '7—'.—) (2.5) "3 2i G(T.) EGO.) For the interval cost data consider a model yig = ziflg ‘I' gig (2-6) for each of the G intervals, where y,,, denotes the cost incurred over the time interval [a,,_l ,ag ), ,6, , g =1,...,G are pxl vectors of unknown regression parameters, and the error terms 8,, are assumed to be independent among different subjects but allowed to be correlated within the same subject. The initial cost is supposed to be null. Two different censoring types might arise: A. Time censoring: 6; =[7T S U, ] = 0 if T," > U,, and B. Cost censoring: 6,; =IT}; SU,]=0 if T,; >U,. 84 t where 7”,, =7} Aug“. By summing both sides of (2.6) over g we obtain y, = 2,6 + u, , where l G . G G y, =Zyig ’ fl=zflg ’ "i :zuig‘ gzl 8:] = Lin’s estimator is then (2.7) A G n igyig) #L4 = 5211920 i=1 G(l; ;)I We compare (2.3) and (2.5). If T < r as. then X,- = X: and 6,- = 6: . Assuming no ties in the data set and G derived as in (2.4), we have :w, =li6,/G(X,)=—Z6, /G(T,)=1-S(T,,,)) (2.8) i=1 i=1 "i=1 alt-r where T,,,, is the largest uncensored observation. To justify (2.8) we define5(t)=]'[(1- y(())), whereN(u)= 2m, <14, 6,. =1], Y(t)= Z[X, 2:] from MS! I=I i=1 where we deduce/330) = _§(,-) Ag?) . Also if no ties among failure and censoring times, S(t-)G(t—)=n"Y(t). Ifnot ties among failure times . = A9,“) . nG(t-—) nG(t-) = S(t—) AN“) 2 -AS(I) , and the equality (2.8) follows. If, in addition, nG(t—) Y(t) therefore 5(T,,, ) = 0 then (2.3) and (2.5) coincide. 85 2.2 Applying general transition model methods to the 2-state model We will prove that our transition model described in Chapter 1 also captures costs under the simpler two state survival model with a single transition time and sojourn. In this case, as described in Section 2.1, several investigators have developed techniques for regression analysis of medical costs with the focus being on estimation in the presence of time censoring that might result in incomplete costs data on some patients. Our transition model can be therefore viewed as an extension of this methodology to multiple transition times and sojoums. We still use the same inverse-probability of censoring-weighted (IPCW) technique to derive consistent and asymptotically normal estimators of regression parameters and for the net present values. Let us specialize our multiple transition model methods from Chapter 1 to a two state model with patients starting in state ‘0’ (alive) and followed until they reach a terminal state ‘1’ (death) at time T. The total cost for a patient i can be interpreted as a sojourn cost that ends at time T or 2' whichever occurs first or as a transition cost at time T if the patient dies in the interval of time [0,2']. While in the first case we will be able to estimate E [ y(T‘ )] , in the second case we estimate E [ y(T)[T S 7]). Throughout this chapter we assume that both S and G, the survival distributions for time and censoring time are continuous, unless otherwise specified. Suppose z,- is a baseline covariate vector, containing time-constant factors such as age at entry, gender, baseline comorbidity. The covariate vector 2,- will be used for the estimation of weights w,- = 5,- / G(T, — I z,- ). We 86 also introduce the vector x,- which typically contains time-constant factors, including 2,, variables for modeling time such as - .2 and interactions between time and time- I’ I constant variables. Let Z0 be a fixed profile covariate. Since we only have two states, we will denote the net present value NPV,,(I) (t I Z0) as NPV“) (t I Z0) and NPV“) (Z0) = NPV,,(,1)(rI Z0 ). Throughout the rest of this chapter we assumeithe discount rate r = 0. 2.2.1 Mean cost in the minimal cost data case NPV for transition costs Since the only transition is O —> 1 , the NPV for transition costs at time t, is NPV“’(t|ZO)= Igcm(5lZ0)POO(O,5IZ0)a0,(5IZO)d5, (2.9) where col (5 I Z0) = E(Cm (5) I X(5—) = O,Z0) and P00(OJIZoI'-'P(7i >IIZoI=SUI20I~ The quantity C 0, (t) was described in Chapter 1, and represents the amount incurred just after time t if the patient dies at time t. Note that we use now 50 I = P (T > t) consistent with our notations in Chapter 1. Since —dS (t I Z0 ) = = S(t— I Z0 )(201 (t I ZO ) , (2.9) reduces to NPV“)(ZO)=—Ec0,(s|Z0)dS(t|Z0) (2.10) 87 In the two-state model a cost y,- = y(T,) is incurred in the transition from the initial state “0” at time 1 =0 to state “1” at time 7}, provided that 7} S r. The probability of still being in the initial state just prior to time t is P[T 2 t] = S (t-) , and the (conditional) probability of transition to state “1” in (1,1 +dt) is dA(t) = a(t)dt , where a(t)is the hazard function. This differs slightly from the set up described in section 2.1, where costs are incurred through time T A 2' (and not at time T). Naturally, while in Lin’s set—up (Section 2.1) the censoring indicator for cost is 6: = [7? S U ,- ] , in our set-up y,- is observed provided 5, =1 wheres, = [7} SU, A 2'], that is, if the transition time 7} occurs by time 7 and is not censored by time U,. Assuming U, is independent of ( y,, 7}) conditional on 2,- we get P(Si =1I yriTiaZ.)=P(Tr sUi ATI Yi’TI’zi)=[7i ST]G(T}—Iz,). In a regression set-up, let y, be the transition costs in the ith patient at the transition time 7}. Consider now a regression model y, =x,,B+£, (2.11) where x,- is a 1x p vector of covariates as described at the beginning of the section. Our model (2.11) differs from Lin’s model (2000)4 by the fact that we include in (2.11) time- varying covariates. This is critical as we shall see later. We assume: A1: E(e,|T,,z,)=0,l§(e,2|'r,,z,)=o,i2 (2.12) A2: rank E(xgx, ) = p (2.13) Since E(s, Iz,,y,-,x,-)=P[U, 27},T, SrIz,,y,,x,-]=P[U, 2T,- Iz,] providedT} S 2', we define weights w,- = 5,- / G(T, — I z,) where G is the survival 88 distribution of the censoring time U, . We assume that G(z' I 2,) > 0 with probability 1, to ensure w, > 0 whenever 5,=1. Hence E(w, |z,,y,,x,-)=[1} S 2']. Now ( -, w. , x.) = Via—214% . - X- )2 , and minimizing the objective function q yI I I 6‘ I yl I II qu(y,- , w, , x,) with respect to I3 yields the estimator n i=1 ,6, = (Zw,x,'x, )'1 (Zw,x,'-y,) (2.14) i=1 i=1 We estimate 001 (t | Z0) by x0 (t) 6W, where 110 (t) denotes the covariate profile at time t in this model containing t,t2 and Z0 , and (2.10) by NPV‘”(Z.>=6:. [atom—$012.». Remark 2.2.1.1 1. If covariates depending on time are not included in the regression model (2.11), (2.14) yields the same estimator as Lin (2000)4, except for a slight difference in weights as noted above. 2. If no covariates are present, the regression model (2.11) has only one intercept I! (and p=l ). If all costs are observed before r , this intercept is estimated by 57 = n"1 2 y, . 1:] From (2.14) we get ,6”, = (2w, )’I (Z w,y,) and so i=1 i=1 6.,—f’—>(1—S(r»"‘6(y.iT.- 5 til. 89 NPV“) ——’;—) E(y, [1} S 2]) . Hence if T S 2' as. the natural estimator of NPV”) is simply y. In practice the costs of some subjects would be censored. Then it is natural to estimate (2.10) by fiwr =-I:5m(t)d§(t) (2.15) where S (t) is the Kaplan-Meier estimator of S (I) based on the data { (X, ,6, ): i =1,...,n } and 601(2) is an appropriate estimator of c0, (1) which we will awn Y(t) dN (5)i define later. Note that dS(t)=-S(t-)dA(t)= -5(t— ) sthe and A(t)= Jim usual Nelson—Aalen estimator of the integrated hazard function (Andersen et al (1993)”). Since costs are realized only at times 7} we estimate 601(1) at time t: T, by 37(7} ), the average costs observed at 7,. Substitution in (2.15) gives =26.y(T.-) —Y(—S,,',))1T. [I(1——Y——,,) Y0) u fiwrz- Since our concern is with S (t) and G(t) for I E [0, 2') we will show (under some assumptions) that sup I n'lY(t) - S(t-)G(t-) I—> 0 in probability. I 0 in probability, provided H(2'-) > 0, where H(t) = S(t)G(t). I<1' Under this same condition the Kaplan—Meier estimators S and G are uniformly 91 consistent on [0,1']. Hence the assertion follows from the inequality |n'1Y(t)—§(t-)G(t-)|S ln'1Y(t)—H(t—)| +|§(t-—)—S(t—)|+|(§(t—)—G(t-)|.u Remark 2.2.1.2 We estimate col (3 | Z0) from a weighted regression model using weights w, = s, / G(Y} — | z, ). There are several avenues for estimating the survival distribution for censoring time, G, we summarize some ideas in the following table: Table 2.1 Estimating the survival distribution for censoring time Model Comments 1- Nonparametric Use Ci , the Kaplan-Meier estimator based on the data {(X, ,6,):1’: l,...,n} , where6, =1—6, . Then the weights can be estimated as w, = s, /é(T,) or w, = s,§(r, —)/nY(T, ). 2. Semi-parametric Estimate G(T, — | z,) from a Cox proportional hazards model 3. Parametric Assume G has a parametric form G(t,6) = P[U > t | 0] . We assume that the functional form of G is known except for an unknown q-dimensional parameter 0. In model 1 choosing to estimate G(t) = P[U > t] by G” (t) for all t S 2' , where (T is the Kaplan—Meier estimator from the data {( X f ,1 — 6,." ) : i = 1,..., n} will not produce changes 92 in the formula for the weights since G” (t) = (3(2) for all t S 2'. We will come back to the problem of estimating the weights later on in this chapter. NPV for sojourn costs The NPV for sojourn costs in state O=’Alive’ is NPV‘2)(ZO)= £b0(t|ZO)P00(O,t—|Z0)dt. The quantity bO (I | Z0) is the expected mean rate of expenditure at time I while sojourning in state O=’Alive’. In practice it will not be observable unless discrete information is available. Instead we will only know the total cost of the sojourn. Since a sojourn ends at a transition time or at 2' , the total cost is observed if censoring has not occurred before time T" = min(T,2) . The set-up now is the same as described in Chapter 2.1 in the introduction and background section. The observed cost y = y(T‘) is uncensored if, and only if 6' = [T‘ S U] = l , and the event occurring at Tis observed if, and only if 6 = 1. In this case the weights are w: = 6: / G(Y,‘ — | 2,) and assuming that the censoring time is conditionally independent of survival time and cost we get E(W: ”ivyz'vzi)=1- The NPV of interest is 51200 | Z0)S(t— | 20m: = ES(t-|ZO)dm(t | 20), where m(t | Z0) = L: b0 (u | Z0 )du . By an integration-by-parts we get NPV(Z0) = E (m(T‘ ) | 20) . From a standard Cox model we obtain an estimator .§'(t|Zo)ofS(t|ZO). 93 Consider a regression model of the observed costs y, on time 7,. , noting that only observations for which 6,. =1 will be used. In addition to the observed (fixed time) covariates 2, our model would include terms for modeling T,~~ , for example We use a model y.- =X.-.6+£,~ (2.18) analogous to (2.11) and the same scheme to obtain the estimator ,6,, = (ZwSxfx, )'1 (z wfx,y,) of )6 as in (2.14). All of this is exactly the same except i=1 i=1 for the new weights w: . Now let x0 (1) denote the covariate profile at time tin this model and derive the estimator of m(r | 20) as 62(2 | Z0) = ,6; £20 (u)du , where the dot denotes differentiation with respect to time. This gives our NPV estimator as N13V(zo) = 3;, L'sa— | Z0)xo(t)dt. Remark 2.2.1.3 Suppose now that we include neither time or covariates in the regression model (2.18). The net present value is 'Cb(t)S(t-)dt = ES(t—)dm(t) where Sis the survival distribution of T, b(t) is the expected mean rate of expenditure at time I while sojourning in state 0 and m(t) = £b(s)ds. By an integration-by-parts we get ES(t—)dm(t) = j;m(t)dS(t) + S(2—)m(2'). Here we have assumed that m(0) = 0, if 94 this is not the case an initial cost should be added to the final computation of NPV. The estimation of m(t) , t< 2 and m(2) should be done differently in the following cases: a) patient dies before 2, i.e. T < 2 and b) patient does not die and is not censored before 2, i.e. T 2 2', U 2 2 . Our goal is to estimate E(y(T* )) = E(y(T)[T < 2]) + E(y(2)[T 2 2]). When viewed as a transition model we are only able to estimate the first part. If we look at the problem as a sojourn model we are able to estimate E ( y(T’ )) . Indeed, E(y(2“ )) = E( j: E(y(2‘)|2‘ = r)(—dS'(r))) since 2‘ s 2, where s‘ is the survival distribution for T‘. On the set I S 2, note that {I S T‘ } = {t S T} and S(t—) = S‘(t—). Also S‘ (t) = P[T~~ > t] = O for any 22. rand S‘ has a discontinuity point at2'. Let us denote m‘(:) = E(y(2‘)|2* =t). Then E(y= fim‘ox—dSYr»: E’m‘(r)(-dS=E(y(r>|T*=r>=E+S(r—)E I T 2 T) (2.19) Naturally for the estimation of the first term of the expression we use the weighted mean of all observations with X ,7 < 2 and 6,." =1 . For the estimation of the second term we use all observations with {X f = 2,6: =1} = {X , 2 2'}. If there are no 95 observations such that X ,* = 2,6: =1 then the estimation of the second term is not possible. Assuming that this is not the case here, we estimate the expression (2.19) as E(y =(1—s‘(r—>>/3'.. +S‘(r—>E(y(r)|22 r) (2.20) Z W?” where S (t) is the Kaplan- Meier estimate of S, 6,, = “___—Xi” and w, {i:X, <2} Z W32.- {i:,X= 2w." “:X, =2} E ( y(2) | T 2 2): where w: = 6: / CART: -) and 6 denote the Kaplan-Meier estimator of the G based on the data {(X,,6,) :i =1,...,n} . However assuming no ties among event and censoring times that are strictly smaller than 2, and no ties among event times strictly smaller than 2 , * 6" A It A at A Z w.= Z . '. = Z n_ MAN?) ”gm, {i:X,.=2} T 1_ T G __ ( Y(2 ) —) (T ) and 2w: =n(l—S(2-))+nS(2—)=n. (2.21) Expression (2.20) reduces to A E(y(T‘» =(1—§(r—»fl.. + Sir—woe? I T 2 r) = 96 2 W13? 2 W;Yi : (1 —§(T-)) {i:X, <2} ‘ + $(T—) {i:X, =2} .. : Z “’2‘ Z “G {i:X,°<2} {i:X,'=2} Z Wiyi Z w,y,- Z w;y, Z wi‘yi‘ . {i2X-'<2} ~ {12,221} {iSn} {iSn} =(1-S(r-))-—=-.——-+S(r-) U = = . - n(1-S(2—)) nS(2—) n 2 w, {iSn} Therefore if times and covariates are not included in (2.18) we can estimate the net 2 W:yi present value by Jig—"L:— which is exactly (2.5), i.e. Lin’s estimator (2OOO)4. Wt {iSn} 2.2.2 Mean cost in the interval cost data case Suppose that the interval [0, 2] is divided into G intervals with 0 = a0 < a1 < < a0 = 2 . Let T, U denote the survival time and censoring time respectively. Let S (t) = P(T > t) be the survival function for T. Patients are followed through time 2 , where 2 indicates the end of the study, i.e. all observation is ceased at 2 . Consider costs incurred in interval I g = [a ag ). If death or censoring preceded g-l ’ ag_l then the cost in the gth interval is either 0 or unknown. We observe a non-zero cost in the gth interval for the ith patient in two possible situations: (1) If the patient is observed throughout I3 = [a 0,, ) , then 8-1 ’ X, = min(Y, ,U, ) 2 ag. Let y,,, be the associated cost at time ag. 97 (2) If the patient dies in I, =[a 08), then a < X, +[( y. ’ —>/( ,=, 25,; Z 800:) Else?) )](5(a,_ ,>— S(a W2. 24) Throughout we have assumed that both S and G are continuous and S(t) = P(T > t) , G(t) = P(U > 2). If this was not the case here, then keeping all previous notations we must use left-hand limits S (t—) and G(t—) throughout. Alternatively we could define S(t) = P(T 2 t) and G(t) = P(U 2 t) , i.e. left continuous versions of the previously defined (right-continuous) survival distributions. When S and G are continuous this distinction is unnecessary, however when it comes to estimation, and S and G denote Kaplan-Meier estimates we need to make the distinction. If there is an initial cost at time 0 then it needs to be added to the final mean cost expression, however for now, as previously specified, we assume initial cost to be null. To ensure all weights are defined properly we assume G( 2) > 0. Using general properties of the Kaplan-Meier estimator S (t) of S (t) = P(T > t), AN(r) AS (I) = -S (t—) Y , and under the assumptions of no ties among failure times we I . —6AT- have AS (T, ) = $57-64. If we assume there are no ties among failure and censoring time then S (t—)G(r—) = Y (t)/ n. Both assumptions hold under the case that both S and G are continuous. Then 99 Z . 6:; zitag.l s2, .-. m1) ""'_:I(S(T ") S(T,))lag_1—T 0 since 5091' IX,,y,.T,-)= P(U,- >T— Ix,,y,,T,)/G(T -)=1 and under 1, E(w,x,78, ) = E(E(w,x;8, Ix, , y, )) = E(x,78,E(w, Ix, , y, )) = E(x,’-8, ) = 0. We need A1 and A2 for the consistency of the (unweighted) estimator = (Z CZXEX, )'1 (Z 6,x,'-y,- ). We have i=1 i=1 sz —=/2’) (—Zw.- ,x,)"<,;Zw.x,e.>—(A"+o (1))(—‘/1;ZW;X,’E;) 1- 1 By the ordinary CLT, —l—ZW,X,£, —>N(0, B), where B: E(w,2 X,X,~€,2 ). Using A1, n . the form of B can be reduced further. Since E (w,2 Ix,, y,,7,)= P(U >7}|x,,y,,T-)/G2(T,- -)=l/G(T, —), itfollowsthatB= E(w,.2 x;,,xe )= E(E(w,2 Ix, , y, ,2, )xgx,e,?) = E((l/G(T, -))x;x,e,?) = E((l/G(T, —))x,7x,E(8,2 Ix, )) = xx! 0'2E(——-— ). Therefore G(T- -) J; ([31. —fl)—1—>N(0.A"BA" ). whereA= E(x;,=x)andB o§E(——— xx ——.) G(T, —) Then 5(z) = x0 (03,, and for a fixed I, JZ(5(:) —c(t)) = x0 (ms/R3,, - 13)) . It follows that: \/—(8(t)— c(t))= x0(z)A"(J—l_-zw,x;e,)+op (1) (2.27) Wewilldenote v0(t)= x0(z)E(x"‘x) so J—(é(t)— c—(I))=v0(t)(—‘/1;Zw,x,78,)+op (1) 103 We now turn to the estimation of the survival function S (I) = P(T > I). Let § (t) denote the Kaplan-Meier estimator of based on the data { (X , , 5, ) : i =1,..., n }. Note that d§(t) : —§(t_)d/i(t) : “SA(I—)d)1/V:t) ,where N(t)=Z:__l[7} 52,5, =1], J(s)dN(s) Y(s) is the usual Nelson-Aalen estimator of the Y(t)=Z;l[X, 2t]and [19):]; integrated hazard function (Andersen et al (1993)”), where J (s) = [Y (s) > 0]. We will assume both S(t) and G(t) = P(U > t) are continuous and 5(7) > 0. Notations: 7Z’(I) = P(X 21) = H(t) = S(t)G(t) 0'2 (I) = £610!) du 7r(u) Model Assumptions (Andersen et al (1993), pl9025): ABGKl: For each 56 [0,1], nEi—ES-i-OIUMs—LJZU) as n-—)oo s ABGKZ: For all 8>0, nfiiififlf“? |> €]a(s)ds———>O as n—)oo ABGKS: fiE(1-J(s))a(s)ds—P)0 as n—>°° Under conditions ABGK1-3, following steps in theorems IV.1.1, IV.1.2(Andersen et al (1993), p190”) and our proofs in Chapter 1, Section 1.3, dM(s)_ 1 ”_ingi“) JZ(A(z)-A(t)) is asymptotically eqmva'emm J1;£_ 75(3) z(s) ’ where M(t)=iM,(r), M,(t)=N,-(t)— £14(t)dA(s).Therefore i=1 104 Then JZ(A(:)— Am) is EdM,(s) = 6,[X, St]__J~(:Y,-(s)d:‘1(s).Let c720): I'd/1(3). ”(s) ”(X,) 72(s) 0m ) asymptotically equivalent to 6,.[X, t)= = £a4(u)(-d77(u))+0'4(t)7r(t)= £fl(u)202(u)d02(u)=2 £02(u)a(u)du The first diagonal component is: _ _ §i[XiSI]_ 2 _ 2 (S‘i[XiSt]_ 2 2 _ Z“(t)-—Var( S(t) ”(Xi) 0' (X,/\I)))—S (t)E(( ”(Xi) a (X,Ai)) )_ 6,[X, St]_26,[X,_ R. ¢dy(h.k)= fixdk+ [,htswyts) (2.31) where the integral with respect to k is defined by the integration by parts formula if k is not of finite variation. Identifying x(t)with C(t) , and y(t) with S(t) , since c(t),S(t)e D[O,z'], El dS I: l— S (2') < C =1 and under the extra assumption EAl: C(t) is of finite variation on [0,1], by (2.31) and functional delta method stated in the Appendix, 107 JZl (aux-£0» (roux—(15(0)): =,/;(¢(5,—§)(r)—¢(c,-S)(r))——->D d¢(ct_S)(ZI’ZZ)(T) ,{x where z, (t) = a0 (t)§, a3 (t) = v0 (magma—£79120 (0'. v0 (t)=x0 (r)E(X’X)“. 22 (t) = —S(I)U(t), U(t) is a gaussian martingale independent of i, which is a standard normal random variable. Also E(U (t)) = O and E(U2 (t)) = 0'2 (t). Using (2.31) we get fiifié(1)('d§(1))— £c(t)(-d5(t))} —D—> EC(S)d22(S)+ Ezmswsm. To compute Var( [O'c(s)dz,(s))=E(( jO’c(s)dz,(s))2 )= :E(( Ec(s)d(—S(5)U(s))2) we just simplify the integrand ficUSdA+ [gc2 0'2 (t), we get immediately Var( ch(s)d22 (5)) = E(c(s)S(s) - fe(u)S(u)dA(u))2d0'2 (s) .The second term j: Zl (s)dS (s) has variance (an (s)dS (5))2 and the cross product has mean zero, since U (I) and f are independent. Hence we have proved the next theorem: 108 Theorem 1 Under the assumptions Al-A2, EA] and ABGK1-3 we have s/Zt flew—£0»— Io'cttx—dsa») A N(0.{ E(cts)3(s)—K(s»2daz(s»)+< [O'aotsmsmf no (2.32) Remark 2.3.1 1. Suppose costs are not considered random, but fixed. Take C(t) = 1. Them/R Eétz>(-d§(t)>— Ec(t)(—dS(t))) = —~/Z(§(t)—S N(0.52(r)02(r». 2. Let cost be fixed and set C(t) = e’" . Then Ee'" (—dS(t)) = £e'”S(t)dA(t) is the actuarial value discussed in Andersen et al (1993), Page 28425. Then (2.32) captures the results stated on page 284, namely v/Z(£e"‘s‘(t)dri(t)— £6"’S(t)dA(t)) L) N(O,{ fi(e"‘S(s)- fe"“5(u)dA(u))ZdoZ(s)}). 109 3. If no fixed or time-varying covariates are included then the cost C(t) is " 5.- 2 G(T.) yi' constant on t and can be estimated as 6(t) = )60 = 1:1" ' for all IS 2'. Then i=1 G(Ti) ,1? = 30(1— S (7)) and from (2.32) the estimated asymptotic variance is 8025‘? (7)62 (r)+(l-S(T))2 var(,30 ). 2.3.1 Regression model for log-cost Suppose all costs in our data set are not null and consider the regression model with log cost as dependent variable: log( y, ) = x, fl+ E, (2.33) where we keep all our previous notations. In this section, we consider the case of an error term with unknown distribution under homoscedasticity. More precisely we assume: A3: 8, are i.i.d. with 8,- : mm (the function s(-) allows for heteroscedasticity conditional on x and ‘a' is a vector of unknown parameters) A4: v, is independent of x, and has zero mean and unit variance. A5: rank E(xfx, ) = p A6: E(egx) <00 110 Our focus is the estimation of C(t) = c010 120) as in the previous section. In view of (2.33) we identify C(t) with E(y, Ixo (t)) = e"°(')‘6E(e£ ) , where x0 (I) is a fixed covariate profile generated from 20. Following Ai and Norton (2000)2°estimation of C(t) is accomplished by estimating )6 in (2.33) and the smearing factor a = E (es ) . Define p(t,6,a)=e‘°"’fla and identify c(t) with (0(I,,B,a). Then E(t)=¢(t,B,6) and £1 = 12:1?" and ,3 is an estimator of ,6 which will be obtained later. Here n '— é, =log(y,)—x,fl. Then JZ(5(t)-c(t))=v/Z(e"o“’36-e*°<”fla). By Taylor expansion, retaining only the first order terms, we get: Mam-cm)=s/Z(¢(t,fl‘.a)—¢(t.fl.a)) ~ a(o(t fl 0) am: I3 a)“, =f<——-— as (B— fl)+——— a -a)) =~/i1-(C(I)Xo (t)(3-fl)+e"°"’fi(a-a»= =c(t)x0(t)JZ(B—,6)+e‘°"’/’JZ(6-a). (2.34) Consider first the estimation of ,6 in (2.33). We follow exactly the same steps as in the previous section 2.3 using 77, = log( y,) instead of y, . Then A n n (S ,5 = (Z w,x;x, )'1 (Z w,x,777, ), where w, = CUT—f Therefore s/Ztfl- fl>= (— ,1,wa x;x.-) (TZW,x;e,-) (2.35) From A5, 1: w,x;x, —P> E(wx’x) = E(x'x) = A. From A3/A4, E(w,x,£, ) = n . lll = E(E(w,x;e, Ix, , y, )) = E(xge,E(w, Ix, , y, )) = E(xfe, ) =—_ E(xgv, ,/s(x, ,6)) = = E(xf,/s(x, ,a)E(v, Ix, )) = O. Hence71_-Zw,x,8,— — —0p (1) . Then (2.35) yields n JZ(,8— 6): A (— jzzaz,x;e,)+op (1) - (2.36) @151 Next, consider the estimation of a byc’i . Then fi(&-a)=71—;2:,(e -e )+J—2:;, (8 E(e )) Use ef" -e£' = e51 (6.3-5, - 1) = e2 (exp((77, - x,,5’) — (7],- —x,,B)) —1) = e" exp(—x,- (8—6) —1) to obtain 712'2:=1(eéi —-e£l’ )=_(;1,'Z;,€E’Xi )J;(,3—,B)+op (1). By A6 we get fZLWE‘ “65" )=-E(e‘x)s/r_2(B -fl)+o,, (1) (2.37) n Using formulas (2.34), (2.36) and (2.37) we obtain Mam-cm)=cxo(tis/Ztfl—fl>+e"°"WZ(a-a) = =c(t)xo(t)(A"(——l—ZW,-x,-8,-)+0p (1))+ JZH +exoit)fl(_E(eex)(A—t(_jjzw,x,£,Hf: (e E(e ))+o,, (1)))+o,, (1)): =(c(t)xo(t)-e"o‘”flE(e‘x))A {-‘/l_—:w,x,7,:/l;—£}+ex°('m{ 2(e -E(e‘t)))+op(1). 112 If we denote 29,(t)=(c(t)x0(t)—e"°(')flE(e€x))A-l and 15*2(:)=e"°"“3 then JZ(5(t)—c(:))=29,(:){— wae +192(t){%2(e£‘-E(e£‘))}+op(l). n i=1 If s(x,a) = a (i.e. the error term does not depend on x , so the error term is homoscedastic) as in Duan (1983)19 then following the same steps as in the previous . 6i[Xl St] 2 I E. 5. section 2,2 (t)=E[(————0 (X,- At)){l91(t)W,-X,-€,- +192 (t)(€' —E(e ‘ ))}]= ”(X,) _ [ZSt]_ 2 I _E[( ”(m a (T, At))§,(t)X,-E(8, |x,)]+ +E[([ i—t] 0207 A1))19(l)(€£‘“E(€E‘))l=0 since by A3/A4, E(E, Ix,- ) = E(v, J; | x,- ) = E(v, s/E) = O and E(eg‘ —E(e£" )lx,)= E(ev’fé |x,)—E(e€" )= E(ev’fi)—E(e€‘ )=O. The second diagonal component of szz(t) is 222 (t) = 29, (t)Var(w,-x,7£,- )l9,’(t) + 1922 (t)Var(e£“ )+ 2E[§, (t)w,x;€,l92 (I)(e€‘ - E(eE‘ D] = x'x =a§z9,(t)E(G ‘T'_ i ))z9,’(z)+af, :9; (t)+ 219, (:)E[x;e, (e5: —E(e€" ))1192 (2) wherea — -Var(e' ). Obviously the first diagonal component of 22 2(t) remains unchanged and the proof is now identical to the proof in the previous section. 113 2.4 Parametric estimation of the survival distribution for censoring time 2.4.1 Minimal cost data Suppose costs can potentially be accrued over a fixed time period [0, T] with expenditure terminating at some event time T so that complete cost observation occurs if a patient is followed through time T' = rnin(T,!) . Further accumulation of medical cost is not possible since there is no cost after death. Censoring at time U might preclude observation of T’ . Let X = min(T,U ) ,5 = [T S U], where [A] denoted the indicator function of the displayed event A. The observed cost y is uncensored if, and only if 6* =1, where 6’ = [T‘ S U], and the event occurring at Tis observed if, and only if 5 =1 . Suppose that we have a random sample of size n. The variables corresponding to the ith subject are indexed by the subscript i. Let {(X: ,6: ) : l S i S n} denote the random sample with X,‘ = min(T,‘,U, ). Consider a linear model for the cost y,- observed at T: in the ith subject given by y,- = x,,B + 6‘, where x,- is a vector of pxl covariates, E,- is an unobserved error term. In estimation of ,6 we use the data {( y, ,x, ),i = l,...,n} . However since cost will be 114 incomplete ifé’,’ = 0, we use the weighted sample {(y, ,x,- ,w, ),i =1,...,n} * where w,- = i Formally ,8 is obtained by minimizing (with respect to ,6 ). ‘ . 1' min 2 w,- ( y,- 4,3)2 (2.38) i=1 The objective function (2.38) weights each observation by the inverse-probability n n of being uncensored. This gives ,6,, = (Z wixgx, )‘1 (Z w,xfy, ). We also assume i=l i=1 censoring to be non-informative in the sense that given fixed time covariates, U is independent of cost and event times. Note that under this assumption E ( w, Ix, ,T, ) = P(d,‘ =1|x,,T,)/G(T,"—) =1, where G(t) = P(U >2). The weights w, are unspecified since G is unknown, however in order to be able to use A, we need to have a suitable estimator for G. So far we have suggested its estimation from a Cox proportional hazards model or non-parametric model. Assume that the distribution, G, of the censoring time U, has a parametric form. We assume that the functional form of G is known except for an unknown q—dimensional parametera Then P[é“ =1| y,T]= G(T‘ —,6), where G(t,6) = P[U > t | 6] . Let g(.,6) andfbe the density functions of U and Trespectively, and let S (t) = P(T > t) be the survival function of the event time T. The estimation of 6 will be accomplished via maximum likelihood. In order to write down the appropriate likelihood, we must be precise about what is actually observed. 115 1. Observation scheme I. We only observe X = min(T,U) andé‘ = [T S U]. This is the usual random censorship model. The contribution of a single observation, (X , ,6,.) to the likelihood is then ((6) = {G(T. ,6)f(T,-)}“" {S(U.)g U . It is also possible that neither Tnor U are observed, in which case T >7 and U > r. The likelihood would now consist of three parts: (1) {G(T)f(T)}§[T-"“Z” The contribution of a single observation to the likelihood is then I.- (6) = {G(T, .6)f(T,- )}‘Z‘T‘q‘mw, )g(U,. ,6)}(1'6‘Wiql{G(r,6)S(z')}[TZ"U2” 116 The relevant part of the log-likelihood for estimation of 6is now 11/(6) = [T < T]5W+[U < r](l—§)W+[T 2 r,U 2 HM G(T,6) g(U,6) G(r,6) (2.40) Combining the first and third terms, [T < r]6———VBG(T’6) +[T 2 r U 2 2']——————VQG(T’6) = G(T,6) G(z',6) z—VfiGflT ’6){[T_T‘]=6‘———V90(,T ’6). G(T ,6) G(T ,6) Therefore (2.40) can be re-written using (X * ,6‘) as: t . V,,c;(r“,e)+(1 6 )V9g(U,6) 6 =6 W ) G(T ,6) g(U,6) (2.41) We will focus on estimation of 6under the observation scheme H. Estimation of 9 Suppose (9 is a compact set in R" and assume that .60 is the unique solution of the problem gag Q0 (6) = 123g E(l(6)) (2.42) Under general conditions, the sample analogue of the expression (2.42), Q" (6) = 121,. (6) which we denote by 6, is consistent and asymptotically normal , see i=1 Newey, McFadden (1994), Theorems 2.5 & 3.362 and Wooldridge (2002, 2003)” 3“. Hereafter 60 is the true underlined parameter and all convergences are under 60. 117 Based upon (2.41), an estimator 6 of 60 is obtained as a solution to Z w, (6) = 0. Now expand 2w,- (6) at the true parameter 60 to get 0 = 2w,- (6‘?) {an (60H ngw, (6")(é—90) (2.43) where6 lies between 6 and 60 and Val/,- denotes the qxq matrix of the derivative (with respect to m of 1/1, (6). Then under standard assumptions n“ 2%”,- (6)——P——>E[VHI/I, (6)]. From (2.43) we get Jae—90) = J" (60 )(—‘/!-_-Zy/,(6O ))+0p(1) (2.44) '1 i where J (60 ) = —E(V01//, (6O )). To verify that (2.44) would give the asymptotic distribution of 6 we will now check that E (1,11, (6)) = 0. Examine each term in (2.41): VQG(T‘,6)+ 1_6.)V9g(U,6) E 6 =EJ‘ [ti/()1 [ T,6) g(U,6) ]= ——V"G(T ’6) [T A 2' s U]] + E[——V”g(U’6) =5 [ G(T',6) g(U,6) [TAT>U]] VgG(T*,6) G(T ,6) The first term is E[ [TATSU]]= ———V"G(T’0) [T s 211T s U]]+ Ely—999%TS Tl” 5 U“: = E [ G(T,6) G(T,6) = E[VgG(T,6)[T s T]]+VgG(2',6)S(T) = = —V,, EG(u,6)dS(u)+VoG(2',6)S(z'). (2.45) The second term is 118 ““1 v g(U,6) v g(U,6) E———9 T U =E——9 = [ g(U,6) [ xr> ]] [ g(U.6) S(U)[T>U]] = 47,, ES(u)dG(u,6). (2.46) Combining (2.45) and (2.46) we will get [z(t)/(19)} = 47,, £G(u,6)dS(u)+V6G(r,6)S(r)-V6 ES(u)dG(u,6) = = —V9 G(u.6)$(u)|; +VgG(r,6)S(z') = 0. An application of the Central Limit Theorem to (2.44) gives 5(6—60)L>N(0.J“‘ (1510 )). where J" (6,,)E(1y(60)1//’(60))J‘l (60) = J" (60 )J(60 )J‘1 (190) = J“1 (190 ). We can replace 60 by the consistent estimator6, and use [9, = P[U, Z 7? 1T1] = G(T,‘,6) for the unknown p,- = P[U, 2 7? |7}] . For this adjusted estimator we have A n -l n _ A I A I flwp— Zwixixi Z 1'"th i=1 i=1 which is a two-step estimator, first step would be estimating6 , and then estimating ,6 in (2.38) after we have replaced w,- = (5': /P[U,- 2 7}!“ IT,] by 1?»,- = 6: /G(T,’,6). In other n words the two-step estimator solves the problem min 2 1'43,- (y, — x,- fl)2 . i=1 Therefore :,L—i tame] (2.47) n 119 Following closely Chapter 12.4 of Wooldridge (2001)“, we expand the second term on the right hand side of the equation above to get 6" X8, ” 5,- " 5 T;¢1f11_2_,___x§,£ _[nz____’_‘_€__v,,G(T (9’)]J—(é- 90) P.- n P1 1— 1(G(T 19))2 where 6 is between 6 and 60. Again use the standard arguments to claim * ~ " 5,7'x’e, pE[ 15,-3,78,. i=1 (G(T; 96)) (VGT,,6 , VGT,',6 ’= 2 19 ( ))—-> (G(T1190))2( 9( on] x'e, G(T i 1 60) Denote this limit by D(60 ), a KXq matrix. Combining these results we have the established the expansion " 5 5, _ _Lz ixigfl;_Z{——XE—D(6O)J1(60)1//,-(00)l+0p(1) 51:1 Pi n 1=1P1 Let k, = 6' x,€,= 5' X'E' .nThe G(T,- 190 ) P1 5* V G(T, 6 11,78,- G(T},6O) G(T},60) V,G(z' 60) G(r, 60) Taking the expectation yields, I V G(T,6) I VQG(7960) I E - 6 k- =E T-<2'-‘9——i—°—x-£.+T-Zt———x,,. (W;( O) 1) [[1 ] G(TI,60) (1 [1 ] G(T’Ho) V T’,6 , Observe that is the same as D’(60)=E ( 66“ 0))x,€, . G(T; ’60) 120 Then the asymptotic variance Vof Jr—z(,6wp — ,6) is v= (E(x1x1 )" )E(k.- _D(,90,,-1 (611w, (190))(11, -D(60)J" (60w,- (60))' (E(xIx. 1") =(E(x:x.)")1E(k.k;)+D(6o)J“(6015011,- (90)l//1'(90))J-l (60)D’(61,)(E(x:x.)“) -D(60)J”' (6011501.-(60)k:)-E(k1w,-’(6o))r‘ (6010160)}. Notice that E( 111,. (60 )(1/,.’(60 )) = 1(6)) which makes the second term above D(6,,)J“l (60)D’(60). So the variance V is v: (E(x;x.- 1“ >1E(k.-k:)—D<6o)r' (60)D’(6o)}(E(x:x,- )“ ). Under the assumption E(E, |x,-)=0, D’(90)=E (Vqu, ’60 )) x; G(T} .90) £,-]=0 and V=(E(x,7x,-)“1)E(k,-k;)(E(Xf!1I,-)_l ). In general the asymptotic variance of Jr;(,5’w,, -,B) is smaller when we use an estimate 6 for 60 in the expression of 18111,; as in (2.47). 2.4.2 Interval cost data Suppose that the interval [0,1) is divided into G intervals [0,T)=U[a,,_, ,6,) as g in Lin (2000), and Willan er al (2002, 2003)“ with 0 = a0 < a, < ...< ac = 2' . For the interval cost data first define 7“,; = 7,- A a and 5,; = [T,-; S U , ] , also denote 8 121 T,’ = 7,2; = T, A “a = 7, A 2'. For the ith patient we denote y,,, the cost in gth interval at time 7,; . We will have two different censoring types A. Time censoring: 5: =[7,* SU,]=O if T: > U,, and B. Cost censoring: 5,; =[7,; _<_U,]=O if 7",; >U,. Consider the model yig = xigfl+ “1' for all g =1,...,G or y, = X,,6+u, where Va “11 x,, y, = - ,u. = - and X, = - .Here y,, u, are le and X, is GXp. in “10 x10 The following are assumed to be true: AGl E(x,’u, ) =0 AGZ rank E(x,’x, ) = p The IPW-POLS estimator of ,6 (pxl vector) in the model above is given by n A n =(Z X,X,) ‘(XZ ,’.y,) (2.48) [1: i=1 where y, =S,P,’ly,, and R,., f1, are similarly defined. Here S, is the diagonal matrix with elements 5,; = [7,; S U, ] , g =1,...,G in the main diagonal (will allow costs to be zero in some intervals), and P- is the diagonal matrix with elements, , ’12,, in the main 1 diagonal, where p,,, = P[U, Z T,; IT, ]. Under the assumptions above we have P[(si; zllxig’yig’ji]:P[Ui271;I71]:pig' 122 Assume the distribution of the censoring time U, has a parametric form, P[U, > t] = p(t,6) , where the function p is known except for the unknown q- dimensional parameter 6. Then p,,, = pa}; -,6). Now replace 6 by a consistent estimator6, and use fi,,, = p(T,; —,6) in (2.48) for the unknown p,,, . For this adjusted estimator we have 3 G 57' . . JZ(fl.,,— ,6): 1—22— ‘8 11,, )_,,,,1J_ZZ_,,%,1_1_,,_) (2.49) is 1': lg: l pig n i=1 g=l Estimation of 6 The observable data on the censoring times is restricted to X ,i = rnin(T," ,U,) and 5," = [Ti S U, ]. Here 60 E 2' is the upper limit of observation. For a patient 1', {X, = 1’}={T, 2 T,U, 2 r} . This event has probability S(r)p(r,6) , assuming both S and p are continuous. Now consider our adopted panel framework in which the observed X ,1. falls into some interval [a,_, ,a, ), g =1,...,G. The part of the likelihood of ( X," ,5,‘) that is relevant for estimation of 6 has the form { p(X,‘ ,6)}‘5‘. {g (X ,1 ,6)}1’5‘. where g(t,6) is a density for U, . We will assume that p(aG ,6) > O and that 6 -—> g(-,6) fulfills all regularity conditions needed for maximum likelihood estimation of 6. Note that Xfe [a 6,,) and 5," =1 is equivalent to [U, >T T,,,][ag_ ,ST; aG, g-l— 123 define the indicator 1,, (t) = [t 2 a0 ] . Then the derivative with respect to 6of the aforementioned log-likelihood can be written {51101, T)——g—+ +(1- 6,,,)1(U)—9—-'—-+ g2: 8 8' p jg, 8(Ui’6) G +1/GI, T U, V‘5")(T’}'9) d,,(6). ( )G.(,A )— W6) 1:; A n G A The estimator 6 is a solution to 22d,, (6) =0. Consistency of 6 follows from 1:1 g=l the standard regularity conditions on the functions 6 —-) g(-,6) and 6 —) p(-,6) for maximum likelihood estimation of 6. In the sequel we will call 60 the true parameter. I: 0 Note that d,, (60) is a qxl vector. Using a Taylor expansion of 2 2d,, (6) at 60 i=1 g=1 yields A n G J;(6-60)=-(%22d., (6)) (71:22:41,160). i=1 g=1 '=l g=l where 6 is between 6 and the true 60. Also, 6,, (6) is the derivative of d,, (6) with respect to 6. Note that 6,, (6) is a qu matrix. Now use the standard arguments to claim G d,d,,(6)—)E(Z ' ,(60)) in probability, and 71.;22d,,(60)=0p(1). g=1 i=1 g=l 11 ENG alu— The last claim is simply a consequence of the central limit theorem. G . G G Now E(Zd,, (6O )) = —E((Zd,, (60 ))(Zd,, (60 ))) = 1(6O ). Hence we have =l g=l g=l 124 n G J'(6— 60): (1(6o))"(— $1.224,- ,(60))+0,, (1) n G 5,,x,,u,, We can expand the sum Z 2—7— to get i=1g=l pig L " Z5661 _ n i=1 g=l firg n G 6‘1 x! u! I " —ZZ-—— , , 8- 05221—267916 ,.6» 1616—60) ‘fginlgl pig rlgl(p(rg’6)) where 6 is between 6 and 60. Again use the standard arguments to claim * 1 n G aigxig u, I ";Z(6(T1;,67)) V9 TE Em? 1,6082» V” , ° in probability. Denote this limit by D(6O ), a qu matrix. Note that D(60) is also the ui I same as E(Z—(V,p( T,,,6O)) ). g= =T1P( ig’ 60 Combining these results we have the established the expansion 6‘ n G 6. . . __1'_“g g=11"”22{m—D(60)J’1(9o)dig (60)}+0,, (1): 1g i=1 8=1 pig 0P)l: ”2 iZl{X,S,P,'lP,_lu, - 19(90)J-l (60)d1 (60)j6}+ 0P (1) ’ where d, (60) = [d,, (60 ),...,d,G (6O )] is a qu matrix, and jG is a 6x1 vector of 1’s. Now apply the central limit theorem to show that it converges in distribution to a G- variate normal, mean vector zero and variance matrix V. 125 G The assumption AGl: E(Xi'ui ) = 0 means that E(Z x,-g uig ) = 0. If g=l E(xiguig ) = O or, more strongly, E(uig le-g ) = O for all g, assumption AGl holds. Without additional assumptions on uig we can not conclude that 0(60) = 0. Suppose we impose E(ul-g lxig,7})=0 for all g=l,...,G.Then * G aigxiguig * I G E(Z (Vgp(7;gs60)))=ZE(uig lxig’Ti') g=l Xig(V6P(Ti; 6w :0 gamut; 60W (P(T; ’90 ))2 Computing V G 5‘ xigu, Let k,- = X§SinlP,-"u,- = 24%. Then the aforementioned variance Vis g=1p(7;g ’60) I v: E(k, —-D(60)J" (60m, (60)j0)(k,. —D(60)J" (60)d,.(60)j6) = =E(k,.k; ) + 0(60 )J‘1 (60 )E(d,. (6O )jGj’Gdgwo ))J'1 (60 )D’(60 ) - -D(60)J" (60)E(d,. (6,,)J'c,.it;)—E(It,.j’Gd;(60))J‘l (60)D’(60) Notice that E(d, (60 )jGj’Gd; (6O )) = 1(60) which makes the second term above D(6’O)J" (6O )D’(6O ). G G 5.’ xf u. Now consider E(d, (60 )ij§ ) = E((Zdig (60))(Z-ij-fi». We will prove g=l g=l P ig ’ 0 . . G xiguig " I that this would reduce to D(60 ) = E(Z——T—(V9p(7}g ,60 )) ) so that Vcan be 8:] p jg $ 0 expressed as E(k,k;)— D(60 )J’1 (60 )D'(6o ), a similar formula to case 0:1. 126 Recall that T-, . V .,6 d,,,(60)= 6,81,,(T,) V2“ ' 6’0)+(1—§,,,)I,,(U,-) 68(U' 0)+ V ,6 +(1/G)IG+1 (T, AU,)—6—p—(:'—Q, p(TTHO) 25: 8x,gu,g and let k,g —— .To prove the equality p(T [8960) G G 6,?‘x;u,- 6v (T16) , E((Zd,,(6o))(Z—i—Tg-—£-))=E(Z 9” g x,,u,,) g=l g=1p ig ’60) g=l pT( ig ’ 60) we first prove that for any h =1,...,G we have ihxihuih __)_ E(V6P(T ih,90) X, E(( d, (6 >> E g 0 1203.26,) 1602.60) ihuih)' We prove the equality above for h=1. If h>1 the proof is the same. So, " (T, ,6) 6,.‘x’ u A: E(( d,- (6 ))____i1xiluil ___): _E({ [g (T ) Vflp 0 i1 il })+ gz=l g 0 P(7i1690){2=1§; ”(igo) P(T 11,60) V6g(Ui ’60) filxiluil g(Ui’60) P(T [19 60) +E(Z(1—6,; )1, (U,) g=l (2', 6 ) x u +E( (l/G)I , (7}A U,) V91” 0 1 "1 "1 . gZ=1 G 1p(2' 60) p(t-1.90) Dropping orthogonal terms, V T,6 x . V T6 x, u, A: E({6i111(T) 9P(T 0)6111u11 +2 jg 8(1) 0P(T 0)6 ll 1 l }+ p( i160) p(nlrgo) gZZ p(T ’60) p(T [1960) 127 :1- V 8(U' ,6)6 I“Ix “- +E 1’6 I U,- 0 i 0 i1 ll 11 «g .,),( ) g(U.-.6o) 41,3609” V6P(T 60)61'1xiluil }). p(T, 60) p(T,1,90) E({Ui AUi 27' T] The first term of the sum above is V p(T ,M) “X 11 P(T 9 )I E(6IT, ‘9 ° ‘1‘“): E(iTT,-.>.a 1V p(T,-,6 )——"—"—). 3222 g 9 0 p(al,60) l a 0 P(artgo) The third term is E({Z5 ,1(1- 5, g)[g(U,)V68(Uivao) xflu,l ¥ }): g>2 8(Ui’60) P(Eliao) v . f . =E({Z[ag _U, <73; ] 98(U"6°) x'lu'l g.>_2 8(Ui’60) P(ahgo) })= V68(Ui ’60) xiiuil =E([ sU,a,1(vgp(al,6o)- gp(T,- ,06» ”(1,609 The last term is (1.6) w , ,5“. -61‘79” ° ““ 1)=E<{1T,-2r1v (n6)—————"" 1) p p( .1. 61 ‘9” ° p(alflo) E({CZ'1[T/\iU Combining the above formulas we get V0p(7;'9 60) x! ‘7‘, ll 1'] i1 i1 xilu i1 ) + G E((Zd, ,(6011 g= p )=E(I (T) (Tn-1.60) ' pa}. 60) +E([T> T, .>_ a21Vap(T,,6o)—§i—‘fl‘—)+ P(ahgo) I +E([T,* >a,](v,,p(a,,6o)— v6p(T,* BOD—M—H Pa( 01900) I +E T,>TV 1,06 _x___,,u,, (H ] 6P( ) NW)» The second and fourth terms combined give xi E([T>T.- 2al1179p(T...60 )—""—‘—“——)+E<11T >r1vgp(r 601—— “5"“ 1): pa( a19 00) 176001.90) =E([T,-' 2 a. 1V6p(T,-*,6o)—""—‘i‘fi—>. p(alflo) G .‘ f . V T6 Therefore E((Zd,g(6o))fl‘;‘—"L)=E(1,(T,) 9“ " 0) x;,u u,,)+ g=l P(Thflo) P( 1" O) )1:{ u- V ,6 x' +E([7} 2“2]V6P(‘12v90)—'M—')=E( 9P(T1 0) x“ u“) P012960) (119 60) In conclusion V can be expressed as E(k,k,7)— D(QO)J'1(60)D’(60). 129 CHAPTER 3 ESTIMATING HOSPITAL COST FOR AMI PATIENTS IN THE NIS 2000 3.1 Description of the data set and background The Nationwide Inpatient Sample (N18) The Nationwide Inpatient Sample (NIS) is the largest all-payer inpatient care database that is publicly available in the United States, containing data from 5 to 8 million hospital stays from about 1000 hospitals sampled to approximate a 20-percent stratified sample of US. community hospitals. It is part of the Healthcare Cost and Utilization Project (HCUP), sponsored by the Agency for Healthcare Research and Quality (AHRQ, formerly known as the Agency for Health Care Policy and Research). Currently data is available for a 13-year time period, from 1988 to 2000, allowing analysis of trends over time. Researchers and policymakers use the NIS data to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. NIS is the only national hospital database with charge information on all patients, regardless of payer, including persons covered by Medicare, Medicaid, private insurance, 130 and the uninsured. NIS's large sample size enables analyses of rare conditions, such as congenital anomalies, uncommon treatments, such as organ transplantation, and special patient populations, such as children. Inpatient stay records in the NIS include clinical and resource use information typically available from discharge abstracts. Hospital and discharge weights are provided for producing national estimates. The NIS can be linked to hospital-level data from the American Hospital Association's Annual Survey of Hospitals and county-level data from the Bureau of Health Professions' Area Resource File, except in those states that do not allow the release of hospital identifiers. Beginning in 1998, the NIS differs from previous NIS releases: some data elements were dropped, some were added, for some data elements the coding was changed, and the sampling and weighting strategy was revised to improve the representativeness of the data. There is a growing literature on use of the NIS in health services research that we 4247‘ 64. For 2000, the NIS contains over 7.4 million discharges from will use for guidance 28 states. Sixty strata are defined by a combination of region (Northeast, South, Midwest and West), location (urban, rural), ownership'(public, private), teaching status, and bed size (small, median, large). The first stage samples approximately 20% of hospitals within each stratum. Then all discharges from the sampled hospitals are included in the database. Patient demographics in the NIS include: age at admission, gender and race. Total charge and length of stay (LOS) are the main healthcare utilization variables in the N18 for each hospital stay. Discharge status identifies whether it was routine, or resulted in death, or the patient was disposed to other care facilities. The NIS database contains a number of variables describing the hospital stay. Up to 15 diagnoses and 15 131 procedures are coded based on the International Classification of Diseases, 9‘h Revision, Clinical Modification (ICD-9-CM). They can be used to select samples of hospital discharges for specific diagnoses and procedures. A broader categorization of ICD codes may be used for this selection based on HCUP’s Clinical Classification Software (CCS), or by diagnosis-related groups (DRG) that combine information on patient age, sex, diagnoses and procedures accounting for relationships among them and to inpatient resource use. The main limitation of the NIS is its inability to track individual patients. Each record in the NIS is a separate hospital discharge, and thus we cannot identify multiple admissions by the same individual within a year (or across years). Also, the NIS does not include all preoperative and postoperative treatments rendered in an outpatient setting, although services in ambulatory surgery centers may be included. We will use the NIS to examine length of stay and hospital charges associated with admissions for heart disease. In particular we concentrate on treatments undergone for acute myocardial infarction (AMI). Coronary heart disease and treatment procedures The heart is a muscle that works 24 hours a day. To perform well, it needs a constant supply of oxygen and nutrients, which is delivered to the myocardium (heart muscle tissue) by the blood through the coronary arteries. The blood flow to the heart can be reduced by a process called atherosclerosis, in which plaques of fatty substances build up inside the walls of blood vessels. The plaques attract blood components, which stick to the inside surface of the vessel walls. Atherosclerosis can affect any blood vessels and 132 causes them to narrow their lumina and harden their walls. This process develops over many years and can begin early, even in childhood. Coronary heart disease (CHD) is the most common form of heart disease, the leading cause of death for Americans. About 12.6 million Americans suffer from CHD, which often results in a heart attack. About 1.1 million Americans suffer a heart attack each year, and about 515,000 of these heart attacks are fatal65 that is about 2,600 every day; one person every 33 seconds. It takes more lives than cancer-in fact, more than cancer, accidents, and the next five leading causes of death in the United States combined“. In CHD, atherosclerosis affects the coronary arteries. The fatty buildup, or plaque, can break open and lead to the formation of a blood clot. The clot covers the site of the rupture, also reducing blood flow. Eventually, the clot becomes firm. The process of fatty buildup, plaque rupture, and clot formation recurs, progressively narrowing the arteries. Ever less blood reaches the heart muscle, and thus fewer quantities of oxygen and nutrients reach the myocardium, leading to ischemia (oxygen starvation of the heart), clinically translated into chest pain. Depending on the degree of pain, the level of physical activity that pain occurs at, and the degree of coronary obstruction, coronary artery diseases can be classified into asymptomatic or mild angina, angina class II-IV or unstable angina, acute myocardial infarction, ischemia after Coronary Artery Bypass Graft (CABG)67. Following previous analyses of charges and LOS of AMI patients in the MICH 48-50 study , we will focus on patients admitted in the hospital with AMI, a common high- mortality condition whose outcomes are affected by the process of care. We only focus 133 on patients that underwent as their primary procedure Coronary Artery Bypass Grafting (CABG), Cardiac Catheterization (CATH) or Percutaneous Transluminal Coronary Angioplasty (PT CA) or patients with AMI who underwent no procedures at all. Cardiac catheterization In cardiac catheterization (abbreviated “CATH"), a diagnostic procedure in which a very small catheter (hollow tube) is advanced from a blood vessel in the groin through the aorta into the heart. Once the catheter is in place, several diagnostic techniques may be used. The tip of the catheter can be placed into various parts of the heart to measure the pressure within the chambers. The catheter can be advanced into the coronary arteries and a dye injected into the arteries (coronary angiography or arteriography). With the use of fluoroscopy (a special type of X-ray), the physician can tell where any blockages in the coronary arteries are located as the dye moves through the arteries. Percutaneous transluminal coronary angioplasty Percutaneous transluminal coronary angioplasty, also known as PT CA, is an established, effective therapy for some patients with coronary artery disease. PTCA is used to dilate (widen) narrowed arteries. A doctor inserts and advances a catheter with a deflated balloon at its tip into the narrowed part of an artery. Then the balloon is inflated, compressing the plaque and enlarging the inner diameter of the blood vessel so blood can flow more easily. Then the balloon is deflated and the catheter removed. PT CA is a less traumatic and less expensive alternative to bypass surgery for some patients with coronary artery disease. In about 40 percent of patients who've had PTCA, the dilated segment of the artery narrows again within six months after the procedure. They may require either another PT CA or coronary artery bypass surgery“. 134 Improvements in technologies for PT CA include use of stents and more-recently drug-eluting stents (DESs). A stent is a surgical stainless steel coil that is inserted into the blocked artery via a catheter. The stent serves as a scaffold, supporting the artery walls, and reducing the risk of the artery re—closing (restenosis) over time. Bare-metal stenting after angioplasty has more durable effects than angioplasty alone, but researchers are still seeking ways to reduce restenosis, which occurs about 20% of the time in the'clinical setting. The most promising method to lower restenosis rates appears to be using drug- eluting stents (DESs). Some researchers are hailing DESs as one of the greatest interventional advances in the past decade. But others warn that while DESs appear to offer benefit over bare-metal stents, the hype may be outrunning the science. In his article ‘Drug-Elutin g Stents Show Promise’, Mitka (2004)69 presents the beneficial effects of DESs, but also talks about hospital concerns that reimbursements for such devices may be too low, leading to financial losses for these institutions. NIS database precedes the widespread use of stents and therefore the analyses that we describe would refer to PT CA. Coronary artery bypass graft Also known as “bypass surgery” or CABG, coronary artery bypass graft operation uses a piece of vein taken from the leg, or of an artery taken from the chest or wrist. This piece is attached to the heart artery above and below the narrowed area, thus making a bypass around the blockage. Sometimes, more than one bypass is needed. Bypass surgery may be needed due to various reasons, such as an angioplasty that did not sufficiently widen the blood vessel, or blockages that cannot be reached by, or are too long or hard for, angioplasty. In certain cases, bypass surgery may be preferred to angioplasty. For 135 instance, it may be used for persons who have both CHD and diabetes. A bypass also can close after a period of time. This happens in about 10 percent of bypass surgeries, usually after 10 or more years. Variations in hospital costs/charges As suppliers of the most expensive type of health service, hospitals have been particularly vulnerable to the pressures of competition. The health economics literature contains a number of recent studies on hospital costs. A common technique that has been proven to be well-suited to examining the effects of provider institutions on patient costs in a managed care environment is multilevel modeling. This framework is used to analyze data that fall naturally into hierarchical structures consisting of multiple ‘micro’ units nested within ‘macro’ units. Multilevel models analyze variability arising at distinct levels within data by extending the more traditional statistical techniques and introducing a degree of realism often absent from single-level models such as multiple regressions. A good description of why charges may differ between facilities is presented in Health Core Data Report, 2000 (PHC 5320) from Department of Health and Family Services, Division of Health Care Financing, Bureau of Health Information”. New technology - The equipment facilities use to provide services differs in age, sophistication, and utilization. Facilities with the latest technology may have higher charges than those with older, less sophisticated equipment. Stafi‘ing costs - Salary scales may differ regionally and are typically higher in urban than rural areas. Furthermore, competition for nurses and other skilled personnel may result in higher staffing costs and, therefore, higher charges. 136 Intensity of care - Facilities differ in the severity of illness of patients (i.e., some facilities care for more severely ill patients than others). Patients within the same diagnosis or procedure classification may need very different levels of service and staff. Efi‘iciency of operation - Facilities vary in the utilization and efficiency of services they provide. Infrequently used services may cost more per patient than services that are used more frequently. Difi‘erences in coding - Facilities vary in their coding systems and personnel, and in the number of billing codes they put on a billing form. The use of additional appropriate codes may result in a patient being assigned to a diagnosis or procedure classification with greater reimbursement or may otherwise justify higher charges. Facilities with better-trained personnel or more sophisticated coding software are more likely to place these additional codes on their billing forms and, therefore, may have higher charges than facilities with less expertise. Discounts - Facilities negotiate and offer volume discounts to Health Maintenance Organizations (HMOs), Preferred Provider Organizations (PPOs), and other large-volume purchasers of health care services. The number of these organizations has grown considerably in recent years. Full charges are paid for only a very small proportion of patients. Percentage of government pay — Government payers generally reimburse facilities at rates below their full charges, similar to the discounts offered to commercial payers. Therefore, facilities with a large percentage of patients whose charges are paid either by government programs or discounted commercial payers may report large gaps 137 between what they bill and what they actually receive. This may result in higher charges, including those for non-discounted patients. Facility price structures - Some facilities spread the cost of services and equipment over all patients. Others bill the full cost of a service to those patients actually using the service. Furthermore, facilities may provide some services at a loss while allowing other facility operations to subsidize the losses. Any of these practices can result in significantly different charges for a given diagnosis or procedure classification. Range of services provided - Facilities differ in the range of services they provide to patients. Some may provide the full range of services required for diagnosis and treatment during the stay. Others may stabilize patients and then transfer them to another facility for more specialized or rehabilitative care. Data-related issues - Facilities differ in the number of cases served, the case-mix and illness severity of patients, and the comparability of patients within a given diagnosis or procedure classification. For example, a single case can greatly affect a facility’s average charge if the facility reported only a few cases. Capital expenses - Facilities differ in the amount of debt and depreciation they must cover in their rate structure. A facility with a heavy debt load, a new building, or a major renovation to amortize may have higher charges than a facility not facing such expenses. Furthermore, facilities may choose to lease or purchase equipment or facilities. The choices made about financing of capital projects may affect charges in different ways. Rice et al (1997)“, Carey (2000, 2002) 52' 53 and Goldstein (2002)54 insist on the usefulness of multilevel methods in studies where data on cost are collected over multiple 138 sites (hospitals in our data). This chapter takes a multilevel modeling approach to estimate costs in patients hospitalized for AMI. Patients nested within hospitals form a natural hierarchical structure suited to analysis that models each level simultaneously. The existence of a non-zero intra-hospital correlation, resulting from the presence of more than one residual term in the model, means that traditional estimation procedures such as OLS, which is used for example in multiple regression, are inapplicable. Application of OLS techniques leads in this case to incorrect inferences, although when the intra-class correlations are small we can expect reasonably good agreement between estimates from the multilevel and the simpler OLS approaches. One can also go one step further and extend our model to include higher levels such as hospitals nested in states. Correlates of total charges available in the data set Possible correlates of total charges available in the data set are: procedure, gender, age, number of procedures, hospital characteristics (location, teaching status, bed size, region), length of stay (LOS), Charlson Comorbidity Index (CCI). Procedure: Clinical Classifications Software (CCS), developed by the Agency for Healthcare Research and Quality (AHRQ), is a tool for clustering patient diagnoses and procedures into a manageable number of clinically meaningful categories. CCS is used for grouping conditions and procedures without having to wade through thousands of codes. This "clinical grouper" makes it easier to quickly understand patterns of diagnoses and procedures so that health plans, policymakers, and researchers can analyze costs, utilization, and outcomes associated with particular illnesses and procedures. 139 CCS collapses diagnosis and procedure codes from the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM), which contains over 12,000 diagnosis codes and 3,500 procedure codes. Without CCS, the large number of ICD-9- CM codes poses difficulties in statistical analysis and reporting. CCS consists of two related classification systems, single level and multi-level, which are designed to meet different needs. The multi-level CCS groups single—level CCS categories into broader body systems or condition categories (e. g., "Diseases of the Circulatory System," "Mental Disorders," and "Injury"). Multi-level CCS is most useful when evaluating larger aggregations of conditions and procedures or exploring them in greater detail. Single-level CCS is most useful for ranking of diagnoses and procedures. The single-level diagnosis CCS aggregates illnesses and conditions into 259 mutually exclusive categories. We consider discharges that have either CABG (CCS=44), CATH (CCS=47), PTCA (CCS=45) or no procedure (CCS=.) as a primary procedure. Demographics Variables: We use gender and age of the patient. Patient age is recorded as age in years at admission. Other demographic variables in the NIS are race and household income by zip code, but they are not uniformly recorded for all states. Number of procedures (NPR): NPR counts the number of ICD-9-CM procedures coded on the discharge record. The principal procedure is included in this count. A value of 0 means that the patient underwent no procedures on record, a value of 1 means that only the primary procedure is recorded, secondary procedures are left blank, etc. A maximum of 15 procedures have been retained on a NIS inpatient record. States 140 that provide fewer than 15 procedures have had the procedure vector padded with blank values. For example, if a state supplied 5 procedures, PR6 through PR15 are blank (" ") on all records from that state. States that provide more than 15 procedures may have information truncated. All states have provided at least 6 procedures. CCI (Charlson Comorbidity Index“) is used to assess comorbidity. CO is a weighted sum of the presence of 15 specified medical conditions at admission. There are two ICD-9-CM adaptations, Deyo (1992)72 and Dartmouth-Manitoba, Romano et al (1993)73 of the Charlson comorbidity index, as well as other various searches for improved clinical comorbidity indices74'76. We use the Dartmouth-Manitoba version of the index. The conditions and associated weights (shown in parentheses) are: CHF = 'Congestive Heart Failure' (1) PVD = 'Peripheral Vascular Disease' (1) CVD = 'Cerebrovascular Disease' (1) DEM = 'Dementia' (1) COPD = 'Chronic Obstructive Pulmonary Disease' (1) ULCD = rUlcer Disease' (1) MLIVD = 'Mild Liver Disease' (1) DIAB = 'Diabetes without Complication' (l) HEPL = 'Hemiplegia' (2) REND = 'Renal Disease' (2) DIABCC = 'Diabetes with Complications' (2) MALIG = 'Any Malignancy' (2) 141 SLIVD = 'Moderate or Severe Liver Disease' (3) CANCER = 'Metastatic Solid Tumor' (6) AIDS = 'AIDS’ (6) NIS_STRATUM is a four-di git stratum identifier used to post-stratify hospitals for the calculation of universe and frame weights. The hospital's census region, ownership/control, location/teaching, and bed size were obtained from the American Hospital Association (AHA) Annual Survey of Hospitals. Region: States were grouped in 4 regions Northeast (CT, MA, ME, NY, NJ, MA) Midwest (IA, IL, KS, MO, WI) South (FL, GA, KY, MD, NC, SC, TN, TX, VA, WV) West (AZ, CA, CO, HI, OR, UT, WA) Location] Teaching: A metropolitan statistical area is considered urban, and a non-metro statistical area is rural. Teaching hospitals have an AMA-approved residency program, are a member of the Council of Teaching Hospitals (COTH) or have a ratio of full-time equivalent interns and residents to beds of .25 or higher. All hospital in the rural area were classified as non-teaching. Bedsize categorizes the number of short-term acute beds in a hospital into ‘small’, ‘medium’, ‘large’. A hospital's bed size category depends upon region, location and teaching status. For example, urban teaching hospitals in the Northeast were classified as having ‘large’ bed size if the number of hospital beds exceeds 425. In the western region for the same location/teaching status, the ‘large’ bed size category is defined if the count 142 exceeds 325. This categorization was created to have approximately 1/3 of hospitals in each bed size category in a given region, location, teaching status combination. Ownership/control includes categories for government nonfederal (public), private not-for-profit (voluntary) and private investor-owned (proprietary). However when the sample size was sufficiently large, hospitals were stratified as public (Ownershipzl), voluntary (Ownership=2) and proprietary (Ownership=3). This stratification was used for southern rural, southern urban non-teaching, and western urban non-teachin g hospitals. For smaller strata — the midwest and western rural hospitals —a collapsed stratification of public versus private was used, with the voluntary and proprietary hospitals combined to form a single ‘private’ category (Ownership=4). For all other combinations of region, location and teaching status, no stratification based on control was advisable given the number of hospitals in these cells (Ownership=0). Although the CONTROL variable was used to define the strata, AI-IRQ collapsed some categories. In consequence, because of the overlapping categories, this variable will not be used further. Total Charges Total charges in NIS data are the amount the hospital charged or billed for the entire hospital stay. They do not necessarily reflect reimbursements or costs. Charge data were present for 98 percent of all discharges. Charges are generally higher than costs. Generally, total charges (TOTCHG) do not include professional fees and non-covered charges. If the source provides total charges with professional fees, then the professional fees are removed from the charge during HCUP processing. In a small number of cases, 143 professional fees were not be removed from total charges because the data source did not provide the necessary information. Two states, MA and W1, will be excluded from the working data set (see exclusion criterion #8 in the next section), because they may have included professional fees. Emergency department charges incurred prior to admission to the hospital may be included in total charges. 3.2 Creating a working subset data set From the NIS 2000 Core file of 7.4 million discharges we extracted all records with a primary diagnosis of AMI. This is based on DXCCS 1:100 or ICD9-CM codes 410.xx and it yields n=157,263 discharges. All discharges fall within Major Diagnostic Code MDC=5 (Circulatory System). There are 165 possible primary procedures for these discharges. Table 3.1 shows the distribution of primary procedures among discharges, excluding procedures that are present in <1% of discharges. We will consider only 121,264 discharges that have either no procedure, Coronary Artery Bypass Grafting (CABG), diagnostic cardiac catheterization or coronary artheriography (CATH) or Percutanerous Transluminal Coronary Angioplasty (PT CA) as their primary procedure. 144 Guided by several published analyses from the N IS, the following exclusion criteria will be applied (all n’s are out of the 121264 patients): 1. We exclude admissions for AMI that were not the first episode of care for a newly diagnosed AMI (ICD9-CM code: 410.x1), or if the location of the infarction was unspecified (410.x0)64 (n=983). We exclude discharges in hospitals that performed S 5 cases since these cases are most likely coding errors” 43' 47 (n: 256). We exclude patients <18 years or >85 years at admission. The pathophysiology of disease in patients <18 years is likely to be different than for adults (n=22). Also patients >85 years are less likely to be treated aggressively than younger patients” 47' 6“ (n=1 1,238). Because of concerns that patients with diagnosis code ‘AMI’ may have included ‘rule out’ AMIs, all discharges of < 2 days are excluded (n=l6,682). Patients with LOS >60 days are eliminated because they are probably long term care patients 52 (n=57). Newborn admission types are also excluded (n=7). For hospital costs within the US health-care system one common approach is to assign a cost to each hospitalization based on the basis of its associated Diagnostic-Related Group”. In order to make sure that we only have patients with AMI in the working data set, we exclude patients with the following DRG in effect on discharge date (DRG=104, 105, 113, 119,124, 125,144, 145, 468, 477,478,479,483) (n=l,6l 1). 145 10. 11. 12. Excluding states: States MA and W1 may have included physician fees in the total hospital charges. Some hospitals in TX did not report total charges until July 2000. Because identification of these hospitals was not available, we exclude all discharges from TX, MA, WI (n=20,560). One hospital in Arizona is also excluded because 44.3% of its total charges data is coded C=inconsistent (either excessively low or high) (n=149). One can not link information from the discharging and receiving hospitals. Therefore, if we want to capture LOS and total hospital charge for AMI admissions, then we must restrict to ED admissions and all routine referrals from physicians, clinics, and HMOs (n=23,244). We drop records missing an essential field such as patient age, gender, length of stay, total charges, DRG, admission source or discharge status (n=10,511). We also eliminate patients who left the hospital against medical advice since we do not know whether they received subsequent care (n=703). The exclusion criteria above are not of course mutual exclusive, a discharge may be excluded for more than one reason. After all exclusions we are left with n=58,469 discharges. Censoring The variable DISPUNIFORM indicates the discharge status assigned to each record. We have already excluded missing, invalid records and patients who left the hospital against medical advice. DISPUNIFORM=1 (routine) or DISPUNIFORM=20 146 (died in hospital) cover a complete hospital episode. In these two cases we will regard TOTCHG and LOS as completely observed. Patients transferred to a short-term hospital. transfers to skilled nursing facilities (SNF), intermediate care, home health care have incomplete TOTCHG and LOS. They will be treated as censored. Of the 58,469 discharges in our working data set 18,753 (32%) of the discharges are censored. Figure 3.1 summarizes the process of creating our data set from the initial 157,263 discharges for AMI. Characteristics of the patients Table 3.2 shows the characteristics of the patients. As already mentioned we only consider discharges that have either CABG (CCS code: 44; N=7,369, 12.6%), CATH (CCS code: 47; N=14,264, 24.4%), PTCA (CCS code: 45; N=17,901, 30.6%) or no procedure (CCS code: N=18,935, 32.4%) as a primary procedure. There are N: 35,977 (61.5%) males and mean age is 65.72 (STD=12.7). Since the higher the number of procedures, the higher the cost, we categorized patients in 3 possible groups: no procedure (N=18,935, 32.4%), 1-4 procedures (16,489, 28.2%), 5 or more procedures (23,045, 39.4%). We will form two comorbidity groups based on CCI score: 0 (no comorbidities, N=23294, 39.8%) and 1 or more (at least one comorbidity). The biggest percent of discharges in the data set comes from the South region (43.1%), followed by Midwest (19.9%), Northeast (19.6%) and West (17.4%). The hospitals are categorized as: rural (13.5%), urban/teaching (42.1%), urban/non-teaching. Hospitals are grouped as: Small (8.9%), Medium (25.5%) or Large. 147 3.3 Estimation methods We will follow methods similar to those described in Chapter 2. Total hospital charge (TOTCHG) and length of stay (LOS) are two primary outcome variables. We identify the average total charge over a specified duration 2' with the net present value. Given a covariate profile 20 and ignoring discounting the average cost restricted to 2' is ,u(T) = — Eco, (s | ZO )dS (t | Z0) . We first describe a multilevel model to estimate c0 1 (t | Z0 ) , including a method for estimating weights. Then we will describe methods to estimate the LOS distribution. Combining these two estimation steps we arrive at an estimate of ,u( 2') . ESTIMATING co, (1 | 20) A traditional approach for estimating patient costs involves ordinary least squares (OLS) models containing a response variable at the individual level and correlates at both individual and higher levels of analysis. This disregards correlations structures in the data emanating from common influences operating within groups. As argued in the previous section, hospital characteristics, such as quality of service, managerial performance or treatment patterns administered by physicians may impose distinct effects on the costs of treating patientssz‘ 53. The working data set consists of 58,469 discharges for AMI patients. Of these, 39,716 are uncensored for total charge and LOS. There are 617 hospitals with discharges between 6 to 1181 per hospital. Because charge data is skewed we consider a log- 148 transformation to mitigate the effects of skewness of TOTCHG. Figure 3.2 shows the histogram of log-transformed charges in all 58,469 discharges. 3.3.1 Unconditional means model A one-way random effects model is fitted first. We express the outcome, Y,,- : log(TOTCHG) associated with the i'h discharge in the 1"” hospital, as a linear combination of a grand error mean flo , a series of deviations from the grand mean no, and a random error 8,}: Model 1: Y,,. = ,6,) +110]. + 5,]. where no, ~ iid N (0,030), 8,,- ~ iid N (0.03) and uoj , 6,,. are assumed independent. We exclude censored observations from this model by using weights 5,- , where 5,- is the censoring indicator. Note that since “0," and 8,,- are assumed independent, we have Var(YU ) = Var(flo + “o; + 8,,- ) = 030 + of , and for two different discharges i at i’ within the same hospital Cov(Y,,. , Y,,, ) = of. Using estimates of the variance components, the intra-class correlation 2 0' . . . p = ———“—°—— is estimated to be 3954 = .533 .The most vanatron occurs at the 3,, +03 .3954+.3466 hospital level and the value of ,0 suggests that there is considerable clustering effect within hospitals, invalidating the traditional OLS results. 149 3.3.2 Including effects of hospital level (level 2) correlates The unconditional means model, Model 1, provides a baseline against which we can compare more complex models. We include hospital characteristics which are level 2 variables: Region, Location, Bedsize. These categorical variables have more than 2 levels. For example, region has 4 categories (Northeast, Midwest, South, West). To keep our notation simple we use one name for each categorical variable indicating the region, location and bed size of each hospital. The model is: Model 2: Y,,. = ,6,),- + 8,, and ,6,),- = :50 + flOIRegionj + ,602 Location ,- + [303 Bedsize,- + “0} where no, ~ iid N (0,030), 8,,- ~ iid N (0,0?) and uoj , 8,, are assumed independent. Substituting the level 2 equation into the level 1 equation yields: 1’},- = [ 160 + flmRegionj + flozLocationj + ,603 Bedsize, ]+ [uoj + 8,, ]. The terms in the first bracket represent the fixed part, while the terms in the second bracket represent the random part. The residual (error) variance within hospital 0'? , remains almost unchanged (going from .3466 to .3465). However the variance component representing variance between hospitals, 0'30 , is much lower (going from .3954 to .1682). Therefore the three variables, Region, Location, Bedsize explain a large portion of the hospital-to-hospital variation in log total charges. About .3954 -.1682 .3954 = .57 of the explainable variation in hospital log total charges is explained by the three variables. We can estimate once again the intra-class correlation as 150 .1682 .l682+.3465 = .33 which remains high. 3.3.3 Including effects of discharge level (level 1) correlates We illustrate the effect of including discharge level correlates by initially examining a model with only level 1 correlates, excluding level 2 correlates. After reviewing the necessary steps for including level 1 correlates, we fit a combined model. Suppose first that all correlates: Procedure, LOS, Age, Female, CCI, NPR are fixed effects. Since NPR, the number of procedures, is in strict dependence with Procedure (for example a patient who receives a procedure, CABG, PT CA or CATH will have NPRZ 1, a patient who has no'procedures will have NPR=O), we use the cross-term NPR*Procedure to assure a unique interpretability of the coefficients in the model. Based on residual analyses we also decide to include LOSSQ = LOS2 and Agesq=Age2 in the model. Then the model can be written as: K Model 3a: Kj:160j+zflkx(k)zj+€zj , flojzflo+u0j k=1 where uoj ~ iid N (0,030), 8,, ~ iid N (0,03) and uoj, 8,,- are assumed independent. The covariates x,“ include LOS, LOSSQ, Age, Agesq all as continuous variables; and indicator variables for procedure type (CABG, PTCA, CATH), Gender (female), CCI (0), NPR (14), and two dummies for CABG*NPR (CABG, 1-4 procedures) and PTCA*NPR 151 (FT CA, 1-4 procedures). As expected these fixed effects variables at the discharge level considerably reduce the within hospital error variance, of , going from .3466 in Model 1 to.0965. Since LOS is a very strong correlate of the outcome and the variance of Y,,. varies with LOS as seen in Figure 3.3, we consider having a random coefficient for LOS. This will allow for the relationship between LOS and patient total charges to vary across hospitals. The model can be written as: K M0d613b3 Y,,. =50}+511L0511+Zfltxtm+56 . It=1 1501' :30 +u0j'fl1jzfll +“11‘ . j]~ lidN(02922x2)9 gij ~iid N(O,U§) and (uoj’ulj)’ 8,] massumed “o where ul ,- 2 UuO 0.1401 2 ] is chosen by 0.1401 0.141 independent. Our unstructured covariance matrix 2 =[ considering the AIC, BIC criteria. Model 3b becomes K I . a o a 0 Y,,- : ,60 + Z ,6,, x, k ),.j + “01 + u1 jLOSU + 8,, . Wrthin a hosprtal, the variance of Y,,- is k=l Var(Y,-,- ) = Var(uo ,. + u, ,.Los, + 5,, ) = 6,3,, + 20,0,Los, + of, L053. + 6,2. The estimated covariance parameter estimates are 0'30 = .1958, of, = .00052, 030, = —.00449. Also 0',2 = .09352. We compare the two models above, Model 3a and Model 3b. When fitting this models we find that the change in -2LL when using REML method to estimate parameters is ~728.6. An approximate test of the null hypothesis that this change is 0 is 152 obtained when we compare the difference in -2LL’s to a 2’2 distribution with 2 degrees of freedom. The p-value is <.0001. Therefore we will keep random slopes for LOS. Should LOSSQ, along with LOS, be included as random effects at the hospital level? If we do so, there is an additional variance component that comes from the LOSSQ qu term. The error term is ul ,- . Figure 3.3 suggests that LOSSQ is not needed. Fitting a ugl- 2 0140 0.1401 01102 general covariance matrix X = d,,,“ 03, 03,12 results in an estimate 0'32 =0, i.e. 2 0.1402 0.1112 0.142 the estimate is on the boundary of the parameter space. Another reason for this is seen in Var(1’}j) = Var(uoj + uleOS,-j + uZJ-LOS,-,2 + 8,,- ) = _ 2 + 2 2 2 4 2 2 3 — ,0 d,,,LOS,-,. + ouzLOSU + o, + 20,0,LOS,,- + ZoumLOSU + Zo'mLOSU which has a 4th-order term in LOS. If 0'32 2 0 so are 0,02 and Gun. 3.3.4 Including both effects of discharge level (level 1) and hospital level (level 2) correlates Having considered models with either just 1 level correlates or level 2 correlates, we can now consider models which contain variables of both types. Consider the following model: 153 K MOde‘ 4: Yij = 501‘ + 511L056 +Zflkx(k)ij +81,- . k=1 where ’50]. = :60 + ,6,),Regionj + ,602 Location j + ,603 Bedsize, + uoj . .31; = .51 +flnRegionJ. +,B,2Locationj +fl13Bedsizej +u1j. u - 0'2 0' Here [ 0’} iidN(02,22,2) where z: “0 “20‘ , 5,, ~ iid N(o,o§)and "U 01101 0141 (“01’“11') , 8,, are assumed independent. Combining the two equations above yields the single two level equation model: 3 3 K Y1]: .60 +h2fl0hzhj+hzfl1hzhj*LOSTj+l,ZflI:x(k)1j+(u0j+u1jLOSij +513) =1 :1 =1 where 21 ,- = Region, , 22,- : Location,- , z3j = Bedsize}. In addition to the cross-terms required by the model: Region*LOS, Location*LOS, Bedsize*LOS, and NPR*Procedure, significant cross-terms procedure*LOS, Procedure*LOSSQ, NPR*LOS, CCI*LOS, CCI*LOSSQ are included in the model. Within a hospital, the conditional variance of Y,,- is Var(Y-- ) = 030 + 20,,01LOSU + 031L055? + of and the covariance between two different patients: Cov(Y,-j ’Ylj ) = CoVar(u0,- + uleOSU + 80,140, + uU-LOSU + 8,,- ) = = 3,, + of, Los,.,.Los,,. + 0,,01 (1.05,, + 1.05,, ) . These elements form higher level block matrices V, represented as V, = ZQQZ’+ Q1 , where Z contains the jth hospital values for the known design matrix of explanatory variables for the random portion of the model, etc. The covariance parameter estimates are 630 = .1189, of, = .000202, 6,,, = —.00137 and the estimated level one 154 variance is ,2 = .08866. This can be compared with the level one variance obtained from the null model, Model 1, or total within hospital variance, .3466. The ratio .3466 - .08866 .3466 = .74 represents the proportion of within hospital variance that is explained by correlates (comorbidity, procedure, demographic variables). The between hospital covariance is 0. 3.3.5 Inverse probability weighting Before estimating c01 (t | 20) for specific covariate profiles, we need to account for the uncensored observations in the analysis. As discussed, previously in Chapter 2, we will weight each observation by w,- = 5,17,- S 1']/ G(T} ) , where 5,- is the censoring indicator and G(-) is the survival function for the censoring distribution. To be consistent with our notations above we will use j for hospital and i for discharges, although we are well aware that these are not the standard notations when one would consider hospital as the primary units and discharges within hospitals as subunits. Consider the Random-Effects model yj=Xjfl+Zjaj+uj (3.1) where the dimensions of y, and u, are n,- XI, ,6 is pxl, X, is n, Xp, Z, is n, Xq and a, is qxl , where p is the number of fixed effects, q is the number of random effects 155 and n ,- is the number of discharges in the jth hospital. We assume the usual conditions of a Random-Effects model34 E(u, |X,,Z,-,a,.)=0, E(a,|X,.,Z,-)=0. Let v j = Z ,a ,- + u ,- be the composite error. Under these assumptions E (v j ) = 0 and Var(v,)= 2,02; +R,-, where Var(a,)= E(aja; |X,-,Z,)=G and Var(u, ) = E(uju; IX,- ,Z,- ,a', ) = R,.. The form of G is specified by the TYPE option in the RANDOM statement, while the form of R j is specified through the TYPE option in the REPEATED statement. To estimate ,8 we use generalized least squares (GLS) estimation, so an estimate 5 of ,6 that minimizes the expression Z(y, —X,.6)’V,."(y, —X,-,6) is given by 1' 13': (XX, v,.'"X )“(ZX’ .,v"y, Weighting observations Let W,- = diag{ w, ,i =1,...n, } denote a diagonal matrix of weights w,-,. for discharges within the jth hospital. Weighting observations is more than simply transforming model (3.1) by W)”. If we do so, and write the transformed model as y,=X,fl+Z,a,-+ii, (3.2) (where y, = W,” y ,- and the other quantities are similarly defined), then it is easily seen that GLS applied to (3.2) does not change our estimator ,5 , indeed if 156 17,. =Var(W,1/2v, )=w,‘/3(z,.oz, +R,)W,.‘/2 then 'I"-l" -1 'I"-1~ _ I —l -l I -l (ZXJVJ' X1) (ZXTVJ y,)-(ZX,V, X1) (ZXIVJ' y,). 1 1 1 1 Consider first the standard linear regression model y, =x,,6+u,- (where we now return to standard subscript ‘i’ for subject). By weighting the observations by w,- , we actually estimate ,6 by minimizingz w,- ( y,- — x,-,6)2 . This gives Ill 5 = (Z w,x,'x,-)_1(Z w,x,'y,-) and Var(,B) = (E(w,x,-'x, ))—l E(w,.zufxfx, )(E(w.x-'x- ))—1. In the general RE model (3.1), by weighting we mean transforming y, —+ y, , Z, ——> 2,, X, —> 21,, but preserving the original variance form V, = Z,GZ, +R,. Then 5 is obtained by minimizing 2(y, — X,,B')’W,”217,"'W,'’2 (y, - X,,B). Now J consider 17, = Z,GZ, +R,. 1. If G = 0, then W,“217,"W}/2 = Wj’zR,W}/2. In effect therefore, the original R, is replaced by W,7”2R,-W,-“2 to get 6 = (Z X;W}’2R;‘W}’2X. 1“ (Z X;W}/2R;‘W}”y. >- J' 1' This is the practice adopted in SAS PROC MDED (Chapter 18, Version 8.2 p633). If in - _ 2 "_ I —1 I part1cularR,—0'£I then fl—(ZX,W, X,) (:X,-W, y,). l J 2. If G i 0 , then the form of V371 = (2,02, + R, )'l is more complicated. Indeed 157 v,T‘ =R,T.’[1— 2, (GT1 +2 RT‘Z) z’. ,R,T.‘]. Then w,l’217,T‘W,l’2=-W,“2R,T‘[1—z',(GT‘+2,R,T'Z',)T‘Z,R,T‘]W,"2= This shows that the effect of weighting is to change R,- to W,'“2R,W,'“2. The final form of the GLS estimator of ,5 under weighting is then 6: (ZX’ v,T'X, )T (ZX, v,T‘y, (3.3) where V, =W,‘“2V,W,'“2 and the variance of )6 is ~ ~ Var(6)= (ZX,V,"'X, )T(ZX,,T13 13,17, ,"1X,)(ZX,17,71X,)"', where 13,- are the residuals y,- — X, ,5 . This can be obtained using the EMPIRICAL option in the PROC MIXED statement. This estimator has been described in White (1980)”, Liang and Zeger (1986)”, and Diggle, Liang, and Zeger (1994)80 and is commonly referred to as the "sandwich" estimator. We need to specify this in accordance with our methods described in Chapter 2 of the thesis. For this option to take effect we must include the RANDOM or REPEATED statements. If R,- :03]. then vi-I =w}”v‘.-lw}’2 =w.“ (W"’z 02 ’-W}” +631>“W}” 421-02} ”WW" and (3.3) gives 6: (Z X, (Z,Gz, +a§W,T‘ )T1 X, )T1 (Z X,- (z ,02, +a§W,T‘ )T1 y , ) . 1' 1' 158 3.4 Results of estimation In this section we will concentrate on obtaining estimates of median LOS and total charges at median LOS, at specified covariate profiles. For example, we vary the type of procedure, hospital region and the number of comorbidities, but we keep fixed age (= 65), gender (= male), hospital location (= rural), hospital bed size (= medium) and the number of procedures between 1 and 4, if there is a primary procedure. All analyses were carried out using SAS. Various estimation methods for G and S, the survival distribution functions for censoring time, U and LOS (= T), are discussed in Chapter 2, Remark 2.2.1.2. Here, we choose parametric methods for estimating both G and S. Estimation of G Given a smaller number of covariates z ( procedure, Charlson Comorbidity Index, region and number of procedures) an estimate of G(t | z) may be obtained from a parametric model log(U ) = zflu + 0',,€ , where ,6,, and 0,, are parameters and 8 is an unknown error with a specified parametric distribution. PROC LIFEREG is used for the estimation of ,6,, . Various choices for the distribution of 8 were considered. Based on the AIC we chose the log-normal and gamma distributions. Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-normal and gamma models are shown in Figure 3.4. The plotted pairs of points for the gamma regression model initially fall on the referent line and then fall mostly above the referent line. Based on this plots we decide that the log-normal distribution provides a better fit to the data. 159 log(t) - 26,, ) u The estimated weights are w, =5, /G(T,- Iz,) where G(t | z) =1—( where <1) is the cumulative distribution function for the normal diStribution. Estimation of CO, (I | Z0) To estimate c0l (t | 20), we use the weighted regression model described in the previous section. Parameter estimates derived using PROC MIXED are summarized in Table 3.4. Although coefficient estimates for continuous variables must be interpreted with care, all estimates seem to be in the right direction. For example, charges increase with the complexity of the cardiac procedure, with CABG having the highest charge followed by PT CA and diagnostic CATH. Patients who underwent none of these procedures had the lowest cost. For a given covariate profile Z0, which contains t and t2 , 50, (I | 20) = exp(Zo,5)exp('/2(5'30 + 26,0,t+63,z2 +63 )). Estimation of S (t | Z0) Similar to estimating G, we estimate S (t | Z0) using PROC LIFEREG in SAS. from a parametric model for uncensored observation (i.e. 5,- =1), log(T) = X ,6, + 0,8 , where 8 is a vector of errors. The log—logistic and gamma distributions exhibit the lowest AIC. Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-logistic and gamma models are shown in Figure 3.5. 160 Based on these graphs, the AIC criterion and previous experiences with the analysis of LOS49’ 50 we use the log-logistic distribution. Estimates of (1S (t | 20) are obtained as d§(t | 20 ) = . BMW) 2 dt, where w = (log(t) — 206,. )/6,. 0,. (1+ exp( w)) Finally, cost estimates at any time t for fixed time covariate profile ZO are obtained as £60, (u | Z0 )dS(u | 20) = ,1/6. exp(-206. I6.) 6, (1+ uvé’ exp(-206, I6, ))2 = LCXMZO (“)3)CXP(V2(6'30 + 251401“ ‘1’ (33,142 ‘1” 5'3 )) where we denote Z0 (u) = (Zo,u,u2) . The expression of £50, (u | Z0 )dS(u | Z0) can be viewed as a function ,0, (6,5,5, ,5’,) where 5' = (5'30 ,5',,o, 531,552) . For a fixed I, using the Delta Method we compute the variance of the mean cost. Estimates of median LOS, and total charges at median LOS, together with confidence intervals, for specified covariate profiles as well as graphic representations are presented in Table 3.4, and Figures 3.6, and 3.7. Discussion We have applied methods discussed in Chapter 2 of this thesis to estimate mean charges for hospital stays of specified duration for patients hospitalized for AMI. Our method accounts for the existence of a non-zero intra-hospital correlation, censoring and skewness of total charges. Estimates derived in our model are quantitatively similar to those reported in other studies of health care utilization in patients undergoing CABG surgery or PTCA. 161 Although 2000 costs might seem more relevant for comparison purposes, costs inflated to 2000 may not fully reflect current costs because the efficiency of intervention procedures and cardiac surgery services probably has increased during the last years; Weintraub (1995)81 argues that costs may best be used for comparison between procedures. Also, direct comparisons with our estimates is not possible because of different patient demographic and clinical characteristics, use of charges as a proxy for costs and different time periods. We now discuss results of previous studies. Three randomized clinical trials have evaluated the relative costs of angioplasty versus bypass surgery”. These studies included all costs to the patient, including hospitalization, procedures, medications and follow-up care. The Bypass Angioplasty Revascularization Investigation (BARI) Study of Economics and Quality of Life (SEQOL) (n =1829 patients) was collected over a three-year period from August 1988 to August 1991. Using BARI data, Hlatky et al (1997)83 report that the mean initial hospital cost of angioplasty was 65% of that of CABG, while same percent in our data is 67%. The Randomised Intervention Treatment of Angina (RITA) trial (UK) reported that an angioplasty had only 52 percent of the cost of a bypass operation following the initial procedure“. The Emory Angioplasty Versus Surgery (EAST) trial found the same percent to be 63%“. Using data from Michigan Inter-institutional Collaborative Heart Study (MICH), collected on 360 patients who underwent cardiac procedures in two large urban medical centers in eastern Michigan during January 1994 through April 1995, Polverejan et al (2003)49 report mean charges of 23,545$ and 26,213 (inflated 2000 costs: 26,080$, 28,435$) for patients who underwent CABG with 1 or 2 comorbidities respectively. From 162 a large national sample of Medicare patients who underwent CABG surgery in 1990, Cowper et al (1997)85 report a mean hospital cost of 26,9765 (inflated to 2000 dollars: 33,202$) for patients with AMI undergoing bypass surgery. We observe higher length of stay in NE then in any other region. Similar findings from Centers for Disease Control and Prevention, National Center for Health “'88 are reported: in most years, the average length of stay in the Northeast was Statistics significantly longer than the average stays in the other three regions. NIS data has both strengths and limitations. Because NIS data represent hospital discharges and not individual persons, the main limitation is that there exists the possibility that these discharges will show up as complete cases in the data set as another record. However, restricting our admissions to ER admissions and all routine referrals would decrease this chance considerably. This issue is impossible to resolve completely because of the inability in the NIS to track individual patients. Another limitation of the data set is that it lacks clinical detail (e. g., stage of disease, vital statistics) and laboratory and pharmacy data. The main strength of the NIS is that it is the largest collection of all- payer, uniform, state based inpatient data. 163 TABLE 3.1 Primary procedures Primary procedure“ CCS code: Description N PERCENT .: No PR code 44,003 28.0 44: Coronary artery bypass graft (CABG) 15,290 9.7 45: Percutaneous coronary angioplasty (PTCA) 3,7249 23.7 47: Diagnostic cardiac catheterization, coronary 24,722 15.7 artheriography (CATH) 48: Insertion, revision, replacement, removal of 2,026 1.3 cardiac pacemaker or cardioverter/defibrillator 49: Other O.R. heart procedures 2,908 1.8 63: Other non-O.R. therapeutic cardiovascular 1,714 1.1 procedures 193: Diagnostic ultrasound of heart (echocardiogram) 2,549 1.6 216: Respiratory intubation and mechanical 5,562 3.5 ventilation 222: Blood transfusion 1,826 1.2 231: Other therapeutic procedures 6,350 4.0 *All AMI admissions (N=157,263). Only primary procedures that are present in more than 1% of discharges are shown 164 Figure 3.1 Creating a working data set N=157,263 AMI diagnosis l N=35,999 Other Procedures J. i EXCLUSION CRITERIA“ (N) N=121,264 NO PROCEDURE, CABG, PT CA or CATH l (983) 2 (256) 7 (1,611) 3 (11,283) 8 (20,560) 4 (16,682) 9 (149) After exclusions Y 5 (57) 10 (23,244) N =58,469 6 (7) 11 (10,511) l l 12 (703) CENSORED N=18,753 UNCENSORED N=39,716 *Exclusion criteria above are not mutual exclusive, a discharge may be excluded for more than one reason 165 TABLE 3.2 Descriptive statistics (N=58469) N=58,469 N=39,7l6 (all discharges in the working (only discharges with data set) complete TOTCHG) Variable Subgroup Frequency Percent Frequency Percent Procedure CABG 7 .369 12.6 4.5 86 1 1.6 PTCA 17,901 30.6 16,540 41.7 CATH 14,264 24.4 9,420 23.7 No procedure 18,935 32.4 9,170 23.1 Gender Male 35,977 61.5 25,466 64.1 Female 22,492 38.5 14,250 35.9 NPR (number of None 18.935 32.4 9,170 23.1 procedures) 1 - 4 16,489 28.2 11,915 30.0 2 5 23.045 39.4 18.631 46.9 CCI 0 23,294 39.8 17,505 44.1 1+ 35,175 60.2 22,211 55.9 Region Northeast 1 1,458 19.6 6,432 16.2 Midwest 11,628 19.9 8,441 21.3 South 25,219 43.1 17,344 43.7 West 10,164 17.4 7,499 18.9 Location Rural 7,900 13.5 4,210 10.6 Urban Teaching 25,981 44.4 16,358 41.2 Urban Non-Teaching 24,588 42.1 19,148 48.2 Bedsize Small 5,174 8.9 2,743 6.9 Medium 14,904 25.5 8.992 22.6 Large 38,391 65.7 27.981 70.5 Mean = 5.44 Mean = 5.06 , 25‘h Percentile = 3 25lh Percentile=3 LOS 2 2 Contlnuous Median = 4 STD = 4.1 Median ___ 4 STD = 3.4 751h Percentile = 7 75‘h Percentile =6 Mean = 65.73 Mean = 64.03 . 25‘h Percentile = 56 25‘h Percentile =54 18 S AGE S 85 Contlnuous Median = 67 STD = 12.7 Median = 65 STD = 12.8 75"1 Percentile = 76 75Ih Percentile =75 166 Figure 3.2 . Histogram of Log (TOTCHG) 8 .. N $469 Mean 9.95 : 7 ‘ Std Deviatim 0.82 l4 N Minimum 3.. 5‘ Maximum 13.8 / K 6 ‘ Skewno. -0.13 “ Kurtosis 0.“ K Percent .5 l ‘1‘ fl 0 l l T l l l I l 1 l I I 3.675 4.425 5.175 5.925 6.675 7.425 8.175 6.925 9.675 13.425 11.175 11.925 12.675 Log-transformed of 6118.138 167 Figure 3.3 Graph of Log (TOTCHG) versus LOS 14.4 ., It 13 * air * if“ I! ** 'l' 13"»: t , *Ih . . n4 * * *§ ,.. 1.: :*§* * § * .. " .. ‘3)! up .. 1V2” . . * ‘ * X in .. ... i 1 9E n. “J * *3”! #10! a * § * .,= ~ 1'4. . I a... It 0 1*...4‘ * cg 9.. IF‘ 0 . if “a 8 it; 5 x b 7 * :* ... | * *i* 4* 8’ 6 E gain“! —J I!” 5 * * 44 * 34] l l I 1 T 0 I) 20 30 40 60 Length of stay (LOS) 168 Figure 3.4 Estimation of G: Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-normal and gamma models 4. Kaplan-Meier Cunulaive Hazard ‘ I r r 0 1 2 3 meal Rog Model leulativa Hazard Kepler-Meier Cumulative Hazard I T I O 1 2 3 Gamma Reg Modal Cumulative Hmrd 169 TABLE 3.3 Parameter estimates in multilevel regression Effect Groups Estimate StdErr tValue P-value Intercept _ 8.400 0.0790 106.36 <.0001 Procedure CABG 2.022 0.0553 36.56 <.0001 PTCA 1.620 0.0424 38.21 <.0001 CATH 1.031 0.0419 24.59 <.0001 LOS _ 0.206 0.0060 34.18 <.0001 LOSsq _ 0.004 0.0003 -1405 <.0001 CCI No comorbidities -0.059 0.0126 -4.68 <.0001 Male _ 0.022 0.0044 4.96 <.0001 AGE _ 0.005 0.0016 2.90 0.0037 Agesq _ 0.000 0.0000 -4.18 <.0001 Number of procedures 0 or Z 5 procedures 0.395 0.0317 12.47 <.0001 Procedure*NPR CABG , Z Sprocedures -0.316 0.0571 -5.53 <.0001 PTCA , Z Sprocedures -0.292 0.0432 -6.75 <.0001 Region Northeast 0.256 0.0405 -6.33 <.0001 Midwest -0.363 0.0346 -1050 <.0001 South 0.172 0.0424 -4.06 <.0001 Location Rural -0.107 0.0358 -2.97 0.0031 Bedsize Small 0126 0.0092 -1371 <.0001 Medium 0.097 0.0067 -1451 <.0001 LOS*Procedure CABG 0.074 0.0059 -12.61 <.0001 PT CA 0.003 0.0004 9.30 <.0001 CATH 0.003 0.0003 9.61 <.0001 LOSsq*Procedure CABG 0.002 0.0003 7.85 <.0001 PTCA 0.141 0.0347 -4.06 <.0001 CATH 0.203 0.0310 -6.53 <.0001 LOS*NPR 0 or Z 5 procedures -0.010 0.0022 -4.55 <.0001 LOS*CCI No comorbidities 0.013 0.0032 4.03 <.0001 LOSsq*CCI No comorbidities 0.000 0.0001 -3.53 0.0004 LOS*Region Northeast -0.008 0.0040 -1.98 0.0472 Midwest 0.007 0.0031 -2.37 0.0179 South 0.012 0.0030 .407 <.0001 170 Effect Groups Estimate StdErr tValue P-value LOS*Location Rural -0.001 0.0036 -0.28 0.7803 LOS*Bedsize Small 0.001 0.0040 0.32 0.7481 Medium 0.002 0.0029 0.70 0.4839 Covariance Parameters Subject Estimate StdErr ZValue P-value UN(1,l) HOSPID 0.1235 0.00837 14.75 <.0001 UN(2,1) HOSPID 0.0024 0.00049 4.93 <.0001 UN(2,2) HOSPID ' 0.0004 0.00004 9.57 <.0001 Residual 0.1384 0.00100 139.03 <.0001 171 Figure 3.5 Estimation of S: Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the regression model estimates of the cumulative hazard from both log-logistic and gamma models 44 i :1: 5 3 § 0 5 I § 9. 5c 0“, I I I 0 1 2 3 Log—Logistics Rog Model Cumulative Hazard “l s .. :2 xx”, g 3 § 24 g ”,x,, I ’1”,, t .. 0‘ I l O 1 2 3 Gamma Rog Model Cumulative Hmrd 172 Table 3.4 Estimates of total charges at median LOS by procedure types, comorbidity levels (0, 1+) and regions“. CABG NE $33,705.24 (i $4,931.65) CATH $12,247.12 (i $1,554.89) (LOS=9.45) (LOS=5.22) MW $33,192.80 (i $3,997.98) $12,111.16 (i $1,214.83) (LOS=7.71) (LOS=4.26) s $34,098.87 (i $3,924.04) $12,507.77 (i $1,173.06) (LOS=7.87) (Los=4.35) ,3. w $44,882.55 (i $5,757.71) $16,278.16 (i $1,786.27) :23 (1.03:7.41) (1.08:4. 10) g PTCA NE $20,657” (,3 32.64669) NONE $7,214.51 (i $896.86) é (LOS=4. 13) (LOS=5-36) MW $20,703.55 (i $2,075.27) $6,973.11 (:1: $672.76) (LOS=3.37) (LOS=4.37) 5 $21,386.47 (i $2,020.57) $7,205.49 (i $641.46) (LOS=3.44) (LOS=4.46) w $27,796.50 (i $3,071.51) $9,339.37 (i $982.97) (LOS=3.24) (LOS=4.20) CABG NE $36,544.54 (1r $5,350.00) CATH $13,356.57 (i $1,687.48) (LOS=11.78) (LOS=6.51) MW $35,770.72 (i $4,332.80) $13,124.59 (i $1,316.81) (LOS=9.61) (LOS=5.31) 8 $36,682.38 (i $4,245.21) $13,532.62 (1 $1,267.17) g» (LOS=9.82) (LOS=5.42) LE w $48,570.85 (i $6,267.93) $17,668.42 (:t $1,943.16) § (LOS=9.24) (LOS=5.11) g PTCA NE $22,232.91 (i $2,858.71) NONE $8,071.18 (it $994.88) g (LOS=S. 15) (LOS=6.68) 2 MW $22,202.56 (i $2,260.66) $7,721.79 (5: $740.39) (LOS=4.20) (LOS=5.45) s $22,923.91 (i $2,199.08) $7,965.56 (i $702.93) (LOS=4.29) (Los=5.56) w $29,858.35 (1' $3,349.71) $10,360.54 (i $1,086.92) (LOS=4.04) (LOS=5.24) *for fixed covariate profile: age=65. gendemmale. hospital location = rural, hospital bed size = medium, and 1-4 procedures (if there is a primary procedure) 173 Figure 3.6 Estimates of total charges at median LOS - no comorbidities* E Total charges + LOS $35000 NEMWS ——> ———> ——><———> (— CABG (— PTCA (— CATH NOPROC 10.00 9.00 8.00 . 7.00 6.00 5.00 4.00 3.00 2.00 1 .00 0.00 LOS ‘Estimates corresponding at median LOS, for covariate profile: age=65, male, rural location, medium bed size, between 1-4 procedures 174 Figure 3.7 Estimates of total charges at median LOS - at least one comorbidity“ C: Total charges -0— LOS LOS $60,000 1400 $50000 w 1200 \’\ 10.00 g; ”T '7 V'“ w 8.00 5 $30,000 _- g MW 8 W 6.00 " $20,000 at l— J NE ...,... UUU UWUU 2... .. UU .... (— CABG 5 (— PTCA a (— CA TH ’ ‘ NOPROC ’ *Estimates corresponding at median LOS, for covariate profile: age=65, male, rural location, medium bed size, between 1-4 procedures 175 CHAPTER 4 CONCLUSIONS AND FUTURE WORK Evidence of rising health care expenditures is widespread. The resurgence in health care spending is fueled by fundamental forces that will prove difficult to resist”. One analyst has predicted that if current trends continue, health care will consume 25 percent of the GDP by the year 203090. The enormous investment in biomedical research will probably accelerate the rate of technological development in medicine, with effects on overall expenses”. Also the proportion of the population 65 years of age or older and the proportion over 80 will increase by 33 percent and 14 percent, respectively, between 2000 and 20209'. N o matter how healthy elderly persons are in the future, as compared with earlier generations, their sheer numbers are almost certain to result in increased expenditures for health care”. Yet another important influence will be increasing national prosperity. The more income people have, the more of it they tend to spend on health care. One of the strongest predictors of the proportion of the GDP that Western nations spend on health care is the GDP itself”. Therefore, substantial increases in health care spending over the next 5 to 10 years are virtually inevitable. The United States spends 3.4 percent more on health care as a percentage of the GDP than any other Western country”, but there is no conclusive evidence that health outcomes are better in the United States than in other industrialized nations”' 93' 94. 176 Because of concerns on the availability of resources to pay for different health care interventions, medical technologies and treatments, CEA has evolved to become an important discipline in the pursuit of maximizing health benefits from a specified expenditure, or in finding the lowest-cost strategy for a specified health effect. Although several advances have been made over the past decade, complexities in analyses of health care cost and outcomes still pose many methodological challenges. Guided by other research in this field 2'9’ I" '5' 30' 33' 34'48'49’ 52'53’ 59'60’95'103, in this thesis we have addressed several statistical issues in the analysis of medical cost data. The objective of Chapter 1 was to demonstrate how costs could be incorporated into a longitudinal model in which a patient’s event history unfolds as sojoums in health states and as transitions between health states. Using a Markov model the counting process methodology was used to estimate the transition probabilities and integrated transition intensities with patient heterogeneity modeled through Cox regression on the underlying transition intensities. There are extensive uses of this model in biomedical applications. Incorporating costs into this framework is new, although the basic idea of calculating present value is ubiquitous in the insurance and finance literature where costs are manifest as fixed amounts paid at random times, and as fixed payment streams over random durations. However, the fundamental difference in our context is the need to estimate both the transition costs and the sojourn costs. Our approach uses a mixed random effects model to estimate costs conditional on time, and then the dynamics of time through transition probabilities and integrated intensities. In this way we obtain estimates of net present for expenditures incurred over a finite time horizon. The asymptotic distribution of these estimates is shown to depend on three stochastic 177 components: the regression parameter in the random effects model for costs, the regression parameter in the Cox regression model for transition intensities and the estimator of the baseline integrated intensity. In Chapter 2, we showed that our transition model described in Chapter 1 captures costs under the simpler two state survival model with a single transition time and sojourn. We discuss parametric methods for estimating the survival distribution for censoring time, and therefore the weights in the IPCW technique, as well as methods of estimation for the survival distribution for event time. In Chapter 3 we provided an application of our method using length of stay (LOS) and inpatient costs for patients hospitalized for AMI from the Nationwide Inpatient Sample 2000 (NIS 2000) of Healthcare Cost and Utilization Project. Our analysis addresses a number of characteristics of these utilization data including censoring of costs, skewness in cost distributions, heterogeneity in LOS and costs across patients. Our method is based on a hierarchical model in which patients are nested within hospitals. Future work To obtain cost-effectiveness measures such the cost-effectiveness ratio (CER) and net health cost (NHC) for competing health care interventions or treatments, we need both accurate estimates of costs and health benefits. In this thesis we have addressed several issues on estimating costs. The three basic measures of health outcomes are survival probability, life-expectancy (LE) and quality adjusted survival years (QALY). 178 To assess the health benefit of an intervention relative to another, we may also use any clinically meaningful measure such as improvement in life expectancy, deaths averted, or number of toxic side effects prevented. For example, in our multi-state description, life expectancy translates to the average time from the time origin until death or some other absorbing state is reached. Since the goal of any health care intervention is much broader than simply treating the disease condition or preventing death, the use of quality-adjusted life years (QALYs) to quantify health outcomes has been advocated15 . QALYs also provide a common-metric to gauge health benefit across disease types. For example, a decision maker facing resource allocation can compare the cost-effectiveness of coronary artery bypass surgery versus a percutaneous coronary intervention, and the cost- effectiveness of different lipid lowering therapies for the prevention of cardiovascular disease, and the cost-effectiveness of different regimens of screening women for their susceptibility to breast cancer. For each unit of time spent in a health state, a quality weight is the relative value placed on that health state against the state of perfect health. Perfect health has a quality weight of one, while death, or states judged equivalent to death, get a quality weight of zero. All other health states receive a quality weight between zero and one. Having determined quality-weights for various health states, QALYs adjusts the length of life for the quality of life during those years. With multiple health states, each state is associated with its sojourn. For example, suppose we have estimated a total life-expectancy of 30 years, which consists of 10 years in a state with quality-weight 0.5, 10 years in a health state with quality weight 0.65, and 10 years in perfect health. This results in an expected 21.5 QALYs (=10x.5+10x.65+10x1). It is worth noting that the two main elements in 179 calculating QALYs, time spent in each health state and its associated utility, can vary across patients. Accounting for the random variation is an important task in any CEA. Suppose the state space E is labeled with ‘k’ being an absorbing state and the remaining states being transient. For example, in cancer treatment studies104 transient states are relapse and remission and death is an absorbing state. Survival time is then the time to absorption defined by 1,, = inf {I > 0: X (t) = k} , with survival distribution conditional on the initial state, S,k(t) = P[rk >t| X0 :1] =1—Bk(0,t). If r1 which has hazard 0’00 = —a0l , and the initial distribution is degenerate, 7:0(0) = 1. Also, the survival distribution is S (t) = P[rl > t], 2'l being the survival time. Without discounting (r =0), LE reduces to restricted mean survival, E (2'1 A T) = ES(t)dt . This entity is often used in economic evaluations when survival time is the principal endpoint. Finally, if we write q(t) = q(0,t) we get QALY= £e_"q(t)S(t)dt. In comparing two competing health care interventions, a test intervention versus a standard, the cost-effectiveness ratio (CER) is the ratio of the incremental cost relative to the incremental benefit. In the patient population for which these interventions are intended, let am and a“, denote respectively, the cost and effectiveness of the test intervention, and, .11” and psebe the corresponding measures for the standard intervention. Then the CER is the parameter 6 = (,um - a“ )/( a“, — ,use ), and having established a maximum value of the CER (90 , a value society is willing to pay to gain one unit of effectiveness by adopting the test intervention, the net health cost is given by NHC( 60 )= (rate _ Ilse) — 60 (lute -Iuse) ' we equate lite and ruse by ‘E601(5lzo)d5(tlzo) ’ where ZO =1 for the new intervention and Z0 = 0 for placebo. We may define effectiveness, a“, and ,use, in terms of the restricted mean survival as E S (t | Z0)dt . Then 181 NHC(00)=£c0|(s|ZO =0)r1$(t|zO =O)—Ec01(s|Zo =1)dS(t|ZO =1)- -60E(S(IIZO =1)-S(t|z0 =O))dt. The methods developed in Chapters 1 and 2 can be applied to the estimation of QALY and NHC. Because our models incorporate covariates, their practical importance lies in the fact that both QALY and NHC can be estimated along specified covariate profiles. However, this requires data on patient health histories and costs over time. With the increased focus being given to controlling health care expenditures, such longitudinal data sets are likely to be compiled from administrative databases, large scale chronological and epidemiological studies. Many recent clinical trials have included economic sub-studies. For example the Antiarrhythmics Versus Implantable Defibrillators (AVID)‘°6 and Canadian Implantable Defibrillator Study (CIDS)'°7 trials yielded useful information on both the clinical effectiveness of the implantable defibrillator as well as its cost—effectiveness. Many researchers have examined issues of statistical power and sample size assessments for cost-effectiveness studiesw'lm’103'log"l3. They show that the requirements on sample size demonstrating cost-effectiveness can be many times larger than that needed to establish effectiveness alone. Therefore the investigators face the ethical dilemma of continuing trial and enlarging them for economic evaluation studies. It is here where our regression techniques could be most informative. Indeed, Simon Thompson recently editorialized in Statistical Methods for Medical Research1 '4, ‘the challenge for the future will be development of regression models for costs and cost-effectiveness that can assess the impact of patient characteristics and cost elements that drive total costs’. Determining in which subgroups 182 of patients an intervention is more cost effective from health care utilization studies and clinical trials would be an active field of research in the coming years. 183 APPENDIX MATHEMATICAL BACKGROUND Concepts of stochastic processes, stochastic integration, functional delta method and results on Ito integration, all used in this thesis, will be briefly reviewed in this appendix. This appendix uses results from Andersen et al (1993)”, Gill (1989)”, Harrison (1985)‘ '5, Oskendal (1995)l '6 and Polverejan (2001)“. Stochastic processes Let (Q,f,P) be a probability space. A filtration ( f; ,t E T ) is an increasing right-continuous family of sub- 0' algebras of f . A stochastic process X is just a time- index collection of random variables {X (t) : t E T}. The process X is called adapted to the filtration f; if X (t) is f; -measurable for each I. We write X 0,01) for the realized value of X(t) at the point ate 9. The process X is called cadlag if its sample paths { X (t,a)) : t E ’7' }, for almost all a) , are right-continuous with left-hand limits. The set of cadlag functions is denoted by D( ’I' ), the Skorohod space of weak convergence theory. For two stochastic processes X and Y, JXdY denotes the stochastic process t—) £X(s)dY(s) defined for each a) andtsuch that £|X(s)||dY(s)| [0| dY(s)| is called its total variation process. To a cadlag process X, we associate its left-continuous modification X 7 , defined by X' (t) = X(t—) and its jump process defined by AX(t) = X (t) — X (t—). Martingale, predictable processes, compensators A martingale is a cadlag adapted process M which is integrable, i.e. E(| M (t) |) < 0° for all 16 T and satisfies the martingale property E(M (t) | f; ) = M (s) for all 5 St . The process is called sub(super)-martingale if the equality is replaced by E(M (t) | .73 ) 2 M (s) (E(M (t) | f; ) S M (s) respectively). A martingale is called square integrable if sup E (M (t)2 ) < 00. A local martingale is a process M such that an 167 increasing sequence of stopping times Tn exists, P(T" 2 t) —>l as n —-> 0° for all 16 T, such that the stopped process {T,, > 0]M T" are martingales for each n (here X T (t) = X (T A t) ). A local square integrable martingale is a process M as above such that the localizing sequence can be chosen making {T,, > 0]M T" a square integrable martingale. A class of processes complementary to martingales is the class of predictable processes. A stochastic process H is called predictable if, as a function of (t,a))e T xQ , 185 it is measurable with respect to the 0' — algebra on T x (2 generated by the left- continuous adapted processes. Any left-continuous adapted process is predictable. There is an important orthogonality between martingales and finite variation predictable processes. This is due to the fact that if a process is at the same time both a local martingale and a predictable finite variation process, then it is trivial or constant process. Suppose that X is a cadlag adapted process. We say that X is the compensator of X if X is a predictable, cadlag, and finite variation process such that X — X is a local martingale, zero at time 0. If a compensator exists it is unique, by Doob-Meyer decomposition theorem. A semi-martingale is the sum of a local martingale and a cadlag adapted finite variation process. All (local) sub-martingales and super-martingales have compensators. In particular, non-decreasing, non—negative locally integrable cadlag processes have compensators, because such processes are sub-martingales. Suppose M, N are local square integrable martingales. The square of a local square integrable martingale is a local sub-martingale and also has a non-decreasing compensator denoted by < M >. Also the product, MN = i {(M + N )2 — (M — N )2 } , is the difference of two local sub-martingales, therefore it has a compensator denoted by (M , N). By definition, < M > and (M , N) are the unique finite variation cadlag predictable process such that, M 2 — < M > and MN - (M , N) are local martingales, zero at time 0. The process < M > is called the predictable variation process of M, and (M , N) the predictable covariation process of M and N. 186 Theorem (see Theorem H.3.1, Andersen et a1 (1993)”) Suppose M is a finite variation local square integrable martingale, H is a predictable process, and [H 2d < M > is just locally finite (automatically true of H is locally bounded). Then IHdM is a local square integrable martingale, and < J'HdM >= IH2d. The predictable process H is locally bounded if it is left continuous and also if it is right-continuous. Formulas for predictable and optional covariation processes of stochastic integrals follow the same form. We have < deM. jKdN >= jHKd < M , N >. Fubini-type Theorem Let (s,u) —> H(s,u) , (s,u)e (TXT ) be a bounded, Bxfi measurable function, where 2? is the set of Borelians on T . Let a be a finite measure on the space (T,B). Then for every te T , A6 3: L1 £H(s,u)dM (u)],u(ds) = £[ LH(s,u)d,u(ds)]dM (u) for almost all we 52. For more general versions of this theorem and their proofs see Protter (1990)l 17. Functional Delta-Method A concept of differentiability that allows generalization of the usual Delta Method is the one of Hadamard or compact differentiability. Let B1 and 32 two normed vector spaces. 187 The functional 0:81 ——> 82 is compactly or Hadamard differentiable at a point 66 B1 if and only if a continuous linear map dgoz Bl ——> 82 exists, such that for all real sequences an -—> co and all convergent sequences h" —> he Bl , a" (¢(e+a;'h,, )-¢)(t9)) 4 d(o(6).h as n —> oo. Here d¢(6) is called the derivative of (a at the point 19. Next we define the concept of weak convergence in normed vector spaces. Let (B,|| . ll) be a nonned vector space endowed with a 0' - a1 gebraQ? , such that 23' g 23 g 23", where 23' and 6' are the 0' — algebras generated by the open balls and the open sets of B, respectively. Thus B" is the Borel 0'— algebra; when B is separable E" = 9". Let X n be a sequence of random elements of (B, E) and let X be another random element of that space. We say X n converges weakly (or in distribution) to X and we write X" —D—>X if and only if E(f(X,, ))-—D—>Ef(X) for all bounded, norm- continuous, E -measurable f : B —> R. Theorem (see Theorem 3, Gill (1989)”) Suppose (0: Bl —> B2 is compactly or Hadamard differentiable at a point ,ue B] and both it and its derivative are measurable with respect to the 0' — algebras 231 and $2 (each nested between the open ball and Borel 0' — algebras). Suppose X n is a sequence of elements of B1 such that Z, = J; (X n - ,u) —P—> Z in B1 , where the distribution of Z 188 is concentrated on a separable subset of B1 . Suppose addition: 82 x 82 —> B2 is measurable. Then (1) lJZrX. -#),JZ(¢(X,, )—¢ (2.0) in B1 >o and (3) JZMX. >000) —3—+ 4000.2. Measurability of drp: Bl —> 32 can often be shown to follow from measurability of (a (see Lemmas 4.4.3 and 4.4.4, Van der Vaart (1988)“). The following is a useful lemma. In many applications the mapping (a is only a priori defined on certain members of B1 and one could set about choosing a particular extension to all of Bl such that the hypotheses of the Functional Delta Method are satisfied in each particular application. Lemma (see Lemma 1, Gill (1989)”) Consider xe E C B1 and (0: Bl —> Bz. Suppose there exists a continuous linear map drp(1r):B1 —> B2 such that for all tn —9 0 (tn 6 R) and h" -—> he Bl such that x" = x+tnhn E E for all n, we have: r;‘ (¢(x+ tnhn )—¢(x)) 4 d(0(x).h as n —> oo. Then (a can be extended to B1 in such a way that it is differentiable at x , with derivative d¢(x). The derivative is unique if the closed linear span of possible limit points h equals 3,. 189 Suppose we endow D[0,z'] with I] . IL. , the supremum norm , and (D[0,2'])P with the maximum-supremum norm (i.e. for x6 (D[0,r])p, || x||= max(|| xl IL, ,...|] xp “w)). Under this norm (D[0,T])” is Banach, but not separable. However under the Skorohod topology, these spaces are Banach and separable. In these spaces if the limiting process has continuous sample paths then weak convergence in the sense of the Skorohod metric and in the sense of the supremum norm are exactly equivalent. Otherwise, supremum norm convergence is stronger. Results on Ito integration Let (Q,f,P) be a probability space with a filtration ( f: ,tE T), where .76 contains all null sets of f . A continuous k-dimensional vector martingale M = (M (t),te T), T =[0,2'),1'E R is called Gaussian if: i) < M > is a continuous deterministic k xk positive semidefinite matrix valued function on T , with positive definite increments, zero at time 0. ii) M (t) — M (s) has a multivariate normal distribution with zero mean and covariance matrix V(t)—V(s) and is independent of (M (u),u S s) for all 0 S s St in T . Let 77/2 be the set of all adapted processes X on ((52.17 ,P), (f; ,te T )) satisfying E( I; X2 (s) < M (ds) >) < oo for all t6 T and let M be a continuous Gaussian martingale on this space. For any fixed t, one can define the Ito integral I , (X ) = EXdM for X6 712. 190 Properties of the Ito integral: Let X,Y€ 7f: andlet OSs. 191 10. BIBLIOGRAPHY Heffler S, Smith S, Keehan S, Clemens M, Zezza M, Truffer C. Health Spending Projections Through 2013. Health Affairs. 2004. Willan AR, Chen EB, Cook RJ, Lin DY. Incremental net benefit in randomized clinical trials with quality-adjusted survival. Statistics in Medicine. Feb 15 2003;22(3):353-362. Willan AR, Lin DY, Cook RJ, Chen EB. Using inverse-weighting in cost- effectiveness analysis with censored data. Statistical Methods in Medical Research. Dec 2002;11(6):539-551. Lin DY. Linear regression of censored medical costs. Biostatistics. 2000;1:35-47. Lin DY. Regression analysis of incomplete medical cost data. Statistics in Medicine. Apr 15 2003;22(7):1181-1200. Gardiner J C, Bradley CJ, Huebner M. The cost-effectiveness ratio in the analysis of health care programs. In: Rao CR, Sen PK, eds. Handbook of Statistics, Bioenvironmental and Public Health Statistics. Vol 18. New York: North- Holland; 2000:841-869. Gardiner J C, Indurkhya A, Luo Z. The performance of estimation procedures for cost-effectiveness ratios. In: Balakrishnan N, ed. Advances on Methodological and Applied Aspects of Probability and Statistics. New York: Gordan Breach; 2001:547-559. Gardiner J, Holmes-Rovner M, Goddeeris J, Rovner D, Kupersrnith J. Covariate- adjusted cost-effectiveness ratios. Joumal of Statistical Planning and Inference. 1999;75:291-304. Baser 0, Bradley CJ, Gardiner J C. Estimation from censored medical costs. In Review. 2002. Baser 0, Bradley CJ, Gardiner JC, Given CC. Testing and correcting for non- random selection bias due to censoring: An application to medical costs. Health Services and Outcomes Research Methodology. 2003;4(2):93-107. 192 11. 12. l3. 14. 15. 16. 17. 18. 19. 20. 21. 22. Baser O, Gardiner JC, Bradley CJ, Yuce H, Given CC. Longitudinal analysis of censored medical costs data. In Review. 2002. Commonwealth of Australia: Guidelines for the pharmaceutical industry on preparation of submissions to the Pharmaceutical Benefits Advisory Committee: including major submissions involving economic analyses. 1995. Located at: Canberra: Australian Government Publishing Service. National Institute for Clinical Excellence. Technology appraisals work programme, new guidance documents. 2001. ‘ Canadian Coordinating Office for Health Technology Assessment. Guidelines for economic evaluation of pharmaceuticals:. Canada. 2nd ed. Ottawa: Canadian Coordinating Office for Health Technology Assessment ( C COHTA ). 1997. Gold MR, Siege] JE, Russell LB, Weinstein MC, eds. Cost-Effectiveness in Health and Medicine. New York: Oxford University Press; 1996. Stinnett AA, Mullahy J. Net Health Benefits: A new framework for the analysis of uncertainty in cost-effectiveness analysis. Medical Decision Making. 1998;18(Supplemental):S68-S80. Manning WG, Mullahy J. Estimating log models: to transfrom or not to trasfonn. Journal of Health Economics. 2001;20:461-494. Manning WG. The logged dependent variable, heteroscedasticity, and the retransforrnation problem. Journal of Health Economics. 1998;17:283-295. Duan N. Smealin g estimate: A nonparametric retransforrnation method. Joumal of the American Statistical Association. 1983;78(383):605-610. Ai C, Norton EC. Standard errors for the retransforrnation problem with heteroscedasticity. Journal of Health Economics. 2000;19:697-718. Mullahy J. Much ado about two: Reconsidering retransformation and the two- part model in health econometrics. Journal of Health Economics. 1998;17:247- 28 1. Lucas JW SJ, Benson V. Summary health statistics for U.S. Adults: National Health Interview Survey 2001. Washington, DC: Centers for Disease Control and Prevention, National Center for Health Statistics; 2004. Library of Congress Catalog Number 362. 193 23. 24. 25. 26. 30. 31. 33. 34. 35. Hay JW, Olsen RJ. Let them eat cake: A note comparing alternative models of the demand for medical care. Journal of Business & Statistics. l984;2(3):279-282. Duan N, Willard G. Manning J, Morris CN, Newhouse JP. Choosing between the sample-selection model and the multi-part model. Journal of Business & Economic Statistics. l984;2(3):283-289. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993. Praestgaard J. Nonparametric estimation of actuarial values. Scandinavian Actuarial Journal. 1991;2:129-143. Norberg R. Ruin problems with assets and liabilities of diffusion type. Stochastic Processes and Their Applications. Jun 1 1999;81(2):255-269. Norberg R. Differential-Equations for Moments of Present Values in Life- Insurance. Insurance Mathematics & Economics. Oct 1995;17(2):171-180. Norberg R. A Time-Continuous Markov-Chain Interest Model with Applications to Insurance. Applied Stochastic Models and Data Analysis. Sep 1995;11(3):245- 256. Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete follow-up data. Biometrics. 1997;53:419—434. Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika. 2000;87(2):329-343. Strawderrnan RL. Estimating the mean of an increasing stochastic process at a censored stopping time. Journal of the American Statistical Association. Dec 2000;95(452):1192-1208. Wooldridge JM. Inverse Probability Weighted Estimation for General Missing Data Problems. Working Paper Michigan State University. April 2003. Wooldridge J M. Econometric Analysis of C ross Section and Panel Data. Cambridge, MA: MIT Press; 2002. Zhou XH, Tu WZ. Interval estimation for the ratio in means of log-normally distributed medical costs with zero values. Computational Statistics & Data Analysis. Dec 28 2000;35(2):201-210. 194 36. 37. 38. 39. 40. 41. 42. 43. 45. Zhou XH, Tu WZ. Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics. Dec 2000;56(4):1118-1125. Zhou XH, Stroupe KT, Tierney WM. Regression analysis of health care charges with heteroscedasticity. Joumal of the Royal Statistical Society Series C-Applied Statistics. 2001;50:303-312. Zhou XH, Melfi CA, Hui SL. Methods for comparison of cost data. Annals of Internal Medicine. 1997;127(8):752-756. Zhou XH, Li CM, Gao SJ, Tierney WM. Methods for testing equality of means of health care costs in a paired design study. Statistics in Medicine. Jun 15 2001;20(11):1703-1720. Zhou XH. Inferences about population means of health care costs. Statistical Methods in Medical Research. Aug 2002;11(4):327-339. Zhou XH, Gao SJ. One-sided confidence intervals for means of positively skewed distributions. American Statistician. May 2000;54(2): 100-104. Maynard C, Chapko MK, Every NR, Martin DC, Ritchie JL, Department of Medicine DoVASW. Coronary angioplasty outcomes in the Healthcare Cost and Utilization Project, 1993-1994. The American journal of cardiology. Apr 1 1998;81(7):848-852. Ritchie JL, Maynard C, Chapko MK, Every NR, Martin DC, Department of Medicine DoVASWUSAruwe. Association between percutaneous transluminal coronary angioplasty volumes and outcomes in the Healthcare Cost and Utilization Project 1993-1994. The American journal of cardiology. Feb 15 1999;83(4):493-497. Brooks JM, McClellan M, Wong HS. The marginal benefits of invasive treatments for acute myocardial infarction: does insurance coverage matter? Inquiry : a journal of medical care organization, provision and financing. Spring 2000;37(1):75-90. Brooks SE, Ahn J, Mullins CD, Baquet CR, D'Andrea A. Health care cost and utilization project analysis of comorbid illness and complications for patients undergoing hysterectomy for endometrial carcinoma. Cancer. Aug 15 2001;92(4):950-958. 195 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. Howard TS, Hoffman LH, Stang PE, Simoes EA, Galt Associates SVUSA. Respiratory syncytial virus pneumonia in the hospital setting: length of stay, charges, and mortality. The Journal of pediatrics. Aug 2000;137(2):227-232. Watanabe CT, Maynard C, Ritchie J L, Department of Medicine UoWSWUSA. Short-term outcomes after percutaneous coronary intervention: effects of stenting and institutional volume shifts. American heart journal. Aug 2002;144(2):310- 314. Polverejan E. Regression Models for Analysis of Medical Costs [PhD]. East Lansing: Statistics & Probability, Michigan State University; 2001. Polverejan E, Gardiner JC, Bradley CJ, Holmes-Rovner M, Rovner D, Department of Epidemiology MSUELMIUSA. Estimating mean hospital cost as a function of length of stay and patient characteristics. Health economics. Nov 2003;12(11):935-947. Gardiner JC, Luo Z, Bradley CJ, Polverejan E, Holmes-Rovner M, Rovner D. Longitudinal assessment of Cost in Health Care Interventions. Health Services &0utcomes Research Methodology. 2002;3:149-168. Rice N, Jones A. Multilevel models and health economics. Health Economics. Nov-Dec 1997;6(6):56l-575. Carey K. Hospital Length of Stay and Cost: A Multilevel Modelling Analysis. Health Services &0utcomes Research Methodology. 2002;3:41-56. Carey K. A multilevel modelling approach to analysis of patient costs under managed care. Health Economics. Jul 2000;9(5):435-446. Goldstein H, Browne W, Rasbash J. Multilevel modelling of medical data. Statistics in medicine. Nov 15 2002;21(21):3291-3315. Jacobsen M. Statistical Analysis of Counting Processes. New York: Springer- Verlag; 1982. Arijas E, Haara P. A marked point process approach to censored failure time data with complicated covariates. Scandinavian Journal of Statistics. 1984;11:193- 209. Andersen PK, Gill RD. Cox's regression model for counting processes: A large sample study. Annals of Statistics. 1982; 10(4): 1 100-1 120. 196 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. Gill RD. Non-and semi parametric maximum likelihood estimators and the von Mises method (Part I). Scandinavian Journal of Statistics. 1989;16:97-128. Etzioni RD, Feuer EJ, Sullivan SD, Lin D, Hu C, Ramsey SD. On the use of survival analysis techniques to estimate medical care costs. Journal of Health Economics. 1999;18:365-380. Huang YJ, Louis TA. Nonparametric estimation of the joint distribution of survival time and mark variables. Biometrika. Dec 1998;85(4):785-798. Vaart AWvd. Asymptotic Statistics. Cambridge University Press. December 1998 1998, 460 pages. Newey WK, McFadden D. Large sample estimation and hypothesis. Handbook of econometrics. 1994;1Vz211 1-2245. Wooldridge J M. Econometric Analysis of C ross Section and Panel Data. Cambridge, MA: MIT Press; 2001. Volpp KG, Williams SV, Waldfogel J, et al. Market reform in New Jersey and the effect on mortality from acute myocardial infarction. Health services research. Apr 2003;38(2):515-533. National Heart Lung and Blood Institute. Facts About Coromary Heart Disease. http://www.nhlbi.nih.gov/health/public/heart/other/chdfactthtm. American Heart Association. Heart Disease and Stroke Statistics -- 2004 Update. Dallas, Texas 2004. Smith SCJ, Dove JT, Jacobs AK, et al. ACC/AHA guidelines of percutaneous coronary interventions (revision of the 1993 PT CA guidelines)--executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (committee to revise the 1993 guidelines for percutaneous transluminal coronary angioplasty). J Am Coll Cardiol. 2001;37(8):2215-2239. American Heart Association. Angioplasty, Percutaneous Translunrinal Coronary (PTCA). http://www.americanheart.org/presenter.jhtml?identifier=4454. Mitka M. Drug-Eluting Stents Show Promise. JAMA. Feb 11, 2004 2004;291(6):682. 197 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. Wisconsin Department of Health and Family Services. Why Charges May Difi‘er Between Facilities: Division of Health Care Financing, Bureau of Health Information; 2001. Health Care Data Report, 2000 (PHC 5320). Charlson ME, Pompei P, Ales KL, Mackenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of Chronic Diseases. 1987;5:373-383. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative data. Journal of Clinical Epidemiology. 1992;45:613-619. Romano PS, Ross LL, Jollis JG. Adapting a clinical comorbidity index for use with ICD-9-CM administrative data: differing perspectives. Journal of Clinical Epidemiology. 1993;46:1075-1079. Ghali WA, Hall RE, Rosen AK, Ash AS, Moskowitz MA. Searching for an improved clinical comorbidity index for use with ICD-9-CM administrative data. J Clin Epidemiol. Mar 1996;49(3):273-278. Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Med Care. Apr 2004;42(4):355-360. Quan H, Parsons GA, Ghali WA. Validity of information on comorbidity derived rom ICD-9-CCM administrative data. Med Care. Aug 2002;40(8):675-685. Taira DA, Seto TB, Siegrist R, Cosgrove R, Berezin R, Cohen DJ. Comparison of analytic approaches for the economic evaluation of new technologies alongside multicenter clinical trials. American Heart Journal. 2003/3 2003;145(3):452-458. White H. A Heteroskedasticity-Consistent Covariance-Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica. 1980;48(4):817-838. Liang KY, Zeger SL. Longitudinal Data-Analysis Using Generalized Linear- Models. Biometrika. Apr 1986;73(1):13-22. Diggle PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data. Oxford, UK: Oxford University Press; 1994. Weintraub WS, Mauldin PD, Becker E, Kosinski AS, King SB, 111. A Comparison of the Costs of and Quality of Life After Coronary Angioplasty or Coronary Surgery for Multivessel Coronary Artery Disease : Results From the Emory 198 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. Angioplasty Versus Surgery Trial (EAST). Circulation. November 15, 1995 1995;92(10):2831-2840. Faxon D. Myocardial Revascularization in 1997: Angioplasty Versus Bypass Surgery. American Family Physician. 1997;56(5): 1409-1420. Hlatky MA, Rogers WJ, Johnstone I, et al. Medical Care Costs and Quality of Life after Randomization to Coronary Angioplasty or Coronary Bypass Surgery. N Engl J Med. January 9, 1997 1997;336(2):92-99. Sculpher MJ, Buxton MJ, Seed P, et al. Health service costs of coronary angioplasty and coronary artery bypass surgery: the Randomised Intervention Treatment of Angina (RITA) trial. The Lancet. 1994/10/1 1994;344(8927):927- 930. Cowper PA, DeLong ER, Peterson ED, et al. Geographic variation in resource use for coronary artery bypass surgery. Medical Care. 1997;35(4):320-333. Kozak L, Hall M, Owings M. National Hospital Discharge Survey: 2000 Annual Summary with detailed diagnosis and procedure data. National Center for Health Statistics. Vital Health Statistics. 2002;13(153). Pokras R, Kozak L, McCarthy E, Graves E. Trends in Hospital Utilization: United States 1965-1986. National Center for Health Statistics, Vital Health Statistics. 1989;13(101). Gillum BS GE, Kozak LJ. Trends in Hospital Utilization: United States, 1988—92. National Center for Health Statistics, Vital Health Statistics. 1996;13(124). Blumenthal D. Controlling Health Care Expenditures. N Engl J Med. March 8, 2001 2001;344(10):766-769. Morris CR. Too much of a good thing? Why health care spending won't make us sick. New York: Century Foundation Press; 2000. Anderson GF, Hussey PS. Population aging: a comparison among industrialized countries. Health Afi(Millwood). May-Jun 2000;19(3):191-203. ' Organisation for Economic Co-operation and Development. OECD Health Data 2004. 2004. 199 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. Anderson GF, Hurst J, Hussey PS, Jee-Hughes M. Health spending and outcomes: Trends in OECD countries, 1960-1998. Health Affairs. May-Jun 2000;19(3):150- 157. World Health Organization. Health systems: improving performance. Geneva 2000. Gardiner J, Hogan A, Holmes-Rovner M, Rovner D, Griffith L, Kupersmith J. Confidence intervals for cost-effectiveness ratios. Medical Decision Making. 1995;15:254-263. ' Gardiner J, Holmes-Rovner M, Rovner D, Goddeeris J, Kupersmith J. Cost- effectiveness ratios in multi-state Markov models. Paper presented at: Biometrics Society, 1996; Richmond, VA. Gardiner J C. The asymptotic distribution of mortality rates in competing risks analyses. Scandinavian Jouranl of Statistics. 1982;9:31-36. Gardiner JC. Properties of occurrence-exposure rates in multiple decrement theory. Journal of Statistical Planning & Inference. 1983;81301-314. Gardiner J C, Huebner M, J etton J, Bradley CJ. Power assessments and confidence intervals for cost-effectiveness analyses. Medical Decision Making. 1998;18(4):465. Gardiner J C, Huebner M, J etton J, Bradley CJ. Power and sample size assessments for tests of hypotheses on cost-effectiveness ratios. Health Economics. 2000. Lin DY. Proportional means regression for censored medical costs. Biometrics. 2000;56:775-778. Willan AR, Lin DY. Incremental net benefit in randomized clinical trials. Statistics in Medicine. Jun 15 2001;20(1 1): 1563-1574. Willan AR. Analysis, sample size, and fewer for estimating incremental net health benefit from clinical trial data. Controlled Clinical Trials. Jun 2001;22(3):228- 237. Perez-Ocon R, Ruiz-Castro JE, Gamiz-Rerez ML. A multivariate model to measure the effect of treatments in survival to breast cancer. Biometrical Journal. 1998;40:703-715. 200 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. Glasziou PP, Simes RJ, Gelber RD. Quality adjusted survival analysis. Statistics in Medicine. 19909: 1259-1276. Antiarrhythmics Versus Implantable Defibrillators (AVID)--rationale, design, and methods. Am J Cardiol. Mar 1 1995;75(7):470-475. Connolly S, Gent M, Roberts R. Canadian Implantable Defibrillator Study (CIDS). A randomized trial of the implantable cardioverter defibrillator against amiodarone. Circulation. 2000;101:1297-1302. Gardiner CJ, Sirbu MC, Rahbar M. Update on statistical power and sample size assessments for cost-effectiveness studies (Review). Expert Rev. Pharmacoeconomics Outcomes Res. 2004;4(1):89-98. Briggs AH, Gray AM. Power and sample size calculations for stochastic cost- effectiveness analysis. Medical Decision Making. 1998;18(supplemental):881- S92. Sacristan J A, Day SJ, Navarro 0, Ramos J, Hernandez JM. Use of confidence intervals and sample size calculations in health economic studies. The Annals of Phannacotherapy. 1995;29:719-725. Laska EM, Meisner M, Siegel C. Power and sample size in cost-effectiveness analysis. Med Decis Making. 1999;19:339-343. Willan AR, Obrien BJ. Confidence intervals for cost-effectiveness ratios: An application of Fieller's theorem. Health Economics. J ul-aug 1996;5(4):297-305. Willan AR, O'Brien BJ. Sample size and power issues in estimating incremental cost-effectiveness ratios from clinical trials data. Health Economics. 1999;8z203- 211. Thompson S. Statistical issues in cost-effectiveness analyses. Stat Methods Med Res. Dec 2002;11(6):453-454. Harrison JM. Brownian motion and stochastic flow systems. New York: John Wiley; 1985. Oskendal B. Stochastic differential equations. New York: Springer-Verlag; 1995. Protter P. Stochastic integration and differential equations. New-York: Springer- Verlag; 1990. 201 G illillllllliililglliliill llllljilllll