{vi-iii

324%,...”

nu’

2. d)
.2 My: .
4 31?. . . .—

 

.
p. 4.. -

vu... .3”

3 ..

.l

m
.
M.
w.

I‘

 

mm

3;. i
C ". wt
7 '1\
(r«)__3t:’l7/L/(J(0
I

This is to certify that the
dissertation entitled

ASSESSING MEDICAL COSTS FROM A LONGITUDINAL
MODEL

presented by
CORINA MIHAELA SIRBU

has been accepted towards fulﬁllment
of the requirements for the

Doctoral degree in Statistics and Probability

 

 

Ma'or Proﬁsor’s Signature

Q ICMM
/

27 2001/

Date

 

/

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

_ LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
y TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJClRC/Dateouepes-ols

 

ASSESSING MEDICAL COSTS FROM A LONGITUDINAL MODEL
By

Corina Mihacla Sirbu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2004

ABSTRACT
ASSESSING MEDICAL COSTS FROM A LONGITUDINAL MODEL
By

Corina Mihacla Sirbu

The United States spends a larger share of its gross domestic product (GDP) on
health care than any other major industrialized country. Expenditures for health care
represent nearly one-seventh of the Nation's GDP, and they continue to be one of the
fastest growing components of the Federal budget. In 1960, for example, health care
expenditures accounted for about 5 percent of the GDP; by 2000, that ﬁgure had grown to
more than 13 percent. Although the rate of growth in health care costs slowed somewhat
in the mid-19908, it has once again started to rise at' a rate that exceeds other sectors of
the economy. Thus, identifying methods to accurate estimate health care costs continues
to be a priority for policymakers and public and private payers.

In medical follow up studies incomplete observation due to censoring would
preclude ascertainment of outcomes in some subjects. Standard assumptions used in
survival analysis do not apply to medical costs because the cumulative cost at the
endpoint of interest will generally be correlated with the cumulative cost at the time of
censoring.

We use a dynamic regression model in which costs are incurred in random
amounts at transition times between and during sojourn in health states. A Markov model

describes the unfolding over time of individual patient event histories, with transition

intensities depending on patient speciﬁc demographic and clinical characteristics through
a multiplicative intensity model. A random effects model is used for transition and
sojourn costs. We then estimate the net present value of expenditures incurred over a
ﬁnite time horizon. While incorporating explanatory variables, the joint model can
accommodate heteroscedasticity, skewness and censoring in cost and health outcome data
and provides a ﬂexible approach to analyses of health care costs and outcomes.

Our transition model can be viewed as an extension of the simpler two state

model, case in which we obtain and revise already developed techniques for regression

'7

analysis of medical costs with the focus being on estimation in the presence of time
censoring that might result in incomplete costs data on some patients. Using the 2000
Nationwide Inpatient Sample data set of Health Care Utilization Project we focus on
estimating costs for patients admitted in the hospital with acute myocardial infarction
(AMI), a common high-mortality condition whose outcomes are affected by the process
of care.

Our methods provide ﬂexible approaches to estimating medical costs. Estimates
from cost studies are not only needed to determine the economic burden of disease, to
predict the economic consequences of new medical interventions, but also for
comparative purposes such as cost-effectiveness analysis. Other possible extensions of

our methods are in this area.

Cepyright by
CORINA MIHAELA SIRBU

2004

To my husband, parents and brother for all his love and support!

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. Joseph C. Gardiner, for having the patience
to allow me to pursue my own interests, with the occasional push when required to keep
me on track. I felt very privileged to have his guidance and support during my graduate
studies. I also want to express my thanks to the other committee members for their
friendly support. My research was supported by the Agency for healthcare Research &
Quality, under Grant 1R01HSO9514.

I thank my family for always believing in me. My parents, Rodica and Ion Radut,
my brother, Radu have continually offered me support and encouragement. I also want to
thank my ﬁ‘iends at Michigan State University for always helping me to turn worries into
laughs.

Last and most of all, I thank my dear husband, George. This last year was trying
for both of us, but his unwavering support and encouragement certainly made my

struggles more bearable and I am forever grateﬁrl to him.

vi

TABLE OF CONTENTS

LIST OF TABLES .....................................................................................................
LIST OF FIGURES ..... ' ..............................................................................................

ABBREVIATIONS ...................................................................................................

INTRODUCTION .....................................................................................................

CHAPTER 1
ESTIMATING MEDICAL COSTS FROM A TRANSITION MODEL ..................
1.1 A Markov model for describing patient health histories ............................

1.1.1 Marked point processes ...................................................................
1 .1 .2 Incorporating censoring ...................................................................
1.2 Incorporating costs in the Markov model ...................................................
1.3 Estimation of transition probabilities ..........................................................
1.4 Estimation of average transition costs ........................................................
1.4.1 Modiﬁcation for censoring ..............................................................
1.4.2 Consistency and asymptotic normality of ,3“, .................................
1.4.3 Computation of AW and Bw ............................................................
1.4.4 Estimation of G(- | z) ......................................................................
1.5 Asymptotic distribution of the mean transition cost ...................................
CHAPTER 2

ESTIMATING MEAN COST FOR THE TWO-STATE CASE ..............................
2.1 Introduction and Background .....................................................................

2.2 Applying general transition model methods to the 2-state model ...............
2.2.1 Mean cost in the minimal cost data case .........................................
2.2.2 Mean cost in the interval cost data case ..........................................
2.3 Asymptotic normality of the mean cost ......................................................
2.3.1 Regression model for log-cost ........................................................

2.4 Parametric estimation of the survival distribution for censoring time .........

vii

ix

xi

13
14

18
20
22
31
51
53
56
57
58
61

78
79
86
87

97
101
110
114

2.4.1 Minimal cost data ............................................................................ 114

2.4.2 Interval cost data ............................................................................. 121
CHAPTER 3 130
ESTIMATING HOSPITAL COST FOR AMI PATIENTS IN THE N18 2000 ........

3.1 Description of the data set and background ................................................. 130
3.2 Creating a working subset data set ............................................................... 144
3.3 Estimation methods ...................................................................................... 148
3.3.1 Unconditional means model .............................................................. 149
3.3.2 Including effects of hospital level (level 2) correlates ....................... 150
3.3.3 Including effects of discharge level (level 1) correlates .................... 151
3.3.4 Including both effects of discharge level (level 1) and hospital
1 53
level (level 2) correlates .....................................................................
3.3.5 Inverse probability weighting ............................................................ 155
3.4 Results of estimation ..................................................................................... 159
CHAPTER 4
CONCLUSIONS AND FUTURE WORK ................................................................ 176
APPENDIX
Mathematical background .......................................................................................... 1 84
BIBLIOGRAPHY ...................................................................................................... 192

viii

Table 2.1

Table 3.1

Table 3.2

Table 3.3

Table 3.4

LIST OF TABLES

Estimating the survival distribution for censoring time ............................... 92
Primary procedures .................................................................................... 164
Descriptive statistics (N =58469) ............................................................... 166
Parameter estimates in multilevel regression ............................................. 170

Estimates of total charges at median LOS by procedure types, comorbidity

levels (0, 1+) and regions. .......................................................................... 173

ix

Figure 1.1
Figure 3.1
Figure 3.2
Figure 3.3

Figure 3.4

Figure 3.5

Figure 3.6

Figure 3.7

LIST OF FIGURES

Transition diagram for a multi-state Markov model ..................................... 16
Creating a working subset data set .............................................................. 165
Histogram of Log (TOTCHG) ..................................................................... 167
Graph of Log (TOTCHG) versus LOS ........................................................ 168

Estimation of G: Graphs of the Kaplan—Meier estimates of the cumulative
hazard versus the regression model estimates of the cumulative hazard from
both log-normal and gamma models ........................................................... 169
Estimation of S: Graphs of the Kaplan-Meier estimates of the cumulative

hazard versus the regression model estimates of the cumulative hazard from

both log-logistic and gamma models .......................................................... 172
Estimates of total charges at median LOS — no comorbidities .................... 174
Estimates of total charges at median LOS — at least one comorbidity ........ 175

CEA
CER
NHC
NHB
NPV
QALY
LE
CABG
PTCA
CATH
LOS
TOTCHG
AMI
DRG
HMO
NIS
HCUP
AHRQ
CCS
CCI
1DC-9-CM

OECD
WHO
iid
CLT

El

ABBREVIATIONS

-Cost-Effectiveness Analysis

-Cost-Effectiveness Ratio

-Net Health Cost

-Net Health Beneﬁt

-Net Present Value

-Quality Adjusted Life Years

-Life Expectancy

-Coronary Artery Bypass Grafting

-Percutanerous Transluminal Coronary Angioplasty
-Cardiac Catheterization

-Length of Stay

-Total Charges

-Acute Myocardial Infarction

-Diagnosis Related Group

-Health Maintenance Organization

-Nationwide Inpatient Sample data

-Health Care Utilization Project

-Agency for Healthcare, Research and Quality
-Clinical Classiﬁcations Software

-Charlson Comorbidity Index.

-Intemational Classiﬁcation of Diseases 9th Revision, Clinical
Modiﬁcation

~Organization for Economic Co-operation and Development
-World Health Organization

-Independent Identically Distributed

-Central Limit Theorem

-End of Statement

xi

INTRODUCTION

Economic evaluations of health care interventions are increasingly important in an
era of constrained health care budgets. As policymakers seek to prioritize health care
expenditures, an accurate assessment of costs and health beneﬁts of competing
interventions and treatments is critical in informing resource allocation decisions in
health care. A recent report1 from the Office of the Actuary at the Center for Medicare
and Medicaid Services projects that the national health expenditures would reach $3.4
trillion in 2013, growing at an average annual rate of 7.3 percent during the forecast
period 2002-2013. As a share of gross domestic product (GDP), health care spending is
projected to reach 18.4 percent by 2013, up from its 2002 level of 14.9 percent. This
demands extraordinary restructuring of the organization and ﬁnancing of US health
services.

Over the past decade there has been an explosion of research on methodology for
economic evaluations in health care. With increasing availability of large databases on
patient outcomes and costs, statistical methods for comparing outcomes with costs need
to be developed. The field is young, however, and there are many important and
challenging problems that remain unresolved.

In this dissertation we will address several statistical issues with analysis of
medical cost data. We adopt a longitudinal framework in which costs incurred as the

individual level are random quantities associated with events that occur as an individual’s

health history unfolds over time. A Markov process is used to describe the dynamics of
movement of an individual through different health states. Costs are incurred at transition
times, and in sojourn in health states. Total expenditures over a finite period of time is
then defined as an ‘expected value’ called a net present value (NPV). Because individual
characteristics such as demographics (age, gender, race) and clinical factors (treatments,
comorbidities) can inﬂuence NPV, we incorporate covariate effects into our model for
estimation of NPV.

Our longitudinal framework provides a natural setting for estimating medical
costs. We will demonstrate how some recent approaches to analysis of costsz'll can be

subsumed into this framework.

Importance of Cost-Effectiveness Analysis

In addition to evidence of clinical effectiveness of treatments, evidence of their
cost-effectiveness has become an important consideration as policy-makers world-wide
face decisions in allocating resources for health care services. In Australia, the
Pharmaceutical Beneﬁts Advisory Committee makes recommendations, based on
effectiveness and cost-effectiveness evidence, on drug products that should be subsidized
and placed in the Pharmaceutical Beneﬁts Scheme”. The National Institute of Clinical
Excellence13 in the UK makes similar requirements for use of new healthcare
technologies in the National Health Service, and in Ontario, Canada, the Drug Benefits
Plan uses economic data when supporting new additions to its formulary”. The
Phenomenal penetration of HMOs into the US health care market has heightened

aWar-eness of cost-effectiveness among providers and consumers of healthcare services.

The US Preventive Services Task Force and the Panel of Cost-Effectiveness in Health
and Medicine have urged consideration of cost-effectiveness in addition to clinical

effectiveness to help inform investment of health care dollars15 .

Measures used in cost-effectiveness analysis

Cost-eﬂectiveness analysis (CEA) has been promoted as a useful tool in the effort
to prioritize expenditure on health care programs15 . By quantifying the trade-offs between
resources that need to be deployed and health beneﬁts that accrue from use of alternative
interventions, CEA offers guidance in decision-making by structuring comparisons
between these interventions. A cost-identiﬁcation analysis is often conducted for
treatments and procedures that are believed to be equivalent in their clinical efﬁcacy. For
example, if two competing programs do not differ on average in their health beneﬁts,
then the one with the lower average cost will be preferred. On the other hand, if the costs
of two programs are judged equivalent, the intervention with the greater health beneﬁt
will be preferred. An intervention that delivers higher beneﬁt at lower cost than its
competitor is said to be dominant. A decision has to be made when one program has both
higher cost and greater beneﬁt than does its competitor. Is there a critical value below
which society would consider the more costly intervention still “cost-effective”? In this
situation, the cost-effectiveness ratio (CER) becomes a useful summary statistic for
ranking competing interventions. It is the ratio of the incremental cost relative to the
incremental beneﬁt. With costs measured in dollars and health beneﬁts measured in their
natural units such as life expectancy, number of lives saved, or preferably quality-

adjusted life years (QALYs), the CER is stated in dollars per unit of effectiveness. In

 

 

CEAs conducted with a societal perspective that accounts for all costs of the
interventions, whether borne by the recipient of care, the provider or the insurer, the
critical value of a CER is the upper limit of what society is willing to pay for an
additional unit of health benefit.

Other summary statistics used in CEA are the net health beneﬁt (NHB) or net
health cost (NHC)”’. Suppose the incremental health beneﬁt is monetized using a value
for each additional unit of health beneﬁt. This could be the upper limit of the CER as
judged by what society is willing to pay for adopting the competing intervention. The
NHC, expressed in dollars, is the difference between the incremental cost and the
monetized incremental beneﬁt. The net health beneﬁt can be defined in an entirely
analogous manner, and would be expressed in units of effectiveness. Many researchers
have pointed out that the CER has undesirable properties that make its use in decision

making problematic.

Analysis of medical costs

Incomplete data are likely to arise in longitudinal studies, because patient follow
up will not be complete in all subjects. In survival analysis, censoring occurs when the
time to event variable T is not observed in some individuals because a censoring event
occurs ﬁrst at time U, that is U < T. Most survival analysis models assume T, U ‘
independent, or, when covariates z are present, that conditionally on 2, T and U are
independent. This is the usual random censorship model. With accumulating costs this

assumption is untenable. If y(t) is the accumulated cost up to time t, then y(T) and

y(U ) are generally correlated. Therefore analyzing costs by traditional survival analysis
techniques is not possible.

In this dissertation, we first consider the situation in which a single cost variable y
is observed together with covariate information 2 in a sample of subjects. In our general
framework costs can potentially be accrued over a ﬁxed time period [0, 2'] with

expenditure terminating at some event time T so that complete cost observatiOn occurs if
a patient is followed through time T’ = rnin(T, 7). Suppose y(t) is a right-continuous
process that represents the cumulative cost up to time t (including time t) for a typical
patient in the population under study. If lifetime cost is of interest then Tdenotes survival

time. Since costs do not accumulate after T, y(t) = y(T) for all t 2 T. The cumulative

cost y(r) at time 2' is the principal random variable of interest, so inference focuses on

the mean cost, ,u = E ( y(T)) = E ( y(T' )). With lifetime medical cost, the cumulative cost
is y(T) and estimating the average E ( y(T)) is of interest.

Because of possible censoring at time U, y(T) is not observed if T > U . If this is
the case we observe y(U). A simple sample average of the observed costs in the patient

sample would underestimate the true expected medical cost for the treatment under study.
Also using the average in the sub—sample of patients with complete costs would be
inefﬁcient.

Even if complete costs were available, standard regression techniques for
assessing the inﬂuence of covariates on costs can not be directly applied. Cost data are
often very skewed, usually to the right. They also exhibit considerable heterogeneity
across patients. Standard assumptions used in ordinary least squares (OLS) for example

can not be applied. To mitigate the effects of skewness, the log—transformation of costs

might be considered. However, this too has adherents and non-adherents as explained in
Manning (2001)”. Even if a transformation were feasible, a retransforrnation would be
needed to obtain estimates of mean costs (and other statistics) across specified covariate
proﬁles: retransforrnation itself presents some methodological challenges18'2'.

Extreme form of skewness occurs in cost data when proportion of subjects in the
sample have zero costs. For example if we examine costs of office visits to a doctor or
other health professional in the year 2000, 20% of adults 18 years of age and over did not
make any ofﬁce visit22 and therefore incurred zero expense. This creates a 2-part
distribution for costs, one part for the sub-population with an expense, and the second
part for those without. These groups differ considerably in their demographic

characteristics and medical history. In a two-part model, one has a model for the

likelihood of expense, for example, a probit or logit model for P( y > 0 | z) , where z is a
vector of covariates, and then a second model for E ( g( y)| y > 0,z), where g is a

transformation, such as the logarithmic. Debate continues on the proper analysis of the
two-part models and comparisons to other models such as the Heckman model and
sample selection modelsB’ 24.
Cox regression has been the mainstay for analysis of censored time to event data.
However using this method directly with costs is not possible. As noted earlier, cost at
censoring time and cost at event time are correlated. In this dissertation we maintain the
traditional use of Cox regression for time to event analysis. It is used to model covariate

effects on the transition intensities as patients move from one health state to another. We

then combine this with a linear mixed effects model for costs (incurred at transition times

or sojourn in states) conditional on event times. Finally we derive estimators of NPV

given a covariate proﬁle and develop the asymptotic theory of these estimators.

A transition model for analysis of medical costs: Outline of Dissertation

When an intervention is deployed costs are incurred in random amounts at
random points in time. Typically these costs are associated with health states that a
patient might visit in the course of the intervention, and the different lengths of time spent
in each state. The probabilistic mechanism that governs transition between these states
and the distribution sojourn times in health states vary at the individual level depending
on patient speciﬁc demographic and clinical characteristics. This thesis concerns the
development of new statistical methodologies for estimating medical costs with censored
data in both this multiple states setting and the two-state case.

In Chapter 1 we describe the evolution of a patient’s health as the unfolding in
time of a ﬁnite state stochastic process. A non-homogeneous Markov

process X = { X(t) :IE 7'} with ﬁnite state space E = {l,2,...,k} , provides a natural
setting to describe the probabilistic mechanisms that govern transitions between states,
where X (t) is the patient status or health state occupied at time t e T = [0,2'], and 1's 00.

Transition probabilities are denoted by PM (s,t) and transition intensities by ah]- (I). The

state space of X typically consists of several transient states, such as “well”, “recovery”,

“relapse” and one or more absorbing states such as “dead” or “disabled”. Over the follow

up period the typical patient would transit to other health states, X I = X (T1),

X 2 = X (T2 ),... at random times T1 ,T2 and these transition times and health states

describe the event history of each patient. If observation of X is ceased after some random
time U, independent of X, then we will need to account for censoring accordingly.

The survival model is an example of a two-state process with a single transient
state “alive” and a single absorbing state “dead” with survival time T = T, and T, = 00 for
n 2 2. The multi-state analog of survival time is the time to absorption in state k given
byz'k =inf{t >0:X(t)=k}.

Having described the evolution of a patient history by the ﬁnite state space non-
homogeneous Markov process X, we now consider two types of costs that might be

incurred in the course of follow up, costs at transition between health states, and costs of

sojoums in a health state. Incorporating costs in the model enlarges the usual 0' -ﬁeld used
in the multiple states survival theory, namely ff =a{ (Tn , X n ) :0 _<. T, S t A U } , by adding

cost information. Under specified assumptions the martingale theory still obtains and the
compensators remain unchanged.

The estimation of PM (s,t | Z0) from a Cox regression model (multiplicative

intensity model) for the a,,,- (t | 20 ), has been very well developed by Andersen et al

(1993)25. Numerous applications of this method are published regularly in the medical
and epidemiologic literature.

To analyze costs incurred at transition times, we adopt a mixed model approach.

If 7}, ,7}2 ,...,7},,_ denotes the observed sequence of n, transition times in the ith individual
and Y, = ( y” , y,2 ym )' the associated vector of costs (or transformed costs), then the
random-effects model Y,- = X,,B + Z,v, +u, is the basis for estimation of ,6. Here the

covariate matrix X, will include terms for the times 7}, ,T,2,...,7},,, , individual patient

characteristics, and the matrix Z,- will include a subset of these factors, most likely
variables for modeling the effect of transition times such as 7},- and Ti]? . The unobserved
heterogeneity is the vector v,, inducing dependence among the ya. '5 and u,- is the

residual error. We will derive the NPV for all transition costs in the interval [0, 21, in the

form NPV(Z0)= Z Ee-"chj(t|ZO)a'hj-(tIZO)P,h(0,t|Zo)dt conditional on the initial
h¢j

state X 0 =i and a specified covariate profile Z0, where r denotes the discount rate. Here

ch, (t I Z0 ) is the expected cost incurred at time t if the transition h—-> j occurred and it can

be obtained from the components of E (Y, |X, ) . Similarly we can deﬁne NPV for all

sojourn costs. The mathematical form for NPV depends on the underlying transition
probabilities and intensities.

Following Andersen et al (1993)25 we account for heterogeneity across patients
by semiparametric modeling of the transition intensities of the process through patient-
speciﬁc covariates. We estimate transition probabilities using a Cox regression model.
We combine the two parts to form an estimate of the mean present value of all
expenditures and use the inverse-probability of censoring-weighted (IPCW) technique to
account for censored observations. The estimators are obtained conditional on an initial
state and given a covariate profile. By applying the delta-method to functionals that arise
in the estimation of the NPV, we obtain large sample properties of these estimators.

This approach is new and builds upon a similar idea used by Praestgaard (1991)26
to estimate actuarial values in life insurance. In that context the beneﬁt (cost) is ﬁxed and
the stochastic elements are the sojoums in policy states or transitions between policy

states. The ‘cost’ at transitions (called assurances in the life-insurance literature) are ﬁxed

by the terms of the policy. Also the ‘unit cost Of sojourn’ (called annuity payments) are
also ﬁxedzng. In our context these quantities are no longer fixed. With longitudinal data
these costs are observed and vary across patients.

In Chapter 2, we Show that our transition model described in Chapter 1 also
captures costs under the simpler two state survival model with a single transition time and
sojourn. In this case several investigators have developed techniques for regression
analysis of medical costs with the focus being on estimation in the presence of time
censoring that might result in incomplete cost data on some patients. Our transition model
can be therefore viewed as an extension of this methodology to multiple transition times
and sojoums. If we specialize our multiple transition model methods from Chapter 1 to a
two state model with patients starting in state ‘0’ (alive) and followed until they reach a
terminal state ‘1’ (death) at time T , the total cost for a patient can be interpreted as a
sojourn cost that ends at time T or 2' whichever occurs ﬁrst or as a transition cost at time

T if the patient dies in the interval of time [0,2'].

Current methods for estimation of the population mean cost are both
nonparametric and semi-parametric. The key references are Lin et al (1997, 2000,
2003)“'5’3°, Bang and Tsiatis (2000)“, Strawderman (2000)”, Willan et al (2002, 2003 W,
Wooldridge (2002, 2003) 33"“. Semi-parametric models would assume a special
parametric form for the distribution of cost. For a single cost, Zhou el al (2000)35
primarily uses a log-normal regression model to assess covariate effects on mean costs.
He also discusses approaches to deal with heteroscedasticity, skewness, censoring and

zero costs, all in the context of a parametric model354'.

10

Our approach uses the same inverse-probability of censoring-weighted (IPCW)
technique to derive consistent and asymptotically normal estimators of regression
parameters and for the net present values. We discuss parametric methods for estimating
the survival distribution for censoring time, and therefore the weights in the IPCW
technique, as well as methods of estimation for the survival distribution for event time.

Chapter 3 focuses on applications to real data: We use the inpatient utilization
data from the Nationwide Inpatient Sample (NIS) of Health Care and Utilization Project
(HCUP), a database of all hospital inpatient stays drawn from a stratified sample of
approximately 1,000 community hospitals in the US. For 2000, the NIS contains over 7.4
million discharges from 28 states. Total charge and length of stay (LOS) are the main
healthcare utilization variables for each hospital stay. There is a growing literature on use
of the NIS in health services research that we use for guidance 4247. Following our
experience with analyzing charges and LOS of acute myocardial infarction (AMI)

48’50, we will focus on patients admitted in the hospital with

patients in the MICH study
AMI that have undergone either no procedures or Coronary Artery Bypass Grafting
(CABG), Cardiac Catheterization (CATH) or Percutaneous Transluminal Coronary
Angioplasty (PT CA) as a primary procedure.

Hospital characteristics, such as quality of service or managerial performance
may impose distinct effects on the costs of treating patients. Rice et al (1997)“, Carey
(2000, 2002) 52’ 53 and Goldstein (2002)54 insist on the usefulness of multilevel methods
in studies where data on cost are collected over multiple sites (hospitals in our data). In

such circumstances it can be expected that hospitals may have an impact on the cost

regardless of treatment the patient receives. The inclusion of hospital as a level in a

11

multilevel analysis will ensure that the clustering effects within hospitals will be
adequately controlled for. Traditional estimation procedures such as OLS, which is used
for example in multiple regression, are inapplicable because of the existence of a non-
zero intra-hospital correlation, resulting from the presence of more than one residual term
in the model.

We use a multilevel modeling technology to estimate costs for patients diagnosed
with acute myocardial infarction as they relate to both patient and hospital level
characteristics. Patients transferred to a short-term hospital, as well as other transfers,
including skilled nursing facilities (SNF), intermediate care, home health care have
incomplete total charges and length of stay so they will be treated as censored. In our
working data set 32% of the discharges are censored. Selection probabilities from the
censored sample of event times are estimated using parametric estimators of the
censoring distribution. We then estimate total charges at the median LOS for specific
covariate proﬁles. Our method accounts for the existence of a non-zero intra-hospital
correlation, censoring and skewness of total charges.

In the concluding Chapter 4 we outline some extensions of our work particularly
to estimation of summary measures in CEA such as the CER, net health beneﬁt (NHB) or
cost (NHC), net present value (NPV), life-expectancy (LE) and quality-adjusted life years

(QALY)-

12

CHAPTER 1

ESTIMATING MEDICAL COSTS FROM A

TRANSITION MODEL

Economic evaluations of health care interventions are increasingly important in an
era of constrained health care budgets. As policymakers seek to prioritize health care
expenditures, an accurate assessment of costs and health beneﬁts of competing
interventions and treatments is critical in informing resource allocation decisions in
health care.

A multi-state model is defined as a model for a stochastic-process, which at any
time occupies one of a set of discrete states. The states can describe conditions like
healthy, diseased, diseased with complications and dead. When an intervention is
deployed costs are incurred in random amounts at random points in time. Typically these
costs are associated with health states that a patient might visit in the course of the
intervention, and the different lengths of time spent in each state. The probabilistic
mechanism that governs transition between these states and the distribution sojourn times
in health states vary at the individual level depending on patient specific demographic

and clinical characteristics.

13

The main themes of this chapter are arranged as follows. In section 1.1 we
provide a description of patient history as the unfolding in time of a ﬁnite state stochastic
process. A non-homogeneous Markov process describes the probabilistic mechanisms
that govern transitions between states. Following Andersen et al (1993)25 we account for
heterogeneity across patients by semiparametric modeling of the transition intensities of
the process through patient-speciﬁc covariates. We then describe how costs are
incorporated into this framework. We consider two types of costs, costs while sojouming
in a health state and costs incurred at transitions between health states. Net present values
of all expenditures incurred over a ﬁnite time horizon are then deﬁned and have
mathematical forms that depend on the underlying transition probabilities and intensities.

We will write each term of these mathematical forms as sum of independent and

identically distributed variables. Conditional on the initial state i, given the vector Z0 of

basic covariates, we will assess the asymptotic normality of the net present value of all

expenditures associated with the h to j transitions in (0,t] , i.e. the asymptotic distribution
of

n“2 Nﬁv‘.”(t i,Z )—NPV‘.”(: i,Z )
h} 0 h} 0

using the Functional Delta Method.

1.1 A Markov model for describing patient health histories

Let (S2,f’,P) a probability space and let {X(t), re ’1'} with T =[0,T].. a non-

homogeneous continuous time Markov process with ﬁnite state space E = {1,2,...,k} ,

14

having transition probabilities Phj (s,t) and transition intensities ah}. (I). This Markov

process describes the evolution of one patient’s health history, with X (t) the patient

health state occupied at time t. Typically E consists of several transient states, such as

“well”, “ill”, “recovery”, “relapse”, and one or more absorbing states such as “disabled”

or “dead”. Let a = (ah, ),h, j e {l,2,...,k} be the matrix of these transition intensities,

ark-(t): lim P[X(t+At)=j|X(t) =h]/Ar, j¢h
’ Atio

and ahh = -Z ah,- . Thus, starting from the time of entry into state h, the sojourn times in
jth

the given state h are continuously distributed, with hazard rate function —ahh. Given that

the process jumps out of state h at time t, it jumps into state j at h with probability
ah] /‘ahh -

Let Ah,- (I) = Jgahj(s)ds and AM, =—Z Ah,- . For h #5 j the function Ah]. is called
jath

the integrated intensity function for transitions from state h to state j, whereas A“, is
called the negative integrated intensity function for transitions out of state h. The matrix
A = (Ah, ,h, j E {1, 2,...,k }) is also called the intensity measure of the Markov process X.
Hereafter integrated intensity functions Ah,- ,h, j 6 {1, 2,...,k } are supposed continuous,

unless otherwise mentioned. Let

P(s,t)= H(I+dA) for s<I,s,tE T.
(3.1]

The matrix P(s,t) = (BU-(s,t),hje {l,2,...,k})is the kxk transition matrix of the

Markov process.

15

FIGURE 1.1: Transition diagram for a multi-state Markov model

 

To incorporate heterogeneity between patients we let the transition intensities

depend on a covariate vector z(t) through a Cox regression model

ah,- (I l z(t)) = am (t)exp(ﬂ’z,.,- (2))
with one vector of regression coefficients ,6 = (ﬂ, , ,62 ﬂp) and type-specific covariate
2,, (t) = (zhj, (I), zhj2 (t),. . . , zhjp (t)) computed from the vector z(t) of basic covariates,
consistent with the results of Chapter VII in Andersen et a1”. Technically, 2,1. (t) are

everywhere assumed to be predictable and locally bounded, but precise assumptions will

16

be presented later in Section 1.3 of this chapter. Let Arno (t) = Eahj0(s)ds be the

integrated baseline intensity for transitions from state It to state j and Allho = —Z Arno-
jth

Associated with X is a counting process N ,y- (t) which denotes the number of direct

transitions from state h to j in the time interval [0, t],

NhJ-(t)=#{sSt:X(s—)=h,X(s)= j} ,hct j.
The cumulative information revealed up to time t is the sigma-algebra .7; generated by
X(0) and {Nhj (s),s S t,h #5 j,h,je E}. Introduce the indicator function
Yh (t) = [X (t-) = h] to denote whether the process is in state h just prior to time t. Then
with respect to the ﬁltration {f7 :1 _>_ 0} : t2 0}, the multivariate counting process

11¢ j,h,jeE},

N = { Nhj , h ¢ j} has random intensity process I lily .

where 2th,. (I) = ah]. (t)Y,, (t). Moreover,

Mk,- (I) = N,, (I)- Er, (um, (u)du, h it j,h,je E

are zero mean local square integrable martingales. These results were first proved by

Jacobsen (1982)55 and are summarized in Theorem 11.6.8, p94, Andersen et al. (1993)”.

Using the continuity ofAhj ,h, je {1,2,...,k} , it can be shown that (th ,Mé,) = O for all

pairs (h, j) and (q, r) with (h, j) ¢ (q, r). Here we denote by (M ,M ') the predictable

covariation process of M and M '. In our absolutely continuous case with transition

intensities ah,- and Ah,- (t) = Law- (s)ds , we say that the multivariate counting process N

has intensity xi = (2h,- ,h :6 j), with 2,, (t) = Y, (0a,,- (t).

17

1.1.1 Marked point processes

Over the follow up period the subject transits to other health states, X, , X 2 ,...at
random times T, ,T2 and these transition or epoch times {Tn :n 2 0} and health states
{X n :n 2 0} describe the event history of each patient. Formally, these are deﬁned in
terms Of theforward recurrence time W(t) = inf{s > 0 : X (t + s) at X (1)} , which is the
waiting time from t until the next transition out of the state X (t). Having
set X() = X(0) ,To = 0, W(oo) z 00, we deﬁne for all n 2 0:

TM, =Tn +W(T,,) with X“, = X(T,,+,)if Tn+1 <00 and X“, = X" if Tn+1 =oo.
The sojourn at the nth transition is W(T,, ) in state X n . The survival mode] is an example
of a two-state process with a single transient state “alive” and a single absorbing state
“dead” with survival time T = T, and T, = 00 for n 2 2. The multi-state analog of survival
time is the time to absorption in state k given by 2', = inf{t > 0 : X (t) = k}.

Suppose that R, is a general mark corresponding to T, and we deﬁne the marked

point process

N(t,A) =Z[r,, $2.12,, 6 A]

n21

where A is a subset in the range of the R". For our case we could think of

R

n

= (Xn_, ,Xn) with values in R={(h,j) : h,je [5,]: ¢ j}. Our previously described
counting process Nhj (t) can be identified with N (t,A) by taking A = (h, j}. The natural

ﬁltration f; is generated by{N(s,A):OSsSt,ACR}.

l8

We also have56
£=0{N(s,A):0_<.sSt,ACR}=0'{(T,,,X,,):0$Tn St}.
We can endow R with an appropriate 0 -ﬁeld and regard A -9 N (t, A) for each t as a

random measure, and t —> N (t, A) as a counting process for each A. { f; ,t 2 0 } is a

right-continuous ﬁltration and £0 = 0'(U f; ) , where if, = 0'( X (0), z).

:20

For a stopping time T with respect to.7-',' ,t 2 O we deﬁne
f}={BE foo:Br\[TSt]E f; forall t}
fT_={Bn[T>t]:120, Be .5}
We have f;_= ft, for all Tn_l <t S T, and
.FT"=0'{(T,,,X,():ISkSn},
f}r=0{(Tk,X,):l_<.kSn—1;Tn}.
We have also 1’, (t) = [X n_, = h] for IE (Tn-l an ]. Then with respect to the ﬁltration
{szt _>.0}the compensator Ah,- (I) of the process Nhj (I) is given by

P[V,, e du,x,, =j|J-‘Tn
P[Vn 2 u | fr“ ]

 

A,,(dt) = -' ][X,,_, 2h] on

where V” = T, —Tn_, and u = t —T,,_, is the duration of the current sojourn at time t.

Our Markov assumption gives

Ahj(dt) =[Xn_, =h]P[T,, e [t,t+dr),X,, = j|Tn_,,X,,_,]=[X,,_, =h]a,,j(t)dt,

for IE (Tm,l ,Tn ].

l9

1.1.2 Incorporating censoring

If observation of X is ceased after some random time U, independent of X, we will

need to replace N h,- (t)by the censored process
N§j(t)=#{sSt/\U:X(s—)=h,X(S)=j}ih¢jand

Yh(t) by Y,f(t)=[X(t—)=h, U 2t].
Then, with respect to an expanded ﬁltration the aforementioned martingale

property still obtains. For the ith patient we observe a basic covariate z,- (t) , an initial

state X,- (0) , the state indicator Y,,‘, (t) = [X,- (t—) = h,U, Z t] and Ni],- (t) , the number of

n
event transition times before I from state h to j. Let Y,f(t) = Zth, (t) the number of
i=1

subjects who were not censored at time t and just before I were in state h
n

and N5, (t) = ENE}, (z).
i=1

In terms of marked-point processes we deﬁne the marked point process

N" (t,A) = EU" 5 t A U,R,2 e A] where A is a subset in the range of the R". Our

n21

previously described counting process N ,f, (t) can be identiﬁed with N c (t, A) by
takingA = {h,j} . The natural ﬁltration f,“ is ff=a{N‘ (s,A) :03 s S t,A C R}
=0{(T,,,X,,):OST,, SIAU}.

With respect to the ﬁltration { f :t Z 0} the compensator A2,. (I) of the

multivariate counting process N ,f, (t) is given by

20

C P[Vn e du,X,, = j,T,,_, 5U lff_,]
hj(dt)= c " [X,.-i =h]
P[Vn zu,T,,_, sU|fTHI

 

on IE (Tn_,,Tn ] . Equivalently, the processes ng deﬁned by ME, = N5, - z,- are

martingales.
We assume that the censoring variable, U. is independent of everything else in the

model. Then

P[v,, e du,x,, =j,T,,_l SU If; 1
C _ n-l
P[V,, zu,T,,_, sum-3,4]

 

_ C“Tn—l )P[Vn E du’xn =j|Tn-1’Xn-1]
G(T,,_, )P[Vn 2 u ITn_, , Xn_,]

 

[x,,_l =h]=A,,j(dt)

and N ,f, (t) has the same compensator as Nhj (t) . Therefore conform to the Deﬁnition

111.21 in Andersen et al. (1993)25, the right-censoring of the process N generated by U is
independent.

In the sequel we will assume that censoring has been accommodated in this way.
Next we will incorporate costs in the model. the assumption of independence of costs of
everything else in the model is often violated, for example the longer the length of stay in
the hospital, the higher the costs. We will incorporate censoring without assuming

independence.

21

1.2 Incorporating costs in the Markov model

As previously mentioned, we consider two types of costs that might be incurred in
the course of follow-up: costs at transition between health states and costs of sojoums in
a particular health state.

Suppose an amount Ch,- (t) is incurred just after time t if a transition h to j takes

place at time t. The present value of expenditures in (0,t] associated with these

transitions is
t —rs
c},}’(t)=j0e c,,(s)d1v,,,(s).

where r is the discount rate. In economic studies expenditures to be incurred in the future
are discounted to present value. A dollar spent now is worth more than a dollar that
would be spent later. The discount rates used for the US have usually been between 3%
and 5% per year, reﬂecting the rates on savings accounts or certiﬁcates of deposit.

Let Z0 be a given ﬁxed vector of basic covariates. The p-dimensional vectors
Zhj0 of type-speciﬁc covariates are computed from the vector 20, reﬂecting that some of

these basic covariates may affect the different transition intensities differently.

Conditional on the initial state, given the vector Z0 of basic covariates with the

corresponding type-speciﬁc covariates Zth' the mean of this present value is:
. ’ -rs .
vaé” (t | 1320) = 15(C,§,‘.’(t)|xO = 1,20) = E( Le Chj(s)th,-(s)| x0 =i,zo).

We assume that:

22

A1 C h,- (.) are bounded, non-negative processes over 7 , adapted to { ﬂzte T },
left continuous with right hand limits (so Ch, (.) are bounded, predictable processes).

A2 For te (0,2'] ,

E(Chj (t)| X0 = i, X(t-) = h,iZ0) = E(Chj (t) | X(t—) = h,ZO ) ,
so that at any t > O the expected transition costs do not depend on the initial health state.

It is known that if N is a counting process with intensity process A , M = N — J1
and H is locally bounded and predictable, then M and IHdM are local square integrable

martingales, with E (M ) = E ( IHdM) = 0 (see Proposition 11.4.1, p70, Andersen et al
(1993)”). Then, by assumption A1,
(I) - I -rs -
NPV,, (t | 1.20) = E( [De C,, (sth (s)ds| x0 = 1,20) =
I —rs -
= E( [0e Chj(s)Y,, (nah, (s)dsIXo =l,Z0).
By Fubini’s Theorem:
’ -rs -
NPV,f,”(t|i,ZO)=Le E(Ch,(s)r,,(s)|X(0)=t,zo)a,,,(s)ds.
We can write

E(C,,(s)Y,,(s)| XO =i,Z0) = E(Chj(s)|X0 =i,X(s—) = lz,Zo)P(X(s—) =h| XO =i,ZO).

By assumption A2, NPV,,(jl) (t | i ,Z0 ) has the form

NPV,f,”(I|i,ZO)= Ee"-‘ch,(s|zo)13,,(0,s|zo)a,,,(s)ds, (1.1)

where ch, (s | Z0) = E(Chj (s)| X(s—) = h,Z0).

We now turn to the cost of sojoums in a health state. Suppose that the cost in state

h is incurred at the rate B, (u) at time u. The observed rate is zero at time it whenever,

just before u, the patient is not in state it anymore, so [X (u—) = h] = 0. Then the observed

present value of all expenditures in state h, started at time s and ended after the duration

time d is given by
(2) 5+1! _m
c, (s,d)=L e B,(u)Y,(u)du,

where r is the discount rate and Y, (u) = [X(u—) = h].
Conditional on the initial state, given the vector Z0 of basic covariates, the mean
Of this present value is
NPV,”)(s,d|i,zo) = I:(C,‘,2)(s,d)|X0 =i,ZO)=
= fide*"’E(B,(u)Y,(u) | X0 =i,Z0)du.

Conditions similar to A1 and A2 are assumed for B, (.) :

A3 B, (.) are bounded, non-negative real stochastic processes over[0, 2'] ,
adapted to (f; ).

A4 E(B, (u) | X0 = i,X(u—) = h,ZO) = E(B, (u) | X(u—) = h,Z0) for all
u 6 [0,1].

Denote b, (u l Z0) = E(B, (u) | X(u—) = h,Z0). We can write

E(B,(u)Y,(u)|XO = 1,20) =E(B,(u)|X(u-) = h,X0 =i,ZO)P(X(u-)=h | X0 =i,Z0).

By assumption A4:

24

5+
S

NPV,”) (s,d | i,ZO) =[ de’mb,(u|zo)13,(0,u IZO)du (1.2)
The right hand side of (1.1) may be interpreted intuitively as follows. Starting at
3:0 in state i, a patient is in state h at time s with probability P,, (0, s | Z0 ). Conditional on
being in state h just prior to s, suppose a transition to state j occurs at s with intensity
a,,- (s | Z0 ) and this transition incurs a cost. Then (1.1) is the NPV for all h—>j transition
costs in [0, t]. Similarly, for the right hand of (1.2) consider the cost of sojourn in state h
in the interval (3, s+ds]. This is b, (s I ZO )ds , conditional on reaching state h at s. To

incur this cost a patient must move from the initial state i to h by time s, with

probability R, (0, s | 20 ). So (1.2) is the NPV of the total sojourn cost in state h in [0, t].

Suppose costs potentially accrue up to a ﬁxed time horizon 2' for sojoums in, and
transitions among the transient states. If k is the only absorbing state costs would cease at

the absorption time 2', = inf {t > 0 : X (t) = k} or 1' whichever is observed ﬁrst. The net

present value of all expenditures is

NPV(i,ZO) = ZEe"~‘c,,(s|zo)a,,(s|zo)13,(0.s|zo)ds+
h¢j

+ Z Emmizo>a.<o.slz.)ds (1.3)
h

where the dependence on the initial state X 0 =i and the covariate profile is shown. The
unconditional version is obtained by averaging over the initial distribution
It,- (0|Zo)= P(X0 = jIZO) which yields

NPV(Z0) = 272,. (0 | 20 )NPV(i,Z0 ).
ieE

25

Comments

In the absence of covariates, the quantities (1.1) and (1.2) are ubiquitous in the

insurance and actuarial literature29 where C,,- (t) is an assurance amount paid to the
insured upon transition at time I from the insurance policy state h to state j. Then
C j,” (t) is the discounted value (at time 0) of all assurance beneﬁts received in [0, t] for

transitions h ——> j and (1.1) is its corresponding actuarial value given X 0 = i. Likewise,

B, (t) is the annuity payment rate at time tin policy state h, C,” (t) is the discounted

value (at time 0) of all annuity payments received in [0, t] while the insured is in policy

state h and (1.2) is its associated actuarial value. Usually C,,- (t) and B, (t) are known non-
random functions and one is interested in the total payment function

2 hi]. C f,” (t) + 2 h CE) (I) . In this context Praestgaard (1991)26considers the estimation

of (1.1) and (1.2) using a framework very similar to that we have described.

However, because costs are incurred in random amounts at random points in time

during the course of a health care intervention the average expense functions b, (t | z)
and c,, (t | z) are no longer known and need to be estimated from appropriate data along

with the transition probabilities P,j (0,t | Z0) and integrated transition functions

A,”- (t | Z0 ) = Law- (s | Z0 )ds . For easiness of notations we will assume the discount rate

is null, i.e. r = 0 , unless otherwise specified.

26

Insights into assumption A1

The assumption A1: C ,j (.) are bounded, non-negative processes over?" ,
adapted to { f: :tE T }, left continuous with right hand limits (so C ,j (.) are bounded,
predictable processes) is not as naive as it seems. Without it we would have to extend the
observed history at time t to}: , the minimal o-ﬁeld generated by f; and

0'{C,j (s) : s S t, h ¢ j, h, je E}. With respect to this new a-ﬁeld one should wonder

whether the compensator A ,j (t) of the process N ,j (t) can be estimated in the same way

BIC .

Some insights into assumption A1 can be gained by viewing the cost C, as a
mark associated with the transition time T, and describing the underlying process as a
marked point process. Suppose that R, is a general mark corresponding to T, and we

deﬁne the marked point process

N(t,A)=Z[T, St,R,e A]

n21

where A is a subset in the range of the R,. For our case we could think of

R, = (X,_, ,X, ,C,) with values in R={(h,j,c) : h,je E, h at j,c >0}. Our previously
described counting process N,,- (t) can be identiﬁed with N (t,A) by

takingA = (h, j,(0,oo)} . The natural ﬁltration f; is generated

by{N(s, A) :0 S s S t,A c: R}. We can endow R with an appropriate 0’ -ﬁeld and regard

A —> N (t, A) for each t as a random measure, and t —> N (t, A) as a counting process for

27

each A. Moreover with respect to{f-: ,t 2 0} the compensator A(t,A) of N (I, A) is given

by

'-Trt-l an (u,A)

,ue (0,T, —T,_,]
l-F, (u-,R)

 

A(t,A)=A(T,_,,A)+

where F, (u, A) is the conditional distribution F, (u, A) = P[T, -T,_, s u, R, e A | f3.“ ]

and f,“ =0{(T,,R,):1San-1}.For te (T,_,,T,] we have

_ dF,(u,A) dF,(u,R)
dF, (u,R)'1—F, (u-,R)

 

dA(t,A)

where u = I —T,_, is the duration of the current sojourn at time t. The ﬁrst term

dF, (u,A)

can be interpreted as the conditional probability of R, E A given 7:} and
an (u, R) n-l

dF,(u,R)

l—F,(u—,R

is the conditional hazard rate for the

 

T-T

, ,_, = u . The second term

sojourn T, —T,_, given ET“ .
In the case of interest R, = (X ,_, , X, ,C, ). Then writing V, = T, —T,_, and

taking A = {h,j,(y,y+dy]} we can express

 

 

dF (u,A) . ~ . ~
—"——=X_=hPCed v,= ,X,= ,.F PX,= V,=u,.F
dF,(u.R) I ,.i ll. yl H J TM] [ 1| TH]
and
dF,(u,R) —[x _, Plvnedulj-‘T,,1
1—F,(u-,R) "“ P[V, 2u|.7:}n_l]

so that on IE (T,_, ,T,] and recalling that u =t—T,_,,

28

A(d A) P[C d w x 'f“ IPIV"Edu’X"=j|fT'][X l] (14)
t, = ,Ey ,=u, ,=, ~ "' ,_=z. .
1 TH P[V,Zu|.7-}n_l] ‘

 

With respect to the filtration {.7; :t 2 0} the compensator A ,j (t) of the process N ,j (t) is

given by

P[V, e du,X, =j|.7»',, ]

A,j(dt)= PIV.Zu|7"r,_,l - [X,_,=h] on tE(T,_,,T,]. (1.5)

 

Now f3,“ =0“{(T,,X,,Cj):an—l}whereasJ-'Tn_l =U{(Tj,X,-):an—l}.lt is

reasonable to assume that

(T,,X,)is independent of {C}. :an—l} givenf'ml ,

or, at least that

{C j : j S n—l} is fr", -measurable.
If (T,,X,)is independent of {C}. :an-l} then

P[V, e du,X, = jlf‘rﬂ]
P[V, _>. u | f}, 1

 

[XII—l =h]:

P[V, e du,X, =j|.7-"Tn_l ,{Cj,an-l}]
P[V, 2M3", ,{Cj,an-l}]

 

n-l = h]:

P[V,e du,X, =j|fT I]
= "_ [XII-I =h]=Ahj(dt)°
P[V, ZquTH]

 

Since a similar conclusion can be found if we suppose {C j : j S n —l} is .77“ -

measurable (i.e. f,“ = ET“ ) in any of these two cases (1.5) coincides with the second

term in(1.4). Moreover, our Markov assumption makes

A,j (dt)=[X,_, =h]a,j (t)dt, te (T,_,,T,]. In fact taking A={h,j,(0,oo]} we get

29

A,j (dt) = E[A(dt, A) | £_]. Also .75; is the minimal a-ﬁeld generated by f,
and0{C,j (s)AN,j (s) : s S t, h at j, h,j e E}. The ﬁrst term in (1.4) is the distribution of
the cost C, conditional on the past in _, , the destination state X, = j and the transition

time], =t.

With A = {h,j,[y,y + dy)} the cost incurred in [t, t +dt) is
[:ledt.A)=C.,(z)ANi,-(t)
and therefore the present value of costs incurred in [O,t]due to transitions of the type
h—>j is
C£})(t) = J36” ny(ds,A).
Ignoring discounting C j,” (t) = J; nyMs, A).
Let ZO a ﬁxed covariate proﬁle. Then using the martingale property of N (t, A),

the conditional expectation of this net present value given X 0 = i ,20 is
E(ClPltHXo =i,Zo)=E( [36" [”ledaAMXo =i’20)=
=E(J:e’” [:yA(ds,A)|XO =i,Z,,).
Using (1.4), E(C,(,l.)(t)|X0 =i,Z0) becomes
E(ge'” fymc, edyl V, =u,X, =j,f,_l ][X,_, = h]a,j(s)ds | X0 =i,ZO)=

= Le-mmfyﬂq edyl V. =u,X,, =j,f}n, ][X,_, =h]|XO =i,Zo)a,j(s|Z0)ds,

30

With our previous notations, c, (s) = [x,_, = h] E” yP[C, e dy IV, = u, x, = hf,“ 1.
Then
E(c}})(t)| X0 =i,ZO)=

[Se-“Tm, (s)| X(s—) .—. 12.x0 = r320”), (0,s|zo)a,,. (s | Zo)ds.

By A2, E(J; EyN(ds,A)|X(0)=i) = ge'rscm-(slZO)P,,(0,s|Zo)a,j(s|Z0)ds which

is the same formula as (1.1), where c,, (t) = E(C,j (t)| X(t—) = h).
We will estimate A,,- (t | Z0 ) and P,, (s,t | 20) from a Cox proportional hazards

model for multiple states and c,, (s | Z0) from a random-effects model. Putting

A
everything together we are able to compute NPV,?) (t | i, 20) an estimator of

NPV,,” (I | i ,Z0) and show asymptotic properties for the derived estimator.

1.3 Estimation of transition probabilities

We now turn to the estimation of (1.1) and (1.2), focusing on the former.
Andersen et al (1993)25 pioneered an elegant asymptotic theory for estimators of ,6,

A,, (t | Z0 ) and P,, (s,t | Z0) . For each of n patients in a study we observe processes of the

type described. For the ith patient the basic covariate vector is z, (t) , the initial

31

state X, (0) , the state indicator Y,, (t) = [X, (t—) = 12, U, 2 t] and
N,,-(t) =#{sStAU, :X, (s—) = h,X,-(s) = j,},h at j.
Conditionally on {z,,X,-(0):i= l,...,n } assume the processes { X,(t):te T }
are independent and that the Cox regression model 61,,- (t | z(t)) = ath (t)exp(,6'z,j (t))
described in Section 1.1 holds for each individual with the same baseline intensities, i.e.
ah,- (I I z.- (t)) = ahjo (t)exp(ﬂ’zrj,- (t))

for all i =1,...,n . From now on denote by N ,j (t) and Y, (t) respectively, the aggregated

n n
processes ZN,,(I) and ZY,,(I).

i=1 i=1
The following standard notation will be used.

For any h

o Jh(t)=[Yh(I)¢0].l-Jr,(t)=[Yh(t)=0l=IZYh.~(t)=01=[Y;,,-(t)=0.Vil

i=1

For h¢ j:
o 2,, (z)®"’ = 2,, (02,, (I), if m = 2;

o 2,, (t)®'" = 2,, (t) if m =1 and 2,, (t)®"'=1 if m =0;

o s,§;"’(t,ﬂ)=2Y,,(t)2,,(t)®'" exp(,8'z,,(t)), me{0,1,2};

i=1
0 5,0,5)=S,‘,}’(t.ﬂ)/S,‘,f’(t.ﬂ);

o icy-0.3):Sif’lrﬁwsif’(t,A)—Eh,-or)“;

o 10,29): 2 £V,,(u,ﬂ)dN,,(u) with N, =ZN,, ;
i=1

haej

32

o sif’aﬂ):ElY,,(t)z,,-,(t)®'"cxp(,6'2,,(r))], mE{0,l,2};
o e,,-(t,,5)=s,(,;)(t,,B)/s,(,f’(t,ﬂ);
o v,,(t,,6)=s,})(t,ﬂ)ls,gf)(t,ﬂ)-e,§2

o A,, (t | Z0) = A,,0 (t, B) exp(,3'Z,,-o) where the maximum likelihood estimator

A

,6 is deﬁned as the solution of the equation U (2', ,6) = 0 where

n It
Ult.ﬂ>=Z Z [lzhjm-E,(u,ﬂ)ldN,,-.-(u)

i=lh.j=l

hat}

0 20,,6) = Z Ev,- (u,,6)sf,(.)) (u,,6)a,,o(u)du , px p nonrandom matrix does
hatj

not epend on n.
The following assumptions will be adopted throughout this chapter. Although not

all conditions are needed for every result, we state them all to avoid too many technical

distractions in the theorems. We denote by H . II the supremum norm of a vector or a
matrix, e.g. the norm of a vector a = (a,) or a matrix A = (a,,) is H a ll: sup, |a, | and
H A II: sup“- la,- |, respectively. Convergence in probability and weak convergence are

always as it tends to inﬁnity.

Model Assumptions and Conditions:

A5 Conditional on z,(-), U, is independent of X, (.);
A6 (N, (.), Y, (.), z, (t)) ,1 S i S n are independent identically distributed;

For h¢jz

33

T
A7 A,,0 (2') = La,” (t)dt < oo;

A8 2 b—y 2(2,=ﬂ) Z]; ‘0’
r — v,, (u, ,6)s, (u, ,6)a,,o(u)du is positive definite.

notation
hat j

0
There exist acompact neighborhood 6’ of ,6 , with ,66 6’ (the interior of 6’ ),

and scalar, p-vector and px p matrix functions 5,19), sipand 3(2), h ¢ j, deﬁned on

[0,2']x6’ such that for me {0,1,2} and h,jE{1,...,k}, h ¢ j:

A9 sup
(1 ..BHO rlxb’"

                          

—S"’"(t 5)— s,‘,’-'”(t ﬂ)I

 

 

A10 s,}"’(.,.) are continuous functions of Be 6’ uniformly in [6 [0,1] and

bounded on [0, r]x6’; SIG) (., ,6) is bounded away from zero on [0, r] and

0)(I ,5) 82,2)(1 ,6)=—s, ,1)(t, ,3) (Asymptotic Regularity Conditions);

8hr)“ ﬂ)=aﬂs,, 31,3"

All There exists 6 > 0 such that

n’“2 sup |2,,(t)| Y,,(t)[,6’z,,,(t) >-5|2,,(t)|]—P—>0 (Lindeberg Condition)
h¢jJJ

where [a > b] =1 if a > b and zero otherwise;

Conditions A7-A11 are implied in the independent identically distributed case
(i.e. under assumptions A5-A6) by more general conditions as proven by Andersen et al
(1982)57 Theorem 4.1. These conditions are: .

1. 2,,- (.) and Y,, (-) are left continuous processes with right hand limits

processes

34

3. Z, is positive deﬁnite
4. P(Y,, (t) =1,VIE [0,2]) > 0

5. ' E[sup Y,, (t) | z,,, (t) I2 exp(3'z,,, (t))] <oo where the supremum is over
If!

t E [0, 2'] and B E B(ﬂ) with B(,6’) being some neighborhood of the true parameter ,6 .

Although not absolutely necessary, hereafter 2,, (.) are considered to be bounded

predictable processes.
The form of the partial likelihood is functionally the same as in the case of the
ordinary survival Cox proportional hazards model. Thus the log-partial likelihood

evaluated at time I (see p483, Andersen et al. (1983)”) is:

C(t, B): Z Z EIB'ZM,(t)—logS(o)(t,ﬂ)IdN,J-,(u).

11h}—
hatj

Since 5‘” (t, ,6) is the vector of first partial derivatives of S (0) (t, ,6) with respect to )6 ,
the vector U (I, ,6) of partial derivatives of C (t, B) with respect to ,6 is
n k
U(t,,6) =2 Z [[2, (tn—E, (u,p)]d1v,, (u).
i: 1:2=1 $1.
The maximum partial likelihood estimator ,3 of ,6 is deﬁned as the solution of
the likelihood equationU (2, ,6) = 0. For h at j we estimate A,,0 (t) by the Nelson-Aalen

estimator

Alisa ﬂ)= L—M—deu )

(0) u’B)

35

n n
where N,, =ZN,,- , J, (u) =IY, (u) >0], Y, =ZY,, . We use the convention %=0.

i=1 i=1

Let A,,0 (t, 8) = -Z Am.O (LB) . Thus the matrix of integrated baseline intensities
, jxh

A0 (I) = (A,,O (t),h, je {1...k}) is estimated by
1300.3) = IA,,O(I.B),h, jE {l...,k}) .We define for a fixed covariate proﬁle 20, (and
corresponding type-specific covariate 2,0) A,, (t I Z0 ) = Am, (t)exp(,3’Z,,-o) , h¢ j ,

Ahh(IIZO)=-ZAhj(IIZO)-

jaeh
Central to all our proofs is the derivation of asymptotically equivalent
representations of JR]? — ﬂ) and w/h(A(t I Zo)— A(t I Z0 )) in terms of iid random

variables. By asymptotically equivalence of two quantities is meant convergence in

probability to zero of their difference (and where appropriate, uniformly for t E [0.2'] ).

We use the next theorem from Andersen et al (1993)”.

Theorem 1 (Theorem VII.2.l, p497, Andersen et al (1993)”)

Under A5-A11, the probability that the equation U (2', ,6) = 0 has a unique

solution ,3 tends to one and ,3 —’-)——> ,6 as n ——> oo . CI

The next theorem gives the asymptotic normality of ,3 and an estimator of the

asymptotic covariance:

36

Theorem 2 (Theorem VII.2.2, p498, Andersen et al (1993)”)

Assume AS-All. Then n1’2(3— ,6) converges in distribution to a zero mean

normal p-dimensional random vector with covariance matrix 2;] and

A A b-v A
n-lIUHB)->-7(1,,5)II—5—-->0.Inparticular22r = n'lI(2.ﬂ)—L>Z,.D

notation

 

 

SUp
tdOJ]

Following the proof of Theorem VII.2.2 of Andersen et al (1993)” we can show

that the (h,j)element. hat j, ofthe kxk matrix J72(A(r|zo)—A(t|zo)) is

asymptotically equivalent to exp(,B Z,,0){ XIII!) (t)+ X”) (1)} where
n 2 I I
X}.,’ (t) = JZw—ﬂ) I,(Z,,,-o —e,., (2mm... (u)du

xggjo): JTIOS —:—d(")ﬂ),,)" M,,(u)

Here M,,(t)= N,,(t)- £S(O)(u,ﬂ)dA,jo(u) and M,,(t)=2n:M,,(t),where

..r
M...- (r) = N,- (t) - [Y,,- (u)exp(ﬂ’z.,.- mm... 00.

Let h,j. (I) = [32,, -e,, (u,,8))a,,0(u)du ( pxl vector).
Then X1823) (t)=JZ (B—ﬂ)’b., (t).

We expand both X I31) and X $7,) as sums of iid random variables.

J,(u)

WeconsiderXéZ} ﬁrst. Indeed XéZJU): II: 5,0,, uﬂ)

Hﬁlw(u u)"

37

l
=T J; W )[s,‘,°’(u ﬂ) sf,°’(uﬂ)

 

1 dM.,-(u)+

dM, (u) M), (u)

(1'1 (ll-:1——)) -
+\/1-'[;S,(,O)(u, ,3) 71:]: h 380“ “,5)

For easiness of notations we will denote the three terms of the above sum as T1 , T2 . T3 ,

 

whereT,=— jzujgm )[ 0 (0 dM,,(u), T2: —I;(f:)l——"——’(u) and
swam s, ,’(u ,)]ﬂ T s... (at?)
( )
Jig—J30 Jilu ))ﬁ

To reduce the representation of X £2} as sum of iid variables, under the

assumptions A5 - All we prove that the first term, T, converges in probability to O and

the third term, T3 is null. Indeed, if we consider the first term

 

 

 

1
IT‘HTI; h‘“ ”5‘90 3) s‘°’(u .dﬁﬁMWHS
hj
s,‘,,°’<u m— ﬂ)S(’O)(—"—-u——
" dM-(u)I<
77H; Ju.( u) 3,, ,——_) ,0) i, _
(u ﬂ)
n
5(0)
(u 5) 1
s—l—IL1,.(u)dMi,<u)IsupIs,.°’(uﬂ)—Isupl 5(0) I
J; u<r US! ShO)(u ﬂ)____ Sh} (u nﬂ)

38

I 1 n 1
Using Central Limit Theorem for— J (u)dM (u) =—( J (u)dM .,(u)),
J; E h h} \[IT 12:12]; h h}

5(0)
(u ﬂ)
sup | S,” (u, ,6)— I P>0 (by Assumption A9) and sup I 5,0) (u, ,6) |< B < co

uSr uSt

 

(by Assumption A10) it follows that IT, |——P——>0.

The third term, T3 , is null:

(u) i(u u)_
32T£(1-Jh(u—d——__))S(O)(thﬂ) =:/-—;,Z:,:'E[Yhi(u): O,Vi]s(oﬁfh__j—— ﬂ):

1 " Nu“)
:Tg-§£[Yhi(u): OJVIW—

Shj(

th (u)CXP(.B’zhji (“))dAhjo (u)
s9’(u 1?)

=0

 

" t
{3 join..- 00 = 0m
i=1
Using M ,, (u) = Zthi (u) . the second term, T2 can be expanded as sum of iid random
i=1

.Therefore we have showed that

dM, (u) —X ,_ dM,,(u)

' bl T- —
vana es as 2 J—‘EF_ ,9,“ ,5): n =1S,Q)(uﬂ)

XéI',’ (t) \l—IOS ,yo) J”((——u—-du)ﬂ) M,,(u) is asymptotically equivalent to the sum of iid

dM,,- (u)

random variables: —2 £————
0()
<11 X?)

To Obtain the derivation of TITLE — ,8) , we use Theorems 1 and 2 stated above.

Consider now XI; (t) = «5(3- ,6)'b,, (t) . where

b,,(t)= I;(Z,,o—e,,(u,,6))a,,0(u)duisa pxl vector. We have

39

0:..1

\/_(ﬂ- ,5): TEXT, ,6.) U(2', ,6), where bya, a=b, we mean a, is asymptotically

equivalent to b, . We can write U (2, ,8) as a sum of independent identically distributed

random variables as in the proof of Theorem 2 (see proof of Theorem VII.2.2 p498,

Andersen et al. (1993)”):

1 " '
T’IU(T’ﬂ)=;,§ 11= £11,, (u)dM,, (u) where 11,, (u) 7.;(z,j,(u)—E,,(u,ﬂ)) are

predictable, locally bounded processes (p496 Andersen et al (1993) 25).

Consequently,

7301,,- (t Ian—A..- «Izod: expw’zhjonxigy (t)+ X31} 0)}:

n k
=exptﬂ’zhjoll Zr 2 £11,...(ulth,(u))’2<t.,6)"‘bh,-(z)+
i=1 h.j=l
hat '

dM h,,-(u)})

,0) (uﬂ (1.6)

£213,7—

We use:

EH,- (u)dM,, (u) = ﬁ-ﬁjtz, (u) -— E...- (u.ﬂ))dM,,,-,- (u) =

= E-x/iz—(Zhﬁ (id-eh} (u.ﬂ))dM,,-, (u)+ LIT/17“” (u.ﬂ)—E,j (u,ﬂ))dM,j, (u).

Since sup -——— -s,, (u,,6)|——)0 and s,, (.,.) are uniformly
(u,,B)e[0,r]le ’1

continuous bounded functions on[0,2']x21 it follows

40

that sup lehj(u,ﬂ)—Ehj(u,,6)|—£—>O.
(u./3)E[0.r]x‘21

Therefore £31.;(ehj (u,,6)—Ehj (u,,6))thﬁ(u)—P_)O and

J: Hm, (u)dM,U, (u) 2 j: J.(z,,j,.(u)—e,,j(u, ﬂ))thj,. (u).Therefore for 12:: j

we have proved the next lemma

Lemma 1
Under the assumptions AS-All, £01,”. (I | ZO ) — Ah]. (I | Z0 ) is asymptotically

equivalent to

n k
exp(ﬂ'Zhj0){Z( Z £%(zhji(u)-ehj (u,,6))dM,y-i(u))'£(2’,ﬂ)'lbhj (t)+

i=1 h,j=l
hat}
" thj, (u)
+JZE£T_ £0)(u, 19)} .
If we denote
n k
s;{;,(t,r)=2( Z 13171—th}.-(u)—e,,,-(u.ﬂ))dM,,,-,-<u)>’2(r,ﬂ)"bhjm and
i=1 h,j=l ’1
hat}
,dM (u)
5(2) _ h}!
n.hj(t)" :71-—;§LST_?)(14 ﬂ)

then

MAM-(zIzo)—A,.,(z|zo)a§'exp(ﬂ’zhjo){sf,‘2,,(t)+s,‘,33,.<r)}.u

41

Comments

The k2 components of Sf,” (.) = (5:2,?! (.),h,je {l,...,k}) are processes in D[O,r],

the space of real-valued right continuous functions with left-end limits on [0,7]. Equip
D[O,z'] with the supremum norm and the 0' - ﬁled generated by the collection of open

balls, and for product spaces their usual extensions. Regarded as multivariateprocesses

on (D[O,‘r])"2 , we can establish their weak convergence in the Skorohod topology.

However, because the limiting processes have almost surely continuous paths, the

convergence is true in the supremum norm as well.

(2). (t),(h,j)e E} converge weakly

Under assumptions AS-All, the processes { Sm,”

to a process U 5 (I) (see proof of Theorem VII.2.3, p503, Andersen et al (1993)”). The

limiting distribution of the k xk matrix-valued process Sf?) (.) = (5:2,; (.),h, je {l,...k})

is that of a kxk matrix-valued process U; (.) = (05,]. (.),h,jE {l,...,k}), where

U;,,,, = —ZU;,U- and {Ugh}. (.),(h, j)e {l,...,k}} is a continuous Gaussian vector
jzeh

martingale, with

i) Ughj(0)=0,
ii) (Ughj,U;m,> =0 for (h,j)¢ (m,r),(h,j),(m,r)e E

I. by (lb-0(a)
111)<U2hj>(t)=0)1?j(1) = £S£9)j(u ,60)
J 9

notation

iv) C0v(U;,,j (s),U;,,j- (t)) = a); (SAI)

42

The ()1, j )-th element of the limit process Ur(t) of SE,” (t,T) can be expressed as

éébhj (t), where :0 is a p-dimensional normal random variable (its distribution does not

depend on t) with zero mean and covariance matrix 2(2’, ,8)"1 and

bhj (t) = E(Z’VO —ehj (u,,6’))a',,j0 (u)du is a p-dimensional vector depending on t. The
processes Ufhj and US,”- are asymptotically independent and the limiting distribution of
ﬁ(A,,j(z|zO)—A,j(:|zo),

exp( ﬂ’zhjo )(Ufhj (t ) + ugh}. (t)) (1.7)
has mean zero and variance

exp(2ﬂ’z,,jo)(w,3j (0+th (r)’£(r,,6)" h,j (t)} (1.8)

The covariance matrix of (52,)”. (1,7),534. (t,r)) can also be computed directly,

using the definitions of 3,?) (u,,B), m = 0,1,2 and the independence of the processes

{X,-(-),i =l,...,n} . Then

I: k
n var(s,§{,{j (2,7)) =Var(Z( 2 j: (2,1,. (u) —e,,j (u.ﬂ))dM,,,-,- (u))’>3(r.ﬂ)“ b,.,- (t)) =

i=1 h,j=1
hatj

n k
= b,.,- (t)’2(r.ﬂ)”‘ Var<2< Z [I (2,,- (u) —e,.,- (u.ﬂ)>dM,.,, (u>>’)>:<r.ﬂ)“ b,.,- (r) =
i=1 h.j=l
hatj

n k
=bhj (z)’2(r,,8)'IZVar( Z E(zhji(u)_ehj (usﬂDthji(u)))’2(7aﬂ)_lbhj (1):
i=1 h,j=1
haej

43

=b,,,-(z>’ 2(r Mi) 2 E( [0’ (z,,,,(u>- e,.,(u ﬂ>)®2Y,,,-(u)exp(,6’z,,,-,(u))a,,,o(u)du)’-

i=lh,j=l
hatj

-2(r.ﬂ>“b,,,-(r) =

:nbhj-(t) 2(7, ,6) (Z E(S/(J)(“v ,B)- eh] (u may?) (u ,3)- sf,”(u, ,5)e;,j(u,ﬂ)+

h¢j

+ef3 (u ﬂ)s,‘,°’(u ﬂ))a,,,0(u)du)2(r ﬂ) .‘b,(z)=

.—. nbhj (t)’2(z', [3)" 2(1, ﬂ)'2(r, ﬂ)“ h,j (t) = nbhj (t)'2(z’, ﬂ)“ bk]. (t).

For the last equality we use

(0)

5;?)(14 )6)— ehj(u, ,3)s(l_>’(u,,5)- 5“)(u, ﬂ)ehj (u, ,B)+e® ‘(u, ,B)shj (u,ﬂ)=

hj

Sm SU)’
(u ,6) 04 ﬂ)
__ 2) Say 5(1) Shj

+Shj)(u )6)SU-)(u )6) 580)

5,30% ﬂ)

(u m-vh, (u ﬂ)s,‘,°’(u ﬂ)

 

and the deﬁnition of 2(1, ﬂ)- -— Z Evhj (u, ,6)shj)(u,,6')ahjo(u)du .
hatj

 

thi(ll)) zivar £thji(u) _

Also nvar(S(2) (I 7)): Var(z [0d f,°’( um) 5(0)“ 13) _
11, hj 9

=2 E(L—d

s£9)(u,3 2<thi,thi >(u))=

zzk: (0)( _—)'25(th (“)6Xp(ﬂ' zhj, (u)))ahj (u)du- _

Shj(

 

l " I
= ESE?) (u,ﬂ)2a'hj (U)E(;Yhi (U)CXp(ﬂ zhji (u)))du =

 

t l
=nJ‘0 (9)(u,ﬂ)2ahj(u)sl(:)(u, ,=,B)du nEWL—a,,j(u)du=nw,?j(t).

u.,6’)

Also 521,1]. (LT) and 522,3]. (t) are asymptotically uncorrelated because the martingales
M M and M M are orthogonal for i i k (by independence over subjects) and for j :1 (by
continuity of the functions I‘M-0(1)) . Indeed

Cov(sf,f,’y. (t, magi; (t)) =

= E{ [Z( Z EVE—(liar (u) —ehj (u’ﬂDthji (“)YZUMBYI bk! (”1'

i=1 h.j=l
Int}
thJ, (u)
[_‘/1=2£____ s£°)(u ﬂ)”
_ ,(u u)
=—ZIE{[hZ=l E(zhji(u)— ehj (u ﬂ))thji (14)) 2(7 ﬂ) lbhja )HL— :(0)'——h—'_j ﬂ) ]}=
' h¢jj=
_ .-(u u)
=—ZE{[J:(Z,U, (u) eh] (u ﬂ))th,-,(u))'£(r l3) lbhj( )Hg‘ﬁy’y—‘E—l}:
,(u u)
—ZE{E[ J: (zh,,(u) eh, (u ﬂ))dM,,,.(u))’.>2(r ﬂ)"bhj(r)J;-7m—’”—7)I S,l}=
_ ,(u 14)
=_ZE{IJO(zh,,(u)— eh,(u 5))thj,(u))'Z(T ﬂ) ‘bhjumJ; T_,‘”hju ml}.

Using d < thi ,Mhﬂ > (u) =Yhi (u)exp(ﬂ'zhji (u))a,,j (u)du and Fubini Theorem, the

covariance of the two sums becomes

45

Cov(sf,',§j (m) 58,110»:

:-ZE {—_—£ (0): ,6)(Zhji(u)_ehj (usﬂDYd <thi’thi >(u))Z(Z',,B)_1th-(t)} =

shj( u,,6

{( 150,33, (u>exp(ﬂz,,,, <u»(z,,,, (u) e, (u ﬂ)>’1a,,,(u)du)

shj( u,,B
-2<r,ﬂ)"b,,(z)}=

:_£——s(19)(luﬂ)( SI(1j1)(u ﬂ)_ehj(u,ﬂ)shp)(u, ﬂ))2(T,ﬂ)-lbhj([):0

The last integral is null since, by deﬁnition, ehj (t,,6)— - 3,9.) (t, ,B)/sfuo) (t,,6).

The variance of AM = ‘2 Ahj can be calculated using formula above. Indeed,
jaeh

502;th l ZO)—Ahh (’lZo) z—ZJ;(Ahj(tlZO)-Ahj (’lzo)‘:

jach

'=' —zexp(ﬂ’z,,j0){s,‘,{},j (t,r)+ 52.2,),- (t)}.
j¢h

Using deﬁnitions of S m

n h] (t,T),i = 1,2 and same arguments, the variance of the last sum

is

Varl- Zexpw’ Zhou—1,12 Z E—l—(zhﬂ(u)—e,.,(14./3mm,(u))’z(r.ﬂ>“b,,(t)+
jath f1": 1 ij ~l J;
‘1

46

=V —Z(Zcxp(ﬂ Zth) { Z £71.;(zhﬁ(u)—ehj(u,ﬂ))thﬂ(u))’}2(r,,6)”‘b,,j(t)+

n i=1 j¢h h .j= -|
haej

thj, (u)}

Tnzﬁ— “”(u 13)}:

Sh}-

={Zexp(ﬂ'2hj0 )bhj (mm, ,6)" { Zexp(ﬂ'Zhj0)bhj (1)} + Zexpaﬂ’zhjomij (z).

I"! jath jaeh

Using these results one also obtains the convergence of
\[ri(P(s,t | Z0 ) — P(s,t | ZO )) . The Aalen -Johansen-estimator of the transition
probabilities is given by

133,420): 1'](I+dA(‘u|zo)),
(3J1

this estimate being meaningful as long as Aft“, (u | 20) 2 —l on (s,t]. Here AU | 20) is

the kxk matrix of elements Ah]. (I | 20) with A“, (t | 20) = — 2 Ah]. (I | 20) . If a transition
j¢h

occurs at time u, then I+ dfMu | z) is the matrix whose (h, j)-th element is

Mk]. (to/Y,, (u) if h¢ j and equal to 1- 2 Mb]. (u)/ Y,, (u) if h=j. The properties of
jath

f’(s,t) follow from those of A as will see below.

It can be shown that

J5<P(s,z|zo)-P(s,z|zo))=JZ( I] (1+dix(u|zo)— 1'](1+dA(u|zo))

s<u.<.t s<u$t

is asymptotically equivalent to s/Zﬁﬁsm | Z0 )d(;\ —A)(u)P(u,t | 20).

47

Therefore
J5me:|ZO)—P(s,:Izo))a:'JZ£P(s,u|zo)d(s£,”(u)+sf,2’(u))P(u,z|zo) (1.9)

where we define the matrices Sf,” (t) = {5(1)

n’hj(:,r)},,‘j and 39(1) ={S‘2’ (0th-

n ,hj

It follows that

J3<éh<s.tlzo>-Bh<s.rlzo» 5'

k k ,
43122Liz-g(wIzo>d<sfxb<u>+5221<u>>ﬁh<uet|Zo>+

g=ll¢g

It
+2 £31: (51“ I Zo)d(5ii§g (“+52% (u))Pg,, (M | 20)}.(1.10)
g=l

Since 55231; (u) + 5,238, (u) = —(Z 32;, (u) + 5,33, (11)) we have
latg

J;(i§h(s,t|ZO)—Rh(s,r[20» .=.

k k
= #22 Egg (s,u | zo)d(s,‘,{;, (u)+s,‘f;, (u))(P,h (11,: | zo)-Pg,, (11,: | 20)) (1.11)

g=ll¢g
If we replace 521;, (u) and 5213,01) by sums of independent identically distributed

variables we will get an independent identically distributed representation

forx/IIUA}, (s,t | ZO)—R,, (s,t | ZO )). Here we use

I: k
d532,, (11):?th J: 71—;(zhﬂ (u1—e,,,- (u,ﬂ)>dM,,,,- (u))’2(r,ﬂ)"‘db,, (z) and
z: ,)=
hatj

48

dM 11,. (t)

(0) ).Using Theorem VII.2.3, p503, Andersen et al (1993)25 and
(Lﬂ

dezizj (I): J; _2—

[:1 sh!-
(1. 11), the limiting distribution of J—(P Ph(s, IIZO)— Ph (5, tIZO )) is given by the next

theorem.

Theorem 3
Under A5 - All, x/ZUA’M (s,t | Zo)- Pm (s,t I 20 )) converges weakly to

Um, (s,t | Zo)+U2,-h (s,t I 20), where

Um,(s,=t|ZO) IZZI'P (s,u|zo)(P,,,(11,:|zo)—Pgh(u,t|zo))de;}(u) and

g=ll¢g

U2,h(s,=z|zo) JZZZI'P (s,u|Zo)(P,,,(u,t|ZO)—P W(ut|20))dU§2,(u).

g= -ll¢g

The processes Ulih(s,t|ZO), U2i,,(s,t|ZO) are independent. :1

Next we calculate the asymptotic covariance function of

n”2 (P(s,t | ZO)—P(s,r | 20)) . Because Sin)(’) and an)(t) are asymptotically

uncorrelated the k2 xk2 covariance matrix of (1.9) is the sum of two terms. From (1.6)

the (i, h)-th element of the ﬁrst term in (1.9) is

121/2 (3‘19), 2 6XP(,B'ZgIo)£Hg (S,u IZo)dbg1 (10PM (“J I 20):"“2 (B—ﬂ)'a,-h

g,l.g¢l

where bg, (t) = E(ng -eg, (u,,B))a'g,0 (u)du is ap-dimensional vector depending on t,

49

and a”, = Z exp(,15"Zg,0)‘I'I§g (s,u|ZO)dbg,(u)P,,, (u,I|ZO). Therefore the
g,l.g¢l S

asymptotic covariance of the (i, h)-th and (q, r)—th elements is egg/3,1)“ aqr. In order

to estimate this covariance we replace 296,7) by n'11(,3,r) and a”, by

gem/‘3 73,0)ng (s, u|zo)dﬁg,(u)P,,,(u,z|zo),where

13,, (t): £12,,0-E,,<u.19>1d2,,,<u1.

For the second term in (1.9) the asymptotic covariance has the form

IIP'(u,t I ZO ) ® P(s,u I 20 )C0v(vec(dS(2n) (u)))P(u,t I ZO)®P'(s,u I 20) . The inner

(u)
covariance matrix is expressible as ZZCXpQﬂ Z 810 ){vec C gl ”V“ Cgl I 1:50:33 )
l gatl u

where C81 is a mxm matrix with (g, l)-th element equal to l, (g, g)-th element equal to —1,

and all other elements zero to zero. Combined with the previous expression we get a

compact form for the covariance of the first term in (1.9), namely

ZZexpaﬂ' 28,0)I {vecP(s uIZO)Cg,P(u, t|20)}

lgatl

_g__dA10(u)

“{vecP(S.u I Zo)Cglp(u’t I Zo)}——— 3(0) (16 u)

Therefore the asymptotic covariance of n1’2(P(s,t | Zo)— P(s,t I Z0 )) is the sum

of this expression and the covariance matrix of terms ar;,,2(,8,2')’1 aq,. Chapter VII,

pages 514-515, Andersen et al (1993) gives formulas for estimated covariance of

ﬁh(s,z|zo) and 13,145,420).

50

1.4 Estimation of average transition costs

We are now left with the estimation of ch}- (1 I 20) in (1.1). Suppose costs can

potentially be incurred up to earliest of a fixed time 2' or Q , the time to absorption in

state k. If all observation ends at r then we are restricted to transitions and their
associated costs that are observed by 2'. However, allowing for censoring of follow up at

time U, the period of observation actually does not exceedU A TA 2“,, . We now introduce

some new notations.

Let ya. be the transition costs in the ith patient at the chronologically ordered

transition timestij, j = l,...,nl- , where n,- = max U: ’1} S U ,- A 2' } is the number of observed

transitions. The cost yij is observed provided sq. =1 where sil- = [t,-j S U ,- A 2'] , that is, if the

transition time ti]. occurs by time I and is not censored by time U i. We denote by yithe

ni x1 vector of costs and by X a "1 x p matrix of covariates. The matrix X. includes

variables that are constant in time, but vary at the patient level, variables that vary with
time but are constant across individuals (e.g., prices of resources), and variables that vary

both between and within patients. For example, the jth row of X will typically contain

time-constant factors such as age at entry, gender, baseline comorbidity, and variables for

[2

modeling time such as: i}- and interactions between time and time-constant variables.

ij ’
We would also include dummies for the transition types h—-) j.

134

Consider a random effects (RE) mode given by

yizxiﬂ+ai1i+ui (1-12)

51

where ,8 is an unknown pxl parameter, 1,- the ni x1 vector with all elements equal to 1,
a,- an unobserved patient-specific heterogeneity and ui is the n1. x1 vector of idiosyncratic
errors. The composite error in is vi = a,. 1,. +ui . where ,6 is an unknown pxl parameter,
1,- the ni x1 vector with all elements equal tol, ai an unobserved patient-specific
heterogeneity and ui is the ni x1 vector of idiosyncratic errors. The composite error in
(1.12) is v,- = a,- 1,- +u,- . Assume 91 = E(vivf)is positive deﬁnite. Note that the

parameter ,6 in (1.12) is unrelated to the regression parameter in the Cox model for the
transition intensities. In this section our notation conforms to standard usage in RE

models.

RE model Assumptions:

RE] (i) E(ui |x,.,a,.)=o and (ii)E(a,- |x,)=o.
RE2 rank E(Xgng‘xi)= p.
RE3 E(uiu; | x,,a,. ) = 031,. and 15(a,.2 Ix, ) = of,

where 0'3 and 0'3 are constants and I,- is the ni X": identity matrix.

Under RE] and RE3, E(v, ) = o and E(viv; ) = o, = 031,. + 03.1,. where J, is the
n,- Xni matrix with all elements equal to 1. Since E (yi IXI- ) = Xib’ we see that estimates

of ch]. (1 I Z0 ) for speciﬁed transitions (11, j) and Z0 are derivable from the estimates of ,6

and speciﬁcation of the X-matrix.

52

In the RE model an estimate of ,6 is obtained by minimizing (with respect to ,6)

the objective function n“12q(yi,X,-) where q(y,.,X,-)=‘/2(yi —X,-,6)'Q,’l(yi -—X,-,6).

i=l
Condition RE] suffices to ensure E ( XIﬂf'vi ) = 0. The feasible generalized least squares

(GLS) estimator of ﬂ is

n n
ﬁre = (ZXIQI‘X. )" (2x;n;‘y.) (1.13)
i=1

i=1
where (A2,. = (3'3 I ,- + 651,. is a consistent estimator of (2,- derived from suitable consistent
estimators ‘3 , ‘3 of 0'3 ,0'3 respectively. However, (1.13) is implicitly conditional
upon the availability of the sample {( yi , X,- ): IS i S n }. The cost y”- is observed
conditional on ti}. 5 2'. The number of transitions n,- is also random and depends on the

length of the observational period and censoring time. Our notation implicitly assumes

conditioning on the n,- .

1.4.1 Modification for censoring

As noted earlier, time censoring might lead to incomplete observation of
transition costs. Since yl-j and a portion of the jth row of X, are observed only if Si]- =1,
the estimator in (1.13) needs to be modiﬁed to account for this selection. Let sl- denote the

diagonal matrix with jth entry s and t,- = (’11 ,...,tini )’. A covariate vector ziis observed

1.].

initially which might include some covariates contained in X,- that do not depend on the

53

transition times. Consider the observable data {(51 ,z,. ): 1S 1' S n } and assume that
given Zr , the censoring time U,- is independent of ( ti , y, , Xi ), that is, censoring is

independent of the transition times and costs. This is a natural assumption with
administrative censoring, in which case the independence is related to the distribution of
entry times of patients in the study.

Since E(Sij Izi,y,-.X,-)=P[U,- Ztijst <TIzi9Yi9XiJ=Plui Ztij Izi]

ij —
provided ti]- S 1', we deﬁne weights wl-j = 5,7 /p(t,-j ,zi ) where p(t,z,. ) = P[U,- .>_ t I z, ]. If G
is the survival distribution of the censoring time U ,- , then p(tij ’11): G (th - | 2,. ) on

tl-j S 2' and therefore under the mild assumption that G(z'| zi ) > 0 with probability 1, we

ensure p(t z,)>0 whenever sij =1. Hence E(wij Izi,yi,X,.)=E(wij Izi)=l if tij Sr

1] ’
and 0 otherwise.

Let wi be the diagonal matrix with jth entry wij. Then the applicable modiﬁcation
of the previously mentioned objective function q(y,. , X,- ) is
ciry. ,w..X.- 1 =v21w1’2 (14)“ (y.- — x,19)y{wy21.;1 (y.- -X,-b’)}. (1.14)
Here 1.,-denotes the unique lower triangular matrix with positive diagonal elements such
that Q = LiLf. This exists since 52,- is positive deﬁnite. Our assumptions ensure the
diagonal matrix E(wi Izi,y,.,X,-)=I,. provided tin, Sr. Then
51cm. .w..X. )1 = EtEtci<y.-.w.-.X,- ) I y..z.-.X.11=

=E(‘/2{(le(y,. -X,-,6)}’E(W,- lzi.yr.X,-)L71(y.- -Xrﬂ))=E[q(y.-.X,-)] (1-15)

54

Therefore a modiﬁcation of (1.13) is obtained by minimizing with respect to ,6 the

n
objective function n'12§(yi,wi,X,-). Let X, =wI/2L71Xi andyi = wI’Zleyi . Our

i=l

estimator of ,8 is given by

3.. 4231.752.- )"(Xiii11 ' (1.16)

Remark 1.4.1.1

1. Because 9,. and p(t,zi) are generally unknown, to make (1.16) operational
we need to replace them by consistent estimators. Assume for now that these are known.

2. Use of the transformation (y i iXi ) —> ( y, , Xi) preserves the time ordering of
costs because 5'”- depends only on { 1’11. ,k S j }. This is also true of the rows of i,- if only

covariates ascertained at prior times are involved.

3. Model (1.12) can be generalized to include additional random effects, through a
qxl random vector a,-. The model is then yi= Xiﬂ-r Ziai +01 , with Z,- containing a
subset of the covariates in X. . For example Zr =[11 Iti Itf] might be used to model the
time dependence at the individual level. Under an obvious modiﬁcation of A3, the
composite error vi = Ziai +ui satisﬁes E(vi IXi) =0 and
E(viv; I X,- ) = Q,- = 0311+ ZIGZI where G = E(aiafl X,- ) . Even if we assume the qxq
matrix G to be constant, there is conditional heteroscedasticity in 91'- However, the same

arguments leading to (1.15) and (1.16) will obtain.

55

1.4.2 Consistency and asymptotic normality of 3,,

Since E(Xgii ) = E[X,’-L}1E(wi | X,- ,zi)L71Xi] = E(XEQ;1X,- ) , assumptions
REl, RE2 ensure that (11‘1 Z XIX- )"1 converges in probability to (E (XEQi—IXI ))'1 by
i=1

the weak law of large numbers (WLLN). Also writing 9,- = wI/zlevi we get
E(ngi ) = E[X;(L; )'1 E(wi I Y1 ,Xi ,zi )levi ] = E(Xgﬂi'lvi ) = 0 under REl. Hence, by

the WLLN we have consistency of 3w. Also from (1.16)
n“2 (8.. —ﬂ) = (A;‘ +0, (1>>(n"”ZX;1.-) (1.17)
i=1

where Aw = E (xgrzg‘xi ). Application of the central limit theorem gives the normality

A

ofﬂw
n"2 (3.. —5)—D—>N(0,A;‘BWA;‘) (1.18)
where BW = E [igvivgfg ] . Another form of Bw is derived under additional assumptions
as follows. Now
Bw = E[x;ir,.i1;x,. 1 = E[x;(L; )‘1 wiL;‘v,.v; (L; )‘1 w,L;'x,.]
= 15rx;(L;)‘l wiL}1E[viv,’. | f](L;)‘1 wiLj'Xi ], (1.19)
where f is the set (U, ,ti 1X1 ,2.) and recalling that w,- is a function of (Ui ,ti ,2, ). Since

U. is independent of (ti,yi,X,-)given 2,. we getE[viv: If]: E[viv;|X,-,z,.].0ur

1

matrix X includes ti}- ,1 S j S ni , but 2‘. would have variables not in X. . From RE] and

RE3 we get only E[v,- IX, ] =0 and E[v,-vf | Xi ] =ﬂi. Strengthening these to

56

 

51V,- IX),Z,- ] = 0. E [vivg I X,- ,z,- ] =91 preserves the previous assumptions and gives
BW = E[X:(L,'-)'1wiw,-L:'Xi]. However, without these additional assumptions on v,- we

have to be content with Bw given in (1.19).

1.4.3 Computation of Awand Bw

 

To implement the result (1.18) we will need estimates of A w and Bw. From the

discussion leading to (1.19) consistent estimators are obtained as

Aw = 114212;)?“ ﬁw = n'lZi;$i$;Xi . (1.20)
i=1

i=1
Here 9,. = yi — iiﬂw are the residuals obtained after estimation of ,6. A consistent

estimator of (2‘. may be obtained by OLS estimation in the model transformed by wI/z .

that is, the model
1.7 = Xiﬂ + V?

where y: = w” 2 3’1 and the other asterisked symbols are similarly deﬁned. Let 30,5

1‘
denote the OLS estimator from this model and i". = W?”2 (y: — x350“) the

corresponding transformed residuals. Consistency of 3015 follows from RE] and the

mild condition that E (XIX; ) has full rank. A general consistent estimator of (2,. is

A n
12,. = 12’1 2 91.9; . However, under RE3, (2,. has a special structure involving just two

i=1

S7

 

. 7 .
unknown variance components 03 and 0,? . We would then estrmate these components

11,-1

fromaf +002:(Z:1n1 -p)]ZZv,-j and 0',, 2=(}:‘/2n, (n, —1)— ”12225111115-
k>j

1'=lj=l i=lj=l

1.4.4 Estimation of G(. | 2)

To completely specify the weights we are left with estimation of p(t,z) which is

given in terms of the censoring distribution G(- I z) . In our observational scheme follow-
up of the ith patient starting in transient state X (0) at time 1‘ ,0 =0 may lead to one of
three distinct scenarios. (1) Observation ends at the jth transition time 1,,- Sz'in the
absorbing state k. Then U >1,, and X (1,, )= k. (2) Observation continues beyond 1,,- but
ends at Twithout the occurrence of another transition or censoring. Then U ,- > 2', 1,, ,1 > 2'
andX, (1,,- )¢ k. (3) Observation continues beyond 1,, but ends at U, before for the next
transition time. Here 1,., < U , < 1,.,+1 A 2' and X ,- (1,., ) at k. The likelihood function involving
G(- I z) and its density g(- | 2) can be expressed as

1111100.)- Izrn‘v‘l‘j 101112.11“ ‘5“ ”U “’°**>°"”1g<U lz )1" "1 )" WW] (1211

i=1 120

where 6,,- = [X , (1,., ) = k] . If G is known except for an unknown q dimensional parameter

6then maximizing (1.21) yields an estimator 19 of 6 giving an estimatorG(- I 2,3) of

58

 

G (- | z) . Assuming all regularity conditions for maximum likelihood estimation hold, 6

n
is a solution to Z Z 11,-]. (6) = 0 where

 

 

 

i=1j20
(t,- V .
d'1(6): S'J U V6 GB jlz Z) +(1‘61'j)[Ui/\t1j+l>r>tij] QGQU'IZI)
Gg( t'jlzi) 60(lei)
V . .
+(1—61jlj)“ <U1<tij+lAT1 636(U1|z1).
86(Uilzi)

Estimation of G( Iz) changes the weights wil- to wil- =51} /G(t,-j I11 ,6) and

would modify the arguments leading to the asymptotic normality result (1.18). In the

sequel call 60 the true parameter and let (1,]. (6) denote the derivative of (1,7 (6) with

respect to 6. Using standard arguments we get

n“ 2 2d,]. (60 ) —> E(Zdﬁ (60 )) = 1(60 ) in probability. The q><q matrix J(60) is
i=1j20 jzo

assumed to be positive definite. Also 11‘”2 2 2d,]- (60) = 0,, (1). Then using an

i=1j20
expansion of 2241(6) at 60 we obtain
i=1j20
nl’2(6—60)=—J’l(60 xii-“22211,, (60))+0p(1) (1.22)

i=1 jZO

The same steps leading to (1.18) gives

n1/2(ﬂw -ﬂ)— (,1 “122—— 50 X; XE; )" 1014/222— .) (1.23)

'=IJG(tiji’6|z llJG(t1jl1’

where x: = leXi and v: = lev,. Since G(lei ) > Owith probability 1 and 6 ——> 60in

probability, by application of the uniform WLLN the first term in (1.23) converges in

59

 

I!
probability to AW. Its estimator is AW = 11'1 XX? wix; where wi is diagonal with
i=1

elements WU. For the second term in (1.23) we have the expansion

sijixjivj

"4’222600' - (""22

i=1 j i=1 j {G(I;jr::):16)}

 

 

2 V3001,- Izhéiinmwlao)

where é is between 19 and 60. Again, by standard arguments the bracketed expression

# vt

ijux
above converges in probability to D(60)= E(z 'j v”
I {G(Iijlz 21'160”2

2V4’900ij Izi ,60)) apx q
[ti]. S z']x,.'jv 'jV

matrix. Note that D(60)is the same as “260,- M 190)

 

V’gG(tij |z ,60)).Combining

these results together with (1.22) into (1.23) we obtain

* vi

S-ijx
Wm... -ﬂ)= A401 "WZZ{G(1|VU )D(60)J"(60)d,-j(190)})+0p(1)
=1 1 I}

 

= A;‘ (It-1,2:{ki —D(60)J" (190)d,. (60)1,. })+op (1)

i=1

n t t

1 S..x..v..
where kl. =2 U U U , d,-(60)the qx(ni +1) matrix of the du- (60) and lia
'= G(tij Izi'BO)

 

(n1. + l)><l vector with elements all 1. A direct calculation shows that
n“2(,6 —,B)—)N(0, A;‘B A;‘)
where ﬁw =E(k-k ;—) D(60)J"(60 )D(60) Compared 10(1 18) we notlce that E(kk ;)

is the matrix BW in (1.19) when G was assumed known. Thus estimation of G here gives

60

 

an asymptotic covariance that is no larger than if G was assumed known. This is in the

sense that A: (13w —Bw )A;l is negative deﬁnite.

1.5 Asymptotic distribution of the mean transition cost

From our model (1.20) for all transition costs we obtain estimates of chj (t | 20)
for a covariate proﬁle Z0 by specifying the appropriate vector X 0 of covariates
corresponding to a column positions in X. Suppose for example in the model for ya- the

row vector xfj of X contains the ﬁxed covariates xi , dummies for transitions types, terms

of modeling the transition times such as ti]- ,t5. and perhaps interactions between these

times and xi . Our special X 0 will contain the desired 20, interactions between Z0 , t and
t2 , indicator variables with value 1 for transition type h—) j, and value 0 for all other
transition types. Denoting this covariate proﬁle by X11100) then ch]- (I | Z0 ) = X,’,jo (t),B
and from ( 1.16) we obtain the estimator
6,”. (1|ZO)=X,;jo(z)Bw. (1.24)
Although the consistency of 5,1111 '20) might seem immediate from (1.16) the

final form of the computable ,3“, involves the estimated ﬂiand weights “’1" the latter
through the censoring distribution G.

N ow recall from (1.1), the expected net present value NPV,? (t I i ,ZO) =

61

 

 

(”(r)lXo =i.Zo) has the form PM.(sIZomMOJIZoMAm-(Slzo) - From

E(Chj

(1.24) and Section 1.3 our estimator is

Nﬁv,fj‘)(z|i,zo) -_- Ee"‘5hj(s|zo)éh(o,s-|zo)d/1,,j(s|zo) (1.25)

which also leads to the estimate of the first term for the expression for NPV in (1.3).

Remark 1.5.1

1. Our model for all transition costs y ,- uses the entire sample of patients to

estimate simultaneously all ch]- (- | ZO ), h ¢ j. This has the advantage of drawing strength

from other parts of the data set when patients do not have observed transitions of every

type. Its limitation lies in discerning a simple but adequate speciﬁcation in the model for
the times ti]. . Based on our previous experience we suggested using terms ti. ,1:- and their
interactions with other covariates in the rows of X, . Although our estimation strategy

does not depend on this speciﬁcation, as a practical matter a small number of terms
involving t~ is desirable and should be adequate in capturing the dynamics of time. Also
note that the ﬁrst term in the NPV formula is actually a sum over the transition times

because 21,1]- (- I ZO )can jump only at these times. Therefore, unless some smoothing of

[am-(1Z0) is done we will need estimates of the cm (- | Zo ) only at the transition times.

2. Since inadequate follow up in some patients might result in incomplete cost
observation, we used inverse probability of censoring weights (IPCW) to obtain

consistent estimates of the regression parameter ,6. Many investigators4‘ 3‘ have applied

62

- ' J

this technique in a standard survival model with a single cost determination for the whole

follow up period.

3. The ﬁxed covariate profile X th (I) is assumed to be continuous in t. The

asymptotic properties of the estimator ,3 in (1.18) imply that

sup |e‘"(a,,j(z|20)—c,,j(t|zo))|—”—>o.
re[0.r]

Also Chj (. | 20) is assumed to be bounded. We have uniform consistency of f3 , i.e.

sup | 13:12 (0,t | Z0 ) -— 8,, (0,t | Z0 ) | —P—>O. Based on these results and Theorems 1 and 2,

re[0,r]

the net present value estimator is uniformly consistent, that is

sup |NI3V,fj”(t|i,ZO)—NPV,fj”(t|i,Z0)|——P—>O.

re[0.z']

This result is proved in Polverejan (2001)“.

Conditional on the initial state 1', given the vector ZO of basic covariates, we will

assess the asymptotic normality of the net present value of all expenditures associated

with the h to j transitions in (0,t], i.e. the asymptotic distribution of
721/2 (NPV,? (t | i ,20 ) - NPV,;1)(t|i,ZO)) using the Functional Delta Method.
Consider the trivariate process 2"" (r) = (21”) (2)25") (r),z§"’ (t)), with

components

2:")(1)=‘/;8_n(5hj(’lzo)’chj("20)),
Zé"’(z)=~/Z<é.<s.rIla-13.0.:I20»,
z§"’(z)=./Z(A,,j (tlZO)—A,,j (r120).

63

 

We recall our convergence results from the previous sections.

From (1.17), (1.18) and (1.24) we have

21102)“) _ tho(,)e‘"A"(J_ZX (L; )"1w Ll.l v ) and the weak convergence result

zf”’(t)—l’—>e‘”chjo(t|zo) where chj0(I|Z0)= ng0(t)§, and g, is a multivariate

normal with mean zero and covariance matrix 1133“,)“: .

Using (1.7) and Theorem 3 we obtain the weakly convergences
A 0.8.
25"’(t)=JZ(a,(s.rIzo)—a,.(s,t|zo» =

k k
=ZZJPg (s u-IZo)d(S,(,lg,(u)+5,(,2;1(u))(3h(urt[Zo)—Pgh(“vtIzo»

——D—>Um, (s,t | zo)+U2,,, (s,t | 20).

where

U1i;r(5’lzo)= 221 P...<su— IZo)(fih(ut|Zo)- P.,.<urlzo»de;3<u>
g=ll¢g s

Uzi/z(S’IZo)= 22]: a.(s.u-Izo><a.<u.rIzo>—P,.<u.tIzo>>dUé131<u>-
g= -ll¢g

and U1”, (s,tlZo), Uzih(5v’|Zo) are independent; and also

due

23"”(2): J—(Ahj(t|20)— Ahj(t|Zo)= exp(ﬂ’z,,jo){sf,‘,1j(t,r)+s;2,;(t)}

—D—>exp(,B’ZhJ-o )(Ufhj (I) "Ll/$12; (t)) . 11¢ j

where
n k
53,310 r)=-i-Z( Z E—l—(zsm-eh,(u,ﬂ))dM,.,-.(u))’2(r.ﬂ)"‘b,,-(t).
n i=1 ,)=1 )5

64

 

 

thj, (u)

”h! (:t) -T:£—— (0)(u ﬂ)

and the (h, j)-th element of the limit process U: (t) of Sf,” (t,2') can be expressed as

:3th (t), where :0 is a p-dimensional normal random variable (its distribution does not

depend on r) with zero mean and covariance matrix 2(7, ,6)“1 ,
bhj (r) = £(Zhj0 —e,,j (u,,6))a(hj0 (u)du is a p-dimensional vector depending on I; also

Ufa/‘0) and Ughj (t) are independent.

Theorem 4

Under assumptions A1 - A11 and REl-RE3 for a fixed time t and h :15 j :

Ahj(t|Zo)-Ah,-(tlZo) U;j(tlZ0)
J; iih(0atlzo)"Pih(Oit|Zo) —D_’ UihUIZO)
Ehj(t|ZO)_chj(t|ZO) chj0(t|ZO)

where the elements of the covariance matrix 2 are

2(1,1) = exp(Zﬂ’ZhJ-0 ){wfy (I) +b;,j (t)2(2',,6)’1bhj (1)},

2(2, 2): ZZEP- §(s,u|ZO)(P,,,(u,t|zo)-Pg,,(u,t|zo))2-

g= —11¢g
eXp(ﬂZg,o)d(31(t)+b;,(u)2(7 ﬂ)"bg,(u))

2(3, 3)=e‘2"x,’,jo(t)A; ‘B ijhjoo),
2(1, 2) = exp(ﬂ'Zhj-O )bjy- (t)2(2', ,6)‘1 F”, (r, B) +

+exp(2/3’z,,jo)jo Ph(0, u|z0 )(Pjh(u IIZO)- P,,,,(u, t|zO ))dwh2j(u))

65

 

where

Eh(t):Z:=lZ;g £138(o,u|20)(P,,(u,t|zo)—Pg,,(u,t|20))exp(ﬂ’zg,o)dbg,(u)),

Z(l,3) =X(2,3) = 0.1:]

Proof:

The expressions above for the diagonal elements of 2 have been previously
proved in (1.8), Section 1.3 and (1.18). We prove that 2(1,3) = £(2,3) = 0 , which

veriﬁes that

2f")(t) and (Z§"’(t),Z§")(t)) are asymptotically independent.

Let us remember that

20,3) = Covrxgj0(t)A;,‘x;(L;)" w,L',i‘v,.F(z,r.M,-.z, )1

where
F(t,2',M,-,z,.)=
k
l I I—1 thji(u)
—— ~ --, M,- 2, bvt+———.
{h§1£ﬁ(zml(“) e,,,(uﬂ))d ,,(u)} (r ﬂ) m) Esgfnw)
hat}

The expression XXL; )’1 wiLI'v, is a function of all transition times
t,- = (In ,ti2 ,...tini )observed in [0, r] as well of (U,- ,z,- ,yi ,X, ). We will impose the
assumption E(vi IX,- ,z,~ ) = O . The function F(t,r,M,- ,zi) depends on

th = Z thi where
l

thi (I) = Nhji (I) - EYM (“)CXMﬂ’zhﬁ (“))dA/tjo (u)

66

 

depends on a subset of each of ti ,zi,X,« and U,- . Since w is a function of (U,- ,t,. ,z,)
and U,- is independent of (ti,y,-,X,-)given z,- we get E[v,- If] = E[v,. |Xi,z,.] where
f is the conditioning set (Ui ,ti ,X, ,z, ). Then
E(vaiMhﬂ (t) | f) = x; (L; )’1 w,L}‘E(v,. |x,. ,2, )M,,ﬂ (r)

and so under the assumption E(vi | X, ,zi ) = O we get 20,3) = 0. Similar considerations
lead to Z(2,3) = O.

The covariance of exp(,B’Z,,J-o )(U:,,j (t) +Ughj (t)) and
U1,,,(O,1|ZO)+U2,~,, (0,t | 20) can be calculated using results previously stated in Section
1.3.
Z(1,2)= E(exp(ﬂ'Zhjo)(Ufhj (t)+Ughj (t)) (Ulih (0,! | ZO)+U1,.,, (0,t | Z0 )))=

k k
=E(exp(ﬂ’zhjo)Uf,,j(t)ZZ L138(o,u|zo)(P,,,(u,t|ZO)—Pg,,(u,t|zo))-
g=lltg

-exp(,6’Zg,0 )de2 (u)) +

k k t
+E(exp(ﬂ’Zhj-0)U;hj(t)zz [012g (0,14|Zo)(P,h(u,tIZO)—Pgh(u,t|Z0)).
g=ll¢g

-exp(,6’Zg,0 )dugfg’, (u)) = I + 11.

Using U ; hj (t) = gab hj (t) , the first term of the sum can be calculated as

k k
I =E(exp(ﬂlzth)Ufhj(t)ZZ £Eg<01uIZOXBhU‘JIZO)-Pgh(u’tlzo))'

g=ll¢g

-exp(ﬂ’Zg,O )de; (u))=

67

 

 

k k
=E(CXp(ﬂ’Zhjo)§6bhj “)2: Jgpig (01“ I 20 X8}; (L1,! I ZO)—Pgh (“9’ I 20 ))
g=ll¢g

-exp(,6”Zg,0 )déabg, (t).
Therefore I: exp(,B'Z,U-O)E(§E,bhj (0&6th (t,,6)) where

k k
F”, (t,ﬂ)=ZZ I331: (o,u|zo)(P,,,(u,t|ZO)-Pg,, (u,I|ZO))exp(ﬂ'Zg,O)dbg, (1.)).

g=ll¢g
Hence,
=CXP(.6'Zhj-o )E(b;.j (050%)th (I, ,6)) = eprg’Zhjo )sz (1)2(79 ,6)—1 Fr}, (Ioﬂ) -
The second term is:
II=E(exp(ﬂ'Zhj0 )U;hj (2)2;12; £128 (O,u | zo)(P,,, (u,t | zo)- P8,, (u,t | 20 ))-

-exp(,6’zg,o)dU§2, (u))=
I t '1' t k k
=E(exp(,8 2m)[OdU2,j(u)jO(Zg=lZ#ng (O,u|Z0)(P,h(u,t|Z0)-Pgh(u,t|ZO)).
~exp(ﬂ'Zg,O )dug‘; (u )) =
Using (Ughj,U;m,> =0 for (h,j) ¢ (m,r),(h,j),(m,r)e E‘ (so the processes

{Ugh}. (.),(h,j)e E‘} are independent) and

.. by t ah'O (u)
U - t = (02- ’ = j
< 2h} >( ) hj( )notatiml J0 51(5)) (uyﬂo) u

we calculate
I * t
11= E(CXpw Zhjo)ﬁdUZhj(u)I0P.-h(0.uIZO)(P,-,,(u.t|zo)-P,,,,(u,t|zo)).

~exp(ﬂ'ZhJ-O )can/$3]. (u)) =

68

 

 

=E(CXP(2.5’Zhjo)J;Bh (0M I ZOXPjh (14,! I ZO)_Phh (“it I Z0 ))d <U§Z~ > (W)

:exp(2,B'ZhJ-0)J:Hh (0,u|ZO)(Pj,, (u,t|zo)-g,,,(u,t|zo))dw§j(u)).
Putting I and II together,

so, 2) = exp( 13’2th )bju. (t)2(r, ,6)“ F”, (t, ,6) +

+exp(2ﬂ’zhj0)£Pih(01u_IZO)(Pjh(u1IIZO)_ Phh (“it I 20 ))dng ("))-

Fix the time t and consider the functional
(0, IE‘ —> JR. w, (21 .zz.zg)= £z1(8)zz(s)dz3(s).
where E‘ is a subset of D[O,'z']3 such that (p, is well deﬁned. Notice we can write
NPV,;"(t | 2320) as
(0, (21,22,33 ) = (0, («84% (- I 20 H71. (0.-I 20%/1..,- (-| 20)).

If g), has an extension to D[O,r]3 that is Hadamard differentiable in (z, ,z2.z3) then,

under some extra-conditions, we can apply the Functional Delta Method.

Lemma 2

Let E'. ={(x,y,z)€ D[O,2']3 : EIdZIS C}, where O<C<oo. Foraﬁxed time

t6 (O,r] we deﬁne (0, :E’ —)R by (0, (21,22,23)= Ezl(s)z2(s)dz3(s).

69

Let (z, , 22 .23)be a ﬁxed point of E‘ such that El d(zlz2 ) I < 00. Then (a, can be
extended to the space D[0, 213 so as to be Hadamard differentiable at (z, .22 . z3 ) , with
derivative

I I I
61(0) (2| 9 22 9 Z3 )(Z] ’22 ’23 ) : I0 2122d23 'I‘ 10212de3 + LZIZ2dZ3 9 (126)

where the integral with respect to Z3 is deﬁned by the integration by parts formula if Z3

is not of ﬁnite variation.

For technical reasons the following extra-assumption is considered: ch]. (. | Z0) is

of ﬁnite variation over [0.2"]. We write Eldchj (s | 20 )l < oo . Lemma 2 is proved in

Polverejan (2001)48and more details on regularity conditions can be found in Gill

(1989)58 and Praestgaard (199l)26.

Theorem 5

Under the assumptions Al - All and RE] - RE3 for a ﬁxed time t:

n“2 (NPV,;"(t|i,zo)-NPv,fj"(t|i,zo)) =
= "mm ((3-.ch “20ml! (o..|zo>.i1.,- (420))-(0, (e'r-ch) (,Izo),P,.,, (O,.|Z0),Ahj (.|20))1
LCM): (e-rthj (-IZo >13. (0,. I Z0 )iAhj ('IZO))'(e-r.chj0 (Izo ).U,-,, (0.. I Zo)'U;j ('I 20)) =
= Ee"schj0(s|Z0)Bh(0,s|Z0)dAhj(s|ZO)+ £e‘”c,,j(s|ZoX/ih(0,s|Zo)dAh,-(S|Zo)+

+ Ee-rschj(slzo)ﬂh(0,s|Z0)dU;j(s|ZO)= p(,)

70

where 0;]. (t) = exp(,6'ZhJ-o )(Ufhj (t)+U;,,, (t)). U,,, (I) = Um, (int/2,, (t) and

ch10 (t I Z0) = X,',j0(t)§l .13

Proof:

The mapping go, :E‘ —> P is defined by (a, (zl,zz,z3 ) = £z1(s)zz(s)dz3(s),
where E‘ ={(z,,z2.z3)e Dior]3 : Eldz3 I5 C} and C=Ahj(2'IZO)+l<oo. Let
z1(s) = e"‘c,,j (s I Z0 ), 22 (s) = PM (0,5 I 20 ), 2:3 (s) = Ah]. (5 I ZO ). All 21,22 .23 E D[0,z'].
Therefore (zl ,zz ,Z3 )6 E‘. We have that EId(zlzz )I < 0°.

The previous Lemma implies that (p, can be extended to D[O,‘z']3 so as to be
Hadamard differentiable at (zl , 2213), with derivative

61¢: (31 $2,233 )°(Zl r22 ’23): Igzizzdé +£2122dz3 + Ezlzzdzg, .
where the integral with respect to Z3 is deﬁned by the integration by parts formula if Z3
is not of ﬁnite variation. Denote by (of. the extension of ¢, to D[O,2']3.

Define 2, (s) = 6"”5121 (s | 20 ), 22 (s) = P”, (0,s | 20 ), 23 (s) = Ah}. (5 | 20 ), sE [0,1].
We have (21,22 ,23 )E D[0,z']3 for every n and

P(EdIz3I<C)=P(EdIAhjo(s,/§)|< C)=P(A,,j(r|zo)<C)—+1 as n—)00

because A,” (1|Z0)—P->A,lj (1| ZO)< C. Thus (21,22,23)€ E‘ with probability tending

to one.

We have

71

"“2 (21(‘)-Zi('))—D_"Zi (') =e-r.chj0('IZO)’

Ill/2 (22 0 ’ Z2 (-))—D—>22 (') = Uih (0" I 20 ).

n“2 (23(-)-z3(-))—2—>z3(-) :11}, ((20).

The processes Z1 ,Z2 ,Z3 are Gaussian and hence have versions that are almost

surely continuous. Let C[O,r] the set of continuous functions on [0,1]. The subset
C[0,1']3 C D[O,2']3 is separable, so (Z1 ,Z2 ,Z3) has separable support.

Because P(o,.|zo)=n(1+dA(.|zo)) and P(o,.|zo)=['[(1+dA(.|zo)),
(0..) (0.-1

the matrices P(O,. | Z0 ), P(O,. I Z0) are functionals of A(. | Z0 ), A(. I Z0 ) , respectively.

Also by Theorem 4,
Ahj(’IZo)‘Ahj(’IZo) U;j(tIZO)
‘5 BMOJIZOI‘BMOJIZO) ‘2") Uih(‘IZo)
e-n(5hj(tIZO)-chj(tIZO)) e'"e,,jo(t|zo)

Therefore by Functional Delta Method,
"1,2(¢t(21v22 ’23)-¢t (11 ’32 ’33))‘—D"d¢rE. (21’22’23)'(zl’22 ’23 ) = P(’) deﬁned in

the theorem statement.

Theorem 6 (Variance of P(t))

Under the assumptions in Theorem 5, the variance of

P(r)= Ee-"chj0(sIZ0)P,-h(0,s|Z0)dA,,j-(SIZ0)+ Ile—rschjbIZOX/ih(O,SIZO)dAhj-(SIZO)+

I "ifs a .
+Ioe Cry-(s!Zo)R.(0.sIZo)dU,,j(s|zo)is

72

Var(P(t)) = 7],,j(t)A;leA;.17]’,,j (t)+T2hj(t)’Z(T,ﬂ)_lT2hj-(t)+

2 “mom

i
st?) (ﬂu)

+exp(2,B’Zhj-O)(2EBI +21582 + LIV-"chi (t I Zo)P,i, (0,t | 20)}

' —rs I
where Two): Ioe tho(s)8h(0,sIZ0)dAhj(s|ZO),
TZhj (I) = exp(ﬂ'Zhj0) Le-rschj (5 I 20 )IP,;. (0,5 I 20 )dbhj (S) +Fih (S)dAth(S I 20 )I i

I T —
EB] = “P(ﬂ ZthLe "Chj (’ I 20 )dAI1j0(t)’

winds)

' -rs
-(Ioe chj (SIZO)R§(0,SIZO)IPjh (S’IIZO)-Phh (SJIZON (0)
sh,- (£3.14)

)

and EB: = Ee-nchjUIZo )dAth(t)'(Le—rschj(SIZO)Hih (Sit)dAhj0(S))-

Proof:

The form of the variance can further be elaborated with the following notation.

Let

F”, (r) = ZZexp(ﬂ'Zglo)£Pig (o,u|zo )(Hh(u,I|ZO)-Pgh (u,t | Zo))dbg, (u)

g=ll¢g

V”, (t) = ZZexp(ﬂ’Zglo)I;Bg (0,u|Zo)(P,h (u,t | Z0)—Pgh(u,t | 20))dugg, (u).
g=lI¢g

While Fm (t) is a non-random p-dimensional vector, V”, (t) is an example of an Ito

process. Here U; (.) = (Ughj (.),h, jE {l,...,k}) is a kxk matrix-valued process, where

Ugh}, = —2 Ugh]- and {Ugh}- (.),(h,j)€ {1....,k}} is a continuous Gaussian vector
jack

73

martingale, properties of this process are described in Section 1.3. It follows that V”, is
Gaussian and by Fubini’s theorem E (Vih ) = 0.

Let us consider the three terms of P(t):

P<t>= ﬁe"‘c..o<slZO>P.(0.sIzo>dA..-<slzo>+

+ Eta-"cm. (s I 20 )Uih (0,5 I 20 )dAhj (5 I Z0 ) +
+ Iée'P‘chj (5 | 203,, (0,5 | zo)dU,','j (5 | 20) = [+11 + 111.
Since the ﬁrst term Era-”ch10 (5 | Z0 )1}, (0,5 I Z0 )dAhj (5 I 20) is uncorrelated with the

other two we calculate its variance separately. Indeed

Var(!) =var(I(:e"-‘c,,j0(s|z0 )Ph (o,s|zo)d.c1,,j(s|Z0 )) =
=Vnr(.g1 Iée"~‘x,;j0(s)Ph(o,s|zo)dA,,j(s|zo))

For easiness of notation let Tlhj (t) = I: (”XII-0 (5)13,l (0,5 I ZO )dAhj (s | ZO ).

Then Var(I) = Var(élTW (t)) = T1,,j(t)Var(§l)Tl’hj-(t)= Tlhj (t)A;1BWA;lTl’,,j (t)
The second and third term are correlated and we look at them together
Var(Il+III) =Var( ﬁre—"chmIzox/ih(o,s|zo)dA,,j(s|zo)+

+ Ee—"‘chj (5 | 20 )P,,, (0,5 | 20 )dug} (s | 20 )).
Some algebraic calculations lead to

' —rs
Var(II + [11) = Var( Ioe Chj (s I ZO)(U”,I (0,5 I Z0)+U2,~h (0,5 I 20 ))dAhj (5 I Zo)+

' -r5 , ..
“' Le Chj (S I 20 )3), (0.5 I Zo)exp(ﬂ Zhj0)dUhj (s | 20)) =

74

=Var(exp(ﬂ'Zth ) Ila-”ch, (5 | z() gar... (S)dA,U-o (s I 20 ) +
+exp(ﬂ’Zth)Ee-rschj(sIZO)Vih(S)dAhj0(SIZO)+

+exp(,6'Zhj0 ) Ere-”City (5 I 20 )Pih (0,5 I 20 )dUghj (S I 20 ) +

+CXP(16’Zhjo).I-(:e_rschj (SI 20),)»: (0,5 I 20):“th (5)) =

= Var(exp(ﬂ’ZhjO )gg I: emchj- (5 I 20 )I 3;, (0. S I 20 )dbhj (S) “I“ Fih (SIdAhjo (S I 20 )I +

+CXP(,B’Zhjo) Le—rschj (5 I 20 ){ Pih (0,5 I Zo )dUghj (5 I 20 ) +Vih (3)5“th (3 I 20 II-

The last two terms are independent, and therefore

Var(II + III):

Var(exp(ﬂ'Z,U-0)§6 ﬁe'rschjUIZO){P,-h(0,5IZo)dbhj(5)+Eh(5)dAhjo(5IZ0)}+
+Var(exp(,6"Z,,j-0 ) £6""chj (5 I 20 ){ 8,, (O, 5 I Z0 )dUghj (5 I Z0 ) +Vih (SM/1MO (5 I Z0 )}).
The ﬁrst variance does not pose a challenge:

Var(exp(/5”z,,jo)§;, Igemchj (s|20){P,.,, (0,5IZo)db,,j(5)+F,-h(5)dAhjo(5 | 20)} =

= Varese.) (t)) = T...) (t)’z(r,ﬂ)" T2,.) (t).

where T2,,j (t) = exp(,6"Z,U-o) Eeqschj (5 I 20 ){Pih (0,5 I Z0 )dbhj (5) +Fl-h (5)dA,y~o (5 I Z0)}.
The second term

J(t) =exp(ﬂ'Zh]-0) Ige"~‘c,,j (5 | zone, (0,5 l zo)dU;,,j (5 | zo)+v,.,, (5)dAhjo(5 | zo)})(1.27)

75

is an Ito process (here the dependence on i, h, j and Z0 has been suppressed) and it

follows that J (I) is Gaussian. The ﬁrst term in (1.27) is the integral of a deterministic
function with respect to a Brownian motion, and hence the mean is zero. The same result
applies to the second term, as seen by Fubini’s theorem.

Also since C h}- (.) are bounded, non-negative processes over?’ and 8,, (0,5 I 20) are
probabilities (<1),P{exp(ﬂ'Zhj0) I:( Ige'” Ic,,j(5IZ0)1'3h(0,5IZ0)ds)2 <oo} = 1.
For the second term we need to check that

PICXPCB’Z/y'o I [II Jge-rschj (S I 20 IV”: (SIdAth (S I 20 I I< °°I =1-

This follows directly under the assumptions A7: Ath (r) = I: arm-0 (t)dt < co and

A10: 5;") (.,.) are uniformly continuous bounded functions of (t, ﬂ)e [0,1])(6’ and

boundedness of CM (.) . If we apply Ito’s formula for the function f (t, J (t)) = J 2 (t) then

J 2 (t) has also a stochastic differential form

r . dA- ( )
J2(t)=2L'J(s)dJ(5)+ I0(e"*c,,j(s|zo)P;,,(o,s|zo)}27#9i.
5h]. (,8,5)

Split I:J(5)dJ(5)=
= £J(5){e-”chj (5IZ0)B,, (0,5IZ0)}dU;,,j (5)+ £J(s){e"‘chj(sIZo)V,-,,(S)}£L‘1,,jo(5),

and note that the expectation of the first term is zero. Hence to evaluate E (J 2 (7)) we
need examine the second term in the expression above. Inserting J (5)into the second

term gives

76

EJ(I){e-"chj (r | 20 )v,,, (t)}dAhj0 (,)=
EI£e”‘chj(5|Zo)P,-h(0,s|Z0)dU;,,j(s))-e'"chj(tIZO)V,-h(t)dA,,jo(t)+

KI IgemchﬂleoWn (s)dA,,,-o(s)I-e‘"c,,j (:|zo)v,.,, (t)dA,,j0(t)=Bl +32.

We have for s <t,

Hrh(S.t)=E{V;, (S)Vh(t)}=

=ZZeXp<2ﬂ’P.-Zzgn>£ (O,u|Z0)(I=;h(u,t|Z0)-Pgh(u,tlZo))°

g= -ll:tg

Myo(u)

'(Bh (“’SIZOI—Pgh (u,SIZo))m

and therefore we get
—rr ‘ —r.i‘
1532 = Ee ch}- (I | zo)dA,,jO(t).(Ioe ch]. (5| 20m”, (s,t)dAhj-0(5)).
Finally evaluate EBl by noting that the processes U '2' M , U ; g, are independent for all (g, l)
at (h, j). We get a single term

E3] = exp(ﬂ’zhj0)Ee-nChj (I I zoIdAthUI'

r -.. (3)
(Ice chj(5IZ0)B,3(0,5IZ0){PJ-h(5,tIZO)-Phh(5,II20)} 534—; )I
u
Therefore
Var(II + III) = TZhj (t)'£(2’,,6)'1T2hj (t)+
. -n ‘ (I)
+exp(2,BZhjo) 2EBI+2EBZ+I:{e chj(tIZO)P,-,,(O,IIZO)}2 3374—; u)

The proof of the theorem is now complete.

77

CHAPTER 2

ESTIMATING MEAN COST FOR

THE TWO-STATE CASE

With the rapid escalation of costs of medical treatment, interest in accurately
determining the cost of medical care has increased. Estimates from cost studies are
needed to determine the economic burden of disease, to predict the economic
consequences of new medical interventions, and for comparative purposes such as cost-
effectiveness analyses.

Health care studies are typically designed with a period of recruitment in which
patients enter the study, and an additional period of follow up in which health outcomes
are recorded. At study termination however, some patients would have not reached their
end-point of interest which leads to right censoring of their time-to-event response. The
medical cost associated with the follow up period in these patients is also right censored
in the sense that the total cost at the time of censoring is less than the cost that would
have accrued if their follow up continued until their end—point was reached. Due to this
analogy with censoring and event times, it is tempting to use standard techniques from
survival analysis for the analysis of medical costs. The survival analysis approach to costs

seems also appealing because of its simplicity, its nonparametric nature and its apparent

78

robustness in the presence of censoring. However, although the approach is apparently
free from distributional assumptions, it is not entirely assumption-free. Some of the
assumptions that underlie such analyses, for example, the independence of event and
censoring times, are not tenable with censored costs4‘ 30‘ 59. Generally, cost at the time of
censoring and cost at the event time are correlated. A simple sample average of the
observed costs in the patient sample would underestimate the true expected medical cost
for the treatment under study, and using the average in the sub-sample of patients with
complete costs would be inefﬁcient.

The current methods for estimation of the population mean cost are both
nonparametric and semi-parametric. Key references are Lin et al (1997, 2000, 2003)4'5'30,

Bang & Tsiatis (2000)“, Strawderman (2000)”,Wi11an et al (2002, 2003); 3, Huang &

Louis (1998)“), Arijas & Haara (1984)56 and Wooldridge (2003)”.

2.1 Introduction and Background

Suppose costs can potentially be accrued over a fixed time period [0, 2'] with

expenditure terminating at some event time T so that complete cost observation occurs if
a patient is followed through time T' = min(T,2'). Suppose y(t) is a right-continuous

process that represents the cumulative cost up to time 1 (including time t) for a typical
patient in the population under study. If lifetime cost is of interest then T denotes survival

time. Since costs do not accumulate after T, y(t) = y(T) for all t 2 T. The cumulative

cost y(z') at time 2' is the principle random variable of interest, so inference will focus

79

on the mean cost, ,u = E ( y(T)) = E ( y(T' )). With lifetime medical cost, the cumulative

cost is y(T) and estimating the average E ( y(T)) is of interest.

Censoring at time U might preclude observation of T‘. Let X = min(T,U ) ,
6: [T S U], X. = min(T*,U) and (5‘ = [T‘ S U], where [A] denotes the indicator
function of the displayed event A. The observed cost y = y(T') is uncensored if, and
only if 5' = 1, and the event occurring at Tis observed if, and only if 5 =1. We assume
{(7}. ,U, , yi = yi (T: )); i =1,...,n} are independent identically distributed copies of
(T’ ,U , y), with the observed data given by

(X; = n1in(7}*,U,-),§: = {T} s U,],y, ,i =1,...,n}.
We assume U is independent of (y(- ), T). Then PM" =1| y,T] = G(T" ) , where

G(t) = P[U 2 t] . For i =1,...,n the variables corresponding to the ith subject are indexed

by the subscript i. Hereafter, unless otherwise speciﬁed, the distributions of the event

time T and censoring time U are considered continuous.
In estimating E ( y(T‘ )) the available data could be minimal, in the sense that only
{X f ,6: , y,- , i =1,...,n} is recorded. However when patients are followed overtime, we

may have costs observed in multiple intervals, that is the cost history is available.

Lin et al (1997)30 proposed two different estimators of y = E ( y(T)) , for these

two cases. Both estimators are proved to be consistent and asymptotically normal under

certain conditions. Suppose that the interval [0,2’) is divided into G intervals

80

[0,7) = U[a,,_1 ,a,) with 0 = a0 < a1 < < (16 = 1'. If we use intervals closed at the left,
3

so that the gth interval is [a,,_l ,ag ), to be consistent with Lin et al (1997, 2000) and

Willan et a1 (2002, 2003)“ 3" the survivor function is defined as S(t) = P(T 2 t).

Keeping with the more classical definition, S (t) = P(T > t) , we would need to partition

(0,2'] as U(a,,__l ,ag ]. Become S continuous, we need not be concerned with this
s

distinction. In both methods, either with minimal cost data or using cost histories, the

Kaplan-Meier estimator of S is estimated based on {(X, , 6,- ), i =1,..., n}.
Lin’s ﬁrst estimator refers to the case where we observe the cumulative cost at

time X ,- and at each of the observation points a,,_1 S X ,. Let Ay,g = y,(ag )— y,- (a,_,) be

the incremental cost for the ith patient in interval [a a, ). Since the ith patient may be

3-1’

censored during the gth interval we deﬁne

~ y.(ag)—y,(a,_,) if X,>a,
.Vi " ,
g yi(XiI-yi(ag_1) If ag_,.<_X,-<a,,

as the cost incurred over the interval [a,,_l ,ag ). Observe that Ay,g ¢ Ay,,, only if the ith

patient is censored during the interval [a,,_1 ,ag ).

Lin’s first estimator of ﬂ = E(y(z')) is

G
a“ = ZS(a,,_1)Eg_, (2.1)
g=l

n
ZIXi 2 ag—l IAyt'g
where Eg-l = ‘=1 , g =1,...,G , and S (t) is the Kaplan-Meier estimator

Z[X, 2ag_,]
i=1

 

81

r

of S (t). Equation (2.1) arises from the identity
G G
u = mm) = E(Z(y(a, )— y(ag_, )) = 2500:, )— y(a,-. ) I T 2 agn )S(a,-i)
g=1 g=1

and assumes no cost at time 0. If there is cost at time 0 then it should be added to the
expression of (2.1).

Lin’s second estimator is appropriate when we do not have cost histories, so only

the accumulated cost at time X ,- is observed. If we suppose T S 2' as, this estimator is

based on the identity

G
a = E(y(T" )) = E(y(T)[T s 1]) = ZEMT) | 0,4 S T < 0g )(5(a,_, I'5("g I),
g=l

consistent with the deﬁnition of S as S (t) = P(T 21) .

This suggests:

‘,-1(§(a,_.)—s‘(a, » (2.2)

M0

#L2 =

II
—

8

Z[a,_, _<_ X, <a, ,6, =1]y,(r,)
-l

where Ar, = i” averages the observed ﬁnal costs for all

 

Z[ag_, s X, < (18,5,- =1]
'=l

patients observed to die in the gth interval. The observed costs of patients who are
censored before 2' are not involved in any calculations so they need not be recorded
when using this estimator. For G=1, i.e. 0 = (10 < a, = z' , and T S 2' as, (2.2) becomes
n
Zﬁi yi (7} I

[1,2 = (1 _ §(z-))_f_l_____

ia-
i=1

82

Bang and Tsiatis (2000)31 proposed a basic unbiased. weighted estimator for the
mean cost that they call the weighted complete-case estimator. It is applicable only when

for each patient 1' we observe the cumulative cost y,- (T, ). They assume all times 7} to be

bounded by 2' , so X,- = X,‘ and 6,- = 6: for all i =1,...,n . The estimated mean cost is

ﬁar = Z w,y, (7}) ' (2.3)
i=1

with w, = 6, / G(T, ) , where G(-) is the survival distribution function for censoring time
G(t) = P(U 2 t) . The heuristic argument leading to this estimator is as follows. A patient
who is observed to die at time X , = 7} has a probability G(7}) of not being censored.
Hence we can think of this patient as representing, on average, G(T} )'1 individuals who
might have been censored. They propose to estimate G(T} ) using G(Y} ) , where G

denotes the Kaplan-Meier estimator of the G based on the data {(X, , 6: ) : i =1,...,n} ,

_ n
where6, =1-6, . Formally, if Nc(u)=Z[X, Su,6,- =0] then
i=1

 

c‘;(r)=1'<1(1-A;‘,I (‘3‘) > (2.4)

for any tS X0”: max{ X,- :i=1,...,n} . In (2.3) w, is then 6,- /G(7}).
They show that [13, is unbiased, consistent and asymptotically normal and the variance

can be estimated by:

 

6.0.- waif, dNC<u>
V ,u V2
"-5-: 6m 3157—,“ (y u)- (yu)}

83

1 " 6,-y,[T-Zu]

I

n§(u) i=1 G(T-i

I

where V(y,u) =

Lin (2000)4 considers a regression setup )2,- = z, ,6 + 8,- , where z, is a 1x p vector

of covariates. He estimates ,8 using a weighted least-squares method and proposes two

different estimators, for the cases of minimal cost data and interval cost data. When no

I: II
covariates are present, his estimator reduces to (Z w,- y, )/(2 w, ) , where the weights
i=1 i=1

are w, = 6," / G(Y?) for all i =1,...,n . For minimal cost data these weights are estimated
by vi», = 6: / G(T,m ), where G is an appropriate estimator of G and the mean estimated

cost is

, " 6Ty.(T.“) " 6’
ﬂ =( J—A‘I‘TI—IK '7—'.—) (2.5)
"3 2i G(T.) EGO.)

For the interval cost data consider a model
yig = ziﬂg ‘I' gig (2-6)
for each of the G intervals, where y,,, denotes the cost incurred over the time interval
[a,,_l ,ag ), ,6, , g =1,...,G are pxl vectors of unknown regression parameters, and the
error terms 8,, are assumed to be independent among different subjects but allowed to be

correlated within the same subject. The initial cost is supposed to be null. Two different
censoring types might arise:

A. Time censoring: 6; =[7T S U, ] = 0 if T," > U,, and

B. Cost censoring: 6,; =IT}; SU,]=0 if T,; >U,.

84

t
where 7”,, =7} Aug“.

By summing both sides of (2.6) over g we obtain y, = 2,6 + u, , where

l

G . G G
y, =Zyig ’ ﬂ=zﬂg ’ "i :zuig‘
gzl 8:] =

Lin’s estimator is then

(2.7)

 

       

A G n igyig)
#L4 = 5211920

i=1 G(l; ;)I
We compare (2.3) and (2.5). If T < r as. then X,- = X: and 6,- = 6: . Assuming no

ties in the data set and G derived as in (2.4), we have

:w, =li6,/G(X,)=—Z6, /G(T,)=1-S(T,,,)) (2.8)

i=1 i=1 "i=1

alt-r

where T,,,, is the largest uncensored observation. To justify (2.8) we

define5(t)=]'[(1- y(())), whereN(u)= 2m, <14, 6,. =1], Y(t)= Z[X, 2:] from
MS! I=I i=1

 

where we deduce/330) = _§(,-) Ag?)

 

. Also if no ties among failure and censoring

 

times, S(t-)G(t—)=n"Y(t). Ifnot ties among failure times . = A9,“) .
nG(t-—) nG(t-)

= S(t—) AN“) 2 -AS(I) , and the equality (2.8) follows. If, in addition,

nG(t—) Y(t)

 

 

therefore

5(T,,, ) = 0 then (2.3) and (2.5) coincide.

85

2.2 Applying general transition model methods

to the 2-state model

We will prove that our transition model described in Chapter 1 also captures costs
under the simpler two state survival model with a single transition time and sojourn. In
this case, as described in Section 2.1, several investigators have developed techniques for
regression analysis of medical costs with the focus being on estimation in the presence of
time censoring that might result in incomplete costs data on some patients. Our transition
model can be therefore viewed as an extension of this methodology to multiple transition
times and sojoums. We still use the same inverse-probability of censoring-weighted
(IPCW) technique to derive consistent and asymptotically normal estimators of
regression parameters and for the net present values.

Let us specialize our multiple transition model methods from Chapter 1 to a two
state model with patients starting in state ‘0’ (alive) and followed until they reach a
terminal state ‘1’ (death) at time T. The total cost for a patient i can be interpreted as a
sojourn cost that ends at time T or 2' whichever occurs ﬁrst or as a transition cost at time

T if the patient dies in the interval of time [0,2']. While in the ﬁrst case we will be able to

estimate E [ y(T‘ )] , in the second case we estimate E [ y(T)[T S 7]). Throughout this
chapter we assume that both S and G, the survival distributions for time and censoring

time are continuous, unless otherwise speciﬁed. Suppose z,- is a baseline covariate vector,

containing time-constant factors such as age at entry, gender, baseline comorbidity. The

covariate vector 2,- will be used for the estimation of weights w,- = 5,- / G(T, — I z,- ). We

86

also introduce the vector x,- which typically contains time-constant factors, including 2,,

variables for modeling time such as - .2 and interactions between time and time-

I’ I
constant variables. Let Z0 be a ﬁxed proﬁle covariate. Since we only have two states, we
will denote the net present value NPV,,(I) (t I Z0) as NPV“) (t I Z0) and

NPV“) (Z0) = NPV,,(,1)(rI Z0 ). Throughout the rest of this chapter we assumeithe

discount rate r = 0.

2.2.1 Mean cost in the minimal cost data case

NPV for transition costs

Since the only transition is O —> 1 , the NPV for transition costs at time t, is
NPV“’(t|ZO)= Igcm(5lZ0)POO(O,5IZ0)a0,(5IZO)d5, (2.9)
where col (5 I Z0) = E(Cm (5) I X(5—) = O,Z0) and
P00(OJIZoI'-'P(7i >IIZoI=SUI20I~
The quantity C 0, (t) was described in Chapter 1, and represents the amount

incurred just after time t if the patient dies at time t. Note that we use now

50 I = P (T > t) consistent with our notations in Chapter 1. Since —dS (t I Z0 ) =

= S(t— I Z0 )(201 (t I ZO ) , (2.9) reduces to

NPV“)(ZO)=—Ec0,(s|Z0)dS(t|Z0) (2.10)

87

In the two-state model a cost y,- = y(T,) is incurred in the transition from the
initial state “0” at time 1 =0 to state “1” at time 7}, provided that 7} S r. The probability
of still being in the initial state just prior to time t is P[T 2 t] = S (t-) , and the
(conditional) probability of transition to state “1” in (1,1 +dt) is dA(t) = a(t)dt , where
a(t)is the hazard function. This differs slightly from the set up described in section 2.1,
where costs are incurred through time T A 2' (and not at time T). Naturally, while in Lin’s
set—up (Section 2.1) the censoring indicator for cost is 6: = [7? S U ,- ] , in our set-up y,- is
observed provided 5, =1 wheres, = [7} SU, A 2'], that is, if the transition time 7} occurs
by time 7 and is not censored by time U,. Assuming U, is independent of ( y,, 7})
conditional on 2,- we get
P(Si =1I yriTiaZ.)=P(Tr sUi ATI Yi’TI’zi)=[7i ST]G(T}—Iz,).

In a regression set-up, let y, be the transition costs in the ith patient at the
transition time 7}. Consider now a regression model

y, =x,,B+£, (2.11)
where x,- is a 1x p vector of covariates as described at the beginning of the section. Our

model (2.11) differs from Lin’s model (2000)4 by the fact that we include in (2.11) time-

varying covariates. This is critical as we shall see later. We assume:
A1: E(e,|T,,z,)=0,l§(e,2|'r,,z,)=o,i2 (2.12)
A2: rank E(xgx, ) = p (2.13)
Since E(s, Iz,,y,-,x,-)=P[U, 27},T, SrIz,,y,,x,-]=P[U, 2T,- Iz,]

providedT} S 2', we deﬁne weights w,- = 5,- / G(T, — I z,) where G is the survival

88

distribution of the censoring time U, . We assume that G(z' I 2,) > 0 with probability 1, to
ensure w, > 0 whenever 5,=1. Hence E(w, |z,,y,,x,-)=[1} S 2']. Now

( -, w. , x.) = Via—214% . - X- )2 , and minimizing the objective function
q yI I I 6‘ I yl I

II
qu(y,- , w, , x,) with respect to I3 yields the estimator
n i=1

,6, = (Zw,x,'x, )'1 (Zw,x,'-y,) (2.14)
i=1 i=1

We estimate 001 (t | Z0) by x0 (t) 6W, where 110 (t) denotes the covariate proﬁle at

time t in this model containing t,t2 and Z0 , and (2.10) by

NPV‘”(Z.>=6:. [atom—$012.».

Remark 2.2.1.1

1. If covariates depending on time are not included in the regression model (2.11),
(2.14) yields the same estimator as Lin (2000)4, except for a slight difference in weights
as noted above.

2. If no covariates are present, the regression model (2.11) has only one intercept

I!
(and p=l ). If all costs are observed before r , this intercept is estimated by 57 = n"1 2 y, .

1:]

From (2.14) we get ,6”, = (2w, )’I (Z w,y,) and so

i=1 i=1

6.,—f’—>(1—S(r»"‘6(y.iT.- 5 til.

89

NPV“) ——’;—) E(y, [1} S 2]) . Hence if T S 2' as. the natural estimator of NPV”) is
simply y. In practice the costs of some subjects would be censored. Then it is natural to

estimate (2.10) by
ﬁwr =-I:5m(t)d§(t) (2.15)

where S (t) is the Kaplan-Meier estimator of S (I) based on the data
{ (X, ,6, ): i =1,...,n } and 601(2) is an appropriate estimator of c0, (1) which we will

awn
Y(t)

dN (5)i

define later. Note that dS(t)=-S(t-)dA(t)= -5(t— ) sthe

 

 

and A(t)= Jim

usual Nelson—Aalen estimator of the integrated hazard function (Andersen et al (1993)”).

Since costs are realized only at times 7} we estimate 601(1) at time t: T, by 37(7} ), the

average costs observed at 7,. Substitution in (2.15) gives

=26.y(T.-) —Y(—S,,',))1T. <rl (2.16)

Some simpliﬁcation is possible if we assume there are no ties in the data set
between event times and censoring times as well as among event times. Let G denote the

Kaplan-Meier estimator of the G based on the data{ (X, ,6, ) : i =1,...,n} , where 6: =1— 6,-

as defined in (2.4). Since the processes N and N 6 do not have jumps in common,
5(1-)G(t-) = n’1Y(t). Also if there are no ties among the event times 37(T, ) = y, (7})

the observed cost in the ith subject. Hence (2.16) reduces to

* ,='1n5%ﬂﬂlrg 217
Pm. n 2.007;. 71 < )

90

(T)

Observe that (2.17) can be also written as ,[lwn- - "n"1 26, g-T—T' ) [T S 2'] since does not
I: I 1

jumps at those times T, for which 6, =1, so . i = .
G(T-) G(-

I

.Relation (2.17) produces

.:i .Sh

-)

the same estimator as described by Bang and Tsiatis (2000)“. If there are ties among the

event times and If < <1; S 2' are the distinct event times observed in [0,2 I, then
70; ) = 5;, the mean of the observed costs at time I}. This simpliﬁes further the

. . ..1 .,J " :1:- . . . . :1-
aforement1oned estlmator to 12 Z ),d, /G(t, ) where d, is the multrplrcrty of 2,.
jr;Sr

If some event and censoring times are tied in the data set in (0, r),

S(t—)G(t-) will not equal n"lY(t), see Van der Vaart (1998)“. In fact

_AN_(_u) AN __(__u) AN__°__(u))= -1
50— )Ga— )= Ho y,——)H(1-—)——,,———,)) >[I(1——Y——,,) Y0)

u<I

where N 0 = N + N C which means that [1W7 1 > ﬁwrz- Since our concern is with S (t) and

G(t) for I E [0, 2') we will show (under some assumptions) that

sup I n'lY(t) - S(t-)G(t-) I—> 0 in probability.

I<T

Proof:

By the Glivenko-Cantelli theorem for iid random variables,

sup I n’lY(t) - H(t—) I——> 0 in probability, provided H(2'-) > 0, where H(t) = S(t)G(t).

I<1'

Under this same condition the Kaplan—Meier estimators S and G are uniformly

91

consistent on [0,1']. Hence the assertion follows from the inequality

|n'1Y(t)—§(t-)G(t-)|S ln'1Y(t)—H(t—)| +|§(t-—)—S(t—)|+|(§(t—)—G(t-)|.u

Remark 2.2.1.2

We estimate col (3 | Z0) from a weighted regression model using weights
w, = s, / G(Y} — | z, ). There are several avenues for estimating the survival distribution for

censoring time, G, we summarize some ideas in the following table:

Table 2.1 Estimating the survival distribution for censoring time

Model Comments

 

1- Nonparametric Use Ci , the Kaplan-Meier estimator based on the data
{(X, ,6,):1’: l,...,n} , where6, =1—6, . Then the weights can be
estimated as
w, = s, /é(T,) or w, = s,§(r, —)/nY(T, ).
2. Semi-parametric Estimate G(T, — | z,) from a Cox proportional hazards model
3. Parametric Assume G has a parametric form G(t,6) = P[U > t | 0] . We

assume that the functional form of G is known except for an

unknown q-dimensional parameter 0.

 

In model 1 choosing to estimate G(t) = P[U > t] by G” (t) for all t S 2' , where (T is the

Kaplan—Meier estimator from the data {( X f ,1 — 6,." ) : i = 1,..., n} will not produce changes

92

in the formula for the weights since G” (t) = (3(2) for all t S 2'. We will come back to the

problem of estimating the weights later on in this chapter.

NPV for sojourn costs

The NPV for sojourn costs in state O=’Alive’ is
NPV‘2)(ZO)= £b0(t|ZO)P00(O,t—|Z0)dt.

The quantity bO (I | Z0) is the expected mean rate of expenditure at time I while

sojourning in state O=’Alive’. In practice it will not be observable unless discrete
information is available. Instead we will only know the total cost of the sojourn. Since a

sojourn ends at a transition time or at 2' , the total cost is observed if censoring has not

occurred before time T" = min(T,2) . The set-up now is the same as described in Chapter
2.1 in the introduction and background section. The observed cost y = y(T‘) is
uncensored if, and only if 6' = [T‘ S U] = l , and the event occurring at Tis observed if,

and only if 6 = 1. In this case the weights are w: = 6: / G(Y,‘ — | 2,) and assuming that the
censoring time is conditionally independent of survival time and cost we get
E(W: ”ivyz'vzi)=1-

The NPV of interest is
51200 | Z0)S(t— | 20m: = ES(t-|ZO)dm(t | 20),
where m(t | Z0) = L: b0 (u | Z0 )du . By an integration-by-parts we get
NPV(Z0) = E (m(T‘ ) | 20) . From a standard Cox model we obtain an estimator

.§'(t|Zo)ofS(t|ZO).

93

Consider a regression model of the observed costs y, on time 7,. , noting that
only observations for which 6,. =1 will be used. In addition to the observed (fixed time)

covariates 2, our model would include terms for modeling T,~~ , for example

We use a model
y.- =X.-.6+£,~ (2.18)

analogous to (2.11) and the same scheme to obtain the estimator

,6,, = (ZwSxfx, )'1 (z wfx,y,) of )6 as in (2.14). All of this is exactly the same except
i=1 i=1

for the new weights w: . Now let x0 (1) denote the covariate proﬁle at time tin this

model and derive the estimator of m(r | 20) as 62(2 | Z0) = ,6; £20 (u)du , where the dot

denotes differentiation with respect to time. This gives our NPV estimator as

N13V(zo) = 3;, L'sa— | Z0)xo(t)dt.

Remark 2.2.1.3

Suppose now that we include neither time or covariates in the regression model
(2.18). The net present value is 'Cb(t)S(t-)dt = ES(t—)dm(t) where Sis the survival

distribution of T, b(t) is the expected mean rate of expenditure at time I while sojourning

in state 0 and m(t) = £b(s)ds. By an integration-by-parts we get

ES(t—)dm(t) = j;m(t)dS(t) + S(2—)m(2'). Here we have assumed that m(0) = 0, if

94

this is not the case an initial cost should be added to the final computation of NPV. The

estimation of m(t) , t< 2 and m(2) should be done differently in the following cases:

a) patient dies before 2, i.e. T < 2 and

b) patient does not die and is not censored before 2, i.e. T 2 2', U 2 2 .

Our goal is to estimate E(y(T* )) = E(y(T)[T < 2]) + E(y(2)[T 2 2]). When
viewed as a transition model we are only able to estimate the first part. If we look at the
problem as a sojourn model we are able to estimate E ( y(T’ )) . Indeed,

E(y(2“ )) = E( j: E(y(2‘)|2‘ = r)(—dS'(r))) since 2‘ s 2, where s‘ is the survival

distribution for T‘. On the set I S 2, note that {I S T‘ } = {t S T} and S(t—) = S‘(t—).
Also S‘ (t) = P[T~~ > t] = O for any 22. rand S‘ has a discontinuity point at2'.

Let us denote m‘(:) = E(y(2‘)|2* =t). Then

E(y<T*)>= ﬁm‘ox—dSYr»: E’m‘(r)(-dS<t))+m‘(r)<S’<r-)—S‘<r»

= ﬁ'm‘ux—dso»+S(r-)m"(r)
Also m"(z)= E(y(2“)|2‘ =r)=E(y(T)|T=r)=m(t) for all t<2 and

m"<r>=E(y(r>|T*=r>=E<ymIT2rr

Therefore
E<y(T‘» = ﬁ‘m<z)(—dsm>+S(r—)E<y(r> I T 2 T) (2.19)

Naturally for the estimation of the first term of the expression we use the

weighted mean of all observations with X ,7 < 2 and 6,." =1 . For the estimation of the

second term we use all observations with {X f = 2,6: =1} = {X , 2 2'}. If there are no

95

observations such that X ,* = 2,6: =1 then the estimation of the second term is not

possible. Assuming that this is not the case here, we estimate the expression (2.19) as

E(y<T")> =(1—s‘(r—>>/3'.. +S‘(r—>E(y(r)|22 r) (2.20)
Z W?”
where S (t) is the Kaplan- Meier estimate of S, 6,, = “___—Xi” and
w,
{i:X, <2}

Z W32.-

{i:,X=

2w."

“:X, =2}

E ( y(2) | T 2 2): where w: = 6: / CART: -) and 6 denote the Kaplan-Meier

estimator of the G based on the data {(X,,6,) :i =1,...,n} .

However assuming no ties among event and censoring times that are strictly smaller

than 2, and no ties among event times strictly smaller than 2 ,

* 6" A It A at A
Z w.= Z . '. = Z n<S(T.- —)—S(T,-»=n(1-S(r—)).
{i:X,'<2} {i:X,‘<2}G(7i ) {1:X,'<2}

 

 

 

Z wf= A;<r)>_ MAN?) ”gm,
{i:X,.=2} T 1_ T G __
( Y(2 ) —) (T )

and 2w: =n(l—S(2-))+nS(2—)=n. (2.21)

Expression (2.20) reduces to

A

E(y(T‘» =(1—§(r—»ﬂ.. + Sir—woe? I T 2 r) =

96

2 W13? 2 W;Yi

 

: (1 —§(T-)) {i:X, <2} ‘ + $(T—) {i:X, =2} .. :
Z “’2‘ Z “G
{i:X,°<2} {i:X,'=2}
Z Wiyi Z w,y,- Z w;y, Z wi‘yi‘
. {i2X-'<2} ~ {12,221} {iSn} {iSn}
=(1-S(r-))-—=-.——-+S(r-) U = = . -
n(1-S(2—)) nS(2—) n 2 w,

{iSn}

Therefore if times and covariates are not included in (2.18) we can estimate the net

2 W:yi

present value by Jig—"L:— which is exactly (2.5), i.e. Lin’s estimator (2OOO)4.

Wt

{iSn}

2.2.2 Mean cost in the interval cost data case

Suppose that the interval [0, 2] is divided into G intervals with
0 = a0 < a1 < < a0 = 2 . Let T, U denote the survival time and censoring time
respectively. Let S (t) = P(T > t) be the survival function for T. Patients are followed

through time 2 , where 2 indicates the end of the study, i.e. all observation is ceased at 2 .

Consider costs incurred in interval I g = [a ag ). If death or censoring preceded

g-l ’
ag_l then the cost in the gth interval is either 0 or unknown. We observe a non-zero cost

in the gth interval for the ith patient in two possible situations:

(1) If the patient is observed throughout I3 = [a 0,, ) , then

8-1 ’

X, = min(Y, ,U, ) 2 ag. Let y,,, be the associated cost at time ag.

97

(2) If the patient dies in I, =[a 08), then a < X, <ag and

g-l ' g—l '-

l I

6- = [7, S U- ] = 1. In this case the cost y,,, can be regard as a transition cost at
time X ,- .
We will regard (I) as a sojourn cost and (2) as a transition cost. Let
6,; =[X, 2a,], 6,: =[a,,_1 S X, <ag ,6, =1] be the censoring indicators fOr cases (1)
and (2) respectively. For the interval I g , the sojourn cost is observed if and only if
6,: =1. If 6,; = 0 , but ~6,2,! =1 then the transition cost y,,, is observed.

Using similar methods to Section 2.2.1, taking together all sojourn costs y,,, , when

II
2‘2; yig

i=1

___—n .
a
Z52
i=1

6,: =1, the average sojourn cost in the gth interval is estimated by 130 (a, ) =

Therefore the NPV for sojourn costs is

L _ ,a )50(I)S(t)dt =Eo(a, )s‘(a,) (2.22)

With transition cost in interval 1,, , we have P(6,‘; = 1H,) =[ag_1 S 7} < as ]G(T,- ).

Takin to ether all transition costs when 6‘1 =1, the avera e transition cost in the th
g g 1g 8 8

 

 

22 6d 2! 6d
interval is estimated 5013 =(Zyg, .. '8 )/(Z ,. '3 ).Therefore the NPV for
i=1 G(X,) i=1 G(X,)
transition costs is
L )EO,(s)dS(s)=Eo,g (S(ag_,)—S(ag )) (2.23)
8'1 ,a,

98

Therefore for each interval 18 = [ag_, ,ag ). g =1,...,G , using (2.22) and (2.23),

we have estimated the mean cost given by

gylg

26:18
——S(a >+[( y. ’ —>/(
,=, 25,; Z 800:) Else?)

 

)](5(a,_ ,>— S(a W2. 24)

Throughout we have assumed that both S and G are continuous and
S(t) = P(T > t) , G(t) = P(U > 2). If this was not the case here, then keeping all previous
notations we must use left-hand limits S (t—) and G(t—) throughout. Alternatively we
could deﬁne S(t) = P(T 2 t) and G(t) = P(U 2 t) , i.e. left continuous versions of the
previously deﬁned (right-continuous) survival distributions. When S and G are
continuous this distinction is unnecessary, however when it comes to estimation, and S

and G denote Kaplan-Meier estimates we need to make the distinction. If there is an
initial cost at time 0 then it needs to be added to the final mean cost expression, however

for now, as previously specified, we assume initial cost to be null. To ensure all weights
are deﬁned properly we assume G( 2) > 0.
Using general properties of the Kaplan-Meier estimator S (t) of S (t) = P(T > t),

AN(r)

AS (I) = -S (t—) Y , and under the assumptions of no ties among failure times we
I

 

. —6AT-
have AS (T, ) = $57-64. If we assume there are no ties among failure and censoring

time then S (t—)G(r—) = Y (t)/ n. Both assumptions hold under the case that both S and G

are continuous. Then

99

 

 

Z . 6:; zitag.l s2,<a,.6,. ll=n§":6,S(7;-)[ag ,_7; <a,)_
.=l G(Xi -) i=1 G(T.—> .-. m1)
""'_:I(S(T ") S(T,))lag_1—T <a ]=n(S(a,...)——§(a,—))(2.25)
Also
:55; =4:[X‘ Zag]=Y(ag)=nS(ag—)G(ag—) ' (2.26)

Using (2.25) and (2.26) in the two terms of (2.24),

 

i 2,53%, G 21:6 igyt's G n A
(-‘——= )S(ag -)= ( .. " . )5‘g(a -)= y.- /nG(a -).
g=l :6}; gz=l "S(ag ")0 (ﬁg ") 321;” g g g
i=1
G n (id
-——)/( S S - _
2:,K2‘,y .-g—G——,X _) NEE—(47% _ ———)1< (ag-1—-) (a ))—

allows simplification of the mean cost:

 

 

G n l 1.,-601 G n 6i I n 6i I
ﬂ=2{%Z(G(x (Styx i'gyg )}_ Z{_’1:2_A__gy')g }=Z{ {12 8ng

g=l i=1 ' X—)+ G(ag _) g=l i=1 G(Xig g= -l n i=1 G(T‘g _'})
where 6,, =65; +6; =[X, 2a,]+[a,_l SX, <a,,,6,. =1],
X}, =min(X,- ,a,)=mln(X,T‘ ,,,a ) and 2,; the same as in (2 7).

This is the same as Lin’s estimator (2000)4 (2.7) for the interval case data. Indeed note

that

100

6,8=[7,AagSU,]=[X,Zag]+[X,<ag,6,=1]=
=[X,Zag]+[ag_ lSX, <08 ,6, =1]+[X,<ag_ ,, 6,=1]=

=6,,, +[X,<ag_ ,, 6,=1].

Also under the no ties assumptions above and using the last equality in (2.21), with the

.. 6.“ ,. T: SU.
role of T, played byT ,g, Z-.—-§—=Z[—Ai—_—]=n forall g=1,...,G.
I=I G(Xig -) I=I G(ng —)

Therefore, the mean cost is

 

 

15 " 5,; 10 n [X,<dg_1,§, I] _
§§G(xlg)y'g n§§G(X;)y1g+ng; G(X.;) lg

 

 

1- yr A ‘71——
g=l (=1 G(Xig) g g=l i=1 G(Xig) i=1 G(X,,,)

:HIH

which is exactly (2.7).

2.3 Asymptotic normality of the mean cost

We are interested in calculating the variance of the estimate
5601(IIZ0)(-dS(t |Zo)) of the mean for transition costs, Eco, (t | Z0 )(—dS(t | Z0 )).

There are various possibilities for estimating S (t | Z0 ). In Chapter 1, we have already

developed estimation methods, proved asymptotic properties and calculated the

asymptotic variance of the estimators of E601 (t | Z0 )(-dS (t | Z0 )) when S (t | 20) is

101

estimated from a Cox proportional hazards model and c0, (2 | Z0) is estimated from a

regression model.
If the assumption of proportional hazards is not satisfied we may consider

estimating the survival function for time using a stratiﬁed Kaplan-Meier estimate. We
will ignore covariates in the estimation of S (t) . Let S (t) be the Kaplan Meier estimate

of S (I) and suppose E(s) = 801(5IZ0) is estimated from the regression model (2.11).

Then 12(20) = — Eé(s)dS (t) . For ease of notations we assume that 7, S 2 for all patients

in the data set. Then s, = [7, S U, A 2] = [7, S U, ] = 6,. This assumption is implicit in
Bang and Tstiatis (2000)“.
Consider the regression model (2.11):
V. = xi ,6 + 5i
where x, is a 1x p vector of covariates with ﬁrst component 1 so that the ﬁrst component
of )6 corresponds to an intercept. Under model assumptions A1 and A2, see (2.12),

(2.13), a consistent estimator for ,6 is given in (2.14)

ﬂw=(ZW,-X,’X,-) (Zia-X XIX)

 

where w, =

).Therefore ,6,.—,B=(Zw,x, x,) (:1wa 8,)

Condition A1 can be weakened to A’l: E(x,8, ) = O, E(8,2 I x, ) = 03. In fact

under A'l and A2, the IPW estimator is always consistent. Indeed

21 def
1: w,x;x, —P—-) E(xfx,), since E(w,-XIII, ) = E(E(w, Ix, )xfx, ) = E(Xixi) = A , and

102

f

lZWrxl€r—-—>0 since 5091' IX,,y,.T,-)= P(U,- >T— Ix,,y,,T,)/G(T -)=1 and under

1, E(w,x,78, ) = E(E(w,x;8, Ix, , y, )) = E(x,78,E(w, Ix, , y, )) = E(x,’-8, ) = 0.

We need A1 and A2 for the consistency of the (unweighted) estimator

= (Z CZXEX, )'1 (Z 6,x,'-y,- ). We have
i=1

i=1

sz —=/2’) (—Zw.- ,x,)"<,;Zw.x,e.>—(A"+o (1))(—‘/1;ZW;X,’E;)

1- 1

By the ordinary CLT, —l—ZW,X,£, —>N(0, B), where B: E(w,2 X,X,~€,2 ). Using A1,
n .

the form of B can be reduced further. Since E (w,2 Ix,, y,,7,)=
P(U >7}|x,,y,,T-)/G2(T,- -)=l/G(T, —), itfollowsthatB= E(w,.2 x;,,xe )=

E(E(w,2 Ix, , y, ,2, )xgx,e,?) = E((l/G(T, -))x;x,e,?) = E((l/G(T, —))x,7x,E(8,2 Ix, )) =

xx!

0'2E(——-— ). Therefore
G(T- -)

J; ([31. —ﬂ)—1—>N(0.A"BA" ).

whereA= E(x;,=x)andB o§E(——— xx ——.)
G(T, —)

Then 5(z) = x0 (03,, and for a fixed I, JZ(5(:) —c(t)) = x0 (ms/R3,, - 13)) . It

follows that:

\/—(8(t)— c(t))= x0(z)A"(J—l_-zw,x;e,)+op (1) (2.27)

Wewilldenote v0(t)= x0(z)E(x"‘x) so J—(é(t)— c—(I))=v0(t)(—‘/1;Zw,x,78,)+op (1)

103

We now turn to the estimation of the survival function S (I) = P(T > I). Let

§ (t) denote the Kaplan-Meier estimator of based on the data { (X , , 5, ) : i =1,..., n }. Note

that d§(t) : —§(t_)d/i(t) : “SA(I—)d)1/V:t)

 

,where N(t)=Z:__l[7} 52,5, =1],

J(s)dN(s)
Y(s)

is the usual Nelson-Aalen estimator of the

Y(t)=Z;l[X, 2t]and [19):];

integrated hazard function (Andersen et al (1993)”), where J (s) = [Y (s) > 0]. We will
assume both S(t) and G(t) = P(U > t) are continuous and 5(7) > 0.

Notations:

7Z’(I) = P(X 21) = H(t) = S(t)G(t)

 

0'2 (I) = £610!) du

7r(u)

Model Assumptions (Andersen et al (1993), pl9025):

ABGKl: For each 56 [0,1], nEi—ES-i-OIUMs—LJZU) as n-—)oo
s

 

 

ABGKZ: For all 8>0, nﬁiiﬁﬂf“? |> €]a(s)ds———>O as n—)oo

ABGKS: ﬁE(1-J(s))a(s)ds—P)0 as n—>°°

Under conditions ABGK1-3, following steps in theorems IV.1.1,

IV.1.2(Andersen et al (1993), p190”) and our proofs in Chapter 1, Section 1.3,

dM(s)_ 1 ”_ingi“)

JZ(A(z)-A(t)) is asymptotically eqmva'emm J1;£_ 75(3) z(s) ’

where M(t)=iM,(r), M,(t)=N,-(t)— £14(t)dA(s).Therefore
i=1

104

Then JZ(A(:)— Am) is

 

 

EdM,(s) = 6,[X, St]__J~(:Y,-(s)d:‘1(s).Let c720): I'd/1(3).
”(s) ”(X,) 72(s) 0m )

asymptotically equivalent to

6,.[X, <t] Y,(s)dA(s) 1 6,[X, _t]_
fZ—-— f,———= -Z<-'——-

02(XiAt».
11,1 H(X) z(s) n ﬁx.)

Using S (t) = exp(—A(t)) and the delta method we conclude that J;(-S (t) — S (t))

is asymptotically equivalent to —S (t)(\/;(;i(t) — A(t))). Therefore

ﬂsm— amp—SHE‘S" [X‘Jl-azwmm (2.28)

J; -, 7m)

(1..

where we use notation a,, = b” to imply asymptotic equivalence. Using (2.27) and (2.28)

we obtain

/ n l
...5(:)—— [Wm—2 6"” -————=-<'—’] *0’2(X,-At)}

[Jam-Sn» J; =1W“

. n (2.29)
Memo—c010» v0(,)_\/1;wa

 

 

\ 1

Under the assumptions Al-A2 and ABGKl-3 we have, by the ordinary CLT:

[Jam-5(0) ]

—D—+ N(O , (t)).
Memo—emu» 2 22”

where the elements of the asymptotic variance are

02(0520) o
22x2(t): O V0(I)UZE(C;X

 

)
git))vo(

105

The components of 2 2x2 (t) are derived as follows. The first diagonal component is

6‘ X _t
S2 (t)Var r{-‘—[—(X—)]— 02(X,At)}.Using the independence of U, and T,,

6. X5 6, X, _
E(’—[,,—‘—t—])=E(E (_[___t]|];))=
II‘(X,-) 7! (X)

62(2)): SID = E(—1—'—[T' 5’1):02(')’

S(Ti)7r(T.-) '

 

=E(
§i[Xi -1102 “I: —I]0_2(

0,(X /\'I))=E(E(

T _.
7r,(X) ”(Xi) At)| ))-

E(-—-——

[T—t] 2 __ 2
=E(——— S(T,) 0' (T, t))- £a(u)0' (u)du,and

E(o4(X, At» = E(o“(X, )[X, si])+o“(t)P(X, >t)=

= £a4(u)(-d77(u))+0'4(t)7r(t)= £ﬂ(u)202(u)d02(u)=2 £02(u)a(u)du

The ﬁrst diagonal component is:

_ _ §i[XiSI]_ 2 _ 2 (S‘i[XiSt]_ 2 2 _
Z“(t)-—Var( S(t) ”(Xi) 0' (X,/\I)))—S (t)E(( ”(Xi) a (X,Ai)) )_

6,[X, St]_26,[X,_ <t] 02

4
”z(Xi) ”,x,, UMX At)+0(X,-At)) (2.30)

=52(t)E(

Therefore Z110) = S 2 (1)0'2 (t) . This result can be also be derived using the martingale

 

representation of Jn(/i(t)—A(z)). Also 2,2(t)=E((%-:%S—)ﬂ- 0' (X, Ai))w,x;e,)=
6‘,[X,St] 2
= —- XiA —
E(E(( ”(Xi) 0'( t))G(;,)x x,’€,|x,, y,))_

106

[T,- St]

= E((l—T-‘i—S—t-lno'2 (7} AI))X;€,- ) = E(E((———0'2 (T- At))X;€, Ix, )) =
”(Tn MT.) '
[7; St] 2 I . .

= E((——(—77)——0' (7} AI))x,-E(€, Ix, )) = 0 Since by assumption A1, E(E, Ix,- ) = 0 . The
7" t

second diagonal component is 222 (t) = Var(vO (t)w,x,'-£, ) = 12,, (t)0’£2E(ﬂ‘i—)v0 (t),.

G(T. -)

Next we use Funtional Delta Method and a similar version of Lemma 2 in Chapter
1 to prove the asymptotic normality of mean cost.

Consider the functional

cozE" —>R. ¢<x.y)(:)= ﬁx<s>dy<s).

where E‘ is a subset of D[0,'r]2 ,E‘ ={(x,y)6 D[O,z']2 : Eldyls C}, where 0< C<oo.

Let (x0, yo) be a fixed point of E' such that £|d(x0)|<oo . Then w, can be extended
to the space D[0, 1']2 so as to be Hadamard differentiable at (x0 , yo ) , with derivative

d¢<x,y>(h.k)<r>= ﬁx<s>dk<s>+ [,htswyts) (2.31)

where the integral with respect to k is deﬁned by the integration by parts formula if k is

not of finite variation.

Identifying x(t)with C(t) , and y(t) with S(t) , since c(t),S(t)e D[O,z'],
El dS I: l— S (2') < C =1 and under the extra assumption

EAl: C(t) is of ﬁnite variation on [0,1],

by (2.31) and functional delta method stated in the Appendix,

107

JZl (aux-£0» (roux—(15(0)):
=,/;(¢(5,—§)(r)—¢(c,-S)(r))——->D d¢(ct_S)(ZI’ZZ)(T)
,{x

where z, (t) = a0 (t)§, a3 (t) = v0 (magma—£79120 (0'. v0 (t)=x0 (r)E(X’X)“.

22 (t) = —S(I)U(t), U(t) is a gaussian martingale independent of i, which is a standard

normal random variable. Also E(U (t)) = O and E(U2 (t)) = 0'2 (t).

Using (2.31) we get

ﬁiﬁé(1)('d§(1))— £c(t)(-d5(t))} —D—> EC(S)d22(S)+ Ezmswsm.

To compute Var( [O'c(s)dz,(s))=E(( jO’c(s)dz,(s))2 )=

:E(( Ec(s)d(—S(5)U(s))2) we just simplify the integrand

ﬁc<s>U<s>S<s>dA<s>+ [gc<s)(-5<s»dU(s)= [It/(smash Ectsx—Stsww).
where X(t) = 1, rc(s)5(s)dA(s ). By an integration by parts for the second term

Ec(s)d(-S(S)U(s)) = —j0'(c(s)5(s)-Z(s))dt/(s) .Since U(t) is a zero-mean
gaussian martingale, with < U (t) >2 0'2 (t), we get immediately

Var( ch(s)d22 (5)) = E(c(s)S(s) - fe(u)S(u)dA(u))2d0'2 (s) .The second

term j: Zl (s)dS (s) has variance (an (s)dS (5))2 and the cross product has mean zero,

since U (I) and f are independent. Hence we have proved the next theorem:

108

Theorem 1

Under the assumptions Al-A2, EA] and ABGK1-3 we have

s/Zt ﬂew—£0»— Io'cttx—dsa») A

N(0.{ E(cts)3(s)—K(s»2daz(s»)+< [O'aotsmsmf no (2.32)

Remark 2.3.1

1. Suppose costs are not considered random, but fixed. Take C(t) = 1.

Them/R Eétz>(-d§(t)>— Ec(t)(—dS(t))) = —~/Z(§(t)—S<t». The asymptotic

variance in the above theorem has only the ﬁrst term,

z(t) = fc(t)5(t)dA(t) = S(T)—S(t) and so,

E(c(s)S(s) — Ei(s))2alo2 (s))) = 52 (no2 (r) . This verifies the well known result,
s/Ztstri—Stt» —P—> N(0.52(r)02(r».

2. Let cost be fixed and set C(t) = e’" . Then Ee'" (—dS(t)) = £e'”S(t)dA(t)

is the actuarial value discussed in Andersen et al (1993), Page 28425. Then (2.32) captures

the results stated on page 284, namely

v/Z(£e"‘s‘(t)dri(t)— £6"’S(t)dA(t)) L)

N(O,{ ﬁ(e"‘S(s)- fe"“5(u)dA(u))ZdoZ(s)}).

109

3. If no fixed or time-varying covariates are included then the cost C(t) is

" 5.-
2 G(T.) yi'
constant on t and can be estimated as 6(t) = )60 = 1:1" ' for all IS 2'. Then
i=1 G(Ti)

,1? = 30(1— S (7)) and from (2.32) the estimated asymptotic variance

is 8025‘? (7)62 (r)+(l-S(T))2 var(,30 ).

2.3.1 Regression model for log-cost

Suppose all costs in our data set are not null and consider the regression model

with log cost as dependent variable:
log( y, ) = x, ﬂ+ E, (2.33)

where we keep all our previous notations. In this section, we consider the case of an error
term with unknown distribution under homoscedasticity. More precisely we assume:

A3: 8, are i.i.d. with 8,- : mm (the function s(-) allows for
heteroscedasticity conditional on x and ‘a' is a vector of unknown parameters)

A4: v, is independent of x, and has zero mean and unit variance.

A5: rank E(xfx, ) = p

A6: E(egx) <00

110

Our focus is the estimation of C(t) = c010 120) as in the previous section. In view
of (2.33) we identify C(t) with E(y, Ixo (t)) = e"°(')‘6E(e£ ) , where x0 (I) is a fixed
covariate profile generated from 20. Following Ai and Norton (2000)2°estimation of
C(t) is accomplished by estimating )6 in (2.33) and the smearing factor a = E (es ) .

Define p(t,6,a)=e‘°"’ﬂa and identify c(t) with (0(I,,B,a). Then E(t)=¢(t,B,6)

and £1 = 12:1?" and ,3 is an estimator of ,6 which will be obtained later. Here
n '—

é, =log(y,)—x,ﬂ. Then JZ(5(t)-c(t))=v/Z(e"o“’36-e*°<”ﬂa).
By Taylor expansion, retaining only the first order terms, we get:
Mam-cm)=s/Z(¢(t,ﬂ‘.a)—¢(t.ﬂ.a)) ~

a(o(t ﬂ 0) am: I3 a)“,
=f<——-— as (B— ﬂ)+——— a -a))

=~/i1-(C(I)Xo (t)(3-ﬂ)+e"°"’ﬁ(a-a»=

=c(t)x0(t)JZ(B—,6)+e‘°"’/’JZ(6-a). (2.34)
Consider first the estimation of ,6 in (2.33). We follow exactly the same steps as

in the previous section 2.3 using 77, = log( y,) instead of y, . Then

A n n (S
,5 = (Z w,x;x, )'1 (Z w,x,777, ), where w, = CUT—f Therefore

s/Ztﬂ- ﬂ>= (— ,1,wa x;x.-) (TZW,x;e,-) (2.35)

From A5, 1: w,x;x, —P> E(wx’x) = E(x'x) = A. From A3/A4, E(w,x,£, ) =
n .

lll

= E(E(w,x;e, Ix, , y, )) = E(xge,E(w, Ix, , y, )) = E(xfe, ) =—_ E(xgv, ,/s(x, ,6)) =

= E(xf,/s(x, ,a)E(v, Ix, )) = O.

Hence71_-Zw,x,8,— — —0p (1) . Then (2.35) yields
n

JZ(,8— 6): A (— jzzaz,x;e,)+op (1) - (2.36)

@151

Next, consider the estimation of a byc’i . Then

ﬁ(&-a)=71—;2:,(e -e )+J—2:;, (8 E(e ))
Use

ef" -e£' = e51 (6.3-5, - 1) = e2 (exp((77, - x,,5’) — (7],- —x,,B)) —1) = e" exp(—x,- (8—6) —1)

to obtain 712'2:=1(eéi —-e£l’ )=_(;1,'Z;,€E’Xi )J;(,3—,B)+op (1). By A6 we get
fZLWE‘ “65" )=-E(e‘x)s/r_2(B -ﬂ)+o,, (1) (2.37)
n

Using formulas (2.34), (2.36) and (2.37) we obtain

Mam-cm)=c<t>xo(tis/Ztﬂ—ﬂ>+e"°"WZ(a-a) =

=c(t)xo(t)(A"(——l—ZW,-x,-8,-)+0p (1))+
JZH
+exoit)ﬂ(_E(eex)(A—t(_jjzw,x,£,Hf: (e E(e ))+o,, (1)))+o,, (1)):

=(c(t)xo(t)-e"o‘”ﬂE(e‘x))A {-‘/l_—:w,x,7,:/l;—£}+ex°('m{ 2(e -E(e‘t)))+op(1).

112

If we denote 29,(t)=(c(t)x0(t)—e"°(')ﬂE(e€x))A-l and 15*2(:)=e"°"“3 then

JZ(5(t)—c(:))=29,(:){— wae +192(t){%2(e£‘-E(e£‘))}+op(l).
n i=1

If s(x,a) = a (i.e. the error term does not depend on x , so the error term is

homoscedastic) as in Duan (1983)19 then following the same steps as in the previous

 

. 6i[Xl St] 2 I E. 5.
section 2,2 (t)=E[(————0 (X,- At)){l91(t)W,-X,-€,- +192 (t)(€' —E(e ‘ ))}]=
”(X,)
_ [ZSt]_ 2 I
_E[( ”(m a (T, At))§,(t)X,-E(8, |x,)]+
+E[([ i—t] 0207 A1))19(l)(€£‘“E(€E‘))l=0

since by A3/A4, E(E, Ix,- ) = E(v, J; | x,- ) = E(v, s/E) = O and
E(eg‘ —E(e£" )lx,)= E(ev’fé |x,)—E(e€" )= E(ev’ﬁ)—E(e€‘ )=O.
The second diagonal component of szz(t) is

222 (t) = 29, (t)Var(w,-x,7£,- )l9,’(t) + 1922 (t)Var(e£“ )+ 2E[§, (t)w,x;€,l92 (I)(e€‘ - E(eE‘ D] =

x'x
=a§z9,(t)E(G ‘T'_

i

 

))z9,’(z)+af, :9; (t)+ 219, (:)E[x;e, (e5: —E(e€" ))1192 (2)

wherea — -Var(e' ). Obviously the ﬁrst diagonal component of 22 2(t) remains

unchanged and the proof is now identical to the proof in the previous section.

113

2.4 Parametric estimation of the survival distribution

for censoring time
2.4.1 Minimal cost data

Suppose costs can potentially be accrued over a ﬁxed time period [0, T] with

expenditure terminating at some event time T so that complete cost observation occurs if
a patient is followed through time T' = rnin(T,!) . Further accumulation of medical cost
is not possible since there is no cost after death. Censoring at time U might preclude
observation of T’ . Let X = min(T,U ) ,5 = [T S U], where [A] denoted the indicator
function of the displayed event A. The observed cost y is uncensored if, and only if

6* =1, where 6’ = [T‘ S U], and the event occurring at Tis observed if, and only if
5 =1 .

Suppose that we have a random sample of size n. The variables corresponding to
the ith subject are indexed by the subscript i. Let {(X: ,6: ) : l S i S n} denote the random
sample with X,‘ = min(T,‘,U, ).

Consider a linear model for the cost y,- observed at T: in the ith subject given by

y,- = x,,B + 6‘, where x,- is a vector of pxl covariates, E,- is an unobserved error term. In

estimation of ,6 we use the data {( y, ,x, ),i = l,...,n} . However since cost will be

114

incomplete ifé’,’ = 0, we use the weighted sample {(y, ,x,- ,w, ),i =1,...,n}

*

where w,- = i Formally ,8 is obtained by minimizing (with respect to ,6 ).

‘ .

 

1'

min 2 w,- ( y,- 4,3)2 (2.38)

i=1

The objective function (2.38) weights each observation by the inverse-probability

n n
of being uncensored. This gives ,6,, = (Z wixgx, )‘1 (Z w,xfy, ). We also assume
i=l

i=1
censoring to be non-informative in the sense that given fixed time covariates, U is

independent of cost and event times. Note that under this assumption E ( w, Ix, ,T, ) =
P(d,‘ =1|x,,T,)/G(T,"—) =1, where G(t) = P(U >2).
The weights w, are unspecified since G is unknown, however in order to be able

to use A, we need to have a suitable estimator for G. So far we have suggested its
estimation from a Cox proportional hazards model or non-parametric model. Assume that
the distribution, G, of the censoring time U, has a parametric form. We assume that the
functional form of G is known except for an unknown q—dimensional parametera Then
P[é“ =1| y,T]= G(T‘ —,6), where G(t,6) = P[U > t | 6] . Let g(.,6) andfbe the
density functions of U and Trespectively, and let S (t) = P(T > t) be the survival

function of the event time T.
The estimation of 6 will be accomplished via maximum likelihood. In order to
write down the appropriate likelihood, we must be precise about what is actually

observed.

115

1. Observation scheme I.

We only observe X = min(T,U) andé‘ = [T S U]. This is the usual random
censorship model. The contribution of a single observation, (X , ,6,.) to the likelihood is
then

((6) = {G(T. ,6)f(T,-)}“" {S(U.)g<U.- 3))”:-
Then keeping only terms involving 6 , the relevant part of the log-likelihood for

estimation of (9 is then

114-(6P6, (,_,,w

(2.39)
G(Ttﬂ) g(U.-.0)

n
Then (9 can be estimated as a solution to 2w, (6) = 0.

i=1

2. Observation scheme [1.

Suppose observation does not go past time 1' (a fixed time). If T is observed, then
necessarily T S T and, of course T S U . If on the other hand U is observed, we must
have just the opposite: U S 2' and T > U . It is also possible that neither Tnor U are
observed, in which case T >7 and U > r. The likelihood would now consist of three

parts:
(1) {G(T)f(T)}§[T<r]
(2) {S(U)g(U)}(l-§)[U<r]

(3) (G(r)5(r)}”->-"“Z”
The contribution of a single observation to the likelihood is then

I.- (6) = {G(T, .6)f(T,- )}‘Z‘T‘q‘mw, )g(U,. ,6)}(1'6‘Wiql{G(r,6)S(z')}[TZ"U2”

116

The relevant part of the log-likelihood for estimation of 6is now

11/(6) = [T < T]5W+[U < r](l—§)W+[T 2 r,U 2 HM
G(T,6) g(U,6) G(r,6)
(2.40)
Combining the first and third terms,
[T < r]6———VBG(T’6) +[T 2 r U 2 2']——————VQG(T’6) =
G(T,6) G(z',6)
z—VﬁGﬂT ’6){[T<z'][T* <U]+[T22'][U .>_T‘]=6‘———V90(,T ’6).
G(T ,6) G(T ,6)

Therefore (2.40) can be re-written using (X * ,6‘) as:

t

. V,,c;(r“,e)+(1 6 )V9g(U,6)

6 =6
W ) G(T ,6) g(U,6)

(2.41)

We will focus on estimation of 6under the observation scheme H.

Estimation of 9
Suppose (9 is a compact set in R" and assume that .60 is the unique solution of
the problem
gag Q0 (6) = 123g E(l(6)) (2.42)

Under general conditions, the sample analogue of the expression (2.42),

Q" (6) = 121,. (6) which we denote by 6, is consistent and asymptotically normal , see

i=1
Newey, McFadden (1994), Theorems 2.5 & 3.362 and Wooldridge (2002, 2003)” 3“.

Hereafter 60 is the true underlined parameter and all convergences are under 60.

117

Based upon (2.41), an estimator 6 of 60 is obtained as a solution to

Z w, (6) = 0. Now expand 2w,- (6) at the true parameter 60 to get
0 = 2w,- (6‘?) {an (60H ngw, (6")(é—90) (2.43)

where6 lies between 6 and 60 and Val/,- denotes the qxq matrix of the derivative (with

respect to m of 1/1, (6). Then under standard assumptions

n“ 2%”,- (6)——P——>E[VHI/I, (6)]. From (2.43) we get

Jae—90) = J" (60 )(—‘/!-_-Zy/,(6O ))+0p(1) (2.44)
'1 i

where J (60 ) = —E(V01//, (6O )). To verify that (2.44) would give the asymptotic
distribution of 6 we will now check that E (1,11, (6)) = 0. Examine each term in (2.41):

VQG(T‘,6)+ 1_6.)V9g(U,6)

E 6 =EJ‘
[ti/()1 [ T,6) g(U,6)

]=

——V"G(T ’6) [T A 2' s U]] + E[——V”g(U’6)

=5
[ G(T',6) g(U,6)

[TAT>U]]

VgG(T*,6)

G(T ,6)

The first term is E[ [TATSU]]=

———V"G(T’0) [T s 211T s U]]+ Ely—999%TS Tl” 5 U“:

= E
[ G(T,6) G(T,6)

= E[VgG(T,6)[T s T]]+VgG(2',6)S(T) =

= —V,, EG(u,6)dS(u)+VoG(2',6)S(z'). (2.45)

The second term is

118

““1

v g(U,6) v g(U,6)
E———9 T U =E——9 =
[ g(U,6) [ xr> ]] [ g(U.6) S(U)[T>U]]
= 47,, ES(u)dG(u,6). (2.46)

Combining (2.45) and (2.46) we will get
[z(t)/(19)} = 47,, £G(u,6)dS(u)+V6G(r,6)S(r)-V6 ES(u)dG(u,6) =
= —V9 G(u.6)$(u)|; +VgG(r,6)S(z') = 0.
An application of the Central Limit Theorem to (2.44) gives
5(6—60)L>N(0.J“‘ (1510 )).
where J" (6,,)E(1y(60)1//’(60))J‘l (60) = J" (60 )J(60 )J‘1 (190) = J“1 (190 ). We can
replace 60 by the consistent estimator6, and use [9, = P[U, Z 7? 1T1] = G(T,‘,6) for the

unknown p,- = P[U, 2 7? |7}] . For this adjusted estimator we have

A n -l n
_ A I A I
ﬂwp— Zwixixi Z 1'"th
i=1 i=1

which is a two-step estimator, first step would be estimating6 , and then estimating ,6 in

(2.38) after we have replaced w,- = (5': /P[U,- 2 7}!“ IT,] by 1?»,- = 6: /G(T,’,6). In other
n

words the two-step estimator solves the problem min 2 1'43,- (y, — x,- ﬂ)2 .
i=1

Therefore

:,L—i tame] (2.47)
n

119

Following closely Chapter 12.4 of Wooldridge (2001)“, we expand the second

term on the right hand side of the equation above to get

6" X8,

” 5,- " 5
T;¢1f11_2_,___x§,£ _[nz____’_‘_€__v,,G(T (9’)]J—(é- 90)

P.- n P1 1— 1(G(T 19))2

 

where 6 is between 6 and 60. Again use the standard arguments to claim

 

 

* ~

" 5,7'x’e, pE[ 15,-3,78,.
i=1 (G(T; 96))

(VGT,,6 , VGT,',6 ’=
2 19 ( ))—-> (G(T1190))2( 9( on]

x'e,
G(T i 1 60)
Denote this limit by D(60 ), a KXq matrix.

Combining these results we have the established the expansion

 

" 5 5, _
_Lz ixigﬂ;_Z{——XE—D(6O)J1(60)1//,-(00)l+0p(1)

 

 

 

 

51:1 Pi n 1=1P1
Let k, = 6' x,€,= 5' X'E' .nThe
G(T,- 190 ) P1
5* V G(T, 6 11,78,-
G(T},6O) G(T},60) V,G(z' 60) G(r, 60)

Taking the expectation yields,

 

I V G(T,6) I VQG(7960) I
E - 6 k- =E T-<2'-‘9——i—°—x-£.+T-Zt———x,,.
(W;( O) 1) [[1 ] G(TI,60) (1 [1 ] G(T’Ho)
V T’,6 ,
Observe that is the same as D’(60)=E ( 66“ 0))x,€, .
G(T; ’60)

120

Then the asymptotic variance Vof Jr—z(,6wp — ,6) is

v= (E(x1x1 )" )E(k.- _D(,90,,-1 (611w, (190))(11, -D(60)J" (60w,- (60))' (E(xIx. 1")

=(E(x:x.)")1E(k.k;)+D(6o)J“(6015011,- (90)l//1'(90))J-l (60)D’(61,)(E(x:x.)“)
-D(60)J”' (6011501.-(60)k:)-E(k1w,-’(6o))r‘ (6010160)}.

Notice that E( 111,. (60 )(1/,.’(60 )) = 1(6)) which makes the second term above

D(6,,)J“l (60)D’(60).
So the variance V is

v: (E(x;x.- 1“ >1E(k.-k:)—D<6o)r' (60)D’(6o)}(E(x:x,- )“ ).

Under the assumption E(E, |x,-)=0, D’(90)=E (Vqu, ’60 )) x;
G(T} .90)

 

£,-]=0 and

V=(E(x,7x,-)“1)E(k,-k;)(E(Xf!1I,-)_l ). In general the asymptotic variance of Jr;(,5’w,, -,B)

is smaller when we use an estimate 6 for 60 in the expression of 18111,; as in (2.47).

2.4.2 Interval cost data

Suppose that the interval [0,1) is divided into G intervals [0,T)=U[a,,_, ,6,) as
g

in Lin (2000), and Willan er al (2002, 2003)“ with 0 = a0 < a, < ...< ac = 2' . For the

interval cost data ﬁrst define 7“,; = 7,- A a and 5,; = [T,-; S U , ] , also denote

8

121

T,’ = 7,2; = T, A “a = 7, A 2'. For the ith patient we denote y,,, the cost in gth interval at
time 7,; . We will have two different censoring types
A. Time censoring: 5: =[7,* SU,]=O if T: > U,, and
B. Cost censoring: 5,; =[7,; _<_U,]=O if 7",; >U,.
Consider the model
yig = xigﬂ+ “1'
for all g =1,...,G or y, = X,,6+u, where
Va “11 x,,
y, = - ,u. = - and X, = - .Here y,, u, are le and X, is GXp.
in “10 x10
The following are assumed to be true:
AGl E(x,’u, ) =0
AGZ rank E(x,’x, ) = p
The IPW-POLS estimator of ,6 (pxl vector) in the model above is given by
n A n
=(Z X,X,) ‘(XZ ,’.y,) (2.48)
[1: i=1
where y, =S,P,’ly,, and R,., f1, are similarly deﬁned. Here S, is the diagonal matrix
with elements 5,; = [7,; S U, ] , g =1,...,G in the main diagonal (will allow costs to be
zero in some intervals), and P- is the diagonal matrix with elements, , ’12,, in the main

1

diagonal, where p,,, = P[U, Z T,; IT, ]. Under the assumptions above we have

P[(si; zllxig’yig’ji]:P[Ui271;I71]:pig'

122

Assume the distribution of the censoring time U, has a parametric form,

P[U, > t] = p(t,6) , where the function p is known except for the unknown q-

dimensional parameter 6. Then p,,, = pa}; -,6). Now replace 6 by a consistent

estimator6, and use ﬁ,,, = p(T,; —,6) in (2.48) for the unknown p,,, . For this adjusted

estimator we have

3

G 57' . .
JZ(ﬂ.,,— ,6): 1—22— ‘8 11,, )_,,,,1J_ZZ_,,%,1_1_,,_) (2.49)
is

1': lg: l pig n i=1 g=l

Estimation of 6

The observable data on the censoring times is restricted to X ,i = rnin(T," ,U,) and
5," = [Ti S U, ]. Here 60 E 2' is the upper limit of observation. For a patient 1',
{X, = 1’}={T, 2 T,U, 2 r} . This event has probability S(r)p(r,6) , assuming both S and
p are continuous. Now consider our adopted panel framework in which the observed X ,1.

falls into some interval [a,_, ,a, ), g =1,...,G. The part of the likelihood of ( X," ,5,‘)

that is relevant for estimation of 6 has the form { p(X,‘ ,6)}‘5‘. {g (X ,1 ,6)}1’5‘. where

g(t,6) is a density for U, . We will assume that p(aG ,6) > O and that 6 -—> g(-,6)
fulﬁlls all regularity conditions needed for maximum likelihood estimation of 6.

Note that Xfe [a 6,,) and 5," =1 is equivalent to [U, >T T,,,][ag_ ,ST; <ag ]

g-l’

(77;):1, whereas X:E [a a, )and 5," =0 is equivalent to[ag SU, STE]

i818 g-l ’

=(1-6,‘,)1,,(U,)=1,whe1e I,(t)=[a <t<a,].'ro1nc1ude the interval t>aG,

g-l—

123

define the indicator 1,, (t) = [t 2 a0 ] . Then the derivative with respect to 6of the

aforementioned log-likelihood can be written

{51101, T)——g—+ +(1- 6,,,)1(U)—9—-'—-+
g2: 8 8' p jg, 8(Ui’6)

G
+1/GI, T U, V‘5")(T’}'9) d,,(6).
( )G.(,A )— W6) 1:;

A n G A
The estimator 6 is a solution to 22d,, (6) =0. Consistency of 6 follows from
1:1 g=l

the standard regularity conditions on the functions 6 —-) g(-,6) and 6 —) p(-,6) for

maximum likelihood estimation of 6. In the sequel we will call 60 the true parameter.

I: 0
Note that d,, (60) is a qxl vector. Using a Taylor expansion of 2 2d,, (6) at 60
i=1 g=1

yields

A n G
J;(6-60)=-(%22d., (6)) (71:22:41,160).

i=1 g=1 '=l g=l

where 6 is between 6 and the true 60. Also, 6,, (6) is the derivative of d,, (6) with

respect to 6. Note that 6,, (6) is a qu matrix. Now use the standard arguments to claim

G
d,d,,(6)—)E(Z ' ,(60)) in probability, and 71.;22d,,(60)=0p(1).
g=1 i=1 g=l

11 ENG

alu—

The last claim is simply a consequence of the central limit theorem.

G . G G
Now E(Zd,, (6O )) = —E((Zd,, (60 ))(Zd,, (60 ))) = 1(6O ). Hence we have
=l g=l g=l

124

n G
J'(6— 60): (1(6o))"(— $1.224,- ,(60))+0,, (1)

n G 5,,x,,u,,
We can expand the sum Z 2—7— to get
i=1g=l pig
L " Z5661 _
n i=1 g=l ﬁrg

n G 6‘1 x! u! I "

—ZZ-—— , , 8- 05221—267916 ,.6» 1616—60)

‘fginlgl pig rlgl(p(rg’6))

where 6 is between 6 and 60. Again use the standard arguments to claim

*

1 n G aigxig u, I
";Z(6(T1;,67)) V9 TE Em? 1,6082» V” , °

 

in probability. Denote this limit by D(6O ), a qu matrix. Note that D(60) is also the

ui I
same as E(Z—(V,p( T,,,6O)) ).
g= =T1P( ig’ 60

Combining these results we have the established the expansion

6‘

n G 6. . .
__1'_“g g=11"”22{m—D(60)J’1(9o)dig (60)}+0,, (1):

1g i=1 8=1 pig

0P)l:

”2 iZl{X,S,P,'lP,_lu, - 19(90)J-l (60)d1 (60)j6}+ 0P (1) ’

where d, (60) = [d,, (60 ),...,d,G (6O )] is a qu matrix, and jG is a 6x1 vector of 1’s.

Now apply the central limit theorem to show that it converges in distribution to a G-

variate normal, mean vector zero and variance matrix V.

125

G
The assumption AGl: E(Xi'ui ) = 0 means that E(Z x,-g uig ) = 0. If
g=l

E(xiguig ) = O or, more strongly, E(uig le-g ) = O for all g, assumption AGl holds.
Without additional assumptions on uig we can not conclude that 0(60) = 0. Suppose we

impose E(ul-g lxig,7})=0 for all g=l,...,G.Then

*

G aigxiguig * I G
E(Z (Vgp(7;gs60)))=ZE(uig lxig’Ti')
g=l

Xig(V6P(Ti; 6w :0
gamut; 60W

(P(T; ’90 ))2

 

 

Computing V

G 5‘ xigu,

Let k,- = X§SinlP,-"u,- = 24%. Then the aforementioned variance Vis

g=1p(7;g ’60)

I

v: E(k, —-D(60)J" (60m, (60)j0)(k,. —D(60)J" (60)d,.(60)j6) =
=E(k,.k; ) + 0(60 )J‘1 (60 )E(d,. (6O )jGj’Gdgwo ))J'1 (60 )D’(60 ) -

-D(60)J" (60)E(d,. (6,,)J'c,.it;)—E(It,.j’Gd;(60))J‘l (60)D’(60)
Notice that E(d, (60 )jGj’Gd; (6O )) = 1(60) which makes the second term above

D(6’O)J" (6O )D’(6O ).

G G 5.’ xf u.
Now consider E(d, (60 )ij§ ) = E((Zdig (60))(Z-ij-ﬁ». We will prove
g=l g=l P ig ’ 0

. . G xiguig " I
that this would reduce to D(60 ) = E(Z——T—(V9p(7}g ,60 )) ) so that Vcan be
8:] p jg $ 0

expressed as E(k,k;)— D(60 )J’1 (60 )D'(6o ), a similar formula to case 0:1.

126

Recall that

 

 

T-, . V .,6
d,,,(60)= 6,81,,(T,) V2“ ' 6’0)+(1—§,,,)I,,(U,-) 68(U' 0)+
V ,6
+(1/G)IG+1 (T, AU,)—6—p—(:'—Q,
p(TTHO)
25: 8x,gu,g
and let k,g —— .To prove the equality
p(T [8960)

G G 6,?‘x;u,- 6v (T16) ,
E((Zd,,(6o))(Z—i—Tg-—£-))=E(Z 9” g x,,u,,)

g=l g=1p ig ’60) g=l pT( ig ’ 60)

 

we ﬁrst prove that for any h =1,...,G we have
ihxihuih __)_ E(V6P(T ih,90) X,

E(( d, (6 >>
E g 0 1203.26,) 1602.60)

 

ihuih)'

We prove the equality above for h=1. If h>1 the proof is the same. So,

 

" (T, ,6) 6,.‘x’ u
A: E(( d,- (6 ))____i1xiluil ___): _E({ [g (T ) Vﬂp 0 i1 il })+
gz=l g 0 P(7i1690){2=1§; ”(igo) P(T 11,60)

V6g(Ui ’60) ﬁlxiluil
g(Ui’60) P(T [19 60)

 

+E(Z(1—6,; )1, (U,)
g=l

(2', 6 ) x u
+E( (l/G)I , (7}A U,) V91” 0 1 "1 "1 .
gZ=1 G 1p(2' 60) p(t-1.90)

 

Dropping orthogonal terms,

V T,6 x . V T6 x, u,
A: E({6i111(T) 9P(T 0)6111u11 +2 jg 8(1) 0P(T 0)6 ll 1 l }+
p( i160) p(nlrgo) gZZ p(T ’60) p(T [1960)

127

:1- V 8(U' ,6)6 I“Ix “-
+E 1’6 I U,- 0 i 0 i1 ll 11
«g .,),( ) g(U.-.6o) 41,3609”

V6P(T 60)61'1xiluil }).
p(T, 60) p(T,1,90)

 

 

E({Ui AUi 27' T]

The ﬁrst term of the sum above is

 

V p(T ,M) “X 11 P(T 9 )I
E(6IT, ‘9 ° ‘1‘“): E(iT<UiI(T.)—"—-—'——9—xiu u,)=
11( ) pT-( ,60) p(TmBo) 11920 60) 1 1
=E(11(T .) V9p(l’ 60) x’

pT( ) xi] “11)
1" 6'0
The second term is:

T-,6 x; u,-
E(Zé‘i lé‘iglg(T ) V9p( t 0) 1‘ 1 _
322p(71"60) P(Tuﬂo)

 

=E(Z[T;' SU,]18(7})VQP(TI’60) xiluil _
g22 1701490) P(aiago)

 

xfu. xfu.
=E( I (T,)V adj-,6 )——"—i—)=E(ir>T,-.>.a 1V p(T,-,6 )——"—"—).
3222 g 9 0 p(al,60) l a 0 P(artgo)

The third term is

 

E({Z5 ,1(1- 5, g)[g(U,)V68(Uivao) xﬂu,l

¥ }):
g>2 8(Ui’60) P(Eliao)

v . f .
=E({Z[ag _U, <73; ] 98(U"6°) x'lu'l
g.>_2 8(Ui’60) P(ahgo)

 

})=

V68(Ui ’60) xiiuil

=E([ sU,<T,*]
{a1 g(U,-,60) p(azﬂo)

 

The third term becomes

V68(U1"60) xiruii
8(U1690) p(al’go)

 

E(italsu,<T.-*1 })=

128

: x11 “11 —V xiluil .
E(w,]: (u 60)) pa=(1.60)) E([T >a,1(vgp(al,6o)- gp(T,- ,06» ”(1,609

The last term is

(1.6) w , ,5“.
-61‘79” ° ““ 1)=E<{1T,-2r1v (n6)—————"" 1)
p<r6o> p( .1. 61 ‘9” ° p(alﬂo)

 

E({CZ'1[T/\iU

Combining the above formulas we get

V0p(7;'9 60) x!

‘7‘, ll
1'] i1 i1
xilu i1 ) +

G
E((Zd, ,(6011
g= p

 

)=E(I (T)
(Tn-1.60) ' pa}. 60)

+E([T> T, .>_ a21Vap(T,,6o)—§i—‘ﬂ‘—)+
P(ahgo)

I

+E([T,* >a,](v,,p(a,,6o)— v6p(T,* BOD—M—H
Pa( 01900)

I

+E T,>TV 1,06 _x___,,u,,
(H ] 6P( ) NW)»

The second and fourth terms combined give

xi

E([T>T.- 2al1179p(T...60 )—""—‘—“——)+E<11T >r1vgp(r 601—— “5"“ 1):

 

 

pa( a19 00) 176001.90)
=E([T,-' 2 a. 1V6p(T,-*,6o)—""—‘i‘ﬁ—>.
p(alﬂo)
G .‘ f . V T6
Therefore E((Zd,g(6o))ﬂ‘;‘—"L)=E(1,(T,) 9“ " 0) x;,u u,,)+
g=l P(Thﬂo) P( 1" O)
)1:{ u- V ,6 x'
+E([7} 2“2]V6P(‘12v90)—'M—')=E( 9P(T1 0) x“ u“)

P012960) (119 60)

In conclusion V can be expressed as E(k,k,7)— D(QO)J'1(60)D’(60).

129

CHAPTER 3

ESTIMATING HOSPITAL COST FOR AMI

PATIENTS IN THE NIS 2000

3.1 Description of the data set and background

The Nationwide Inpatient Sample (N18)

The Nationwide Inpatient Sample (NIS) is the largest all-payer inpatient care
database that is publicly available in the United States, containing data from 5 to 8
million hospital stays from about 1000 hospitals sampled to approximate a 20-percent
stratiﬁed sample of US. community hospitals. It is part of the Healthcare Cost and
Utilization Project (HCUP), sponsored by the Agency for Healthcare Research and
Quality (AHRQ, formerly known as the Agency for Health Care Policy and Research).
Currently data is available for a 13-year time period, from 1988 to 2000, allowing
analysis of trends over time. Researchers and policymakers use the NIS data to identify,
track, and analyze national trends in health care utilization, access, charges, quality, and
outcomes.

NIS is the only national hospital database with charge information on all patients,

regardless of payer, including persons covered by Medicare, Medicaid, private insurance,

130

and the uninsured. NIS's large sample size enables analyses of rare conditions, such as
congenital anomalies, uncommon treatments, such as organ transplantation, and special
patient populations, such as children.

Inpatient stay records in the NIS include clinical and resource use information
typically available from discharge abstracts. Hospital and discharge weights are provided
for producing national estimates. The NIS can be linked to hospital-level data from the
American Hospital Association's Annual Survey of Hospitals and county-level data from
the Bureau of Health Professions' Area Resource File, except in those states that do not
allow the release of hospital identiﬁers. Beginning in 1998, the NIS differs from previous
NIS releases: some data elements were dropped, some were added, for some data
elements the coding was changed, and the sampling and weighting strategy was revised
to improve the representativeness of the data.

There is a growing literature on use of the NIS in health services research that we

4247‘ 64. For 2000, the NIS contains over 7.4 million discharges from

will use for guidance
28 states. Sixty strata are defined by a combination of region (Northeast, South, Midwest
and West), location (urban, rural), ownership'(public, private), teaching status, and bed
size (small, median, large). The ﬁrst stage samples approximately 20% of hospitals
within each stratum. Then all discharges from the sampled hospitals are included in the
database. Patient demographics in the NIS include: age at admission, gender and race.

Total charge and length of stay (LOS) are the main healthcare utilization variables
in the N18 for each hospital stay. Discharge status identifies whether it was routine, or

resulted in death, or the patient was disposed to other care facilities. The NIS database

contains a number of variables describing the hospital stay. Up to 15 diagnoses and 15

131

procedures are coded based on the International Classification of Diseases, 9‘h Revision,
Clinical Modiﬁcation (ICD-9-CM). They can be used to select samples of hospital
discharges for speciﬁc diagnoses and procedures. A broader categorization of ICD codes
may be used for this selection based on HCUP’s Clinical Classification Software (CCS),
or by diagnosis-related groups (DRG) that combine information on patient age, sex,
diagnoses and procedures accounting for relationships among them and to inpatient
resource use.

The main limitation of the NIS is its inability to track individual patients. Each
record in the NIS is a separate hospital discharge, and thus we cannot identify multiple
admissions by the same individual within a year (or across years). Also, the NIS does not
include all preoperative and postoperative treatments rendered in an outpatient setting,
although services in ambulatory surgery centers may be included. We will use the NIS to
examine length of stay and hospital charges associated with admissions for heart disease.
In particular we concentrate on treatments undergone for acute myocardial infarction

(AMI).

Coronary heart disease and treatment procedures

The heart is a muscle that works 24 hours a day. To perform well, it needs a
constant supply of oxygen and nutrients, which is delivered to the myocardium (heart
muscle tissue) by the blood through the coronary arteries. The blood flow to the heart can
be reduced by a process called atherosclerosis, in which plaques of fatty substances build
up inside the walls of blood vessels. The plaques attract blood components, which stick to

the inside surface of the vessel walls. Atherosclerosis can affect any blood vessels and

132

causes them to narrow their lumina and harden their walls. This process develops over
many years and can begin early, even in childhood.

Coronary heart disease (CHD) is the most common form of heart disease, the
leading cause of death for Americans. About 12.6 million Americans suffer from CHD,
which often results in a heart attack. About 1.1 million Americans suffer a heart attack
each year, and about 515,000 of these heart attacks are fatal65 that is about 2,600 every
day; one person every 33 seconds. It takes more lives than cancer-in fact, more than
cancer, accidents, and the next five leading causes of death in the United States
combined“.

In CHD, atherosclerosis affects the coronary arteries. The fatty buildup, or plaque,
can break open and lead to the formation of a blood clot. The clot covers the site of the
rupture, also reducing blood flow. Eventually, the clot becomes ﬁrm. The process of fatty
buildup, plaque rupture, and clot formation recurs, progressively narrowing the arteries.
Ever less blood reaches the heart muscle, and thus fewer quantities of oxygen and
nutrients reach the myocardium, leading to ischemia (oxygen starvation of the heart),
clinically translated into chest pain. Depending on the degree of pain, the level of
physical activity that pain occurs at, and the degree of coronary obstruction, coronary
artery diseases can be classiﬁed into asymptomatic or mild angina, angina class II-IV or
unstable angina, acute myocardial infarction, ischemia after Coronary Artery Bypass
Graft (CABG)67.

Following previous analyses of charges and LOS of AMI patients in the MICH

48-50

study , we will focus on patients admitted in the hospital with AMI, a common high-

mortality condition whose outcomes are affected by the process of care. We only focus

133

on patients that underwent as their primary procedure Coronary Artery Bypass Grafting
(CABG), Cardiac Catheterization (CATH) or Percutaneous Transluminal Coronary
Angioplasty (PT CA) or patients with AMI who underwent no procedures at all.

Cardiac catheterization

In cardiac catheterization (abbreviated “CATH"), a diagnostic procedure in which
a very small catheter (hollow tube) is advanced from a blood vessel in the groin through
the aorta into the heart. Once the catheter is in place, several diagnostic techniques may
be used. The tip of the catheter can be placed into various parts of the heart to measure
the pressure within the chambers. The catheter can be advanced into the coronary arteries
and a dye injected into the arteries (coronary angiography or arteriography). With the use
of ﬂuoroscopy (a special type of X-ray), the physician can tell where any blockages in the
coronary arteries are located as the dye moves through the arteries.

Percutaneous transluminal coronary angioplasty

Percutaneous transluminal coronary angioplasty, also known as PT CA, is an
established, effective therapy for some patients with coronary artery disease. PTCA is
used to dilate (widen) narrowed arteries. A doctor inserts and advances a catheter with a
deﬂated balloon at its tip into the narrowed part of an artery. Then the balloon is inﬂated,
compressing the plaque and enlarging the inner diameter of the blood vessel so blood can
ﬂow more easily. Then the balloon is deﬂated and the catheter removed. PT CA is a less
traumatic and less expensive alternative to bypass surgery for some patients with
coronary artery disease. In about 40 percent of patients who've had PTCA, the dilated
segment of the artery narrows again within six months after the procedure. They may

require either another PT CA or coronary artery bypass surgery“.

134

Improvements in technologies for PT CA include use of stents and more-recently
drug-eluting stents (DESs). A stent is a surgical stainless steel coil that is inserted into the
blocked artery via a catheter. The stent serves as a scaffold, supporting the artery walls,
and reducing the risk of the artery re—closing (restenosis) over time. Bare-metal stenting
after angioplasty has more durable effects than angioplasty alone, but researchers are still
seeking ways to reduce restenosis, which occurs about 20% of the time in the'clinical
setting. The most promising method to lower restenosis rates appears to be using drug-
eluting stents (DESs). Some researchers are hailing DESs as one of the greatest
interventional advances in the past decade. But others warn that while DESs appear to
offer beneﬁt over bare-metal stents, the hype may be outrunning the science. In his article
‘Drug-Elutin g Stents Show Promise’, Mitka (2004)69 presents the beneﬁcial effects of
DESs, but also talks about hospital concerns that reimbursements for such devices may
be too low, leading to ﬁnancial losses for these institutions. NIS database precedes the
widespread use of stents and therefore the analyses that we describe would refer to
PT CA.

Coronary artery bypass graft

Also known as “bypass surgery” or CABG, coronary artery bypass graft operation
uses a piece of vein taken from the leg, or of an artery taken from the chest or wrist. This
piece is attached to the heart artery above and below the narrowed area, thus making a
bypass around the blockage. Sometimes, more than one bypass is needed. Bypass surgery
may be needed due to various reasons, such as an angioplasty that did not sufﬁciently
widen the blood vessel, or blockages that cannot be reached by, or are too long or hard

for, angioplasty. In certain cases, bypass surgery may be preferred to angioplasty. For

135

instance, it may be used for persons who have both CHD and diabetes. A bypass also can
close after a period of time. This happens in about 10 percent of bypass surgeries, usually

after 10 or more years.

Variations in hospital costs/charges

As suppliers of the most expensive type of health service, hospitals have been
particularly vulnerable to the pressures of competition. The health economics literature
contains a number of recent studies on hospital costs. A common technique that has been
proven to be well-suited to examining the effects of provider institutions on patient costs
in a managed care environment is multilevel modeling. This framework is used to
analyze data that fall naturally into hierarchical structures consisting of multiple ‘micro’
units nested within ‘macro’ units. Multilevel models analyze variability arising at distinct
levels within data by extending the more traditional statistical techniques and introducing
a degree of realism often absent from single-level models such as multiple regressions. A
good description of why charges may differ between facilities is presented in Health
Core Data Report, 2000 (PHC 5320) from Department of Health and Family Services,
Division of Health Care Financing, Bureau of Health Information”.

New technology - The equipment facilities use to provide services differs in age,
sophistication, and utilization. Facilities with the latest technology may have higher
charges than those with older, less sophisticated equipment.

Staﬁ‘ing costs - Salary scales may differ regionally and are typically higher in
urban than rural areas. Furthermore, competition for nurses and other skilled personnel

may result in higher staffing costs and, therefore, higher charges.

136

 

Intensity of care - Facilities differ in the severity of illness of patients (i.e., some
facilities care for more severely ill patients than others). Patients within the same
diagnosis or procedure classiﬁcation may need very different levels of service and staff.

Eﬁ‘iciency of operation - Facilities vary in the utilization and efficiency of services
they provide. Infrequently used services may cost more per patient than services that are
used more frequently.

Diﬁ‘erences in coding - Facilities vary in their coding systems and personnel, and
in the number of billing codes they put on a billing form. The use of additional
appropriate codes may result in a patient being assigned to a diagnosis or procedure
classiﬁcation with greater reimbursement or may otherwise justify higher charges.
Facilities with better-trained personnel or more sophisticated coding software are more
likely to place these additional codes on their billing forms and, therefore, may have
higher charges than facilities with less expertise.

Discounts - Facilities negotiate and offer volume discounts to Health Maintenance
Organizations (HMOs), Preferred Provider Organizations (PPOs), and other large-volume
purchasers of health care services. The number of these organizations has grown
considerably in recent years. Full charges are paid for only a very small proportion of
patients.

Percentage of government pay — Government payers generally reimburse
facilities at rates below their full charges, similar to the discounts offered to commercial
payers. Therefore, facilities with a large percentage of patients whose charges are paid

either by government programs or discounted commercial payers may report large gaps

137

between what they bill and what they actually receive. This may result in higher charges,
including those for non-discounted patients.

Facility price structures - Some facilities spread the cost of services and
equipment over all patients. Others bill the full cost of a service to those patients actually
using the service. Furthermore, facilities may provide some services at a loss while
allowing other facility operations to subsidize the losses. Any of these practices can result
in signiﬁcantly different charges for a given diagnosis or procedure classiﬁcation.

Range of services provided - Facilities differ in the range of services they provide
to patients. Some may provide the full range of services required for diagnosis and
treatment during the stay. Others may stabilize patients and then transfer them to another
facility for more specialized or rehabilitative care.

Data-related issues - Facilities differ in the number of cases served, the case-mix
and illness severity of patients, and the comparability of patients within a given diagnosis
or procedure classiﬁcation. For example, a single case can greatly affect a facility’s
average charge if the facility reported only a few cases.

Capital expenses - Facilities differ in the amount of debt and depreciation they
must cover in their rate structure. A facility with a heavy debt load, a new building, or a
major renovation to amortize may have higher charges than a facility not facing such
expenses. Furthermore, facilities may choose to lease or purchase equipment or facilities.
The choices made about ﬁnancing of capital projects may affect charges in different
ways.

Rice et al (1997)“, Carey (2000, 2002) 52' 53 and Goldstein (2002)54 insist on the

usefulness of multilevel methods in studies where data on cost are collected over multiple

138

sites (hospitals in our data). This chapter takes a multilevel modeling approach to
estimate costs in patients hospitalized for AMI. Patients nested within hospitals form a
natural hierarchical structure suited to analysis that models each level simultaneously.
The existence of a non-zero intra-hospital correlation, resulting from the presence of
more than one residual term in the model, means that traditional estimation procedures
such as OLS, which is used for example in multiple regression, are inapplicable.
Application of OLS techniques leads in this case to incorrect inferences, although when
the intra-class correlations are small we can expect reasonably good agreement between
estimates from the multilevel and the simpler OLS approaches. One can also go one step

further and extend our model to include higher levels such as hospitals nested in states.

Correlates of total charges available in the data set
Possible correlates of total charges available in the data set are: procedure, gender,
age, number of procedures, hospital characteristics (location, teaching status, bed size,

region), length of stay (LOS), Charlson Comorbidity Index (CCI).

Procedure: Clinical Classifications Software (CCS), developed by the Agency
for Healthcare Research and Quality (AHRQ), is a tool for clustering patient diagnoses
and procedures into a manageable number of clinically meaningful categories. CCS is
used for grouping conditions and procedures without having to wade through thousands
of codes. This "clinical grouper" makes it easier to quickly understand patterns of
diagnoses and procedures so that health plans, policymakers, and researchers can analyze

costs, utilization, and outcomes associated with particular illnesses and procedures.

139

CCS collapses diagnosis and procedure codes from the International Classiﬁcation of
Diseases, 9th Revision, Clinical Modiﬁcation (ICD-9-CM), which contains over 12,000
diagnosis codes and 3,500 procedure codes. Without CCS, the large number of ICD-9-
CM codes poses difficulties in statistical analysis and reporting.

CCS consists of two related classification systems, single level and multi-level,
which are designed to meet different needs. The multi-level CCS groups single—level
CCS categories into broader body systems or condition categories (e. g., "Diseases of the
Circulatory System," "Mental Disorders," and "Injury"). Multi-level CCS is most useful
when evaluating larger aggregations of conditions and procedures or exploring them in
greater detail. Single-level CCS is most useful for ranking of diagnoses and procedures.
The single-level diagnosis CCS aggregates illnesses and conditions into 259 mutually
exclusive categories. We consider discharges that have either CABG (CCS=44), CATH

(CCS=47), PTCA (CCS=45) or no procedure (CCS=.) as a primary procedure.

Demographics Variables: We use gender and age of the patient. Patient age is
recorded as age in years at admission. Other demographic variables in the NIS are race

and household income by zip code, but they are not uniformly recorded for all states.

Number of procedures (NPR): NPR counts the number of ICD-9-CM
procedures coded on the discharge record. The principal procedure is included in this
count. A value of 0 means that the patient underwent no procedures on record, a value of
1 means that only the primary procedure is recorded, secondary procedures are left blank,

etc. A maximum of 15 procedures have been retained on a NIS inpatient record. States

140

that provide fewer than 15 procedures have had the procedure vector padded with blank
values. For example, if a state supplied 5 procedures, PR6 through PR15 are blank (" ")
on all records from that state. States that provide more than 15 procedures may have

information truncated. All states have provided at least 6 procedures.

CCI (Charlson Comorbidity Index“) is used to assess comorbidity. CO is a
weighted sum of the presence of 15 speciﬁed medical conditions at admission. There are
two ICD-9-CM adaptations, Deyo (1992)72 and Dartmouth-Manitoba, Romano et al
(1993)73 of the Charlson comorbidity index, as well as other various searches for
improved clinical comorbidity indices74'76. We use the Dartmouth-Manitoba version of
the index. The conditions and associated weights (shown in parentheses) are:

CHF = 'Congestive Heart Failure' (1)

PVD = 'Peripheral Vascular Disease' (1)
CVD = 'Cerebrovascular Disease' (1)

DEM = 'Dementia' (1)

COPD = 'Chronic Obstructive Pulmonary Disease' (1)
ULCD = rUlcer Disease' (1)

MLIVD = 'Mild Liver Disease' (1)

DIAB = 'Diabetes without Complication' (l)
HEPL = 'Hemiplegia' (2)

REND = 'Renal Disease' (2)

DIABCC = 'Diabetes with Complications' (2)

MALIG = 'Any Malignancy' (2)

141

SLIVD = 'Moderate or Severe Liver Disease' (3)
CANCER = 'Metastatic Solid Tumor' (6)

AIDS = 'AIDS’ (6)

NIS_STRATUM is a four-di git stratum identifier used to post-stratify hospitals
for the calculation of universe and frame weights. The hospital's census region,
ownership/control, location/teaching, and bed size were obtained from the American
Hospital Association (AHA) Annual Survey of Hospitals.

Region: States were grouped in 4 regions

Northeast (CT, MA, ME, NY, NJ, MA)

Midwest (IA, IL, KS, MO, WI)

South (FL, GA, KY, MD, NC, SC, TN, TX, VA, WV)
West (AZ, CA, CO, HI, OR, UT, WA)

Location] Teaching: A metropolitan statistical area is considered urban, and a
non-metro statistical area is rural. Teaching hospitals have an AMA-approved residency
program, are a member of the Council of Teaching Hospitals (COTH) or have a ratio of
full-time equivalent interns and residents to beds of .25 or higher. All hospital in the rural
area were classiﬁed as non-teaching.

Bedsize categorizes the number of short-term acute beds in a hospital into ‘small’,
‘medium’, ‘large’. A hospital's bed size category depends upon region, location and
teaching status. For example, urban teaching hospitals in the Northeast were classiﬁed as
having ‘large’ bed size if the number of hospital beds exceeds 425. In the western region

for the same location/teaching status, the ‘large’ bed size category is deﬁned if the count

142

exceeds 325. This categorization was created to have approximately 1/3 of hospitals in
each bed size category in a given region, location, teaching status combination.
Ownership/control includes categories for government nonfederal (public),
private not-for-profit (voluntary) and private investor-owned (proprietary). However
when the sample size was sufﬁciently large, hospitals were stratified as public
(Ownershipzl), voluntary (Ownership=2) and proprietary (Ownership=3). This
stratiﬁcation was used for southern rural, southern urban non-teaching, and western urban
non-teachin g hospitals. For smaller strata — the midwest and western rural hospitals —a
collapsed stratiﬁcation of public versus private was used, with the voluntary and
proprietary hospitals combined to form a single ‘private’ category (Ownership=4). For all
other combinations of region, location and teaching status, no stratiﬁcation based on
control was advisable given the number of hospitals in these cells (Ownership=0).
Although the CONTROL variable was used to deﬁne the strata, AI-IRQ collapsed some
categories. In consequence, because of the overlapping categories, this variable will not

be used further.

Total Charges

Total charges in NIS data are the amount the hospital charged or billed for the
entire hospital stay. They do not necessarily reﬂect reimbursements or costs. Charge data
were present for 98 percent of all discharges. Charges are generally higher than costs.
Generally, total charges (TOTCHG) do not include professional fees and non-covered
charges. If the source provides total charges with professional fees, then the professional

fees are removed from the charge during HCUP processing. In a small number of cases,

143

professional fees were not be removed from total charges because the data source did not
provide the necessary information. Two states, MA and W1, will be excluded from the
working data set (see exclusion criterion #8 in the next section), because they may have
included professional fees. Emergency department charges incurred prior to admission to

the hospital may be included in total charges.

3.2 Creating a working subset data set

From the NIS 2000 Core file of 7.4 million discharges we extracted all records
with a primary diagnosis of AMI. This is based on DXCCS 1:100 or ICD9-CM codes
410.xx and it yields n=157,263 discharges. All discharges fall within Major Diagnostic
Code MDC=5 (Circulatory System). There are 165 possible primary procedures for these
discharges. Table 3.1 shows the distribution of primary procedures among discharges,
excluding procedures that are present in <1% of discharges. We will consider only
121,264 discharges that have either no procedure, Coronary Artery Bypass Grafting
(CABG), diagnostic cardiac catheterization or coronary artheriography (CATH) or

Percutanerous Transluminal Coronary Angioplasty (PT CA) as their primary procedure.

144

 

Guided by several published analyses from the N IS, the following exclusion

criteria will be applied (all n’s are out of the 121264 patients):

1.

We exclude admissions for AMI that were not the first episode of care for a newly
diagnosed AMI (ICD9-CM code: 410.x1), or if the location of the infarction was
unspeciﬁed (410.x0)64 (n=983).

We exclude discharges in hospitals that performed S 5 cases since these cases are
most likely coding errors” 43' 47 (n: 256).

We exclude patients <18 years or >85 years at admission. The pathophysiology of
disease in patients <18 years is likely to be different than for adults (n=22). Also
patients >85 years are less likely to be treated aggressively than younger
patients” 47' 6“ (n=1 1,238).

Because of concerns that patients with diagnosis code ‘AMI’ may have included
‘rule out’ AMIs, all discharges of < 2 days are excluded (n=l6,682).

Patients with LOS >60 days are eliminated because they are probably long term
care patients 52 (n=57).

Newborn admission types are also excluded (n=7).

For hospital costs within the US health-care system one common approach is to
assign a cost to each hospitalization based on the basis of its associated
Diagnostic-Related Group”. In order to make sure that we only have patients with
AMI in the working data set, we exclude patients with the following DRG in
effect on discharge date (DRG=104, 105, 113, 119,124, 125,144, 145, 468,

477,478,479,483) (n=l,6l 1).

145

 

10.

11.

12.

Excluding states: States MA and W1 may have included physician fees in the total
hospital charges. Some hospitals in TX did not report total charges until July
2000. Because identiﬁcation of these hospitals was not available, we exclude all
discharges from TX, MA, WI (n=20,560).

One hospital in Arizona is also excluded because 44.3% of its total charges data is
coded C=inconsistent (either excessively low or high) (n=149).

One can not link information from the discharging and receiving hospitals.
Therefore, if we want to capture LOS and total hospital charge for AMI
admissions, then we must restrict to ED admissions and all routine referrals from
physicians, clinics, and HMOs (n=23,244).

We drop records missing an essential ﬁeld such as patient age, gender, length of
stay, total charges, DRG, admission source or discharge status (n=10,511).

We also eliminate patients who left the hospital against medical advice since we

do not know whether they received subsequent care (n=703).

The exclusion criteria above are not of course mutual exclusive, a discharge may

be excluded for more than one reason. After all exclusions we are left with n=58,469

discharges.

Censoring

The variable DISPUNIFORM indicates the discharge status assigned to each

record. We have already excluded missing, invalid records and patients who left the

hospital against medical advice. DISPUNIFORM=1 (routine) or DISPUNIFORM=20

146

 

(died in hospital) cover a complete hospital episode. In these two cases we will regard
TOTCHG and LOS as completely observed. Patients transferred to a short-term hospital.
transfers to skilled nursing facilities (SNF), intermediate care, home health care have
incomplete TOTCHG and LOS. They will be treated as censored. Of the 58,469
discharges in our working data set 18,753 (32%) of the discharges are censored. Figure
3.1 summarizes the process of creating our data set from the initial 157,263 discharges

for AMI.

Characteristics of the patients

Table 3.2 shows the characteristics of the patients. As already mentioned we only
consider discharges that have either CABG (CCS code: 44; N=7,369, 12.6%), CATH
(CCS code: 47; N=14,264, 24.4%), PTCA (CCS code: 45; N=17,901, 30.6%) or no
procedure (CCS code: N=18,935, 32.4%) as a primary procedure. There are N:
35,977 (61.5%) males and mean age is 65.72 (STD=12.7). Since the higher the number of
procedures, the higher the cost, we categorized patients in 3 possible groups: no
procedure (N=18,935, 32.4%), 1-4 procedures (16,489, 28.2%), 5 or more procedures
(23,045, 39.4%). We will form two comorbidity groups based on CCI score: 0 (no
comorbidities, N=23294, 39.8%) and 1 or more (at least one comorbidity). The biggest
percent of discharges in the data set comes from the South region (43.1%), followed by
Midwest (19.9%), Northeast (19.6%) and West (17.4%). The hospitals are categorized as:
rural (13.5%), urban/teaching (42.1%), urban/non-teaching. Hospitals are grouped as:

Small (8.9%), Medium (25.5%) or Large.

147

3.3 Estimation methods

We will follow methods similar to those described in Chapter 2. Total hospital
charge (TOTCHG) and length of stay (LOS) are two primary outcome variables. We
identify the average total charge over a speciﬁed duration 2' with the net present value.

Given a covariate profile 20 and ignoring discounting the average cost restricted to 2' is

,u(T) = — Eco, (s | ZO )dS (t | Z0) . We ﬁrst describe a multilevel model to estimate

c0 1 (t | Z0 ) , including a method for estimating weights. Then we will describe methods to

estimate the LOS distribution. Combining these two estimation steps we arrive at an

estimate of ,u( 2') .

ESTIMATING co, (1 | 20)

A traditional approach for estimating patient costs involves ordinary least squares
(OLS) models containing a response variable at the individual level and correlates at both
individual and higher levels of analysis. This disregards correlations structures in the data
emanating from common inﬂuences operating within groups. As argued in the previous
section, hospital characteristics, such as quality of service, managerial performance or
treatment patterns administered by physicians may impose distinct effects on the costs of
treating patientssz‘ 53.

The working data set consists of 58,469 discharges for AMI patients. Of these,

39,716 are uncensored for total charge and LOS. There are 617 hospitals with discharges

between 6 to 1181 per hospital. Because charge data is skewed we consider a log-

148

transformation to mitigate the effects of skewness of TOTCHG. Figure 3.2 shows the

histogram of log-transformed charges in all 58,469 discharges.

3.3.1 Unconditional means model

A one-way random effects model is ﬁtted first. We express the outcome,

Y,,- : log(TOTCHG) associated with the i'h discharge in the 1"” hospital, as a linear
combination of a grand error mean ﬂo , a series of deviations from the grand mean no,
and a random error 8,}:
Model 1: Y,,. = ,6,) +110]. + 5,].

where no, ~ iid N (0,030), 8,,- ~ iid N (0.03) and uoj , 6,,. are assumed independent.

We exclude censored observations from this model by using weights 5,- , where 5,-
is the censoring indicator. Note that since “0," and 8,,- are assumed independent, we have
Var(YU ) = Var(ﬂo + “o; + 8,,- ) = 030 + of , and for two different discharges i at i’ within

the same hospital Cov(Y,,. , Y,,, ) = of.
Using estimates of the variance components, the intra-class correlation

2
0' . . .
p = ———“—°—— is estimated to be 3954 = .533 .The most vanatron occurs at the

3,, +03 .3954+.3466

 

hospital level and the value of ,0 suggests that there is considerable clustering effect

within hospitals, invalidating the traditional OLS results.

149

3.3.2 Including effects of hospital level (level 2) correlates

The unconditional means model, Model 1, provides a baseline against which we
can compare more complex models. We include hospital characteristics which are level 2
variables: Region, Location, Bedsize. These categorical variables have more than 2
levels. For example, region has 4 categories (Northeast, Midwest, South, West). To keep
our notation simple we use one name for each categorical variable indicating the region,
location and bed size of each hospital. The model is:

Model 2: Y,,. = ,6,),- + 8,, and
,6,),- = :50 + ﬂOIRegionj + ,602 Location ,- + [303 Bedsize,- + “0}

where no, ~ iid N (0,030), 8,,- ~ iid N (0,0?) and uoj , 8,, are assumed independent.

Substituting the level 2 equation into the level 1 equation yields:

1’},- = [ 160 + ﬂmRegionj + ﬂozLocationj + ,603 Bedsize, ]+ [uoj + 8,, ].

The terms in the ﬁrst bracket represent the ﬁxed part, while the terms in the second
bracket represent the random part. The residual (error) variance within hospital 0'? ,
remains almost unchanged (going from .3466 to .3465). However the variance
component representing variance between hospitals, 0'30 , is much lower (going from

.3954 to .1682). Therefore the three variables, Region, Location, Bedsize explain a large
portion of the hospital-to-hospital variation in log total charges. About

.3954 -.1682
.3954

 

= .57 of the explainable variation in hospital log total charges is explained

by the three variables. We can estimate once again the intra-class correlation as

150

 

.1682
.l682+.3465

 

= .33 which remains high.

3.3.3 Including effects of discharge level (level 1) correlates

We illustrate the effect of including discharge level correlates by initially
examining a model with only level 1 correlates, excluding level 2 correlates. After
reviewing the necessary steps for including level 1 correlates, we ﬁt a combined model.
Suppose ﬁrst that all correlates: Procedure, LOS, Age, Female, CCI, NPR are ﬁxed
effects. Since NPR, the number of procedures, is in strict dependence with Procedure (for
example a patient who receives a procedure, CABG, PT CA or CATH will have NPRZ 1,
a patient who has no'procedures will have NPR=O), we use the cross-term
NPR*Procedure to assure a unique interpretability of the coefﬁcients in the model. Based
on residual analyses we also decide to include LOSSQ = LOS2 and Agesq=Age2 in the
model.

Then the model can be written as:
K
Model 3a: Kj:160j+zﬂkx(k)zj+€zj , ﬂojzﬂo+u0j
k=1

where uoj ~ iid N (0,030), 8,, ~ iid N (0,03) and uoj, 8,,- are assumed independent.
The covariates x,“ include LOS, LOSSQ, Age, Agesq all as continuous variables; and

indicator variables for procedure type (CABG, PTCA, CATH), Gender (female), CCI (0),

NPR (14), and two dummies for CABG*NPR (CABG, 1-4 procedures) and PTCA*NPR

151

(FT CA, 1-4 procedures). As expected these fixed effects variables at the discharge level
considerably reduce the within hospital error variance, of , going from .3466 in Model 1

to.0965.

Since LOS is a very strong correlate of the outcome and the variance of Y,,. varies

with LOS as seen in Figure 3.3, we consider having a random coefficient for LOS. This
will allow for the relationship between LOS and patient total charges to vary across

hospitals. The model can be written as:
K
M0d613b3 Y,,. =50}+511L0511+Zﬂtxtm+56 .
It=1
1501' :30 +u0j'ﬂ1jzﬂl +“11‘ .

j]~ lidN(02922x2)9 gij ~iid N(O,U§) and (uoj’ulj)’ 8,] massumed

“o
where
ul ,-

2
UuO 0.1401

2

] is chosen by
0.1401 0.141

independent. Our unstructured covariance matrix 2 =[

considering the AIC, BIC criteria. Model 3b becomes

K
I . a o a 0
Y,,- : ,60 + Z ,6,, x, k ),.j + “01 + u1 jLOSU + 8,, . Wrthin a hosprtal, the variance of Y,,- is
k=l

Var(Y,-,- ) = Var(uo ,. + u, ,.Los, + 5,, ) = 6,3,, + 20,0,Los, + of, L053. + 6,2. The estimated
covariance parameter estimates are 0'30 = .1958, of, = .00052, 030, = —.00449. Also

0',2 = .09352. We compare the two models above, Model 3a and Model 3b. When fitting

this models we ﬁnd that the change in -2LL when using REML method to estimate

parameters is ~728.6. An approximate test of the null hypothesis that this change is 0 is

152

obtained when we compare the difference in -2LL’s to a 2’2 distribution with 2 degrees

of freedom. The p-value is <.0001. Therefore we will keep random slopes for LOS.
Should LOSSQ, along with LOS, be included as random effects at the hospital

level? If we do so, there is an additional variance component that comes from the LOSSQ

qu

term. The error term is ul ,- . Figure 3.3 suggests that LOSSQ is not needed. Fitting a

ugl-

2
0140 0.1401 01102

general covariance matrix X = d,,,“ 03, 03,12 results in an estimate 0'32 =0, i.e.

2
0.1402 0.1112 0.142

the estimate is on the boundary of the parameter space. Another reason for this is seen in

Var(1’}j) = Var(uoj + uleOS,-j + uZJ-LOS,-,2 + 8,,- ) =
_ 2 + 2 2 2 4 2 2 3
— ,0 d,,,LOS,-,. + ouzLOSU + o, + 20,0,LOS,,- + ZoumLOSU + Zo'mLOSU

which has a 4th-order term in LOS. If 0'32 2 0 so are 0,02 and Gun.

3.3.4 Including both effects of discharge level (level 1)

and hospital level (level 2) correlates

Having considered models with either just 1 level correlates or level 2 correlates,
we can now consider models which contain variables of both types. Consider the

following model:

153

K
MOde‘ 4: Yij = 501‘ + 511L056 +Zﬂkx(k)ij +81,- .
k=1

where ’50]. = :60 + ,6,),Regionj + ,602 Location j + ,603 Bedsize, + uoj .

.31; = .51 +ﬂnRegionJ. +,B,2Locationj +ﬂ13Bedsizej +u1j.

u - 0'2 0'
Here [ 0’} iidN(02,22,2) where z: “0 “20‘ , 5,, ~ iid N(o,o§)and
"U 01101 0141

(“01’“11') , 8,, are assumed independent. Combining the two equations above yields the

single two level equation model:

3 3 K
Y1]: .60 +h2ﬂ0hzhj+hzﬂ1hzhj*LOSTj+l,ZﬂI:x(k)1j+(u0j+u1jLOSij +513)
=1 :1 =1

where 21 ,- = Region, , 22,- : Location,- , z3j = Bedsize}.

In addition to the cross-terms required by the model: Region*LOS,
Location*LOS, Bedsize*LOS, and NPR*Procedure, signiﬁcant cross-terms

procedure*LOS, Procedure*LOSSQ, NPR*LOS, CCI*LOS, CCI*LOSSQ are included

in the model. Within a hospital, the conditional variance of Y,,- is
Var(Y-- ) = 030 + 20,,01LOSU + 031L055? + of and the covariance between two different
patients: Cov(Y,-j ’Ylj ) = CoVar(u0,- + uleOSU + 80,140, + uU-LOSU + 8,,- ) =
= 3,, + of, Los,.,.Los,,. + 0,,01 (1.05,, + 1.05,, ) .
These elements form higher level block matrices V, represented as
V, = ZQQZ’+ Q1 , where Z contains the jth hospital values for the known design matrix of

explanatory variables for the random portion of the model, etc. The covariance parameter

estimates are 630 = .1189, of, = .000202, 6,,, = —.00137 and the estimated level one

154

variance is ,2 = .08866. This can be compared with the level one variance obtained from

the null model, Model 1, or total within hospital variance, .3466. The ratio

.3466 - .08866
.3466

 

= .74 represents the proportion of within hospital variance that is

explained by correlates (comorbidity, procedure, demographic variables). The between

hospital covariance is 0.

3.3.5 Inverse probability weighting

Before estimating c01 (t | 20) for speciﬁc covariate proﬁles, we need to account
for the uncensored observations in the analysis. As discussed, previously in Chapter 2, we
will weight each observation by w,- = 5,17,- S 1']/ G(T} ) , where 5,- is the censoring
indicator and G(-) is the survival function for the censoring distribution.

To be consistent with our notations above we will use j for hospital and i for
discharges, although we are well aware that these are not the standard notations when one
would consider hospital as the primary units and discharges within hospitals as subunits.
Consider the Random-Effects model

yj=Xjﬂ+Zjaj+uj (3.1)
where the dimensions of y, and u, are n,- XI, ,6 is pxl, X, is n, Xp, Z, is n, Xq

and a, is qxl , where p is the number of fixed effects, q is the number of random effects

155

and n ,- is the number of discharges in the jth hospital. We assume the usual conditions of

a Random-Effects model34

E(u, |X,,Z,-,a,.)=0, E(a,|X,.,Z,-)=0.
Let v j = Z ,a ,- + u ,- be the composite error. Under these assumptions E (v j ) = 0
and Var(v,)= 2,02; +R,-, where Var(a,)= E(aja; |X,-,Z,)=G and
Var(u, ) = E(uju; IX,- ,Z,- ,a', ) = R,.. The form of G is speciﬁed by the TYPE option in
the RANDOM statement, while the form of R j is speciﬁed through the TYPE option in
the REPEATED statement. To estimate ,8 we use generalized least squares (GLS)

estimation, so an estimate 5 of ,6 that minimizes the expression

Z(y, —X,.6)’V,."(y, —X,-,6) is given by
1'

13': (XX, v,.'"X )“(ZX’ .,v"y,

Weighting observations

Let W,- = diag{ w, ,i =1,...n, } denote a diagonal matrix of weights w,-,. for
discharges within the jth hospital. Weighting observations is more than simply

transforming model (3.1) by W)”. If we do so, and write the transformed model as
y,=X,ﬂ+Z,a,-+ii, (3.2)
(where y, = W,” y ,- and the other quantities are similarly defined), then it is easily seen

that GLS applied to (3.2) does not change our estimator ,5 , indeed if

156

17,. =Var(W,1/2v, )=w,‘/3(z,.oz, +R,)W,.‘/2 then
'I"-l" -1 'I"-1~ _ I —l -l I -l
(ZXJVJ' X1) (ZXTVJ y,)-(ZX,V, X1) (ZXIVJ' y,).
1 1 1 1
Consider first the standard linear regression model y, =x,,6+u,- (where we now

return to standard subscript ‘i’ for subject). By weighting the observations by w,- , we

actually estimate ,6 by minimizingz w,- ( y,- — x,-,6)2 . This gives

Ill

5 = (Z w,x,'x,-)_1(Z w,x,'y,-) and Var(,B) = (E(w,x,-'x, ))—l E(w,.zufxfx, )(E(w.x-'x- ))—1.

In the general RE model (3.1), by weighting we mean transforming y, —+ y, ,
Z, ——> 2,, X, —> 21,, but preserving the original variance form V, = Z,GZ, +R,.
Then 5 is obtained by minimizing 2(y, — X,,B')’W,”217,"'W,'’2 (y, - X,,B). Now

J

consider 17, = Z,GZ, +R,.

1. If G = 0, then W,“217,"W}/2 = Wj’zR,W}/2. In effect therefore, the original
R, is replaced by W,7”2R,-W,-“2 to get

6 = (Z X;W}’2R;‘W}’2X. 1“ (Z X;W}/2R;‘W}”y. >-
J' 1'

This is the practice adopted in SAS PROC MDED (Chapter 18, Version 8.2 p633). If in

- _ 2 "_ I —1 I
part1cularR,—0'£I then ﬂ—(ZX,W, X,) (:X,-W, y,).
l J

2. If G i 0 , then the form of V371 = (2,02, + R, )'l is more complicated. Indeed

157

v,T‘ =R,T.’[1— 2, (GT1 +2 RT‘Z) z’. ,R,T.‘]. Then
w,l’217,T‘W,l’2=-W,“2R,T‘[1—z',(GT‘+2,R,T'Z',)T‘Z,R,T‘]W,"2=

This shows that the effect of weighting is to change R,- to W,'“2R,W,'“2.

The ﬁnal form of the GLS estimator of ,5 under weighting is then

6: (ZX’ v,T'X, )T (ZX, v,T‘y, (3.3)

where V, =W,‘“2V,W,'“2 and the variance of )6 is

~
~

Var(6)= (ZX,V,"'X, )T(ZX,,T13 13,17, ,"1X,)(ZX,17,71X,)"',

where 13,- are the residuals y,- — X, ,5 . This can be obtained using the EMPIRICAL

option in the PROC MIXED statement. This estimator has been described in White
(1980)”, Liang and Zeger (1986)”, and Diggle, Liang, and Zeger (1994)80 and is
commonly referred to as the "sandwich" estimator. We need to specify this in accordance
with our methods described in Chapter 2 of the thesis. For this option to take effect we

must include the RANDOM or REPEATED statements.

If R,- :03]. then
vi-I =w}”v‘.-lw}’2 =w.“ (W"’z 02 ’-W}” +631>“W}” 421-02} ”WW"

and (3.3) gives 6: (Z X, (Z,Gz, +a§W,T‘ )T1 X, )T1 (Z X,- (z ,02, +a§W,T‘ )T1 y , ) .
1' 1'

158

3.4 Results of estimation

In this section we will concentrate on obtaining estimates of median LOS and
total charges at median LOS, at speciﬁed covariate proﬁles. For example, we vary the
type of procedure, hospital region and the number of comorbidities, but we keep ﬁxed
age (= 65), gender (= male), hospital location (= rural), hospital bed size (= medium) and
the number of procedures between 1 and 4, if there is a primary procedure. All analyses
were carried out using SAS. Various estimation methods for G and S, the survival
distribution functions for censoring time, U and LOS (= T), are discussed in Chapter 2,

Remark 2.2.1.2. Here, we choose parametric methods for estimating both G and S.

Estimation of G
Given a smaller number of covariates z ( procedure, Charlson Comorbidity Index,

region and number of procedures) an estimate of G(t | z) may be obtained from a
parametric model log(U ) = zﬂu + 0',,€ , where ,6,, and 0,, are parameters and 8 is an

unknown error with a speciﬁed parametric distribution. PROC LIFEREG is used for the

estimation of ,6,, . Various choices for the distribution of 8 were considered. Based on

the AIC we chose the log-normal and gamma distributions.

Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the
regression model estimates of the cumulative hazard from both log-normal and gamma
models are shown in Figure 3.4. The plotted pairs of points for the gamma regression
model initially fall on the referent line and then fall mostly above the referent line. Based

on this plots we decide that the log-normal distribution provides a better ﬁt to the data.

159

log(t) - 26,, )

u

The estimated weights are w, =5, /G(T,- Iz,) where G(t | z) =1—<I>(

 

where <1) is the cumulative distribution function for the normal diStribution.

Estimation of CO, (I | Z0)
To estimate c0l (t | 20), we use the weighted regression model described in the

previous section. Parameter estimates derived using PROC MIXED are summarized in
Table 3.4. Although coefﬁcient estimates for continuous variables must be interpreted
with care, all estimates seem to be in the right direction. For example, charges increase
with the complexity of the cardiac procedure, with CABG having the highest charge

followed by PT CA and diagnostic CATH. Patients who underwent none of these

procedures had the lowest cost. For a given covariate proﬁle Z0, which contains t and t2 ,

50, (I | 20) = exp(Zo,5)exp('/2(5'30 + 26,0,t+63,z2 +63 )).

Estimation of S (t | Z0)
Similar to estimating G, we estimate S (t | Z0) using PROC LIFEREG in SAS.
from a parametric model for uncensored observation (i.e. 5,- =1), log(T) = X ,6, + 0,8 ,

where 8 is a vector of errors. The log—logistic and gamma distributions exhibit the
lowest AIC. Graphs of the Kaplan-Meier estimates of the cumulative hazard versus the
regression model estimates of the cumulative hazard from both log-logistic and gamma

models are shown in Figure 3.5.

160

Based on these graphs, the AIC criterion and previous experiences with the

 

analysis of LOS49’ 50 we use the log-logistic distribution. Estimates of (1S (t | 20) are
obtained as d§(t | 20 ) = . BMW) 2 dt, where w = (log(t) — 206,. )/6,.
0,. (1+ exp( w))

Finally, cost estimates at any time t for fixed time covariate profile ZO are
obtained as £60, (u | Z0 )dS(u | 20) =

,1/6. exp(-206. I6.)
6, (1+ uvé’ exp(-206, I6, ))2

 

= LCXMZO (“)3)CXP(V2(6'30 + 251401“ ‘1’ (33,142 ‘1” 5'3 ))

where we denote Z0 (u) = (Zo,u,u2) . The expression of £50, (u | Z0 )dS(u | Z0) can be

viewed as a function ,0, (6,5,5, ,5’,) where 5' = (5'30 ,5',,o, 531,552) . For a ﬁxed I,

using the Delta Method we compute the variance of the mean cost. Estimates of median
LOS, and total charges at median LOS, together with conﬁdence intervals, for speciﬁed
covariate proﬁles as well as graphic representations are presented in Table 3.4, and

Figures 3.6, and 3.7.

Discussion

We have applied methods discussed in Chapter 2 of this thesis to estimate mean
charges for hospital stays of speciﬁed duration for patients hospitalized for AMI. Our
method accounts for the existence of a non-zero intra-hospital correlation, censoring and
skewness of total charges.

Estimates derived in our model are quantitatively similar to those reported in

other studies of health care utilization in patients undergoing CABG surgery or PTCA.

161

Although 2000 costs might seem more relevant for comparison purposes, costs inflated to
2000 may not fully reflect current costs because the efﬁciency of intervention procedures
and cardiac surgery services probably has increased during the last years; Weintraub
(1995)81 argues that costs may best be used for comparison between procedures. Also,
direct comparisons with our estimates is not possible because of different patient
demographic and clinical characteristics, use of charges as a proxy for costs and different
time periods.

We now discuss results of previous studies. Three randomized clinical trials have
evaluated the relative costs of angioplasty versus bypass surgery”. These studies
included all costs to the patient, including hospitalization, procedures, medications and
follow-up care. The Bypass Angioplasty Revascularization Investigation (BARI) Study
of Economics and Quality of Life (SEQOL) (n =1829 patients) was collected over a
three-year period from August 1988 to August 1991. Using BARI data, Hlatky et al
(1997)83 report that the mean initial hospital cost of angioplasty was 65% of that of
CABG, while same percent in our data is 67%. The Randomised Intervention Treatment
of Angina (RITA) trial (UK) reported that an angioplasty had only 52 percent of the cost
of a bypass operation following the initial procedure“. The Emory Angioplasty Versus
Surgery (EAST) trial found the same percent to be 63%“.

Using data from Michigan Inter-institutional Collaborative Heart Study (MICH),
collected on 360 patients who underwent cardiac procedures in two large urban medical
centers in eastern Michigan during January 1994 through April 1995, Polverejan et al
(2003)49 report mean charges of 23,545$ and 26,213 (inflated 2000 costs: 26,080$,

28,435$) for patients who underwent CABG with 1 or 2 comorbidities respectively. From

162

a large national sample of Medicare patients who underwent CABG surgery in 1990,
Cowper et al (1997)85 report a mean hospital cost of 26,9765 (inflated to 2000 dollars:
33,202$) for patients with AMI undergoing bypass surgery.

We observe higher length of stay in NE then in any other region. Similar ﬁndings
from Centers for Disease Control and Prevention, National Center for Health

“'88 are reported: in most years, the average length of stay in the Northeast was

Statistics
signiﬁcantly longer than the average stays in the other three regions.

NIS data has both strengths and limitations. Because NIS data represent hospital
discharges and not individual persons, the main limitation is that there exists the
possibility that these discharges will show up as complete cases in the data set as another
record. However, restricting our admissions to ER admissions and all routine referrals
would decrease this chance considerably. This issue is impossible to resolve completely
because of the inability in the NIS to track individual patients. Another limitation of the
data set is that it lacks clinical detail (e. g., stage of disease, vital statistics) and laboratory

and pharmacy data. The main strength of the NIS is that it is the largest collection of all-

payer, uniform, state based inpatient data.

163

TABLE 3.1 Primary procedures

Primary procedure“

 

CCS code: Description N PERCENT
.: No PR code 44,003 28.0
44: Coronary artery bypass graft (CABG) 15,290 9.7
45: Percutaneous coronary angioplasty (PTCA) 3,7249 23.7
47: Diagnostic cardiac catheterization, coronary 24,722 15.7
artheriography (CATH)

48: Insertion, revision, replacement, removal of 2,026 1.3
cardiac pacemaker or cardioverter/deﬁbrillator

49: Other O.R. heart procedures 2,908 1.8
63: Other non-O.R. therapeutic cardiovascular 1,714 1.1
procedures

193: Diagnostic ultrasound of heart (echocardiogram) 2,549 1.6
216: Respiratory intubation and mechanical 5,562 3.5
ventilation

222: Blood transfusion 1,826 1.2
231: Other therapeutic procedures 6,350 4.0

 

*All AMI admissions (N=157,263). Only primary procedures that are present in

more than 1% of discharges are shown

164

Figure 3.1 Creating a working data set

 

 

N=157,263
AMI diagnosis

 

 

 

l

 

 

N=35,999
Other
Procedures

 

 

J.

 

 

 

i

 

 

EXCLUSION
CRITERIA“

(N)

 

 

 

N=121,264

NO PROCEDURE, CABG, PT CA or CATH

 

 

l (983)

 

 

 

 

 

2 (256)

 

 

 

7 (1,611)

 

 

 

 

 

 

 

3 (11,283)

 

8 (20,560)

 

 

 

 

 

 

 

4 (16,682)

 

 

9 (149)

After
exclusions

 

Y

 

 

 

 

 

 

 

 

 

5 (57)

 

10 (23,244)

N =58,469

 

 

 

 

 

 

 

6 (7)

 

11 (10,511)

 

 

l

l

 

 

 

 

 

 

 

 

12 (703)

CENSORED
N=18,753

 

 

 

 

UNCENSORED
N=39,716

 

 

 

 

 

 

 

*Exclusion criteria above are not mutual exclusive, a discharge may be excluded for more than one

reason

165

TABLE 3.2 Descriptive statistics (N=58469)

 

N=58,469 N=39,7l6
(all discharges in the working (only discharges with
data set) complete TOTCHG)
Variable Subgroup Frequency Percent Frequency Percent
Procedure CABG 7 .369 12.6 4.5 86 1 1.6
PTCA 17,901 30.6 16,540 41.7
CATH 14,264 24.4 9,420 23.7
No procedure 18,935 32.4 9,170 23.1
Gender Male 35,977 61.5 25,466 64.1
Female 22,492 38.5 14,250 35.9
NPR (number of None 18.935 32.4 9,170 23.1
procedures) 1 - 4 16,489 28.2 11,915 30.0
2 5 23.045 39.4 18.631 46.9
CCI 0 23,294 39.8 17,505 44.1
1+ 35,175 60.2 22,211 55.9
Region Northeast 1 1,458 19.6 6,432 16.2
Midwest 11,628 19.9 8,441 21.3
South 25,219 43.1 17,344 43.7
West 10,164 17.4 7,499 18.9
Location Rural 7,900 13.5 4,210 10.6
Urban Teaching 25,981 44.4 16,358 41.2
Urban Non-Teaching 24,588 42.1 19,148 48.2
Bedsize Small 5,174 8.9 2,743 6.9
Medium 14,904 25.5 8.992 22.6
Large 38,391 65.7 27.981 70.5
Mean = 5.44 Mean = 5.06
, 25‘h Percentile = 3 25lh Percentile=3
LOS 2 2 Contlnuous Median = 4 STD = 4.1 Median ___ 4 STD = 3.4
751h Percentile = 7 75‘h Percentile =6
Mean = 65.73 Mean = 64.03
. 25‘h Percentile = 56 25‘h Percentile =54
18 S AGE S 85 Contlnuous Median = 67 STD = 12.7 Median = 65 STD = 12.8

75"1 Percentile = 76

75Ih Percentile =75

 

166

Figure 3.2 . Histogram of Log (TOTCHG)

 

 

 

8 ..
N $469
Mean 9.95 :

7 ‘ Std Deviatim 0.82 l4 N
Minimum 3.. 5‘
Maximum 13.8 / K

6 ‘ Skewno. -0.13 “

Kurtosis 0.“ K

 

Percent
.5
l
‘1‘
ﬂ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 l l T l l l I l 1 l I I

3.675 4.425 5.175 5.925 6.675 7.425 8.175 6.925 9.675 13.425 11.175 11.925 12.675
Log-transformed of 6118.138

167

Figure 3.3 Graph of Log (TOTCHG) versus LOS

 

 

 

 

14.4
., It
13 * air * if“ I! ** 'l' 13"»: t
, *Ih . .
n4 * * *§ ,.. 1.: :*§* * § *
.. " .. ‘3)! up
.. 1V2” . . * ‘ * X
in .. ... i 1
9E n. “J * *3”! #10!
a * § *
.,= ~ 1'4. . I
a... It
0 1*...4‘ *
cg 9.. IF‘
0 . if
“a 8 it;
5 x
b 7 * :* ...
| * *i* 4*
8’ 6 E gain“!
—J
I!”
5 * *
44
*
34] l l I 1 T
0 I) 20 30 40 60

Length of stay (LOS)

168

Figure 3.4 Estimation of G: Graphs of the Kaplan-Meier estimates of the cumulative
hazard versus the regression model estimates of the cumulative hazard from both

log-normal and gamma models

 

 

4.

Kaplan-Meier Cunulaive Hazard

 

 

 

 

‘

I r r

0 1 2 3
meal Rog Model leulativa Hazard

 

 

Kepler-Meier Cumulative Hazard

 

 

 

 

I T I

O 1 2 3

 

 

Gamma Reg Modal Cumulative Hmrd

 

169

TABLE 3.3 Parameter estimates in multilevel regression

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Effect Groups Estimate StdErr tValue P-value
Intercept _ 8.400 0.0790 106.36 <.0001
Procedure CABG 2.022 0.0553 36.56 <.0001
PTCA 1.620 0.0424 38.21 <.0001
CATH 1.031 0.0419 24.59 <.0001
LOS _ 0.206 0.0060 34.18 <.0001
LOSsq _ 0.004 0.0003 -1405 <.0001
CCI No comorbidities -0.059 0.0126 -4.68 <.0001
Male _ 0.022 0.0044 4.96 <.0001
AGE _ 0.005 0.0016 2.90 0.0037
Agesq _ 0.000 0.0000 -4.18 <.0001
Number of procedures 0 or Z 5 procedures 0.395 0.0317 12.47 <.0001
Procedure*NPR CABG , Z Sprocedures -0.316 0.0571 -5.53 <.0001
PTCA , Z Sprocedures -0.292 0.0432 -6.75 <.0001
Region Northeast 0.256 0.0405 -6.33 <.0001
Midwest -0.363 0.0346 -1050 <.0001
South 0.172 0.0424 -4.06 <.0001
Location Rural -0.107 0.0358 -2.97 0.0031
Bedsize Small 0126 0.0092 -1371 <.0001
Medium 0.097 0.0067 -1451 <.0001
LOS*Procedure CABG 0.074 0.0059 -12.61 <.0001
PT CA 0.003 0.0004 9.30 <.0001
CATH 0.003 0.0003 9.61 <.0001
LOSsq*Procedure CABG 0.002 0.0003 7.85 <.0001
PTCA 0.141 0.0347 -4.06 <.0001
CATH 0.203 0.0310 -6.53 <.0001
LOS*NPR 0 or Z 5 procedures -0.010 0.0022 -4.55 <.0001
LOS*CCI No comorbidities 0.013 0.0032 4.03 <.0001
LOSsq*CCI No comorbidities 0.000 0.0001 -3.53 0.0004
LOS*Region Northeast -0.008 0.0040 -1.98 0.0472
Midwest 0.007 0.0031 -2.37 0.0179
South 0.012 0.0030 .407 <.0001

 

170

 

 

 

Effect Groups Estimate StdErr tValue P-value
LOS*Location Rural -0.001 0.0036 -0.28 0.7803
LOS*Bedsize Small 0.001 0.0040 0.32 0.7481
Medium 0.002 0.0029 0.70 0.4839
Covariance
Parameters Subject Estimate StdErr ZValue P-value
UN(1,l) HOSPID 0.1235 0.00837 14.75 <.0001
UN(2,1) HOSPID 0.0024 0.00049 4.93 <.0001
UN(2,2) HOSPID ' 0.0004 0.00004 9.57 <.0001
Residual 0.1384 0.00100 139.03 <.0001

 

171

Figure 3.5 Estimation of S: Graphs of the Kaplan-Meier estimates of the cumulative
hazard versus the regression model estimates of the cumulative hazard from both

log-logistic and gamma models

 

 

 

 

 

 

 

 

 

 

 

44
i
:1:
5
3
§
0
5
I
§
9.
5c

0“, I I I

0 1 2 3
Log—Logistics Rog Model Cumulative Hazard

“l
s ..
:2 xx”,
g
3
§ 24
g ”,x,,
I ’1”,,
t ..

0‘ I l

O 1 2 3

 

Gamma Rog Model Cumulative Hmrd

 

 

172

Table 3.4 Estimates of total charges at median LOS by procedure types,
comorbidity levels (0, 1+) and regions“.

 

 

 

 

CABG NE $33,705.24 (i $4,931.65) CATH $12,247.12 (i $1,554.89)
(LOS=9.45) (LOS=5.22)
MW $33,192.80 (i $3,997.98) $12,111.16 (i $1,214.83)
(LOS=7.71) (LOS=4.26)
s $34,098.87 (i $3,924.04) $12,507.77 (i $1,173.06)
(LOS=7.87) (Los=4.35)
,3. w $44,882.55 (i $5,757.71) $16,278.16 (i $1,786.27)
:23 (1.03:7.41) (1.08:4. 10)
g PTCA NE $20,657” (,3 32.64669) NONE $7,214.51 (i $896.86)
é (LOS=4. 13) (LOS=5-36)
MW $20,703.55 (i $2,075.27) $6,973.11 (:1: $672.76)
(LOS=3.37) (LOS=4.37)
5 $21,386.47 (i $2,020.57) $7,205.49 (i $641.46)
(LOS=3.44) (LOS=4.46)
w $27,796.50 (i $3,071.51) $9,339.37 (i $982.97)
(LOS=3.24) (LOS=4.20)
CABG NE $36,544.54 (1r $5,350.00) CATH $13,356.57 (i $1,687.48)
(LOS=11.78) (LOS=6.51)
MW $35,770.72 (i $4,332.80) $13,124.59 (i $1,316.81)
(LOS=9.61) (LOS=5.31)
8 $36,682.38 (i $4,245.21) $13,532.62 (1 $1,267.17)
g» (LOS=9.82) (LOS=5.42)
LE w $48,570.85 (i $6,267.93) $17,668.42 (:t $1,943.16)
§ (LOS=9.24) (LOS=5.11)
g PTCA NE $22,232.91 (i $2,858.71) NONE $8,071.18 (it $994.88)
g (LOS=S. 15) (LOS=6.68)
2 MW $22,202.56 (i $2,260.66) $7,721.79 (5: $740.39)
(LOS=4.20) (LOS=5.45)
s $22,923.91 (i $2,199.08) $7,965.56 (i $702.93)
(LOS=4.29) (Los=5.56)
w $29,858.35 (1' $3,349.71) $10,360.54 (i $1,086.92)
(LOS=4.04) (LOS=5.24)

 

*for ﬁxed covariate proﬁle: age=65. gendemmale. hospital location = rural, hospital bed size = medium, and 1-4
procedures (if there is a primary procedure)

173

Figure 3.6 Estimates of total charges at median LOS - no comorbidities*

 

E Total charges + LOS

 

$35000 NEMWS

 

 

 

 

 

 

——> ———> ——><———>
(— CABG (— PTCA (— CATH NOPROC

 

10.00
9.00
8.00

. 7.00

6.00
5.00
4.00
3.00
2.00
1 .00
0.00

LOS

 

‘Estimates corresponding at median LOS, for covariate profile:
age=65, male, rural location, medium bed size, between 1-4 procedures

174

 

Figure 3.7 Estimates of total charges at median LOS - at least one comorbidity“

 

C: Total charges -0— LOS

 

LOS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

$60,000 1400

$50000 w 1200

\’\ 10.00

g; ”T '7 V'“ w 8.00
5 $30,000 _-

g MW 8 W 6.00
" $20,000 at l— J

NE
...,... UUU UWUU 2...
.. UU ....
(— CABG 5 (— PTCA a (— CA TH ’ ‘ NOPROC ’

 

 

 

*Estimates corresponding at median LOS, for covariate profile:
age=65, male, rural location, medium bed size, between 1-4 procedures

175

CHAPTER 4

CONCLUSIONS AND FUTURE WORK

Evidence of rising health care expenditures is widespread. The resurgence in
health care spending is fueled by fundamental forces that will prove difﬁcult to resist”.
One analyst has predicted that if current trends continue, health care will consume 25
percent of the GDP by the year 203090. The enormous investment in biomedical research
will probably accelerate the rate of technological development in medicine, with effects
on overall expenses”. Also the proportion of the population 65 years of age or older and
the proportion over 80 will increase by 33 percent and 14 percent, respectively, between
2000 and 20209'. N o matter how healthy elderly persons are in the future, as compared
with earlier generations, their sheer numbers are almost certain to result in increased
expenditures for health care”. Yet another important influence will be increasing national
prosperity. The more income people have, the more of it they tend to spend on health
care. One of the strongest predictors of the proportion of the GDP that Western nations
spend on health care is the GDP itself”. Therefore, substantial increases in health care
spending over the next 5 to 10 years are virtually inevitable.

The United States spends 3.4 percent more on health care as a percentage of the
GDP than any other Western country”, but there is no conclusive evidence that health

outcomes are better in the United States than in other industrialized nations”' 93' 94.

176

Because of concerns on the availability of resources to pay for different health
care interventions, medical technologies and treatments, CEA has evolved to become an
important discipline in the pursuit of maximizing health beneﬁts from a speciﬁed
expenditure, or in ﬁnding the lowest-cost strategy for a speciﬁed health effect.

Although several advances have been made over the past decade, complexities in
analyses of health care cost and outcomes still pose many methodological challenges.
Guided by other research in this ﬁeld 2'9’ I" '5' 30' 33' 34'48'49’ 52'53’ 59'60’95'103, in this thesis
we have addressed several statistical issues in the analysis of medical cost data.

The objective of Chapter 1 was to demonstrate how costs could be incorporated
into a longitudinal model in which a patient’s event history unfolds as sojoums in health
states and as transitions between health states. Using a Markov model the counting
process methodology was used to estimate the transition probabilities and integrated
transition intensities with patient heterogeneity modeled through Cox regression on the
underlying transition intensities. There are extensive uses of this model in biomedical
applications. Incorporating costs into this framework is new, although the basic idea of
calculating present value is ubiquitous in the insurance and ﬁnance literature where costs
are manifest as ﬁxed amounts paid at random times, and as ﬁxed payment streams over
random durations. However, the fundamental difference in our context is the need to
estimate both the transition costs and the sojourn costs. Our approach uses a mixed
random effects model to estimate costs conditional on time, and then the dynamics of
time through transition probabilities and integrated intensities. In this way we obtain
estimates of net present for expenditures incurred over a ﬁnite time horizon. The

asymptotic distribution of these estimates is shown to depend on three stochastic

177

components: the regression parameter in the random effects model for costs, the
regression parameter in the Cox regression model for transition intensities and the
estimator of the baseline integrated intensity.

In Chapter 2, we showed that our transition model described in Chapter 1 captures
costs under the simpler two state survival model with a single transition time and sojourn.
We discuss parametric methods for estimating the survival distribution for censoring
time, and therefore the weights in the IPCW technique, as well as methods of estimation
for the survival distribution for event time.

In Chapter 3 we provided an application of our method using length of stay
(LOS) and inpatient costs for patients hospitalized for AMI from the Nationwide
Inpatient Sample 2000 (NIS 2000) of Healthcare Cost and Utilization Project. Our
analysis addresses a number of characteristics of these utilization data including
censoring of costs, skewness in cost distributions, heterogeneity in LOS and costs across
patients. Our method is based on a hierarchical model in which patients are nested within

hospitals.

Future work

To obtain cost-effectiveness measures such the cost-effectiveness ratio (CER) and
net health cost (NHC) for competing health care interventions or treatments, we need
both accurate estimates of costs and health beneﬁts. In this thesis we have addressed
several issues on estimating costs. The three basic measures of health outcomes are

survival probability, life-expectancy (LE) and quality adjusted survival years (QALY).

178

To assess the health beneﬁt of an intervention relative to another, we may also use any
clinically meaningful measure such as improvement in life expectancy, deaths averted, or
number of toxic side effects prevented. For example, in our multi-state description, life
expectancy translates to the average time from the time origin until death or some other
absorbing state is reached. Since the goal of any health care intervention is much broader
than simply treating the disease condition or preventing death, the use of quality-adjusted
life years (QALYs) to quantify health outcomes has been advocated15 . QALYs also
provide a common-metric to gauge health beneﬁt across disease types. For example, a
decision maker facing resource allocation can compare the cost-effectiveness of coronary
artery bypass surgery versus a percutaneous coronary intervention, and the cost-
effectiveness of different lipid lowering therapies for the prevention of cardiovascular
disease, and the cost-effectiveness of different regimens of screening women for their
susceptibility to breast cancer.

For each unit of time spent in a health state, a quality weight is the relative value
placed on that health state against the state of perfect health. Perfect health has a quality
weight of one, while death, or states judged equivalent to death, get a quality weight of
zero. All other health states receive a quality weight between zero and one. Having
determined quality-weights for various health states, QALYs adjusts the length of life for
the quality of life during those years. With multiple health states, each state is associated
with its sojourn. For example, suppose we have estimated a total life-expectancy of 30
years, which consists of 10 years in a state with quality-weight 0.5, 10 years in a health
state with quality weight 0.65, and 10 years in perfect health. This results in an expected

21.5 QALYs (=10x.5+10x.65+10x1). It is worth noting that the two main elements in

179

calculating QALYs, time spent in each health state and its associated utility, can vary
across patients. Accounting for the random variation is an important task in any CEA.
Suppose the state space E is labeled with ‘k’ being an absorbing state and the
remaining states being transient. For example, in cancer treatment studies104 transient
states are relapse and remission and death is an absorbing state. Survival time is then the

time to absorption deﬁned by 1,, = inf {I > 0: X (t) = k} , with survival distribution
conditional on the initial state,

S,k(t) = P[rk >t| X0 :1] =1—Bk(0,t).
If r<oo is the pre-specifled time horizon of the analysis, we deﬁne lyre-expectancy (LE)

restricted to rand discounted at a constant rate r by

LE = gems”, (t)dt,
conditional of the initial state. The total sojourn in state h in [0,1] is £2th (t)dt with

expectation Ee'" ,h(0,t)dt , the sum of which over all states h at k is LE.

Let q(h,t) denote the quality weight for the health state h = X (t)occupied at time

t, with the state ‘dead’ having quality weight equal to zero. The total quality adjusted

time in {0,7} is 212E Ee’"q(h,t)Yh(t)dt , and its expectation conditional on X 0 =i
defines expected quality adjusted life years, QALY(i) = 2 Hi Ee-"q(h,t)P,-h (0,t)dt. If

q(h,t) = q(h) are constant in time then QALY (i) becomes ZIEEqM) Ee'" ih (0,t)dt,

which is the sum over states of the discounted average time spent in each state weighted

180

by its quality. This is the standard definition for QALY‘OS. Finally, an unconditional
version of QALY = Zieﬁrti(0)QALY(i), where 72,.(0) = P(X(0) = i).

In the simplest multi-state model with two states, 0=alive (transient) and 1=dead

(absorbing), the only transition is 0-—>1 which has hazard 0’00 = —a0l , and the initial
distribution is degenerate, 7:0(0) = 1. Also, the survival distribution is S (t) = P[rl > t],

2'l being the survival time. Without discounting (r =0), LE reduces to restricted mean

survival, E (2'1 A T) = ES(t)dt . This entity is often used in economic evaluations when

survival time is the principal endpoint. Finally, if we write q(t) = q(0,t) we get

QALY= £e_"q(t)S(t)dt.

In comparing two competing health care interventions, a test intervention versus a
standard, the cost-effectiveness ratio (CER) is the ratio of the incremental cost relative to
the incremental benefit. In the patient population for which these interventions are

intended, let am and a“, denote respectively, the cost and effectiveness of the test
intervention, and, .11” and psebe the corresponding measures for the standard
intervention. Then the CER is the parameter 6 = (,um - a“ )/( a“, — ,use ), and having

established a maximum value of the CER (90 , a value society is willing to pay to gain one

unit of effectiveness by adopting the test intervention, the net health cost is given by
NHC( 60 )= (rate _ Ilse) — 60 (lute -Iuse) ' we equate lite and ruse by ‘E601(5lzo)d5(tlzo) ’

where ZO =1 for the new intervention and Z0 = 0 for placebo. We may define

effectiveness, a“, and ,use, in terms of the restricted mean survival as E S (t | Z0)dt . Then

181

NHC(00)=£c0|(s|ZO =0)r1$(t|zO =O)—Ec01(s|Zo =1)dS(t|ZO =1)-

-60E(S(IIZO =1)-S(t|z0 =O))dt.

The methods developed in Chapters 1 and 2 can be applied to the estimation of
QALY and NHC. Because our models incorporate covariates, their practical importance
lies in the fact that both QALY and NHC can be estimated along speciﬁed covariate
proﬁles. However, this requires data on patient health histories and costs over time. With
the increased focus being given to controlling health care expenditures, such longitudinal
data sets are likely to be compiled from administrative databases, large scale
chronological and epidemiological studies. Many recent clinical trials have included
economic sub-studies. For example the Antiarrhythmics Versus Implantable
Deﬁbrillators (AVID)‘°6 and Canadian Implantable Deﬁbrillator Study (CIDS)'°7 trials
yielded useful information on both the clinical effectiveness of the implantable
deﬁbrillator as well as its cost—effectiveness. Many researchers have examined issues of
statistical power and sample size assessments for cost-effectiveness studiesw'lm’103'log"l3.
They show that the requirements on sample size demonstrating cost-effectiveness can be
many times larger than that needed to establish effectiveness alone. Therefore the
investigators face the ethical dilemma of continuing trial and enlarging them for
economic evaluation studies. It is here where our regression techniques could be most
informative. Indeed, Simon Thompson recently editorialized in Statistical Methods for
Medical Research1 '4, ‘the challenge for the future will be development of regression

models for costs and cost-effectiveness that can assess the impact of patient

characteristics and cost elements that drive total costs’. Determining in which subgroups

182

of patients an intervention is more cost effective from health care utilization studies and

clinical trials would be an active ﬁeld of research in the coming years.

183

APPENDIX

MATHEMATICAL BACKGROUND

Concepts of stochastic processes, stochastic integration, functional delta method
and results on Ito integration, all used in this thesis, will be brieﬂy reviewed in this
appendix. This appendix uses results from Andersen et al (1993)”, Gill (1989)”,

Harrison (1985)‘ '5, Oskendal (1995)l '6 and Polverejan (2001)“.

Stochastic processes

Let (Q,f,P) be a probability space. A filtration ( f; ,t E T ) is an increasing
right-continuous family of sub- 0' algebras of f . A stochastic process X is just a time-
index collection of random variables {X (t) : t E T}. The process X is called adapted to
the filtration f; if X (t) is f; -measurable for each I. We write X 0,01) for the realized
value of X(t) at the point ate 9.

The process X is called cadlag if its sample paths { X (t,a)) : t E ’7' }, for almost all

a) , are right-continuous with left-hand limits. The set of cadlag functions is denoted by

D( ’I' ), the Skorohod space of weak convergence theory.

For two stochastic processes X and Y, JXdY denotes the stochastic process

t—) £X(s)dY(s) deﬁned for each a) andtsuch that £|X(s)||dY(s)|<oo. Here Yis

184

assumed a cadlag process with paths of locally bounded variation, i.e. LI dY(s) | < 0° for
all t6 T , for almost all we S2 . We call such a process Y a finite variation process, and

I
the process I —> [0| dY(s)| is called its total variation process.

To a cadlag process X, we associate its left-continuous modification X 7 , defined

by X' (t) = X(t—) and its jump process deﬁned by AX(t) = X (t) — X (t—).

Martingale, predictable processes, compensators

A martingale is a cadlag adapted process M which is integrable, i.e.
E(| M (t) |) < 0° for all 16 T and satisﬁes the martingale property E(M (t) | f; ) = M (s)
for all 5 St . The process is called sub(super)-martingale if the equality is replaced by

E(M (t) | .73 ) 2 M (s) (E(M (t) | f; ) S M (s) respectively). A martingale is called

square integrable if sup E (M (t)2 ) < 00. A local martingale is a process M such that an
167

increasing sequence of stopping times Tn exists, P(T" 2 t) —>l as n —-> 0° for all 16 T,
such that the stopped process {T,, > 0]M T" are martingales for each n (here

X T (t) = X (T A t) ). A local square integrable martingale is a process M as above such

that the localizing sequence can be chosen making {T,, > 0]M T" a square integrable

martingale.
A class of processes complementary to martingales is the class of predictable

processes. A stochastic process H is called predictable if, as a function of (t,a))e T xQ ,

185

it is measurable with respect to the 0' — algebra on T x (2 generated by the left-
continuous adapted processes. Any left-continuous adapted process is predictable.

There is an important orthogonality between martingales and ﬁnite variation
predictable processes. This is due to the fact that if a process is at the same time both a
local martingale and a predictable ﬁnite variation process, then it is trivial or constant
process. Suppose that X is a cadlag adapted process. We say that X is the compensator
of X if X is a predictable, cadlag, and ﬁnite variation process such that X — X is a local
martingale, zero at time 0. If a compensator exists it is unique, by Doob-Meyer
decomposition theorem. A semi-martingale is the sum of a local martingale and a cadlag
adapted ﬁnite variation process. All (local) sub-martingales and super-martingales have
compensators. In particular, non-decreasing, non—negative locally integrable cadlag
processes have compensators, because such processes are sub-martingales.

Suppose M, N are local square integrable martingales. The square of a local

square integrable martingale is a local sub-martingale and also has a non-decreasing
compensator denoted by < M >. Also the product, MN = i {(M + N )2 — (M — N )2 } , is

the difference of two local sub-martingales, therefore it has a compensator denoted by

(M , N). By deﬁnition, < M > and (M , N) are the unique ﬁnite variation cadlag

predictable process such that, M 2 — < M > and MN - (M , N) are local martingales, zero

at time 0. The process < M > is called the predictable variation process of M, and

(M , N) the predictable covariation process of M and N.

186

Theorem (see Theorem H.3.1, Andersen et a1 (1993)”)

Suppose M is a finite variation local square integrable martingale, H is a

predictable process, and [H 2d < M > is just locally ﬁnite (automatically true of H is
locally bounded). Then IHdM is a local square integrable martingale, and

< J'HdM >= IH2d<M >.

The predictable process H is locally bounded if it is left continuous and also if it is
right-continuous.

Formulas for predictable and optional covariation processes of stochastic integrals

follow the same form. We have < deM. jKdN >= jHKd < M , N >.

Fubini-type Theorem

Let (s,u) —> H(s,u) , (s,u)e (TXT ) be a bounded, Bxﬁ measurable
function, where 2? is the set of Borelians on T . Let a be a ﬁnite measure on the space

(T,B). Then for every te T , A6 3:

L1 £H(s,u)dM (u)],u(ds) = £[ LH(s,u)d,u(ds)]dM (u) for almost all we 52.

For more general versions of this theorem and their proofs see Protter (1990)l 17.

Functional Delta-Method

A concept of differentiability that allows generalization of the usual Delta Method

is the one of Hadamard or compact differentiability. Let B1 and 32 two normed vector

spaces.

187

The functional 0:81 ——> 82 is compactly or Hadamard differentiable at a point
66 B1 if and only if a continuous linear map dgoz Bl ——> 82 exists, such that for all real
sequences an -—> co and all convergent sequences h" —> he Bl ,

a" (¢(e+a;'h,, )-¢)(t9)) 4 d(o(6).h as n —> oo.

Here d¢(6) is called the derivative of (a at the point 19.

Next we define the concept of weak convergence in normed vector spaces. Let

(B,|| . ll) be a nonned vector space endowed with a 0' - a1 gebraQ? , such that

23' g 23 g 23", where 23' and 6' are the 0' — algebras generated by the open balls and
the open sets of B, respectively. Thus B" is the Borel 0'— algebra; when B is separable
E" = 9".

Let X n be a sequence of random elements of (B, E) and let X be another random

element of that space. We say X n converges weakly (or in distribution) to X and we

write X" —D—>X if and only if E(f(X,, ))-—D—>Ef(X) for all bounded, norm-

continuous, E -measurable f : B —> R.

Theorem (see Theorem 3, Gill (1989)”)

Suppose (0: Bl —> B2 is compactly or Hadamard differentiable at a point ,ue B]
and both it and its derivative are measurable with respect to the 0' — algebras 231 and $2

(each nested between the open ball and Borel 0' — algebras). Suppose X n is a sequence

of elements of B1 such that Z, = J; (X n - ,u) —P—> Z in B1 , where the distribution of Z

188

is concentrated on a separable subset of B1 . Suppose addition: 82 x 82 —> B2 is
measurable. Then
(1) lJZrX. -#),JZ(¢(X,, )—¢<a))-d¢<u).JZ<X. -,U)] —”—-> (2.0) in

B1 ><B2 and consequently
(2) ﬁ(¢(X.)—¢(a))—d¢(ma/E(X. —m—”—>o and

(3) JZMX. >000) —3—+ 4000.2.

Measurability of drp: Bl —> 32 can often be shown to follow from measurability
of (a (see Lemmas 4.4.3 and 4.4.4, Van der Vaart (1988)“).

The following is a useful lemma. In many applications the mapping (a is only a
priori deﬁned on certain members of B1 and one could set about choosing a particular
extension to all of Bl such that the hypotheses of the Functional Delta Method are

satisﬁed in each particular application.

Lemma (see Lemma 1, Gill (1989)”)

Consider xe E C B1 and (0: Bl —> Bz. Suppose there exists a continuous linear
map drp(1r):B1 —> B2 such that for all tn —9 0 (tn 6 R) and h" -—> he Bl such that
x" = x+tnhn E E for all n, we have:
r;‘ (¢(x+ tnhn )—¢(x)) 4 d(0(x).h as n —> oo.
Then (a can be extended to B1 in such a way that it is differentiable at x , with
derivative d¢(x). The derivative is unique if the closed linear span of possible limit

points h equals 3,.

189

Suppose we endow D[0,z'] with I] . IL. , the supremum norm , and (D[0,2'])P with
the maximum-supremum norm (i.e. for x6 (D[0,r])p, || x||= max(|| xl IL, ,...|] xp “w)).

Under this norm (D[0,T])” is Banach, but not separable. However under the Skorohod

topology, these spaces are Banach and separable. In these spaces if the limiting process
has continuous sample paths then weak convergence in the sense of the Skorohod metric
and in the sense of the supremum norm are exactly equivalent. Otherwise, supremum

norm convergence is stronger.

Results on Ito integration

Let (Q,f,P) be a probability space with a filtration ( f: ,tE T), where .76
contains all null sets of f . A continuous k-dimensional vector martingale
M = (M (t),te T), T =[0,2'),1'E R is called Gaussian if:

i) < M > is a continuous deterministic k xk positive semideﬁnite matrix valued

function on T , with positive definite increments, zero at time 0.

ii) M (t) — M (s) has a multivariate normal distribution with zero mean and
covariance matrix V(t)—V(s) and is independent of (M (u),u S s) for all 0 S s St in
T .

Let 77/2 be the set of all adapted processes X on ((52.17 ,P), (f; ,te T ))

satisfying E( I; X2 (s) < M (ds) >) < oo for all t6 T and let M be a continuous Gaussian

martingale on this space. For any fixed t, one can deﬁne the Ito integral I , (X ) = EXdM

for X6 712.

190

Properties of the Ito integral:

Let X,Y€ 7f: andlet OSs<u<t, s,u,tET.Then

i) j'XdM = j“XdM+ 'XdM for a.a. a);
1' .S' ll
.. t I 1
n) [,(r‘XH/MM =cJXdM+ IIYdM fora.a. a):

iii) J” XdM is f; -measurable:
iv) If X(t,a)) = X(t) only depends on I (so X(t) is deterministic) then

I, (X) = J; XdM is normally distributed, with mean 0 and variance EX: (5) < M (ds) >.

191

10.

BIBLIOGRAPHY

Hefﬂer S, Smith S, Keehan S, Clemens M, Zezza M, Truffer C. Health Spending
Projections Through 2013. Health Affairs. 2004.

Willan AR, Chen EB, Cook RJ, Lin DY. Incremental net beneﬁt in randomized
clinical trials with quality-adjusted survival. Statistics in Medicine. Feb 15
2003;22(3):353-362.

Willan AR, Lin DY, Cook RJ, Chen EB. Using inverse-weighting in cost-
effectiveness analysis with censored data. Statistical Methods in Medical
Research. Dec 2002;11(6):539-551.

Lin DY. Linear regression of censored medical costs. Biostatistics. 2000;1:35-47.

Lin DY. Regression analysis of incomplete medical cost data. Statistics in
Medicine. Apr 15 2003;22(7):1181-1200.

Gardiner J C, Bradley CJ, Huebner M. The cost-effectiveness ratio in the analysis
of health care programs. In: Rao CR, Sen PK, eds. Handbook of Statistics,
Bioenvironmental and Public Health Statistics. Vol 18. New York: North-
Holland; 2000:841-869.

Gardiner J C, Indurkhya A, Luo Z. The performance of estimation procedures for
cost-effectiveness ratios. In: Balakrishnan N, ed. Advances on Methodological
and Applied Aspects of Probability and Statistics. New York: Gordan Breach;
2001:547-559.

Gardiner J, Holmes-Rovner M, Goddeeris J, Rovner D, Kupersrnith J. Covariate-
adjusted cost-effectiveness ratios. Joumal of Statistical Planning and Inference.
1999;75:291-304.

Baser 0, Bradley CJ, Gardiner J C. Estimation from censored medical costs. In
Review. 2002.

Baser 0, Bradley CJ, Gardiner JC, Given CC. Testing and correcting for non-
random selection bias due to censoring: An application to medical costs. Health
Services and Outcomes Research Methodology. 2003;4(2):93-107.

192

11.

12.

l3.

14.

15.

16.

17.

18.

19.

20.

21.

22.

Baser O, Gardiner JC, Bradley CJ, Yuce H, Given CC. Longitudinal analysis of
censored medical costs data. In Review. 2002.

Commonwealth of Australia: Guidelines for the pharmaceutical industry on
preparation of submissions to the Pharmaceutical Beneﬁts Advisory Committee:
including major submissions involving economic analyses. 1995. Located at:
Canberra: Australian Government Publishing Service.

National Institute for Clinical Excellence. Technology appraisals work
programme, new guidance documents. 2001. ‘

Canadian Coordinating Ofﬁce for Health Technology Assessment. Guidelines for
economic evaluation of pharmaceuticals:. Canada. 2nd ed. Ottawa: Canadian
Coordinating Ofﬁce for Health Technology Assessment ( C COHTA ). 1997.

Gold MR, Siege] JE, Russell LB, Weinstein MC, eds. Cost-Effectiveness in
Health and Medicine. New York: Oxford University Press; 1996.

Stinnett AA, Mullahy J. Net Health Beneﬁts: A new framework for the analysis
of uncertainty in cost-effectiveness analysis. Medical Decision Making.
1998;18(Supplemental):S68-S80.

Manning WG, Mullahy J. Estimating log models: to transfrom or not to trasfonn.
Journal of Health Economics. 2001;20:461-494.

Manning WG. The logged dependent variable, heteroscedasticity, and the
retransforrnation problem. Journal of Health Economics. 1998;17:283-295.

Duan N. Smealin g estimate: A nonparametric retransforrnation method. Joumal
of the American Statistical Association. 1983;78(383):605-610.

Ai C, Norton EC. Standard errors for the retransforrnation problem with
heteroscedasticity. Journal of Health Economics. 2000;19:697-718.

Mullahy J. Much ado about two: Reconsidering retransformation and the two-
part model in health econometrics. Journal of Health Economics. 1998;17:247-
28 1.

Lucas JW SJ, Benson V. Summary health statistics for U.S. Adults: National
Health Interview Survey 2001. Washington, DC: Centers for Disease Control and
Prevention, National Center for Health Statistics; 2004. Library of Congress
Catalog Number 362.

193

23.

24.

25.

26.

30.

31.

33.

34.

35.

Hay JW, Olsen RJ. Let them eat cake: A note comparing alternative models of
the demand for medical care. Journal of Business & Statistics. l984;2(3):279-282.

Duan N, Willard G. Manning J, Morris CN, Newhouse JP. Choosing between the
sample-selection model and the multi-part model. Journal of Business &
Economic Statistics. l984;2(3):283-289.

Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on
Counting Processes. New York: Springer-Verlag; 1993.

Praestgaard J. Nonparametric estimation of actuarial values. Scandinavian
Actuarial Journal. 1991;2:129-143.

Norberg R. Ruin problems with assets and liabilities of diffusion type. Stochastic
Processes and Their Applications. Jun 1 1999;81(2):255-269.

Norberg R. Differential-Equations for Moments of Present Values in Life-
Insurance. Insurance Mathematics & Economics. Oct 1995;17(2):171-180.

Norberg R. A Time-Continuous Markov-Chain Interest Model with Applications
to Insurance. Applied Stochastic Models and Data Analysis. Sep 1995;11(3):245-
256.

Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete
follow-up data. Biometrics. 1997;53:419—434.

Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika.
2000;87(2):329-343.

Strawderrnan RL. Estimating the mean of an increasing stochastic process at a
censored stopping time. Journal of the American Statistical Association. Dec
2000;95(452):1192-1208.

Wooldridge JM. Inverse Probability Weighted Estimation for General Missing
Data Problems. Working Paper Michigan State University. April 2003.

Wooldridge J M. Econometric Analysis of C ross Section and Panel Data.
Cambridge, MA: MIT Press; 2002.

Zhou XH, Tu WZ. Interval estimation for the ratio in means of log-normally
distributed medical costs with zero values. Computational Statistics & Data
Analysis. Dec 28 2000;35(2):201-210.

194

36.

37.

38.

39.

40.

41.

42.

43.

45.

Zhou XH, Tu WZ. Conﬁdence intervals for the mean of diagnostic test charge
data containing zeros. Biometrics. Dec 2000;56(4):1118-1125.

Zhou XH, Stroupe KT, Tierney WM. Regression analysis of health care charges
with heteroscedasticity. Joumal of the Royal Statistical Society Series C-Applied
Statistics. 2001;50:303-312.

Zhou XH, Melﬁ CA, Hui SL. Methods for comparison of cost data. Annals of
Internal Medicine. 1997;127(8):752-756.

Zhou XH, Li CM, Gao SJ, Tierney WM. Methods for testing equality of means of
health care costs in a paired design study. Statistics in Medicine. Jun 15
2001;20(11):1703-1720.

Zhou XH. Inferences about population means of health care costs. Statistical
Methods in Medical Research. Aug 2002;11(4):327-339.

Zhou XH, Gao SJ. One-sided conﬁdence intervals for means of positively skewed
distributions. American Statistician. May 2000;54(2): 100-104.

Maynard C, Chapko MK, Every NR, Martin DC, Ritchie JL, Department of
Medicine DoVASW. Coronary angioplasty outcomes in the Healthcare Cost and
Utilization Project, 1993-1994. The American journal of cardiology. Apr 1
1998;81(7):848-852.

Ritchie JL, Maynard C, Chapko MK, Every NR, Martin DC, Department of
Medicine DoVASWUSAruwe. Association between percutaneous transluminal
coronary angioplasty volumes and outcomes in the Healthcare Cost and
Utilization Project 1993-1994. The American journal of cardiology. Feb 15
1999;83(4):493-497.

Brooks JM, McClellan M, Wong HS. The marginal beneﬁts of invasive
treatments for acute myocardial infarction: does insurance coverage matter?
Inquiry : a journal of medical care organization, provision and ﬁnancing. Spring
2000;37(1):75-90.

Brooks SE, Ahn J, Mullins CD, Baquet CR, D'Andrea A. Health care cost and
utilization project analysis of comorbid illness and complications for patients
undergoing hysterectomy for endometrial carcinoma. Cancer. Aug 15
2001;92(4):950-958.

195

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

Howard TS, Hoffman LH, Stang PE, Simoes EA, Galt Associates SVUSA.
Respiratory syncytial virus pneumonia in the hospital setting: length of stay,
charges, and mortality. The Journal of pediatrics. Aug 2000;137(2):227-232.

Watanabe CT, Maynard C, Ritchie J L, Department of Medicine UoWSWUSA.
Short-term outcomes after percutaneous coronary intervention: effects of stenting

and institutional volume shifts. American heart journal. Aug 2002;144(2):310-
314.

Polverejan E. Regression Models for Analysis of Medical Costs [PhD]. East
Lansing: Statistics & Probability, Michigan State University; 2001.

Polverejan E, Gardiner JC, Bradley CJ, Holmes-Rovner M, Rovner D,
Department of Epidemiology MSUELMIUSA. Estimating mean hospital cost as a
function of length of stay and patient characteristics. Health economics. Nov
2003;12(11):935-947.

Gardiner JC, Luo Z, Bradley CJ, Polverejan E, Holmes-Rovner M, Rovner D.
Longitudinal assessment of Cost in Health Care Interventions. Health Services
&0utcomes Research Methodology. 2002;3:149-168.

Rice N, Jones A. Multilevel models and health economics. Health Economics.
Nov-Dec 1997;6(6):56l-575.

Carey K. Hospital Length of Stay and Cost: A Multilevel Modelling Analysis.
Health Services &0utcomes Research Methodology. 2002;3:41-56.

Carey K. A multilevel modelling approach to analysis of patient costs under
managed care. Health Economics. Jul 2000;9(5):435-446.

Goldstein H, Browne W, Rasbash J. Multilevel modelling of medical data.
Statistics in medicine. Nov 15 2002;21(21):3291-3315.

Jacobsen M. Statistical Analysis of Counting Processes. New York: Springer-
Verlag; 1982.

Arijas E, Haara P. A marked point process approach to censored failure time data
with complicated covariates. Scandinavian Journal of Statistics. 1984;11:193-
209.

Andersen PK, Gill RD. Cox's regression model for counting processes: A large
sample study. Annals of Statistics. 1982; 10(4): 1 100-1 120.

196

58.

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

Gill RD. Non-and semi parametric maximum likelihood estimators and the von
Mises method (Part I). Scandinavian Journal of Statistics. 1989;16:97-128.

Etzioni RD, Feuer EJ, Sullivan SD, Lin D, Hu C, Ramsey SD. On the use of
survival analysis techniques to estimate medical care costs. Journal of Health
Economics. 1999;18:365-380.

Huang YJ, Louis TA. Nonparametric estimation of the joint distribution of
survival time and mark variables. Biometrika. Dec 1998;85(4):785-798.

Vaart AWvd. Asymptotic Statistics. Cambridge University Press. December 1998
1998, 460 pages.

Newey WK, McFadden D. Large sample estimation and hypothesis. Handbook of
econometrics. 1994;1Vz211 1-2245.

Wooldridge J M. Econometric Analysis of C ross Section and Panel Data.
Cambridge, MA: MIT Press; 2001.

Volpp KG, Williams SV, Waldfogel J, et al. Market reform in New Jersey and the
effect on mortality from acute myocardial infarction. Health services research.
Apr 2003;38(2):515-533.

National Heart Lung and Blood Institute. Facts About Coromary Heart Disease.
http://www.nhlbi.nih.gov/health/public/heart/other/chdfactthtm.

American Heart Association. Heart Disease and Stroke Statistics -- 2004 Update.
Dallas, Texas 2004.

Smith SCJ, Dove JT, Jacobs AK, et al. ACC/AHA guidelines of percutaneous
coronary interventions (revision of the 1993 PT CA guidelines)--executive
summary. A report of the American College of Cardiology/American Heart
Association Task Force on Practice Guidelines (committee to revise the 1993
guidelines for percutaneous transluminal coronary angioplasty). J Am Coll
Cardiol. 2001;37(8):2215-2239.

American Heart Association. Angioplasty, Percutaneous Translunrinal Coronary
(PTCA). http://www.americanheart.org/presenter.jhtml?identiﬁer=4454.

Mitka M. Drug-Eluting Stents Show Promise. JAMA. Feb 11, 2004
2004;291(6):682.

197

70.

71.

72.

73.

74.

75.

76.

77.

78.

79.

80.

81.

Wisconsin Department of Health and Family Services. Why Charges May Diﬁ‘er
Between Facilities: Division of Health Care Financing, Bureau of Health
Information; 2001. Health Care Data Report, 2000 (PHC 5320).

Charlson ME, Pompei P, Ales KL, Mackenzie CR. A new method of classifying
prognostic comorbidity in longitudinal studies: development and validation.
Journal of Chronic Diseases. 1987;5:373-383.

Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use
with ICD-9-CM administrative data. Journal of Clinical Epidemiology.
1992;45:613-619.

Romano PS, Ross LL, Jollis JG. Adapting a clinical comorbidity index for use
with ICD-9-CM administrative data: differing perspectives. Journal of Clinical
Epidemiology. 1993;46:1075-1079.

Ghali WA, Hall RE, Rosen AK, Ash AS, Moskowitz MA. Searching for an
improved clinical comorbidity index for use with ICD-9-CM administrative data.
J Clin Epidemiol. Mar 1996;49(3):273-278.

Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and
Charlson/Deyo methods of comorbidity measurement in administrative data. Med
Care. Apr 2004;42(4):355-360.

Quan H, Parsons GA, Ghali WA. Validity of information on comorbidity derived
rom ICD-9-CCM administrative data. Med Care. Aug 2002;40(8):675-685.

Taira DA, Seto TB, Siegrist R, Cosgrove R, Berezin R, Cohen DJ. Comparison of
analytic approaches for the economic evaluation of new technologies alongside
multicenter clinical trials. American Heart Journal. 2003/3 2003;145(3):452-458.

White H. A Heteroskedasticity-Consistent Covariance-Matrix Estimator and a
Direct Test for Heteroskedasticity. Econometrica. 1980;48(4):817-838.

Liang KY, Zeger SL. Longitudinal Data-Analysis Using Generalized Linear-
Models. Biometrika. Apr 1986;73(1):13-22.

Diggle PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data. Oxford, UK:
Oxford University Press; 1994.

Weintraub WS, Mauldin PD, Becker E, Kosinski AS, King SB, 111. A Comparison
of the Costs of and Quality of Life After Coronary Angioplasty or Coronary
Surgery for Multivessel Coronary Artery Disease : Results From the Emory

198

82.

83.

84.

85.

86.

87.

88.

89.

90.

91.

92.

Angioplasty Versus Surgery Trial (EAST). Circulation. November 15, 1995
1995;92(10):2831-2840.

Faxon D. Myocardial Revascularization in 1997: Angioplasty Versus Bypass
Surgery. American Family Physician. 1997;56(5): 1409-1420.

Hlatky MA, Rogers WJ, Johnstone I, et al. Medical Care Costs and Quality of
Life after Randomization to Coronary Angioplasty or Coronary Bypass Surgery.
N Engl J Med. January 9, 1997 1997;336(2):92-99.

Sculpher MJ, Buxton MJ, Seed P, et al. Health service costs of coronary
angioplasty and coronary artery bypass surgery: the Randomised Intervention
Treatment of Angina (RITA) trial. The Lancet. 1994/10/1 1994;344(8927):927-
930.

Cowper PA, DeLong ER, Peterson ED, et al. Geographic variation in resource use
for coronary artery bypass surgery. Medical Care. 1997;35(4):320-333.

Kozak L, Hall M, Owings M. National Hospital Discharge Survey: 2000
Annual Summary with detailed diagnosis and procedure data. National Center for
Health Statistics. Vital Health Statistics. 2002;13(153).

Pokras R, Kozak L, McCarthy E, Graves E. Trends in Hospital Utilization: United
States 1965-1986. National Center for Health Statistics, Vital Health Statistics.
1989;13(101).

Gillum BS GE, Kozak LJ. Trends in Hospital Utilization: United States, 1988—92.
National Center for Health Statistics, Vital Health Statistics. 1996;13(124).

Blumenthal D. Controlling Health Care Expenditures. N Engl J Med. March 8,
2001 2001;344(10):766-769.

Morris CR. Too much of a good thing? Why health care spending won't make us
sick. New York: Century Foundation Press; 2000.

Anderson GF, Hussey PS. Population aging: a comparison among industrialized
countries. Health Aﬁ(Millwood). May-Jun 2000;19(3):191-203. '

Organisation for Economic Co-operation and Development. OECD Health Data
2004. 2004.

199

 

93.

94.

95.

96.

97.

98.

99.

100.

101.

102.

103.

104.

Anderson GF, Hurst J, Hussey PS, Jee-Hughes M. Health spending and outcomes:
Trends in OECD countries, 1960-1998. Health Affairs. May-Jun 2000;19(3):150-
157.

World Health Organization. Health systems: improving performance. Geneva
2000.

Gardiner J, Hogan A, Holmes-Rovner M, Rovner D, Grifﬁth L, Kupersmith J.
Confidence intervals for cost-effectiveness ratios. Medical Decision Making.
1995;15:254-263. '

Gardiner J, Holmes-Rovner M, Rovner D, Goddeeris J, Kupersmith J. Cost-
effectiveness ratios in multi-state Markov models. Paper presented at: Biometrics
Society, 1996; Richmond, VA.

Gardiner J C. The asymptotic distribution of mortality rates in competing risks
analyses. Scandinavian Jouranl of Statistics. 1982;9:31-36.

Gardiner JC. Properties of occurrence-exposure rates in multiple decrement
theory. Journal of Statistical Planning & Inference. 1983;81301-314.

Gardiner J C, Huebner M, J etton J, Bradley CJ. Power assessments and conﬁdence

intervals for cost-effectiveness analyses. Medical Decision Making.
1998;18(4):465.

Gardiner J C, Huebner M, J etton J, Bradley CJ. Power and sample size
assessments for tests of hypotheses on cost-effectiveness ratios. Health
Economics. 2000.

Lin DY. Proportional means regression for censored medical costs. Biometrics.
2000;56:775-778.

Willan AR, Lin DY. Incremental net benefit in randomized clinical trials.
Statistics in Medicine. Jun 15 2001;20(1 1): 1563-1574.

Willan AR. Analysis, sample size, and fewer for estimating incremental net health
beneﬁt from clinical trial data. Controlled Clinical Trials. Jun 2001;22(3):228-
237.

Perez-Ocon R, Ruiz-Castro JE, Gamiz-Rerez ML. A multivariate model to
measure the effect of treatments in survival to breast cancer. Biometrical Journal.
1998;40:703-715.

200

 

105.

106.

107.

108.

109.

110.

111.

112.

113.

114.

115.

116.

117.

Glasziou PP, Simes RJ, Gelber RD. Quality adjusted survival analysis. Statistics
in Medicine. 19909: 1259-1276.

Antiarrhythmics Versus Implantable Deﬁbrillators (AVID)--rationale, design, and
methods. Am J Cardiol. Mar 1 1995;75(7):470-475.

Connolly S, Gent M, Roberts R. Canadian Implantable Defibrillator Study
(CIDS). A randomized trial of the implantable cardioverter deﬁbrillator against
amiodarone. Circulation. 2000;101:1297-1302.

Gardiner CJ, Sirbu MC, Rahbar M. Update on statistical power and sample size
assessments for cost-effectiveness studies (Review). Expert Rev.
Pharmacoeconomics Outcomes Res. 2004;4(1):89-98.

Briggs AH, Gray AM. Power and sample size calculations for stochastic cost-
effectiveness analysis. Medical Decision Making. 1998;18(supplemental):881-
S92.

Sacristan J A, Day SJ, Navarro 0, Ramos J, Hernandez JM. Use of conﬁdence
intervals and sample size calculations in health economic studies. The Annals of
Phannacotherapy. 1995;29:719-725.

Laska EM, Meisner M, Siegel C. Power and sample size in cost-effectiveness
analysis. Med Decis Making. 1999;19:339-343.

Willan AR, Obrien BJ. Conﬁdence intervals for cost-effectiveness ratios: An
application of Fieller's theorem. Health Economics. J ul-aug 1996;5(4):297-305.

Willan AR, O'Brien BJ. Sample size and power issues in estimating incremental
cost-effectiveness ratios from clinical trials data. Health Economics. 1999;8z203-
211.

Thompson S. Statistical issues in cost-effectiveness analyses. Stat Methods Med
Res. Dec 2002;11(6):453-454.

Harrison JM. Brownian motion and stochastic ﬂow systems. New York: John
Wiley; 1985.

Oskendal B. Stochastic differential equations. New York: Springer-Verlag; 1995.

Protter P. Stochastic integration and differential equations. New-York: Springer-
Verlag; 1990.

201

     

G

illillllllliililglliliill

 

llllljilllll