z. .32? .
a’S-u‘11’7
: 2.. i2;
3. 3.2.1.3..
, . ... 5...
('n II. 28".191'...
1.51.3..- 1.... _.

a .
{v.39}... :331
3.17.. 12:3.

m

1

. c l. 3.
513.1... o. .I

I!“
1"}

9

50:.- 1‘3.
{5:}.I42’

1.. if

a. :3... ........

.I‘ .. 2:15....
.. :53 .

313......

13...... n
:..«.'!..J
c 39.4

7m
m
w
w
w.
m.
m
m
m.
w

 

 

 

 

 

 

 

' UniVersity l

 

 

This is to certify that the
dissertation entitled

Regression Models for ;
Analysis of Medical Costs l

presented by
Elena Polverejan I
i
has been accepted towards fulﬁllment

of the requirements for

PhoDo ﬁdegreeinitatistig

   
 

Joseph Gardiner
Date_duly 31, 2001*

- -- - l

Ma «professor l
I

l

l

l

MS U i: an Afﬁrmative Action/Equal Opportunity Institution I 012771

 

REGRESSION MODELS FOR ANALYSIS
OF MEDICAL COSTS

By

Elena Polverejan

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2001

 

REGRES

R;sin
intznentions
Hospital cos
Knouing the
decisions on

Incre
Vanables joi;
353535 [he in
Parametric o
lth‘CSSiOIl CC
COWIRIIOn b

In an-
funCUOn 0f P

LOS and cos

ABSTRACT
REGRESSION MODELS FOR ANALYSIS OF MEDICAL COSTS
By

Elena Polverejan

Rising cost of health care and the need for evaluating costs of new medical
interventions have led to interest in developing methods for medical cost analysis.
Hospital costs constitute a signiﬁcant proportion of overall expenditures in health care.
Knowing the correlates of in-hospital length of stay (LOS) and cost is important for
decisions on allocating resources.

Increasing availability of patient speciﬁc LOS and cost permits analysis of these
variables jointly, accounting for their likely correlation. A bivariate model is used to
assess the impact of covariates on these outcomes. Under marginal speciﬁcation through
parametric or Cox regression models for LOS and cost, standard errors of estimates of
regression coefﬁcients are obtained using a robust covariance matrix to account for
correlation between LOS and cost that is otherwise left unspeciﬁed.

In another model, we use a conditional approach to estimate mean costs as a
function of patient hospital stay and adjusts for the inﬂuence of patient characteristics on
LOS and cost. The mean cost over a speciﬁed duration is a weighted average of the
expected cumulative cost, with weighting determined by the distribution of LOS.

We extend this model to address costs and resource utilization in longitudinal
studies when patient histories evolve through several health states. In these studies costs

are incurred in random amounts at random times as patients transit through different

 

health states
Ma'kov pro
paicnl char
of expend:
sopums in
of all costs
mind Stale
cxpected in

Our
controlling
competing -

statistics an

health states. We describe the evolution of a patient’s health history by a continuous time
Markov process with ﬁnite state space. Dependence of the transition intensities on
patient characteristics is modeled through semiparametric regression models. Two types
of expenditures are incurred, one at transitions between health states and the other for
sojourns in a health state. Over a ﬁxed follow up period, we consider net present values
of all costs incurred in this period for the two types of expenditures. Conditional on the
initial state and a speciﬁed covariate vector, we obtain consistent estimates of the
expected net present values and derive their asymptotic distributions.

Our methods provide ﬂexible approaches to estimating medical costs while
controlling for the effects of covariates. In addition, for economic evaluation studies of
competing medical interventions, our methods can be applied to estimate summary

statistics and cost-effectiveness ratios.

The beauty of Science is that one can
say something (useful) without having

to say everything.

To my husband and parents, for all of their love and support!

 

 

 

 

l v. c
guidance 9.]
research to
advice on
members i
Agency it

T.
Amma

chOUI-EE

owl

undersr.

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Joseph C. Gardiner for all of his patient
guidance and support during my graduate studies. The freedom he has given me in my
research to investigate different areas is greatly appreciated and I am also thankful for his
advice on the non-statistics related issues. I also want to express my thanks to the other
members of my committee for their friendly support. My research was supported by the
Agency for Healthcare Research & Quality, under Grant lROlHSO9514.

To my family, I extend my love and thanks for being so supportive. My parents,
Adrian and Valentina Feleaga, my sister Anamaria have continually offered me
encouragement.

My deepest gratitude goes to my husband Mihai, for all the love and

understanding he has given me over the years.

 

 

 

 

LIST OF'
LIST OF '
ABBRFA

lntrodr

Chapte
A BIVr

AND C
1.1 '

1.2

TABLE OF CONTENTS

LIST OF TABLES ...........................................................................
LIST OF FIGURES .........................................................................
ABBREVIATIONS .........................................................................

Introduction ...............................................................................

Chapter 1
A BIVARIATE MODEL FOR HOSPITAL LENGTH OF STAY
AND COST ................................................................................

1.1 Semiparametric Marginal Models ...............................................

1.1.1Estimation of the Regression Parameters and Integrated Baseline
Hazards .........................................................................
1.1.2 Large Sample Properties of the Estimators of Regression
Parameters and Integrated Baseline Hazards .............................
1.1.3 Large Sample Properties of the Estimators of Survival
Functions .......................................................................
1.1.4 Point Estimates and Conﬁdence Intervals for Median LOS and

Median Cost ....................................................................

1.2 Parametric Marginal Models ......................................................
1.2.1 Estimation of the Model Parameters ........................................
1.2.2 Large Sample Properties of the Parameter Estimators ...................

1.2.3 Point Estimates and Conﬁdence Intervals for Median LOS and
Median Cost; Application for the Bivariate Normal Case ..............
1.3 Application ..........................................................................

Chapter 2
ESTIMATING HOSPITAL COST OVER A SPECIFIED

DURATION ................................................................................
2.1 Model Description ..................................................................

2.2 Application ..........................................................................

Chapter 3
ESTIMATING MEDICAL COSTS IN LONGITUDINAL

STUDIES ...................................................................................
3.1 Model Description .................................................................

3.1.1 A Markov Model for Describing Patient Health Histories ..............
3.1.2 Incorporating Costs in the Markov Model ................................

3.2 Estimation of the Mean Transition Cost and Mean Sojoum Cost ...........

vi

viii
ix

X

13
14

16
21
30

38
41

47

69
74

91
91
93

100
101
101
104
107

3.2.1Estimation of the Regression Parameters and Integrated Baseline
Intensities ......................................................................
3.2.2Estimation of the Transition Probabilities .................................

3.2.3 Estimation of the Mean Transition Cost ...................................

3.2.4Estimation of the Mean Sojoum Rate .....................................

3.3 Large Sample Properties of the Mean Cost Estimators ........................

3.3.1 Uniform Consistency of the Mean Cost Estimators ......................

3.3.2 Asymptotic Distribution of the Mean Transition Cost ..................

3.3.3 Asymptotic Distribution of the Mean Sojoum Cost .....................
APPENDIX A

EXTENSION OF SLLN ON DE ([0,1]2) .......................................

APPENDIX B
FUNCTIONAL DELTA METHOD ............................................
APPENDIX C
RESULTS ON ITO INTEGRATION ..........................................
REFERENCES ..........................................................................

vii

107
116
122
132
135
135
138
152

160

174

179
183

 

 

 

Table 1

Table 1

Table 1.

Table 2.

Table 1.1

Table 1.2

Table 1.3

Table 2.1

LIST OF TABLES

Characteristics of Patients ...................................................

Length of Stay and Costs by Comorbidity and Discharge Status
(Semiparametric Model) ....................................................

Length of Stay and Costs by Comorbidity and Discharge Status
(Parametric Model) ..........................................................

Estimates of mean cost at duration times by comorbidity and
discharge status ...............................................................

viii

85

89

90

98

 

 

 

 

ﬁwmlJ

ﬁwmll

“WNIJ

58ml?

Figure 1.1

Figure 1.2

Figure 1.3

Figure 2.1

LIST OF FIGURES

Distribution of Costs and LOS ..............................................

Estimated LOS survival function and approximate 95% adjusted
(-——) and naive (- - - —) pointwise conﬁdence intervals. Estimates
were made for a patient discharged alive, who underwent CATH,
with a CCI of 4+, ejection fraction 50+, age 65 at admission and no

history of prior CABG ......................................................

Estimated cost survival function and approximate 95% adjusted
(——) and naive (- - - -) pointwise conﬁdence intervals. Estimates
were made for a patient discharged alive, who underwent CATH,
with a CCI of 4+, ejection fraction 50+, age 65 at admission and no
history of prior CABG .......................................................

Estimated mean cost at duration times by comorbidity (for
survivors with age 65 at admission, ejection fraction 50+, no
history of prior CABG, who underwent catheterization during their

hospital stay) ..................................................................

86

87

88

99

 

 

 

 

AB

CABG
CATH
LOS
DRG
1011
1CD
i.i.c1

CLT
SLLN

CER
CABG
CATH

DRG
AMI
ICD
i.i.d.

CLT
SLLN

ABBREVIATIONS

-Cost-Effectiveness Ratio
-Coronary Artery Bypass Grafting
-Catheterization

-Length of Stay

-Diagnosis Related Group

-Acute Myocardial Infarction
-Imp1antable Cardioverter Deﬁbrillator
-Independent Identically Distributed
-End of Statement

-End of Proof

-Central Limit Theorem

—Strong Law of Large Numbers

INTRODUCTION

Over the past decade the need to control health care expenditures in an
environment of limited budgets has led health care providers and government planners to
turn to cost analyses and cost-effectiveness analyses as an aid to decision making in
allocation of health care dollars. While the primary goals of clinical studies are centered
on patient outcomes, more attention has being paid to collecting economic data alongside
traditional clinical investigations of efﬁcacy of interventions such as randomized clinical
trials and prospective cohort studies. Discrete patient-level cost and resource use data
will become increasingly available. Therefore there is interest in developing rigorous
statistical techniques to analyze both cost and health outcomes.

In many situations two competing interventions need to be compared on their
health beneﬁts and costs. When an intervention is more effective and more costly than its
comparator, the cost-effectiveness ratio (CER) is deﬁned as the ratio of the incremental
cost relative to the incremental beneﬁt. With beneﬁts measured in their natural units
such as years of life saved or number of lives saved and costs measured in dollars, the
CER is stated in dollars per unit of effectiveness. When health beneﬁt is measured by
gain in life expectancy, the cost-effectiveness ratio is the additional cost of the new
intervention to deliver one unit of beneﬁt and is expressed in dollars per life year saved.
Assessment and estimation of the CER are an important part of conducting economic
evaluations of health care programs. A goal of our research is to address the

speciﬁcation, estimation and evaluation of statistical methods for costs and to

 

 

enumerate \

 

developing T

 
  

appropriate r
Use.
notice and

role oi time p

 

manifest in e;
accumulating
random ever
in 1116mm

death and m

Dh'mnacolo

“Wm. the

3° C17) is

“31’ ('denou

Esti

Some Of thc

deOgraph

army of

next goal m

Elm) j z)-
1103ij Cost

m 1'6”“ age 6.

demonstrate their application to cost-effectiveness analysis. In this thesis we focus on
developing rigorous models for analysis of cost data. Integrating these cost models with
appropriate models for assessing health outcomes is the subject of future research.
Usually costs accrue due to resource use over time. Both the amount of the
resource and the time at which it is used vary across individuals. To fully integrate the
role of time into analyses of costs we would need the cumulative cost histories as they
manifest in each patient. For a treatment or intervention under study, let C(t) denote the
accumulating cost over time tin an individual patient. Expenditures terminate at a
random event time T that signals the occunence of some health outcome. For example,
in the treatment of cancer patients following diagnosis, the study endpoint T is the time of
death and the cost at the endpoint C(T) is the lifetime cost from diagnosis. In a
pharmacological intervention in patients with serum cholesterol elevated above 240
mg/dL, the endpoint T might be the ﬁrst time the cholesterol level falls below 200 mg/dL,
so C(T) is the total treatment cost. For hospital cost studies the endpoint is the length of
stay (denoted LOS), and C(T) is the cumulative cost from admission through discharge.
Estimating the distribution of the total cost C(T) or assessing its correlates are
some of the objectives of cost studies. Correlates (called also covariates) might be
demographic factors such as age, gender, race, education or clinical factors such as
severity of the disease. Once the signiﬁcant correlates for total cost are determined, the
next goal might be the estimation of a summary statistic such as the expected total cost
E(C(T) | Z) or, the median cost m(Z) for speciﬁed covariate proﬁles. For example, in a
hospital cost study one might want to estimate the expected total hospital cost for a male

patient, age 65 at admission, hospitalized for acute myocardial infarction (AMI),

 

 

 

 

undergoing .
padent cos: r

E(C(IJlZ)~

 

 

 

cost example
cost after9 d
for AW, and
Statisr
include 1'1 gm.
censored cos:
observations
Salem] p610

has 10 COW V

The .
simple m 6111
ﬁqnwm
um ‘0 mitt
applied to It.
effeqs of Pa
Using an ob:

mod“ etc-

undergoing coronary artery bypass surgery (CABG). If there is interest in modeling
patient cost histories, one objective might be the estimation of the expected cost

E(C(t) | Z) over a given ﬁxed duration t, for speciﬁed covariate proﬁles. In the hospital

cost example, the objective of interest could be the estimation of the expected hospital
cost after 9 days of hospitalization for a male patient, age 65 at admission, hospitalized
for AMI, undergoing CABG.

Statistical analyses of cost data must address several technical problems.1 There
include right-skewed cost data, a signiﬁcant proportion of zero observations, right-
censored cost data, correlation between time and cost outcomes and dependent
observations when costs are ascertained at multiple time points (for instance in each of
several periods during the course of an intervention). Every statistical model for costs

has to cope with at least one of these issues.

Skewness and Transformation

The distribution of costs might exhibit a considerable degree of skewness. Then
simple methods of analysis based on an assumed normal distribution of the cost variable
C=C( T) will not be tenable. A transformation 3 such as logarithm or square-root can be
used to mitigate the effects of this skewness. Standard regression analyses may then be
applied to the transformed dependent variable g( C), which permit assessment of the

effects of patient and intervention characteristics Z that influence the cost distribution.
Using an observed sample {(ani ) : l S i S n} , least-squares estimation of 13 in the

model g(C,) = Z}? + 81 needs only a simple moment structure for the errors 8,. .2' 3 A

 

 

 

 

retransfonn.

of measure:
logiQ) = Z . ,
cost at a spe.

E“ N(0,U:

mean cost E .

nonnality is .

residuals log
estimation of
and median c\

5115pr the C:

 

inference. W:
OfC 311d 3 par

SimpllClly of [I

appmaCh [hm l

 

 

 

 

retransformation then reproduces the results of these analyses back in their original units

of measurement, permitting easy interpretation. For example, we could use the model

log(C,) = 2:13 + 8,. , where the errors 8,. have zero mean and variance 0’. Then the mean
cost at a speciﬁed covariate proﬁle Z0 is E (C | 20) = exp(ﬂ 'Z0)E(exp a) . If
8,. ~ N (0, 0’) , so the costs are logonormally distributed, the simple closed form for the

mean cost E (C I Z0) = exp(,13'Zo + 0.502) make the analyses quite straightforward. If

normality is untenable, one can use a smearing estimate for E (exp 8) , based on the

residuals log C, — 3'2, = 3‘, .4 In the absence of covariates, classical maximum likelihood

estimation of parameters in log—normal distributions have been used to compare mean
and median costs in two independent samples.5 When the parametric assumptions are

suspect, the estimates of the standard error of ,8 could be imprecise, leading to invalid

inference. When transformations can substantially eliminate the skew in the distribution
of C and a parametric distribution can be assumed for E , the advantage lies in the relative
simplicity of the analysis and greater efﬁciency of estimates compared to a nonparametric

approach that leaves the distribution of 8 unspeciﬁed.4

/

4 \
/

Two-Part Models

When sampling an eligible population for assessing the costs of medical services
during a given period, a large proportion would not have used any services so would not
have incurred any medical costs. In these circumstances, the cost measure C is positive

only for users and the zero costs cannot be ignored econometrically. The two-part model

assumes that P(C > 0 | Z) is governed by a parametric binary probability model like logit
or probit (part one) and that E(g(C) | C > 0,Z) is a linear function of 2 (part two), where
g is a transformation applied to moderate the effects of skewness. The objective is to
obtain an estimate of the overall mean E (C | Z ) .6 The ﬁrst part governs the probability
of some expenditure and the second part models the level of the expenditure, given that
there is a positive expense.

Two-part models are used not only for medical costs, but also for many other
outcomes, such as measures of health care utilization (e.g. number of physician visits
over a speciﬁed period), health care outcomes (claims data) or measures of use of

substance use I abuse (tobacco, alcohol, illicit drugs).

Right-Censored Cost Observations

Due to incomplete patient follow-up, in many studies the endpoint T and the total
cost C( T) for some patients are right-censored. For example, in clinical trials with
staggered entry of patients, the signaling event of interest might not have occurred by the
close of the trial. If U denotes the follow-up time for a subject, with right-censorship the

observable data are restricted to X = min(T,U ) , the smaller of T and U, the indicator of
non-censoring 6 = [T S U] that denotes whether T (if 6 =1) or U (if 6 = 0) was

observed, and the covariate vector 2. If T is not censored we observe the true cost C( T).
If T is censored we observe the cost up to the follow-up time U, but we know that

C(T) > C(U).

Because of the analogy with censored survival times, censored medical costs have
been analyzed by Cox regression and other survival analysis techniques. An earlier work
by Dudley et at.7 explored the idea by comparing different analytic models for the cost of
CABG surgery. In addition to the Cox model, these investigators studied the 01.8
method, with and without a log transformation of the cost variable C, a parametric
Weibull regression model, and a binary logistic model using a dichotomization of C. In
their analysis, the Cox model provided the most accurate estimates of the mean and
median cost of CABG surgery, and the proportion of patients with high cost (>$20,000).
This method was successfully reapplied to assess the determinants of costs in CABG
surgery.8

Recent articles9"2 have questioned the appropriateness of survival analytic
methods for medical costs particularly in the treatment of censoring. The total costs C(T)
would not, in general, be independent of C(U) even if T and U are independent. To apply
standard survival analysis methods, we would need the independence of C(T) and C(U)
given the covariate proﬁle 2.

When cumulative cost histories are available over time, greater ﬂexibility in
modeling is possible that skirt the issue of censored costs. In a discussion of different
models for predicting the cost of illness, Lipscomb et al.l applied a proportional hazards
model for the cost intensity a(c | Z) on a large data set of Medicare patients hospitalized
for stroke. Costs were analyzed for a 36-month period following hospital discharge. The
unit of analysis was a patient-month making the potential cost incurred in month (1) right
censored if the patient died during that month. If this occurred, only costs through the

ﬁrst (j—l) months were considered, thereby skirting the issue of censored costs. Any

dependence of costs in month (1) on the stroke patient’s cost history is captured in the
regression model by using as covariates the initial cost of hospitalization and costs in
follow up months 0-1) and (i—2). Also included were patient characteristics such as age,
race, gender and economic status. The investigators found that the Cox proportional
hazards model and the two-part model were superior in their ability to predict accurately
the distribution of costs, based on a logarithmic scoring rule to compare models. For
predictions of mean and median costs these models and log-transformed linear models
performed equally well.

Because C(T) and C(U) cannot in general be independent, Lin et al. 1° proposed
two alternative ways to analyze cost in trials with incomplete patient follow-up. An
assumption is made that patients are not censored because they accrue unusually high or
low costs. Under one approach, if cost histories are available for each patient, an
estimate called the Kaplan-Meier Sampling Average (KMSA) estimate of the average
cost EC(T) is computed. It is essentially an average of costs incurred in each of several

time periods in [0,T] , weighted by the Kaplan-Meier estimate of survival at the start of

each period. If cost histories are not available, the second approach bases cost estimates
on the subset of patients who experience the “event” at issue (in their example death).
The properties of these estimators are dependent on the assumption of discrete censoring
times, which is not true in general. Bang and Tsiatisl3 introduced a class of weighted
estimators for mean medical costs, which account appropriately for censoring. Besides
the consistency and the asymptotic normality, the efﬁciency of their estimators was also

studied. None of these methods incorporate covariates in the cost modeling.

The KMSA technique to deﬁning average cost is similar to the approach taken by
Gardiner et al. ”‘ '5 in evaluating the cost-effectiveness of the Implantable Cardioverter

Deﬁbrillator (ICD). Expected total cost over a ﬁxed time interval [0,to] was deﬁned

by I;°e""S(t)dC(t) , where r is the discount rate, S the survival function and C(t) the value

of cumulative resource use up to time t. Apart from the discounting, the integrand

weights by S(t) the incremental expenditure dC(t) in the small interval [t,t + dt]. The

cost C(.) was assumed to be nonstochastic. It was derived from Medicare payments, drug
charges and physician fees for services that were associated with the interventions. In
their application to the cost-effectiveness of the ICD, cost comprised of expenditures over
72, 30-day periods.

Very recently methods have been proposed to explicitly account for cost
censoring and also adjust for patient characteristics. Lin"S modiﬁed in several ways the
familiar normal equations for least-squares estimation. The total cost in subjects with
complete follow-up is used in the proposed methodology. More efﬁcient estimators are
provided when cost data are recorded in multiple time intervals. Lin'l also developed a
methodology that speciﬁes multiplicative rather than additive covariate effects on the
mean medical cost. He proposed a semiparametric proportional means regression model
for the cumulative medical cost. This model speciﬁes that the mean cost function over
time, conditional on a set of covariates, is equal to an arbitrary baseline mean function
multiplied by an exponential regression function. The corresponding inference

procedures are based on possibly censored observations of the lifetime cost.

Objectives and Structure of Thesis

The existing literature on cost analysis is still in its infancy. Several issues of
practical importance have yet to be address. We believe that our research will help ﬁll
some methodological gaps on modeling and estimation of medical costs, particularly in
longitudinal studies.

With the growing availability of large databases on patient-level health care
utilization and outcomes, there is need to develop statistical techniques to analyze jointly
both costs and patient outcomes. Current methods generally focus on a single measure of
cost or health outcome and do not fully exploit the longitudinal dynamic mechanisms that
engender cost and health outcome data. How individual characteristics might impact
summary statistics such as mean, median cost and survival are key to predicting resource
utilization and informing policy on allocating health care dollars.

- A simple, although important situation in which our proposed methods would
apply is in assessing the correlates of hospital cost and LOS jointly. Several studies have

'7' '8, while others have focused on the

used LOS as a proxy for resource utilization
correlates of hospital cost or charge'g'zz. Because of the likely correlation between cost
and LOS, it is unclear if a model that explicitly recognizes this correlation would lead to
different quantitative results. Other issues such as the skewness in the distribution of
cost, whether or not in-hospital deaths should be regarded as censoring events also add to
the complexity of the joint analyses of LOS and hospital cost.

In Chapter 1 we develop a bivariate model for the two outcomes, LOS and cost,

from a common constellation of covariates that might inﬂuence their joint distribution.

Cost per patient, measured in monetary units, is the total resource use from admission to
discharge, and LOS, measured in time units, is the duration of the hospital stay. Our
regression analyses account for three important aspects. First, because the distributions
of cost and LOS are skewed, for each outcome we use either a semiparametric Cox model
or a linear model applied to a transformation of the outcome. Second, the model
accounts for incomplete observations in both outcomes. Finally, although the correlation
between cost and DOS is not a primary concern, we use methods developed for multiple
failure times23 to adjust for its impact on the standard errors of regression coefﬁcients.
The proposed methods are applied to assess the inﬂuence of comorbidity and
demographic factors on LOS and hospital costs in a cohort of patients who underwent
CABG surgery.

The longitudinal framework that underlies survival analytic techniques provides a
natural setting for a complete speciﬁcation of alternative models for estimating costs. In
Chapters 2 and 3 we develop several models of increasing complexity for health care
costs and outcomes. In these models costs are considered to dynamically evolve over
time.

In Chapter 2 we focus on cost alone including LOS among other patient
characteristics as potential correlates of the accumulating hospital cost. The model
permits estimation of mean cost over a given duration of hospital stay. The mean cost
over a speciﬁed duration is a weighted average of the expected cumulative cost, with
weighting determined by the distribution of [.08. We demonstrate the application of this

technique using the same study as in Chapter 1. The described model can be

10

incorporated in a more general setup, in longitudinal studies with multiple health states
and transitions between them.

When a health care intervention is deployed, costs are engendered through the use
of resources. These occur in random amount at random times that might differ among
patients. Incorporating these components into statistical models that accurately reﬂect
the patient health histories permits consideration of health outcomes and costs jointly. In
the actuarial literature it is common to model the stochastic mechanism governing the
events that trigger the payment of life insurance beneﬁts as a Markov chain with ﬁnite
state space, where each sample path is interpreted as the life history of the insured.2MB
We extend and adapt these models to our context.

In Chapter 3 we propose longitudinal stochastic models that reﬂect the
experience of patients in sustaining and changing states of health. We use a Markov
model29 to describe the evolution of patient histories over time. Dependence of the
transition intensities on patient characteristics is modeled through the Cox regression
model. We consider two types of costs that might be incurred in the course of follow-up:
costs at transitions between health states (eg., cost of diagnosis of a condition) and costs
of sojoums in a health state (eg., cost of the treatment of that particular condition).
Present values are obtained by discounting all expenditures at a ﬁxed rate. Conditional
on the initial state and a speciﬁed covariate proﬁle, we provide estimators of the expected
present value of these two types of costs incurred over a ﬁxed follow-up period. Under

additional assumptions, these estimators can be shown to be consistent and

asymptotically normal, which sets the stage necessary for statistical inference.

ll

Our proposed methods have the capability of incorporating concomitant covariate
information for the time and cost outcomes. Both fully parametric and semiparametric
models were studied, including regression models for the transformed response variables,
Cox regression and Markov models that specify covariate effects in transition intensities.
These models are based on a natural time—cost setting and have easy interpretation. As a
result, the methodologies developed in this thesis are very promising to provide a
ﬂexible, uniﬁed framework for statistical inference on summary statistics (such as CER)

used in cost-effectiveness analysis.

12

CHAPTER 1

A BIVARIATE MODEL FOR

HOSPITAL LENGTH OF STAY AND COST

Hospitalizations constitute a signiﬁcant proportion of overall expenditure in
health care. Length of stay (LOS) is often used as a surrogate for hospital cost or charge.
However, the increasing availability of databases with patient speciﬁc LOS and cost
permits analyses of these variables jointly, accounting for their likely correlation. In this
chapter we explore the differences and advantages of using a bivariate model, compared
to separate univariate models for assessing the impact of covariates on LOS and cost.
Under marginal speciﬁcation through semiparametric or fully parametric regression
models for LOS and cost, standard errors of estimates of regression coefﬁcients are
obtained using a robust covariance matrix” 30 to account for correlation between LOS
and cost. These models account for incomplete observations in both outcomes.

In Section 1.1 we use a Cox regression model for each of the LOS and cost
outcomes and in Section 1.2 a parametric model. In Section 1.3 we apply the proposed
methods to LOS and hospital cost in a cohort of patients who underwent coronary artery

bypass surgery (CABG).

l3

1.1 Semiparametric Marginal Models

Consider n individuals in a study. Censoring occurs when assessment of 108 and
cost are made at some ﬁxed calendar time. This is called administrative censoring. At
that time some patients may not have completed their LOS or incurred all their costs.

For the i-th patient we observe the true L08 7} or the censoring time 7}’ ,
whichever occurs ﬁrst, and ZU(.) , a vector of p explanatory variables that depends on the
current time t since admission. We restrict the time t to a ﬁnite interval [0, 1'1] , r, < 0°.
The nonnegative variables I} , 17 and the process Z,,-(.) are deﬁned on the probability
space ($21,.77,P,). Let X1,- = min(7},7}’) , 61,. = [1} S 1}], where L] is the indicator
function of the displayed event.

For the i-th patient we also observe X 2,. = min(C,,C,-') , 62,- : [Ci 5 CI] and 22,- (.) ,
where C,- is the total hospital cost, C,’ is the censoring cost and 22,-(.) a vector of p
explanatory variables that depends on the current cost c since admission. The costs are
restricted to a ﬁnite interval [0,12] , 2'2 < 0°. The nonnegative variables C,- , C,’ and the
process Z2,(.) are deﬁned on the probability space (522,33, P2).

The following hazard functions a“, a2, relate the covariates to the distributions

of 7;. and q.
For L08: 0:“- (t, [3.0) = alo(t)exp(/3,’0Zh-(t)). (1.1)
For cost: 02,. (c, [320) = aQO(C)exp(ﬂ§oZZ,- (c)). (1.2)

14

We assume the vectors of true values of the regression parameters 610, 1320 to have
dimension p. The underlying intensities am(.) , 0'20 (.) are the baseline intensities

corresponding to zero covariates and they are left completely unspeciﬁed. The subscripts

will allow us to distinguish between covariate effects on LOS and cost. Given a ﬁxed

covariate proﬁle 20 and 7;- 2 r , the hazard a1,- (t, 510 | 20) is the instantaneous probability
that LOS would end just after time 1. Similarly, given 20 and C,- 2 c , the hazard

a2,(c, [320 | 20) is interpreted as the instantaneous probability that total cost would be

realized just above the level c.I

With independent identically distributed (i.i.d.) data (X ”,6‘ ,-,Z"(.)) on n patients
we obtain on the basis of (1.1) the estimate ,3, of 6.0 by maximum partial likelihood

estimation and the Nelson-Aalen estimate 11,00, 31) of the integrated baseline hazard
A,o(r) = Lam(u)du . Analogously, with i.i.d. data (X 2i,62,-,Z2,-(.)) we use (1.2) to

obtain the corresponding estimates 32 and A20(c, [92). Following Wei et al. (1989)”,

(31’, ﬂé)’ has an asymptotic 2p-variate normal distribution whose covariance matrix can

be consistently estimated.

The survival distributions of LOS and cost at a ﬁxed (i.e. time-independent)

covariate proﬁle 20 are respectively, 5,0 | 20) = P(T, > t | Z0) = exp(-A,o(t)e“°z°) and
s,(c | 2,) = P(C, > c | 20) = exp(-A,o(c)eﬂiozo). Their estimates, denoted by s, (t | 20)

and 5'2 (c | 20) , are obtained by replacing the unknown quantities by the aforementioned

estimates. We will show that for ﬁxed time t and cost c, given a ﬁxed covariate proﬁle

15

20, {um (S, (t | 20) — s,(t IZO)),n1/2(S2(c|Zo)- S2(c | 20))} is asymptotically bivariate

normal, with zero-mean vector and a covariance matrix CS that can be consistently
estimated from the data. We call CS the adjusted covariance matrix. Approximate

pointwise 95% conﬁdence intervals for S,(t | 20) and 52 (c | Z0) are calculated and point

estimates and approximate 95% conﬁdence intervals for the median LOS and median

cost are obtained from the estimated survival curves by the procedure described in p511-

512, Andersen et al. (1993).29

1.1.1 Estimation of the Regression Parameters and

Integrated Baseline Hazards

Deﬁne of each patient the processes N ”(t) = [X l,- 5 1,61,. =1] and
l’,,-(t) = [X l,- 2 t]. Aggregated over all n patients, the processes Nl(r) = 22.1 N,,-(r) and

Y, (t) = 2;] Yul!) denote respectively the number of patients with completed LOS by
time t and the number who have not completed their hospital stay at time just prior to
time r from admission. Similarly deﬁne N2,-(c) , Y2,-(c). Here the aggregated process

N2 (c) denotes the number of patients whose total hospital costs, completely observed, do
not exceed c, and Y2(c) the number of patients whose current hospital cost is least c. In

the sequel for notational convenience we will use a single generic argument u for all

processes remembering that the subscript 1 is associated with time and the subscript 2

with cost.

16

We need some standard notation. Let

Sl°’(u. ﬂ.) = Z" Y..(u)exp(mz,.(u».

i=1
Si"<u.ﬂr) = 2;, Yr.- (1412,... (uicxpwzzs (an .
s;2’(u,13,) = 2" Yb. (u)z,,,. (u)z;,. (u)exp(ﬁ;z,,. (11)) , k =1,2.

i=1

Note that Sim is a scalar, Si” a p-dimensional vector and S?) a po matrix.

Deﬁne
E. (14.151)=Sl”(u.ﬂr)/S§°’(u.ﬂt).

V. (14.191)={S§2)(u.ﬁt)/Sl°’(u.ﬂt )}—E. (w. )E. (14.13. )’.

I.(ﬂ.)= E‘V1(u.ﬁr)dNr(u).

3:0)(u’ﬂk) = E(Y“(u)exp(ﬂ;Z“(u))) .
sg"(u, ,6, ) = 12(1'“(u)z,,(u)exp(,6;z,,(u))) ,

s§2’(u, 13,) = 15(Y,,(u)z,,(u)z,,(u)’exp(ﬁ;z,,(u))) , where e(.) denotes the
expectation of the displayed quantity with respect to the corresponding probability.

Under some regularity conditions 3', n"S,£’")(u, ﬁr) converges in probability to

31"")(u, ,8, ) , uniformly in some neighborhood of ,8“, and in u 6 [0,11]. We will provide

later some details. Deﬁne
e,‘ (Ur ﬁr) = 8904. ﬂulsl‘” (u, ﬂit ) s

1’1 (“a ﬁt ) = {31(3) (u, ﬂit )/ 3:0)(1‘, .81; )} " 310‘: ﬁt; >91 (14. [3* )2

17

2.09.) = 1," v. (w. iti°’<u.ﬂ.>a.o<u>du .

We formulate our models and prove many of the results in the framework of

multivariate counting processes. A survey of this theory can be found in Andersen et al.

(1993).29 Our notations follow the ones of this reference.

A.1

A.2

A.3

A.4

A.5

A.6

A.7

The following list of conditions will be assumed to hold throughout this section:

Model Assumptions:

Conditional on Z,,-(.) , 7} and 7}I are independent and conditional on Z25(.) , C,-

and Ci, are independent;

{[ X“- ){515 ],[zli('))}’l S i g n are i.i.d.;
XZi 62" ZZi(')

Ame.) = j," aromd: < .., Awe.) = I," a.o(c)dc < co;
Z.,(.) , er(-) are bounded;
Z,,-(.) , 22,-(.) are adapted, left-continuous with right-hand limits processes;

1101,04) = I,VuE [0.71 1) = P(Y“(r,‘) = 1) >0;

by
2* =mzk (ﬂko) IS pOSItivc definite.

norau'

Note:

A.l is also called the independent censoring assumption. A.2 implies that

(Nu (u),Yki(U),Zu(u),ue [O’Tk ]) , 15 i S n are I.I.d. A.4 and A.5 assure that Zli(') ,

18

22,-(.) are bounded, predictable processes. For k =1, the interpretation of A.6 is that

there is a positive probability that at any time r from admission a subject might not have
completed his/her hospital stay. For k = 2 , the interpretation of A.6 is that there is a
positive probability that the total cost of any subject might be larger than any cost c from
admission. Assumption A.7 is crucial for the (asymptotic) existence of the regression

parameter estimator. :1

Consider on the probability space (£21, .7: , P,) the right-continuous nondecreasing
family (15;,- (t),te [011]), where fl,(t) represents everything that happens up to the time
t for the i—th patient. Formally RV) = 171,2 (r) v R” (t) , where Ii,” (t) = 0'{Nl,-(s),s S t} ,
fh-Z (r) = a’{Z,,-(s),s S t} . Similarly we deﬁne the ﬁltration (1:2,.(c),ce [0, 12]) on the
probability space (522, .772 , P2).

Under the independent censoring assumption, N ,0. (.) is a counting process on
(9,: ,f; , Pk) with the .75)“. (.) — intensity process 21,“. (v, 13*) = a,“- (v, ,3,‘ )Yu (v) , where the
hazard function a,“- (., 13k ) was deﬁned in (1.1), (1.2). The processes M ,0- deﬁned by

M,,(u) = N,,(u) - ﬁrm, )3, )dv, u e 10.71.] (1.3)

are .7,“- (.) — local square integrable martingales on the interval [0, Ti] , with
(Mb->04): Emvﬁmv and (MU,M,j)=o for i¢j,i.e. M,“- and My are

orthogonal for i¢= j. The process Ak,(u, 13k) = EA,“- (v, [3,, )dv is called the compensator

l9

of the counting process N ,- (.) . For details of this result see Andersen et al. (1993)”,

Sections [1.4.1 and H122

Let of" = $2, e...®o,, 1,”) = f, camera, P,‘"’ = P, ®...®P, and
Jimm) = .75}, ®...®f,,, , where the product ®...®is over It factors. The family
(f,“’(u),u 6 [0,1, ]) is a ﬁltration on the n-th sample space (95,"),3”), P,(")). Then (see
Andersen et al. (1993)”, Section 1.4.3) N,,- has the same compensator A,,- with respect
to the product sample space (Q‘"’,f,‘"’ , Pf”) and the ﬁltration (ﬂ(")(u),u 6 [0,1, ]).
When LOS and cost quantities are considered jointly, the stochastic properties are

relative to the ﬁltration (yawn) e f,‘“(c),(t,c)e [0,1,]x[0,12]) on the product space

(a?!) 8991,5011 ®J:2(n)’Pl(n) ® 13203)) .

An estimator 6, of ,6", is obtained by maximizing the Cox partial likelihood (see

Andersen et a1. (1993)”, p483-484). The log-partial likelihood evaluated at time/cost u

has the form
C1049.) = 2;. g ﬂ;Z,,(V)dN,,(v) - f; log 51‘” (v, 13, )dN, (v) .

The vector U, (n.6,) of derivatives of C, (u,,6,) with respect to 19, is

(l)
U, (u, 5,) = 2;! E z,,(v)dN,,(v) - EWA». (1.4)

20

The maximum partial likelihood estimator ,6, of 6,0 is deﬁned as the solution of the

likelihood equation U, (1, , ﬂ, ) = 0. Then the Nelson-Aalen estimator of the integrated

baseline hazard A,0(u) = £a,o(v)dv is given by

J
Arrow 18k): E 5(0),;(1‘39‘) "(v)’

where J, (v) = [Y, (v) > 0] (see Andersen et al. (1993)”, Sections N1 and VII.2.1).

1.1.2 Large Sample Properties of the Estimators of Regression

Parameters and Integrated Baseline Hazards

The following conditions are necessary for the asymptotic properties of our
estimators. They were ﬁrst introduced by Andersen and Gill (1982).31 We use their

formulation from Andersen et al. (1993).29 Throughout this chapter the norm of a vector

a = (a,) or a matrix A = (a,-,-) is "a" = sup|a,| and “A" = suplaU-l , respectively.
1' i. j

Conditions C.a-C.f:

0
There exist a compact neighborhood 8, of [3,0, with [3,0 e B), (the interior of

B ), and scalar, vector and matrix functions 3“” , s“) , s 2) deﬁned on [0,1 ]xB such
It It i 1 Ir

that for me {0,1, 2} :

1 m m P
C.a sup l—Sé ’(u,13,)-s,§ ’(u,,6,) —>o;
(14.3. Flora”; n

21

C.b s‘"’(., .)is a uniformly continuous bounded function of (u, ﬂ,)e [0,1, ]xB, ,

C.c s£°)(.,.) is bounded away from zero;

ca S,l)(u, 13,)=—— a ——s,”(u, 13,);

1813* 3:0)“ :61) 3:2)(14 ﬁt)=— a

ﬁt

C.c 2, is positive deﬁnite;

C.f f a,o(u)du < co.

Under our model assumptions A.l-A.7 the conditions C.a-C.f are veriﬁed for the
functions 3“” , s1”, , s0) deﬁned in the previous sub-section. For a proof and a discussion

about these regularity conditions see Section 4 of Andersen and Gill (1982).31 Under

these general conditions, with a probability tending to one, there exists a unique

consistent solution 3, of the likelihood equation:

Theorem 1 (Theorem v11.2.1, p497, Andersen et al. (1993)”)

Under the assumptions A.l, A.2 and conditions C.a-C.f, the probability that the

P
equation U, (1,, 6,) = O has a unique solution ,6, tends to 1 and ,8, —) [3,0 as n —9 00.13

The assumption of bounded covariates is very important to prove the asymptotic
normality of (31', 35)’. This assumption implies the Lindeberg-type condition used in

Andersen et al. (1993).29 Wei et al. (1989)23 proved the following theorem. Because
some intermediate steps of the proof of this theorem will be used later, we sketch them

below.

22

Theorem 2

Under the model assumptions A.l-A.7, n“2 (61' — 6'0, 6; - 650) converges in

distribution to a zero mean normal 2p-dimensional random vector with covariance matrix

Q=(D,,,k,le{l,2}), where the po matrix 0,, is
DH = ZEIE(WH(ﬂkO)WIl(ﬂIO)’)zl-l .

with w,, (6,0) ap-dimensional vector,

Wiriwro) = E (211(10‘ ek (“tﬂkolldMqu -0

Sketch of the proof:

By (1.4) the score function U, (u, 6,) has the form

Uktu.ﬂ.)=2;, E(zatvl—E.(v.ﬁrl)d1v.,(vl.

Replacing N,,-(v) by (1.3), it follows immediately that

U, (u, ,9, ) = 2;, E (Z,,-(v) - E, (v, 6, ))dM,,-(v).
By Taylor expansion of U, (1, , 6,) around 6,0, we have
"-IIZUk (Tkvﬂko) = ("-1111 (ﬂ;))"“2 (.81 ’ﬂko)t
where -—I, (6,) is the matrix of derivatives of U, (1, , 6,) with respect to 6, and 6,: is

on the line segment between 6, and 6,0.

23

Step 1:
n’” 2 (U 1 (1,, 60)’, U2 (1,, 620 Y), converges in distribution to a zero mean normal

2p-dimensional random vector. The asymptotic covariance is B = (Bu,k,l 6 {1,2}) ,

where Bu = E(Wkl(ﬂk0)wll(1610)’)'

Step 2:

l O P i I P
n' I, (6,)—>2, for any random 6, such that 6, —>6,0.
Step 3:

I
"112 (6; - 6’0, 6; — 650) converges in distribution to a zero mean normal 2p-

dimensional random vector with asymptotic covariance matrix Q = A4811"l , where

A = diag(2,,22) and B was deﬁned in Step 1. II

The next theorem gives a consistent estimator Q of the asymptotic covariance

matrix Q. We follow closely the notations of Wei et al. (1989).”

Theorem 3

Under the model assumptions A. 1-A.7, the asymptotic covariance matrix
Q = (Du,k,l E {1, 2}) of n“2 (6,' - 6,3,6; - 650), is consistently estimated by
Q = (barre {1, 2}) , with

15.1 = "21{l(18r)§uIr-l(/9l).
where 6,, =n"z:;l 1%,, (6, )1'1‘2, (6, )’ and 111,,(6,) is a p-dimensional vector,

24

Wki(ﬁk) = E‘ (Zki(u) - E], (u, Bk ))de-(u, ﬂk ) ,

M,,(u,6,) = N,,.(u) — £Y,,(v)exp(6,Z,,(v))-§(T‘:’£:—v2—)dN, (v) .0
It . Ir

Wei et al. (1989)23 paper does not provide a proof of this result. A Although they
mention that the proof essentially uses the same techniques as in the proofs of Theorem 1
of Wei and Lachin (1984)32 and of Theorem 3.2 and Corollary 3.3 of Andersen and Gill
(1982)3 ', these references are not in the context of our model. In the parametric section

of this chapter we need a similar theorem and we provide all the details of its proof.

In the following we present several results related to the large sample properties

of the integrated baseline hazards estimators. As mentioned in Section 1.1.1, the

estimator of the integrated baseline hazard A,o(u) = £0,0(v)dv is given by

~ J
Arrows/91): Emmi“) .

where J, (v) = [Y, (v) > 0].
As shown in the proof of Theorem VII.2.3, p504, Andersen et al. (1993)”,

n“2 (A,0(u,6, ) - A,o(u)) can be expanded as
W. (Ill-""2091 war ﬁe.<v.ﬂ.a>aro<vldv+o.a>.

where W,(u)=n”2 Eﬁé‘ié—Sdum) and
k ’ k0

Mk(U) = 2:" Mn (1‘) =Nk (u) - ES£O)(V, ﬂko)ako(V)dv .

25

Therefore
n"2 (Actu.3.l—A.0(u))+n”2(8. 'ﬁko)’E e.<v.16,o)a.o(v)dv (1.5)

has the same asymptotic distribution as W, (u).

The following theorem describes the asymptotic distribution of (W,(tLWZ (c)) for

ﬁxed I, c.

Theorem 4

I

Under the model assumptions A.l-A.7, (W,(t),W2(c)) converges in distribution to

a zero mean bivariate normal random vector with covariance matrix
Q. = (D2109: 1“!)11‘116 {1,2})» where 01:1(“krurl = E(W;1(“rrtﬂko)Wn(urtﬂro)) . “Ir is

equal to t or c according as k =1 or 2 and

dMu___(_V)

m" ’3”) E—_ si°’(vﬂrc) 0

Proof of Theorem 4:

Let u denoting t or c, depending on the index k 6 {1,2} .

Wecanwrite W,(u)= n l’21:],(v)s (0)( :6 -——-——-)-dM,(v)=
Ito

 

M1’21()[ l Jth)—
E k :0)“ 510) 310)“ ﬂkO) k v

"112 er(V) -1/2 erO’)
E(l— Jk(V))—_—£o)(v,ﬂko)+ ‘E—szo)(v,ﬂko).

26

We will show

 

P
n‘"2 J ( )[ n - l ]dM ( )—>o, (1.6)
E k v 3:0)(Vrﬂk0) 3:0)(v9ﬂk0) k v

P
n“ l-J (v) —dM*—(")—>o. (1.7)
E( k )S£O)(Vtﬂko)

For this we use the Lenglart’s Inequality for local square integrable martingales and the
following proposition:
Lenglart’s Inequality (see p86, Andersen et al. (1993)”)

For every 77 >O,6>0

uE[0,1]

P( sup |M(u)| > 77) 5%,» P((M)(1)> 6),

where (M ) is the compensator of the martingale M. 1:1
Proposition (see Proposition 11.4.1, p78, Andersen et al. (1993)”)

Let the counting process N have intensity process ’1 , let M = N — I}. and let H

be locally bounded and predictable. Then M and IHdM are local square integrable

martingales with

(M)=J31,

(II-MM): [H210

In our case N, (u) = M, (u) - 55,0) (v, 6,0)a,o(v)dv and the quantities in the

integrals of (1.6) and (1.7) are bounded and predictable by our model assumptions.

Consequently both expressions in (1.6) and (1.7) are local square integrable martingales,

27

of the type IHdM , so we can calculate their compensators and apply the Lenglart’s

 

Inequality.
Let r) > 0,6 > O.
For (1.6):
- 1
P( sup n ”2] (v)[ n - )dM (v) >rp)S
1:610:11 E k 51:0)(151310) 31:0)(1411310) k

 

 

n-1S£0)(u, ﬂko)ako(V)dv > 6) .

 

2
S—+P( J (v)
712$ k [3,0107 1610) 3:0 ).(u 11810)]

By conditions C.a , C.b, C.c, C.f and Dominated Convergence Theorem,

 

2
1 _ P
‘J () " - 's‘°’( . )a ()dv—>0.

so (1.6) is proved.

 

For (1.7):
_ dM (v)
P( su n "2 l-J (v) ——"-—l> )s
14610311 E ( k )SIEO)(V11610) 77
6 —1 (0)
S—+P(E‘( 1- J,(v))————— 1°“ nS, (v.6.o)ato(v)dv>6).
7] (14121310)

But 1- J, (v) = [1, (u) = o] = [1,, (u) = ovt'] = [51” (u, 13,0) = o] , so the integral in the
previous relation is zero. Consequently (1.7) follows.

Relations (1.6) and (1.7) imply that W, (u) is asymptotically equivalent to

net asp—5% - "’2Z?....w;.-<u.ﬁ.o>. as

28

a sum of n i.i.d. random variables w,,(u, 6,0) = g dei(")
s

1°’(v.ﬁ.o) '
The quantity s,°’(., 6,0) is predictable, bounded away from 0, so, by the stated

. . ‘ . ’ . .
Proposrtron, w,,(u,6,0) are zero-mean martingales. Because w,,-(u,6,o) are also 1.1.d.

random variables, by the Multivariate Central Limit Theorem, (W, (0,1172 (0)) converges

in distribution to a zero mean bivariate normal random vector with covariance matrix
C t t t l ,
Q =(Du(u,,u,),k,16 {1,2}), Where Du(u,,u,) = E(W,,(u,,ﬁ,o)wn(u,,ﬂ,0)) and u, IS

equal to t or c according as k =1 or 2. Therefore Theorem 4 is proved. I

The following theorem gives a consistent estimator Q" of the asymptotic

covariance matrix Q‘. As previously mentioned, the techniques of the proof for this type

of theorem will be provided in a similar theorem in the parametric section of this chapter.

Theorem 5

Under the model assumptions A.l-A.7, the asymptotic covariance matrix
Q‘ = (0;, (u, ,u,),k,l E {1, 2}) of (VI’,(t),W2(c))I is consistently estimated by
Q” = (6;,(u,,u,),k,ze {1,2}), with
6;,(u, ,u,) = 1142;, w;,(u,,B, )w;,(u,,19,) and
n

9;.(u..6.>= Elt(V)mdMa-(V.ﬂt).
k ’ it

29

112,,(14, 13,) = N,,.(u) — f; Yu(v)exp(6,Z,,(v))§w%dN,(v).0
It ’ k

1.1.3 Large Sample Properties of the Estimators of

Survival Functions

The survival distribution of LOS or cost at a ﬁxed covariate proﬁle Z0
S, (u '20) = exp(—A,o(u)eﬁ; 20) is estimated by S, (u 120) = exp(—A,o(u, 6, )eﬂiz0). We

want to determine the asymptotic joint distribution of S,(t | Z0) and S, (c | 20) for ﬁxed t

and c. Because by (1.5) n"2 (21,001.11) - A,o(u)) +n”2(6, - 5,0)’ I; e, (v, 6,0)a,0(v)dv

is asymptotically equivalent to W, (u) , a ﬁrst step would be to consider the joint

distribution of the vectors (W,(r),W2(c))’and n”2 (6,' - 670,63 - 1330)

Theorem 6

Under the model assumptions A.l-A.7,
(W10), W2“), "“2081 " 310),: "“2092 ’ 520),),

converges in distribution to a zero-mean normal random vector with covariance matrix

.. * P’ .
Q = [QP Q). The matrices Q and Q and their consistent estimators are described in

Theorems 2-5. The matrix P has the form

P=[ 9 1321(0),
1312(1) 9

30

where P,, (u,) = ZI‘E (w;,(u,, 6,0)w,,(6,0)) for k at! is consistently estimated by

puma): I,"(6,)- w;i(ukrﬂk)wli(8l)'u

i=1

(See Theorems 2-5 for notations.)

Proof Theorem 6:

As shown in the steps of the proof of Theorem 2,

Wm. - ﬂ...) = n1;'(ﬁ;)(n"’22;;,w..(ﬂ.o))+ 0,,(1).

, P
where nI,"(6,)—)2,' and was”): £‘(z,,(u)-e, (u, 6,0))dM,,-(u),1SiSnare i.i.d.
zero mean random vectors. By (1.8)

W, 0‘) = "4,22; W;- (urﬂko) v

dei(V)

where W;i(urﬂko) = ﬁlm
Ir . to

are i.i.d. zero mean random variables. Consequently

(W,(t), W,(c), n"2(6, - 6,0)’, n"2(62 -- 620)')’ is asymptotically equivalent to the

(2p+2)-dimensional vector
diag(1. 1. nlf'wf ). n1§'(ﬂ5>)xn"’22;;, p,(t.c. 19.0.1320).

where the i.i.d. vectors p,- have the components "IL-(1,6,) , w;,(c, 6,) , rein-(6,0) . W2,(1620).

the ﬁrst two being scalars and the last two p-dimensional vectors.

It follows from the Multivariate Central Limit Theorem and Slutsky’s Lemma that

(W,(t), W,(c). n”2(6, - 6,0)’, n"2(62 - 620)’)’ converges in distribution to a zero-mean

31

~ It
normal random vector with covariance matrix Q = [QP Q) , where the matrices

Q and Q‘ and their consistent estimators are described in Theorems 2-5. Write

{Pam 151(0))
112(1) 1322(0) ’

where P,,(u,) is the asymptotic covariance between W, (u,) and n"2(6, - 6,0) ,

k,le {1,2}.

For k #1, P,,(u,) has the form P,,(u,) = z;'13(w;,(u,, 6,0)w,,(6,0)) and is

consistently estimated by 13,, (u,) = If1 (6,)2" w; (u, , 6, Mir-(31) .

i=1
where W, (6,)and 13);,- (u,, 6,) are described in Theorems 3 and 5, respectively.

We will show that P,, (u,) has only zero components, so that

=[ 9 131(5)]
P12“) 9 .

Fork=1,

P11“) = 211E(Wll(t’ﬂ10)wll(ﬂ10))

_ dM ()
=2" E[£s——,‘°)(;|/;o) ("2M“) 310‘ ﬁlolldM11(“)]

Both factors of the product in the expectation are local square integrable martingales. By

the deﬁnition of the predictable covariation process (M ,M ’) of two local square

integrable martingales M and M’ (see p68, Andersen et al. (1993)”),

EMM’ = E (M , M’). If H, K are bounded predictable processes and M is a counting

32

process martingale with (M ) = Ill then (II-MM , IKdM ) = IHK/l (see p78, Andersen

et al. (1993)”). Therefore, using these results, for je (1,., p} we obtain that

dM
ELE S(O)(—l—;__:$)) (le(u)_ e,(u 610))dM”(u)]=
10

J

W 1
=55“ —0——(10:12:)E(211100-e1j(urﬂlo))dM|1(u)]

(”W I
=E<E 110M104) s(°)(r::—_——(l;——:) (lej(u)‘elj(“rﬂlo))dMll(u)>

=E[£ 110.1100— s(°)( 141,6 —(lej(“)‘elj(“tﬁlo))alo(“)cxl)(ﬂlozl1(a))Yll(“)d“]
10)

By Fubini’s Theorem the last quantity is equal to

“1004) (l) ._sl(l)(u 1310)} 31(0) d 0.
[is—W ﬁnalsl (“.131“) 31(0)(“.B10) (14.510) “2

Hence we proved P,,(t) = 9. Similarly we can show P,, (c) = Q .I

Now we have the tools to prove the following theorem about the asymptotic joint

distribution of S,(t | 20) and S2(c | 20) , for ﬁxed time t and cost c.

Theorem 7

Under the model assumptions A. l-A.7,

Ill/2 (3’1“ l 20) ‘ Si“ Izo)’-§2(C l 20) 7 52“ l 20)),

33

converges in distribution to a zero mean bivariate normal random vector with covariance

matrix cs = (05,, (u,,u,),k,ze {1, 2}), where
65.. (u.) = s. (u. I20)2 ewe/31020) x

x”); (u, ) + ( f; (20 - ek(v,,6ko))ako(v)dv)’ Du ( g (20 - ek(v, ﬂko))ako(v)dv },
CS,2(:,c) = s, (t | z(,)s2 (c | zo)exp(,57020 + ﬁgozo) x

x{D|.2(t,c)+(£(Zo ‘en(v,,510))alo(v)dv) P21(c)+
+(E(ZO’32(Vvﬂ20))azo(V)dv) P120)+

+ ( H20 __ e,(V.ﬂno))alo(V)d")’ Du ( g(Zo - e2(v.ﬂzo))a¢o(v)dv)}.

0 (See Theorem 2-6 for notations.)

Proof Theorem 7:

We ﬁrst state the Delta Method, a popular and elementary tool of asymptotic
statistics, which we will apply repeatedly in the proof of this theorem. We will use the
following version, stated at p109, Andersen et al. (1993)”:

Delta Method

Suppose for some random p-vectors Tu and a sequence of numbers an —> oo ,

D
an(Tn—0)—)Z as n—)oo,

where 66 R” is ﬁxed. Suppose ¢ : RP —-) R" is differentiable at 6 with qx p matrix

¢’(9) of partial derivatives. Then

34

D
an (“T") " ¢(9))'-> W09) X Z in Rq
and, indeed, an (¢(T,,) - (9(6)) is asymptotically equivalent to ¢’(0) x an (Tn - 6) .0
Recall that for k 6 {1,2} the survival function at a ﬁxed covariate proﬁle Z0 is

S,‘ (u |Zo) = exp(-A,‘ (u,)?” |Zo)) , where Ak(u,,Bko '20) = Ako(u)exp(ﬂ;o 20) is the

integrated hazard. The survival function is estimated by

5'1: (u '20) = “Phat (“’31: '20)) . where AI: (“’31: '20) = 31100531 )CXPUZZM-
By the Delta Method n"2 (§,(t | 20) - S,(: | 20).s‘2(c | 20) - 82(c | 20)) has the
same limiting distribution as

_[Sl(’|Zo) 0 ]n1/2[ Al(t’Bl|ZO)—Al(t’ﬂ10lzo) ] (1.9)

0 52(CIZO) 3203,32 |Zo)‘A2(C»/320 '20)

Therefore we will determine ﬁrst the asymptotic distribution of

nu2[ 3.0.3. IZo)-An(t.ﬂio I20) )znu2[ CXP(l§iZo)Alo(h/91)‘Cxp(ﬂilozo)Alo(‘) ]
Maﬁz I20) -A.<c. 1920 I20) expmszomoeﬁ.) ammo/awe)

I
9

We apply again the Delta Method. Let Tn = (13,00, 3,),Azo(c.32).3f.33)

6 = (A,0(t),A20(c), ,61'0, 1330 )’ and an = 11'”. A linear combination of the elements of the
vector an (Tn - 0) has the same limiting distribution as a linear combination of the
elements of (W,(t), W2(c), III/2(Bl - ﬁlo)’, n"2(Bz - 1320)). Consequently, by
Theorem 6, an (Tn - 6) converges weakly to a multivariate normal vector.

We deﬁne the mapping (0 : RXRXR" XR” -> R2,

35

¢(a,b,c,d) = (aexp(c 20)).

bexp(d‘Zo)

The function ¢ is everywhere differentiable, with the 2 x (2p + 2) matrix ¢’(a,b,c,d) of

partial derivatives

¢’(a,b,c,d) = exp(c'Zo) 0 aexp(c’Zo)Z{, Q I .
0 exp(d‘Zo) Q bexp(dZo)Zo
By the stated Delta Method,

n1/2[ Al(t’81|ZO)—Al(trﬁ10|zo)

. =a.(¢<T.)—¢<a>)
Maﬁz I20) -A2<c. 1320 Ila]

is asymptotically equivalent to ¢’(6) xan (Tn - 0) =

112 cxp(ﬂfo 20) (3'0“ ’81) - AM”) + exp(,81’0 Zo)Aro(t)Zo(3i " 1610)
CXP(%0 Zo)(A20(C,ﬂ2) "' A20(C)) +CXp(mo ZO)A20(C)Zé(ﬂ2 - £20) .

=11

By (1.5) this quantity has the same limiting distribution as

0 CXP(,350 Z0) 9 CXPCB’m Zo)F2 (c, 1320),

X

[amaze o expmozomwm)’ 0

x(W.(:). We). Wei. - may. #2092 — 1920”,. where
Fk(u,,6,,o) = £(Zo — e,‘ (v,,6,‘o))ako(v)dv, k 6 {1,2}. Therefore, by Theorem 6,

n1/2[ 3.0.8. Ila—40.nov,)
Maxie Izo>-A2(c.ﬂzolzo>

J converges in distribution to a zero mean normal

random vector, with covariance matrix denoted C = (CH (uk ,u, ),k,l e {1, 2}) By the

asymptotic independence of W,(uk) and tr"2(,1§1k - 13,0) .

36

Cut (uh) = “Pamela”
X {th (uk ) + (E(Zo — er (V’ﬂk0))ak0(v)dv) Du: (E(Zo ‘ 3k (V’ﬁk0))ak0(v)dv)}r
C120,” = “Nimble + 33020) x

x{D;’2(r,c-) +( E(Zo -el(v, ﬁ,o))a,o(v)dv) P2,(c) +

+ ( E(Zo * 82(V,ﬂzo))ago(v)dv)’ P12“) +

I

+ ( E(Zo " 31(Vrﬂlo))alo(V)dV) 012 ( E(Zo - 62(v,ﬂ20))azo(v)dv)} .

By (1.9), n'” (330 |zo) - S,(t | Z0),S2(c | 20) - S2(c | 20)) converges in

distribution to a zero mean bivariate normal random vector with covariance matrix

cs = (C5,, (u, ,u, ),k,le {1, 2}), where
CSu(u,‘ ) = 3,04,, IZO)2Cu(uk),

cs,2(r,c) = 3,0 | Z0)Sz(c | Z0)C,2(t,c) .II
A consistent estimator CS = (CSMuk ,u, ),k,l e {1, 2}] of the asymptotic
covariance matrix CS = (03,, (u, ,u, ),k,l e {1, 2}) of

n“2 (S10 | Z0) — Sl(t |Zo),S2(c | 20) —- 52(c | 20)) can be obtained replacing ﬂko by [9,, ,
Sk (“k I20) by the CSIimator SAIC (“k '20) , Dk,,D;,(uk,u,),I’21(C) ,I’l2(t) by their

consistent estimates and E(Zo - ek (v, ,Bw)) ak0(v)dv by the quantity

J
j: (20 - E,(v,B,))W)'E:V:§—k)-d1vk(v).

37

1.1.4 Point Estimates and Confidence Intervals

for Median LOS and Median Cost

The median LOS or cost for a ﬁxed covariate proﬁle 20 is deﬁned as
m,‘ (Z0) = inf {u : 5,, (u |Zo) S .5}
and estimated by
iii, (20) = inf {u : s, (u | 20) s .5}.
For constructing a conﬁdence interval for the median we will follow the approach

described in Andersen el al. (1993)”, p511-512. The advantage of this procedure is that

the estimation of the density of S, is not needed. The conﬁdence interval for the median
m (20) can be read directly from the lower and upper pointwise conﬁdence limits for the

survival distribution in exactly the same manner as ri‘ik (20) can be read from the curve
s, (. | 20) itself.

Let us ﬁrst consider pointwise conﬁdence intervals for S, (u | Z0). By Theorem
6, nm (3,, (u IZO) - Sk(u | Zo)) converges weakly to a zero mean normal distribution with

variance CS“, (u). The “standar ” asymptotic 100(1—a)% interval

A 1/2 . A 1/2
[Sk(u|Zo)-za,2[CSu(u)/n] ,Sk(u|Zo)+za,2[CSu(u)/n) ],

where Zen is the upper a/ 2 quantile of the standard normal distribution, might not be

completely satisfactory for small sample size.33 Kalbfleisch and Prentice (1980)34 and

Thomas and Grunkemeier (1975)33 sug ested that usin transformations such as
8 8

38

g(x) = log(- log x),xe (0,1) might improve the small sample properties of the
conﬁdence intervals. Other transformations for constructing conﬁdence intervals are
described in Klein and Moeschberger (1997).”

If CS“ (u) it 0 and g is a real function differentiable in the neighborhood of

S,‘ (u | Z0) , with continuous derivative g’ different from zero at 5,, (u | Z0) , then by the

Delta Method and Slutsky’s Lemma,

5‘ z — S 2
8( k(“| 0)) 8( rod o)?,2-l—)>N(O,1).

 

lg’($‘,(u |zo))|x[5sa(u)/n]

Asymptotic lOO(1—a)% conﬁdence limits for g(S,‘ (u |Zo)) are

112
g(Sk(u|Z0))iza,2IgI(§k(u|Z0))IX(CSu(u)/n] . (1.10)
For the transformation g(x) = log(—log x),xe (0,1) the derivative 3’ exists and

is continuous everywhere on (0,1) , g'(x) = (xlogx)-I . Also the inverse of this function

has the form g"( y) = exp(—exp y), y 6 R. If S,‘ (u |Zo)e (0,1) then, by retransforrning

(1.10), we obtain the following 1000— a)% conﬁdence interval for S,‘ (u | Z0):
[$110! I 20), 31204 '20)] , Where

$110420) = §k (u |Zo)exp(—PS,(u))’

§,,(u|zo) = S“, (u |zo)°"P‘PS*‘“”, (1.11)

A 1/2
CSkk(U)/II]

 

PS u :2 .. .. .
"U alek(u|Z0)logSk(u|Zo)

39

We now turn to the construction of a conﬁdence interval for the median m, (Z0).

By the procedure described in Andersen et al. (1993)”, p277, we can take as an

approximate 100(1- a)% conﬁdence interval for m, (Z0) all values u which satisfy

8 (§k(“ I 20)) - g (5)

 

 

 

A 1/2 - zctr/2’
lgr(Sk (u IZO))IX(CSu (u)/n]
where g(x) = log(-log x),x€ (0,1) , i.e. all hypothesized values u of m,‘ (20) which are
not rejected when testing H0 : m,‘ (Z0) = u against H 1 :m,‘ (Z0) at u at level a , based on
the asymptotic normality of g (Ska; |Zo)}. Therefore the approximate conﬁdence

interval for mk (20) is
A in
{u : g(.5) is between g (S, (u |Z0)) j; Zai2|8’(51(u IZo))lX(CSkk(u)/"] }=

= {u : 0.5 is between 5“,,(ulzo) and 5“,,(u IZo)}. where 5,,(u |z,) and 3“,,(ulzo) are
given in (1.11).

Deﬁne @420) = inf {u : 5,,(u |z,) s .5}, #1,,(20) = inf {u : 5“,,(u |z,) s .5}. Then
[rﬁk,(Zo),rfrk2(Zo)] is an approximate 1000 —a)% conﬁdence interval for mk(Zo).

This completes our construction of point estimates and conﬁdence intervals for
the median LOS and median cost in the case of semiparametric marginal models. An

application of these results will be presented in Section 1.3.

40

1.2 Parametric Marginal Models

Using the same notations as in Section 1.1, suppose we observe for the i-th patient

of a study with n individuals X” = min(7},7}'), 5,, = [7,. s 7;]. z”. with 7;. = g(r‘),

7}’= g(T,") , where 7}. is the true L08, 7}” is the censoring time and Z” is vector of p
time-independent explanatory variables. The monotonic transformation g (with inverse
g'l ) is chosen to mitigate the effects of skewness that might be present in the data. For
example, the log or square-root transformations are used when the data are right-skewed
and have the advantage of permitting easy interpretation of the model. All variables

7}',T,~",Zl,- are deﬁned on the probability space (521,? , PI). The transformation g is

chosen such that both 7} and 7}' are strictly positive variables.

In a similar manner we consider X 2,- = min(C,,C‘,’) , 62,- : [Ci 5 CI] and the
p-vector Z2, of explanatory variables, where C,- = g(Cf), C,-’= g(Cf‘) , C: is the true
hospital cost, C," is the censoring cost and both C,,Cf > 0. All these variables are

deﬁned on the probability space (S22, .5, P2). The transformation applied to cost need
not be the same as that applied to DOS.
The relationship of explanatory variables to the time-cost observations is modeled

through the following linear regression model:

7i = ﬂI’OZli + 010811
Ci = ﬂéozzi + 02052."

(1.12)

41

where (81,-,82i)’, 1S i Sn are i.i.d., with distribution function F(.,.|p0). Here p0 is the
true value of a nuisance parameter p. Let f (.,. | p0) and a(.,. | p0) be the density and

the hazard function associated to the distribution function F (.,. | p0) .
As in Section 1.1, the subscript k = l is associated with time and the subscript

k = 2 is associated with cost. We assume the 85,1 S i S n are i.i.d., with zero mean and
distribution function F0 that does not depend on p0. This parameter is associated with
the joint distribution function of (8“,82i) . Let f0 be the density and do the hazard of

the distribution of 8,“.

Example 1.2.1:

8. 0 1
Suppose[ h], lSiSn arei.i.d. NU ),[ '00)].
521' 0 po 1

Marginally E“, lSiSn are i.i.d. N(O,l), so ao(u) = 1 “’(b (u) , where or and o are the

 

hazard and distribution function of the standard normal distribution. :1

Example 1.2.2:

 

.V2 rylayZERr

8.
Suppose[ h], lSiSn arei.i.d. F(y1.y2)= _ _
e l+e "+e

21'
3° (61,-,82,) has the bivariate logistic distribution, with no parameter affecting the shape

9f the joint distribution of 8,,- and 82,-. Marginally 8,“, 1S 1' Sn are i.i.d.

 

Fo(u) = ao(u) = _.. , as R, with mean E(e,,) = o and variance var(e,,,) =7t2/6.u

l-l-e

42

For the model (1.12) we denoted by [3,0, [320,010,020 the true values of the
regression and scale parameters, reserving ,8” [32,01,02 for the free parameters in the

likelihood function. The vectors 1310,1320 have each dimension p and 0,0,020 are strictly

I

positive scalars. Denote by 6“, the (p+1)- dimensional vector 6,20 = (ﬁgmako) .

By the speciﬁcation of the linear model (1.12), given the covariates 21,,sz the
time-cost vector (7},Ci)’ has the distribution function F '(.,.,6,0,620,p0) and density

f‘(°9-96|09620,p0) , Where

 

 

F.(I,C,610,620,p0) = F[t-ﬁl’ozli ’c—ﬂiQZZLIpO],
010 0'20
(1.13)
‘ ' ’- ’ i C‘ r'
f (LCﬂmﬂzmpo)=(010020) lf[ [3,02, . $022 We}
0'10 020

Marginally, given 2”, 7} has distribution function Fu(.,0,O), density f,,-(.,l9,o) and

hazard function ark-("6110) , where

areas) = Fo[-‘;§—'9ZE}.
10

fli(t’610)=—fo[t;ﬂ’ﬂ]v (1.14)

l
010

a“(t,610)= Flgao (Li—U.)
1

Similarly, given Z2,- , we deﬁne for the costs C,- the functions F2i(.,020).f2,-(.,620) .

“If (-9 620) .

43

With i.i.d. data (X wail“) on n patients we estimate 0,, by the maximum

partial likelihood estimator 63,. We will show that (67,655) has an asymptotic 2(p+l)-

vaiiate normal distribution and we will provide a consistent estimator of the asymptotic
covariance matrix. Then point estimates and conﬁdence intervals for the median LOS
and median cost will be obtained for a speciﬁed covariate proﬁle. Detailed formulas will

be provided for the bivariate normal case (see Example 1.2.1).

1.2.1 Estimation of the Model Parameters

Consider the study observation period restricted to the ﬁnite time interval [0,1,] ,
t, < oo. Similarly costs are assumed to be upper-bounded by the ﬁnite cost 1'2.

The processes N k,- (u),Y,u- (u),N k (u) have the same deﬁnition and interpretation as
in Section 1.1.1. The argument it is again used to denote either t or c, depending on the

subscript k 6 {1,2}. As mentioned, u takes values in the ﬁnite interval [0,Tk].

Deﬁne the (p + l)x(p +1) matrix

21‘ 212

2" ‘ (2121’ or ’

Where the px p matrix Bil is

 

2
Zzl =akO-3E{Z“Z;lﬂk (2) (“-mOZHJYHWMu},

[:0

the p-dimension vector 2:2 has the form

“to 0r (10 ako

2:2 = arc-35{er £t%[5;ﬂ_iqzﬁ)[l+(u—ﬂio l]ﬁ[ﬂ£9_z.ﬂ}]yu(u)du}

and the scalar 0‘,” is

a“) a0 akO k0

(,3. = awe, { I? [,[Mu-ﬂt}£4a[u_-M}} ao[g_—_zi"a_za}y,,(,,d,},

The following assumptions, similar to those used in Section 1.1, will be assumed

to hold throughout this section:

Model Assumptions:

B.1 Conditional on Z.,-(.) , T,- and 7}, are independent and conditional on 22,.(.) , C,-

and C; are independent; (Independent Censoring Assumption)

3.2 {(x"},[6”},[Z”(')}},1si5n arei.i.d.;
X21' 62r' ZZl‘(')

B.3 Zh- , Z2iare bounded;
8.4 The hazard function a0 is strictly positive and its derivatives of ﬁrst, second and

third order exist and are continuous;

8.5 The matrices 21,22 are nonsingular.

Under the independent censoring assumption, the counting process N “(.) has the

compensator Ak,(u,0ko) = Elﬁwﬂkoﬂv , where the intensity process Ak,(v,6,,o) has the

form Mao”) = a,,(v,t9,o)r,,,(v), with the hazard function a,,.(.,19,,,) deﬁned in (1.14).

45

Let M k,- denote the local square integrable martingales M 5(a) = N “(14) - Au(u,6ko) and

n
Mk = 2".le1' '
Properties of stochastic processes, such as being a local martingale, are relative to

a ﬁltration (ﬁ‘"’(u),u e [0,rk]} of sub a-algebras on the n-th sample space

(52),"),ﬂ("),1’k(")); ﬁ‘")(u) represents everything that happens up to the point u in the
n—th model. For the joint time-cost stochastic processes, we consider the ﬁltration

(£000) ® E")(C).(I.C)€ [0,T]]X [012]) on the product space

($29” @ngfj‘") ® 13"”, Pf") ® sz) . Details of the ﬁltrations deﬁnitions are provided

in Section 1.1.1.

An estimator 0A,, of 6“, is obtained by maximizing the partial likelihood

1., (19,) = f1 11 1..-(14.61)” exp(- 1? 1..-(v.6. >dv).
i-l uelo.r,l

A summary of the main results on partial likelihood for counting processes can be
found in Section 11.7, Andersen et al. (1993).” Likelihood representations for general
Counting process models were ﬁrst given by Jacod (1973, 1975).” 37 Use of the product
integral concept to make the otherwise rather involved formulas more interpretable goes
back to Johansen (1983).38 Arjas and Haara (1984)39 were the ﬁrst to describe the
notations of independent censoring rigorously by formulating these in terms of

likEtlihoods of intensity processes of general marked point processes.

The log-partial likelihood is

46

Cr, (91:) = 2;,{ f 108 31$“va )dNri(“) " Ln Aki(ur6k )du} -
Because f log Yk,(u)dN,u-(u) = [O S Xh- S 1,, ]§, log[X,,, 2 X,,] = 0 , we can also write

C,‘(6,,)= Lam loga,“ (u 9, )dN,,-(u)- E 11,,(uﬂ, )du}. (1.15)

Assuming we may interchange the order of differentiation and integration, the

vector U71 (6,) of derivatives of C,‘ (6,) with respect to 6, has the components

U{;(0,)=Z 2H“: 30, —loga,,,.(u, 9,)d1v,,.(u)- For: ——a,,,.(u. 6,)Y,,(u)du},

36k)“
je {l,...,q}, q=p+l.
The log-partial likelihood may have a number of local maxima, so the equation

U ,k (6,) = 0 may have multiple solutions. We will consider the maximum partial

likelihood estimator 5,, given as a solution to the equation U“ (6,) = 0. If more than one

solution is found in a concrete situation, one could then check which of these gives the

largest value of the log-partial likelihood function.

1.2.2 Large Sample Properties of the Parameter Estimators

The asymptotic properties of the parameter estimators (67,63) hold under some

general “regularity” conditions D.a — D.d. These conditions are stated in p420-421,
Andersen et al. (1993).29 They were used by Borgan (1984)“, who studied the maximum

like«lihood estimation for the multiplicative intensity model. We adopt the notation

47

a _a_

30—g(6,0) for 30 g(6, )la,=o,o' The dimension of the vector 6, of parameters is
’9' k}
q=p+L
Conditions D.a-D.d:

D.a There exists a neighborhood 9,0 of 6,0 such that for every n, 6, 6 6,0 and for
almost all ue [0,1,], the partial derivatives of a,,(u,6,) and loga,,(u,6,) of
ﬁrst, second and third order with respect to 6, exist and are continuous in 6, for
6, 6 6,0.

Moreover, the log-partial likelihood may be differentiated three times with respect

to 6, 6 9,0 by interchanging the order of integration and differentiation.

D.b There exist a sequence {awn Z l} of non-negative constants increasing to inﬁnity
as n —-) co and ﬁnite functions of (6,) deﬁned on 9,0 such that for all

j,le {l,...,q}:
-2 l: n a a P jl
an E Z._ —loga,“-(u,6,o)—loga,,(u,6,o)/l,,(u,6,o)du—)0, (6‘0) as
1430,, 319,,

n—>°°.

D.c The matrix 2, = (017(6”), j,l e {l,...,q}) , with a,{'(6,0) deﬁned in condition

D.b, is positive deﬁnite.

Dod For every n there exist predictable processes 0,, and H ,, not depending on 6,

such that for all u 6 [0,1,]:

48

 

| 8’
Dodo]. a,“- (u,6,)

su S G ,(u)
a,eeliolaH,J-36,,39m k

 

a3
D.d.2 su I lo a (u,6)
okegolagkjaaklagkm g I“ k

 

S H,,(u)

 

for all j,l,me {l,...,q}.

Moreover:

D.d.3 of L" Z?=10,,(u)du

D.d.4 11;” L" Zf=lH,,(u)A,,(u,0,o)du

 

2
_ 32
D.d.S a 2 k n 10 a. u,6 -u,6 du
n I: Z'=l{391j39u 8 kr( 10)} AM k0)
converge in probability to ﬁnite quantities as n —) oo , and for all 8 > 0:

D.d.6 of L" 2;,11,01)[ag‘li,,.(u)“2 > e]/l,,(u,6,o)du—’;o.

Note:

Deriving the statistical properties of the maximum likelihood estimators (6,63)

involves martingale results and Taylor expansions. Condition D.b ensures the
Convergence in probability of the predictable covariation processes of certain
martingales. Conditions D.b and DC are crucial in the proof of the existence and
Consistency of the parameter estimators. By condition D.a the Taylor expansions are
Valid, whereas D.d ensures the remainder terms in these expressions will behave

Properly. D

49

Proposition
Under our model assumptions B. l-B.5, the conditions D.a-D.d are veriﬁed. 1:1

(The matrix 2, from conditions D.b, Do is the one deﬁned at the beginning of Section

1.2.1.)

Proof Proposition:

We consider 8,0 = 9pm x 1,0 a neighborhood of 6,0 such that 9pm c: R” ,

0 0 0
1,0 C (0,00) are compact sets and 6,06 9,10,11,06 Ito. We denote by A the largest

open set included in A, also called the interior of A.

Condition D.a:

Recall that a,,(u,6,)=a[‘ao[E—:—O_é”‘—Z’-‘i], where 6,: =(6;,a, )Ie RPx(0,oo),
r

u-ﬂizk,

Then loga,,(u,6,) = --log0', +loga0[ a’
I:

J. The log function is inﬁnitely

differentiable on (0,oo) , with continuous derivatives. By assumption B.4, the ﬁrst part of

condition D.a is veriﬁed.

The log-partial likelihood was given in (1.15):

Cit (9*) = Z:=l{ El log 0,, (14,9, )dei (u) - ﬁt 0,, (“,0k )Yh (U)d“} ,
where I)" loga,,(u,6, )dN,,.(u) = [o s x, s 1, 15,. loga,,(X,,,6, ). By the differentiability

Properties of log 01,,- , what it is left to be proved is that f a,,(u,6, )Y,, (u)du can be

50

 

Si;

111

differentiated three times with respect to 6,0 6 9,0 by interchanging the order of

integration and differentiation. We will show only that

a r r a

— a-u,6 Yrudu= —a-u,6 Y- du, 1.16
313,1; ..( .>,..() ﬂap, ,.( .).,(u) ( )
the proofs of the other relations having similar arguments.

The next theorem gives sufﬁcient conditions for differentiation under the integral
sign. For a proof for the l-dimensional case see Theorem 4, p30, Fabian and Hannan
(1985)":

Theorem (Differentiation under the integral sign)

Let acompact set @C IR”, 60 a point in 6 and f (u,6) a function on [0,1]x9

such that

i) f (.,6) is Lebesgue measurable for every 66 9;

ii) f (.,60) is integrable;

iii) for every ue [0,1] the partial derivative 7%
1'

f(u,6) exists on 6 for je {l,...,q}

and there exist integrable functions g j, j e {l,...,q} such that

 

a f(u.9) Sg,(u) Vue [0,11,66 9.

 

Then f («9) is integrable for every 66 6 and for every j e {l,...,q}
3 8
f(u96)du = —'f(U,6)du .Cl
86’. E E 39],

For every ﬁxed 6, , (1,,-(u,6, )Y,,(u) is integrable because

51

it a“ (14,0, )Yki (U)du S E‘ d,,(u,9, )d“

and a,,(.,6,) is continuous on [0,1,] , so it is bounded. Also, for 6, = (,6,,a,)e 9,0
and je {l,...,q}:

l a | 3
a (14.6 )Y (a) S su a,(u,6)
3.3:, I“ k I“ 91420 8161:] k k

Using %a,,(u,6,) = —0,’2a{, [115125,] 2,,- and the boundedness of the components
k! r

 

 

of 2,, , we obtain

 

_. , by
a att(u.91)Yr.-(u) 5 SUP 0;2%[u—£’&L] =, s(u).
aﬂkj 0* 6 9‘0 k natauon

 

 

By the following Lemma (see Lemma 1, p635, Jennrich (1969)”), s(.) is a continuous
function on [0, 1,] . Consequently it is bounded and hence integrable over any ﬁnite

interval.

Lemma 1:

If g is a real valued function continuous on the Cartesian product {1’ x} of two

Euclidian spaces and if Y is a bounded subset of y then sup g(x, y) is a continuous
YEJ’

function of x. 1::

The conditions i-iii of the theorem for differentiation under the integral sign are

then‘efore veriﬁed and (1.16) is proved.

Condition D.b:

Let a, = n'”. We will show

52

I

D.b.l * "1.12: I[3—a_ﬂ, loga,,(u, 9"“)(379 loga,,(u, 6,0)J 11,,(u, 6,0)du—)£, , where
It
all the elements of the px p matrix Z,‘ are ﬁnite;
2 P
D.b.z ‘ 1142.: l[—loga,,(u, 6k0)) A,,(u,0,o)du—)O’,32 <

D.b.3 ‘ 11-12,: =laﬂk —loga',,(u, 9,0)-%;-10ga’,,(u, 0,0)11,,(u, BkoﬂULZiz , Where a“

the elements of the pxl matrix 2,2 are ﬁnite.

The derivative functions involved in the relations D.b.l-D.b.3 are

_ﬁ “Wail/ti _ -I _
—loga,,(u, 61:) a0 ——"a_k ( 0‘: Zkt)’

3191
_3_ . =_ -| u-ﬂkzki ﬂ u-ﬁkzlu’
aO_klogar,,(u,6,) a, {1+[ 0* Jae ——ar .

Relations D.b. l-D.b.3 can be proved through a version of Strong Law of Large

Numbers (SLLN). We Will show the details of the proof for D.b.1.; the other two can be

proved using similar arguments.

Consider the p x p matrix

I

Vi(u’6k0) = [Bi-logak,(u.6,o))(a—%- 108011 (u 910)] 4M“ 91:0) "'
It

It

2
=0'-3(%) (“'ﬂzﬂu}x( )z.z’., e 0.1 .
10% aka Iaukrkru[ *1

53

2
% is continuous and Y,,(.) is right continuous with left hand limits. Let

The function

V,'(u,6,o) = V,(1, - u,6,0),u e [0, 1,] . Then the p2 components of the matrix V,‘(u,6,o)
are random components of D[O,1, ], the set of right—continuous real valued functions

with left-hand limits on [0, 1,] . The space BID, 1,] is endowed with the Skorohod

topology. We will apply an extension of SLLN for D[O,1, ]. For the proof for D[O,l] ,

see Rao RR. (1963).43
SLLN for D[O,1,]

Let X ,,i .21 be i.i.d. random elements of D[O,1,]. Suppose that

—)0 a.e.as n—)oo.u

 

E sup lIX,(u)I<c:o. Then sup lIlzyng,(u)-EX,(u)

“[0. fl] “6“). fl]

Let V ,j,(., 6,0) the (1,1) component of the matrix V (. ,.6,o) Because Z,,are

bounded derivatives,

Vijl(“t 910)I= E SUP IVljl(u’ 61:0)I—

ue[0,,1]

 

E sup

116(01]

by
= Eg(Z,,).

notation

SE sup

ue[0.1, ]

010

.(cirf {when}

do 0“)

 

 

By Lemma 1, the function g is continuous so the boundedness of Z,, implies
ES (2“) < co. By the extended SLLN we obtain

—-> 0 a.e. as n —> oo , that implies

 

l n e ..
SUP Izz,=,thl(“t910)‘EVlle‘kao)

“[0911]

-—)0 a.e.as n—-)oo.

 

l
SUP Izzyslvtﬂ(u’6k0) ' EVljl(u’6k0)

ue[0.1,]

54

Because 1, < co, the previous relation implies

E3721: -l Viﬂ(“ 6,0)du—)E Evljl(ur 010M“ 421),,

Every (j,l) component of the matrix 21' is ﬁnite:

(2* )ﬂ (7111': SIUP Vijl(u 910)<°°

ue[ 0,1,]
Therefore condition D.b.l holds.

Condition D.c:
By assumption B.5, the matrix 2, is nonsingular. Consequently we need to show

that 2‘. is ositive semideﬁnite. Let xe IR”, 6 R. It is eas to show that
r P y Y

2
a
(x ’:,y)2,[ I: BE IIx’a 3161 —-,loga,,(u,6,0)+yE-aTlogaum ,6,0)) A,,(u,6,o)Idu20.

Condition D.d:

Let ue [0,1,] and j,l,m€ {l,...,p+l}.

 

l 83
su ,(uﬂ )5
o‘egoIaQU-aguag,” ’1‘ k0

 

by
=. 811 (Zia) = Gkr'.
"010110”

 

l a3
S su su i(u.9 )
«log.la,et§ola6,jaauagkm )1 k0

 

Variables 0,, are bounded because g, is continuous by Lemma 1 and 2,, are bounded.
Therefore condition D.d.i is veriﬁed. Because variables 0,, do not depend on the

al‘gurnent u, condition D.d.3 follows by regular SLIN. Similar arguments can be used

for the rest of the D.d conditions.

This completes the proof of the Pmposition. I

55

The next results state that, with a probability tending to one, there exists a solution
of the likelihood equation and this solution is consistent. However, this does not rule out
the possibility of the likelihood equation having other, possibly inconsistent, solutions.

Under condition D.a the vector U n (6,) of score statistics has the components
(11' (6,) = Z’.‘ I)“ —a-loga,,(u,6, )dN,,(u) - Z’.’ I," —a-—a,,(u,6,o)Y,,-(u)du.
71 i=1 391, i=1 391,

Let -2"" (6,) , 2,17” (6,) denote the second and third order partial derivatives of the log-

partial likelihood C“ (6,) .

Theorem 1 (Theorem v1.1.1, p422, Andersen et al. (1993)”)

Under the assumptions Bi, 3.2 and conditions D.a-D.d, with a probability

,, ,, P
tending to one, the equation U,‘ (6,) = 0 has a solution 6, and 6, ->6,0 as n —> oo .0

Sketch of the proof:

By condition D.a, a Taylor expansion gives for every 6, 6 6,0, j e {l,...,q} that
Uri, (91) = Uri, (910) ‘ 27... (91:1 ‘ 61:01)P1,ﬂ(6k0) +
l . r.
+ 3 27,121.,(6111 " 6101 X911». ‘ 910M311". (91 ).

where 6,: is on the line segment joining 6, and 6,0.
Step 1:
2 ' P
a; U4 (9,0)—->0.

Essential in this step is that

56

(14(0‘0) = 22;] Kt 'a-‘Z—r'logak,‘ (u,6,o)dM,, (u) r
I

Step 2:

-2 jl P jl
an 7;, (610)—“71 (910)-
Step3:

There exists M, < co , not depending on 6, , such that

W .
lim pm") = lim PIa;2;e,I""(9,)|< M, Vj,l,m, v0, 6 9,0} = 1.

n—roo notation n-rec

 

Step 4:

Combine the previous steps and ﬁnish the proof of the Theorem. I

Next we prove the following result about the joint distribution of the maximum

partial-likelihood estimators:

Theorem 2

Assume that our model assumptions B.l-B.5 hold. Let 6, be a consistent

I

Solution of the equation U (6 )=0. Then "112 6’ -6’ ,6’ -6I converges in
1, i 1 lo 2 20

distribution to a 2( p + 1) -dimensional normal random vector with zero mean and

covariance matrix C = (C,,, k,le {1,2}) , where the (p + l)x(p +1) matrix C,, is

0,, = 2;'B,,2;',

with Bid -.-_ E(v“(6,o)v“(6,o)’) and the (p+l) -dimensional vectors v,,(6,0) given by

57

Vu(6ko) = E" £10g (1,, (u,6k0Wb-(U) . 0
k

Proof Theorem 2:

Porn sufﬁciently large 6, E 9,0. Expanding U,,(6,) around 6,0 gives

n"’zu,, (6,0) = it"g, (6,)n"2 ((3,, -6,0).

Step 1:

n‘“2 (U, (6,o)’,U,2 (620)} converges in distribution to a 2q-dimensiona1 normal
random vector (q = p +1), with mean zero and covariance matrix B = (B,,Jc,l 6 {1,2}}.

Because n’WU,‘ (6,0) = {”2224 v,,(6,0) is a sum of i.i.d. zero mean random

vectors, the result of this step follows immediately from the Multivariate Central Limit

Theorem.

Step 2:
l O P i t P
n' P,t (6, )—>2, for any random 6, such that 6, —->6,0.
With probability approaching l, 6,: lies in 9,0. When 6,: 6 9,0 , by the Taylor
expansion, for every j,le{1q},
-ljl‘_-ljl -lq * jbn"
’1 pa (Bk ) - n P“ (9‘0) — n 2m=l(6"m "' 6&0", Wk (6k) ,

Where 6, is on the line segment joining 6,: and 6,0.

58

. P .
By Step 2 of the proof of Theorem l, 11")?” (6,0)-—>a,”(6,0). We will show that
the second term of the right-hand side of the previous expression converges in probability

[0 261°C.

Let the constant M, and the sequence of events A, be deﬁned as in the Step 3 of

Theorem 1. We have that P(A,,) —) 1. Consider an arbitrary 8 > 0 and the sequence of
events
3,, = I IVE?"=1 (6;, - 6,0,, )af’mw", )I > e}.
We need to show that P(B,,) —) 0. This follows from the inequality
0s P(B,.) = M. Mm P(B.. M5).
where A: is the complementary set, and the results

P(B,, n A,‘,') S P(A,‘,') —-> 0 and

P(B,, n A") 5 PIM, IZLIIQL, — 6,0,,II > e) s

e 8 it P
S PIII6, - 6,0II > ——I —> 0 because 6, 66,0.
th
We denote by II . II the supremum norm.
Step 3:

I

n"2 (6; - $0.6; - 6’20) converges in distribution to a 2( p + 1) dimensional

normal random vector with mean zero and covariance matrix C = A"BA"l , where
A = diag(2,,22) and B = (B,,,k,le {1,2}).

By a Taylor expansion we can write

59

n-l/2 [(14010)

_ . _ . é-6
Ur.<6..>]=‘”“g‘" ”in“?! M '72.<62>>n“2[ ‘1 "I

2 620
We apply the following lemma:

Lemma 2 (Theorem 10.1, p62, Billingsley (1961)“)

Let E an Euclidian space. Suppose u, is a random vector in E ' satisfying

D
u, -—),u , where y is a probability measure in E '. Suppose further that v, is a second

random vector in E ’ satisfying either
Iun - vnI S 8,, IunI or Ian - vnI S 8:, IvnI ,

P P .
where £,,—>O, £,',—>0. Then v, has the same limiting distribution 11 as u, .0

We have

"—1/2 (171(610) _Anl/2 €1‘610 S
(112(620) 02-020

"1,2[6-610]
32'920

By Step 2 the ﬁrst factor of the right-hand side converges to zero in probability.

 

s Idiom-‘7; (61' xii-‘22 (195)) - AII x

 

 

 

 

A

19 -0
Applying Lemma 2 we obtain that An"2[ ' 1°

A

I has the same asymptotic distribution
92 ‘ 620

as n—l/Z Utl(6lo)
0,2(920)

I. By Step 1, the result of Step 3 is shown and therefore Theorem 2 is

Pro ved. I

The next theorem provides a consistent estimator for the asymptotic covariance

matrix in Theorem 2.

Theorem 3

Under the model assumptions B.l-B.5, the asymptotic covariance matrix
C = (C,,,k,l e {1,2})of n"2 (6,I - 6,3,6; —6’20)’ is consistently estimated by
6‘ = (Cu,k,l 6 {1,2}), with

611 = "23:1(61 $113,761).

where 3,, = n"Zf=,l7,,(é,)17,(é,)’ and V,,(6,) is a (p+l)-dimensional vector,

‘7“ (ék) = E‘ ﬁlog 0,, (14,01 ”A?“ (u) ,

117,,(11) = N,,(u) - I: Y,,(s)a,,(s,6,)ds.0

Proof Theorem 3:

By Step 2 of the previous theorem and SlUtsky’s Lemma, what is left to be proved

- .. P
rs the convergence 3,, —)B,,, k,l 6 {1,2}. Recall that 8,, = E (v,,(6,o)v,,(6,o)’) , where

the (p + 1) -dimensional vectors v,, (6,0) have the form
vki (0‘0) = £1 110g a,,- (u,9,o)dM,, (u) .
36,
Therefore we will show that for every j,me {1,..., p +1}
,. . . ,. .. . P . m
n“Zj=,v,§ (9, )V,;" (19,) —)E(v,{,(6,o)v,, (9,,)). (1.17)

61

The j-th components of the vectors 17,, (6,) and v,,(6,o) are

‘;k{:(ék)= 6“ [O< in ST,]— —10ga,,(X,,-,é,)- Et—g—a,,(u,é,)l’,,(u)du ,
86 6k]: 66k}

"1109110)== 61:1[05Xkl<7k1-a—g_logakl(xkl’0k0) [Ia—63 “110‘: 910V11(“)du-
kj 0"}

Consequently it"1 2;ka (6, )V,f" (6, )’ can be expanded as R1 - R2 - R3 + R4 , where

- a .. a ..
R1: n '2; 6,6,,[o<x,, <1,][O<X,, 9,136 0;. loga,,-(X,,,6,)mloga,,(x,,,6,),
J

_ n a .. 3 ..
R2 = n lzi=,§,,[05 X,, S 1,]-:—6;loga,,(X,,,6,)Elma,,(s,6,)l’,,(s)ds ,

R3=n"Z'.'6,,[0<X,,S1,]——loga,(X,,,6,)Egika,,(u,a,6,)Y,,(u)du,
I

86 Bin:
R4- " " ‘ a 63 d ' a 6 ds
-n 2.1-3793mm .)Y..<u> u [I Earns. 0171(3) .

Similarly E (v,{ ,(6,o)v,",' (6,0)) can be expanded as L1 - L2 - L3 + L4 , where

a 3
L1: 5(511611[0<X11<Ti][0<xu$ -Tl]a—6'logarl(xrit910)aT logall(xll’610)]’
kj 91m

Bkj bn

3
L3: :EIJMOS X11 571137108011(X11t910)£32—0110" 910)Y11(“)d“It
011118617

L4

ll

 

t a r 3
E[ I: 5b;a,,(u,6,o)Y,,(u)du I: 391... a,,(s,9,o)r,,(s)ds}.

62

P
By Slutsky’s Lemma, for showing (1.17) it is sufﬁcient to prove that Ra —> La for

ae {1.2.3.4} and k.l e {1.2}. We will provide the details for k = 1.1 = 2 . the other cases

being similar.
P
We start by proving Rl-—) Ll.

Recall that al.-(1.6, ) = af'ao [C—g'Z—‘J. Let
I

 

a _ t- ’ a _ -
é(t.c.61.62.zl.z2)=561710g£al 'ao( 06121))362m10g[021a0[£—c£$&)] and

Ru(61.02) =6..52,.[osr,. s 1.][05 C. S1216(7}.C,-.0,.62.Z“.22,-). Then
R1: 1242;, R,,-(é,.é2) and L1 = E(R..(6.0.020)).
First we show that

sup
(91 .02 5910x920

 

n"2?=lRl,-(6,.62)-E(Rl,(6,.02))| —>o a.e. as n —>oo. (1.18)

We will apply a SLLN for separable Banach spaces. ﬁrst proved by Mourier (1953)“:

SLLN 1:
If (X .|| . II) is a separable Banach space and {Vn} a sequence of i.i.d. random
elements in X such that EllVlﬂ < on then "n”‘ZL'Vi — EVII -—> 0 a.e. as n —) oo .13

Consider the separable Banach space of continuous functions on the compact set

9. o x 920 . endowed with the supremum norm. Then R,.(...) are i.i.d. random elements

Of this space. The convergence (1.18) follows from the direct application of the stated

SLI.N 1 if we show that

63

E[ sup |R.,(6..62)|]<oo. (1.19)
(91525918620

Let of, (t.c. z, . zz) = sup |§(t.c. 01.62. z], 22)|. Then
(OI'BIESIOXSZO

E[ SUP |R|l(61’62)I]SE([OSTlsrlHOS-ClSTZléKTIvCl’ZII’ZZl»:
(aloazﬁeioxezo

= E[E([O 5 T1 5 71110 5 C15 12]§I(TI’CI’ZH’ZZI) IZII’ZZI)]'
Given the covariates 211.22, . the distribution of (T..C,)’ has distribution function
F'(....610.620.p0) and density f‘(....6,0.620.p0). Therefore the right hand side of the

previous inequality has the form

E( L" f§.(:.c,z,..zz.)f‘(1.c.6.0.620.p0)dzdc) 5

SE[ SUP 6101912111221)F.(71172191019201P0)]5
(1.c)e[0.tl ]x[0.12]

' by
S E[ sup sup §(t.c.61.02.Z”.ZZI)] =_ Eg(Z“.ZZI).
(t.c)6[0.r, ]x[0.r2 ] (6, .6, )eewxen notation

Because the function f is continuous in all its six arguments. by Lemma 1 stated in the
proof of Proposition. g(...) is also continuous. By the assumed boundedness of
covariates it follows that Eg(Z”.Zzl) < co so (1.19) and therefore (1.18) are proved.

, F
We assumed 6m in the interior of the set 910- Because 6,. a0“, as n —) oothen.

Wi th a probability approaching one. 61 e 910- Consider an arbitrary 8 > 0. By (1.19).
ln-nzfgl R,.(é.,é.) — E(R,.(é.,é2))| < 5/2 (1.20)

wi th probability tending to one as n —> oo.

By Dominated Convergence Theorem. ERl l(.,.) is a continuous function on the

compact set 6,0 x920. Then
ERH(él,éz)‘ER”(610,620)(8/2 (1.21)

with probability tending to one as n —> 0°.

The inequalities (1.20) and (1.21) imply that
.1142le R...(é..6“2) — ER, “6.0.02.9 < e

P
with probability tending to one as n —> 00. Hence we proved R1—>Ll.

P
Next we show that R2 ——> L2 .

Let 7)(t.c.6 .92. 2,. 22) the function obtained from

 

 

a
86 '108a11(’.91)302m

11

021(c.62) by replacing the covariates Z1 1 .221 with the arguments

zl . 22. Deﬁne also R2.(c.6,.62) = (Sh-[0 S T,- S t,]17(7}.c.6,.62.Z,,-.Z2,.)Y2,-(c). By Fubini’s

Theorem R2: 5’ (1:42;, R2.(c.é..é2))dc and L2 = E’ER2.(c.6.o,0.o)dc.

First we show that

 

sup sup "-12:31R2i(c’01’62)— ERZI(C,61,92)| ‘9 0 a.e. as n —) 0° . (1.22)

$103,114.92 )ee...xe..
We will apply an extension of SLLN to DE[O.1'2] . the set of right-continuous functions
Wi th left-hand limits on [0.12]. taking values in a separable Banach space E. The space
05; [0.12] is endowed with the Skorohod topology. For a proof of this result see

Andersen and Gill (1982).3|

65

SLLN 2:

Let {Va} a sequence of i.i.d. random elements of DE[0.1'2]. Suppose

 

 

E||V1ﬂ=E sup ﬂV,(c)||<oo. Then "-IZLIW‘EVIH—W a.e. as n—)oo.1:1
c€[0.r2]

In our case E is the space of continuous real functions on 9,0 x 620 . endowed
with the supremum norm. and V,(c) = R2,-(z'2 — c....). For applying SLLN 2 we need to

show

E[ sup sup |R2,(c.61.62)|] < oo. (1.23)
0210.1. l(6,.62)69,0x920

Let n,(t.z,.z2)= sup sup |n(t.c.61.62.z,.z2)|. Then
ce[0.rzl(6..02)eelox9w

E[ sup sup |R2,(c.6,.62)|]SE([OST, Sr.]n,(Tl.Z“.Zz,))=
ce(o.r.1(19,.1112 )eemxe20

...1:[15(tosrl salm(7‘..zn.za)Ilwzzn)]=

= E‘(£'I £n1(t,zl1,22])f.(t,C,610,620,p0)dtdC) S

by
5 E[ 3UP 771(tv2111221))n0;m581(zi1:221)-

1510;.)
By the same argument used to prove (1.19). Eg,(Zl 1.22.) < co. Therefore (1.23) is
shown and. by the stated SLLN 2. (1.22) follows.

Consider an arbitrary 8 > 0. With probability tending to one. 5,. 6 91:0- Then. as

a consequence of (1.22).

sup 1:42;; R2,.(c.1é1‘l .éz) - ER21(c. 61.,62) < 8/212

CEIOvIZJ

 

66

with probability tending to one as n —) 00. Then

S

 

If n-lz:=1R21(c’éhé2)dc ’ E2 ER21(Caé1.é2)dc
(1.24)

- .. .. ~ n e
n lzr=lR2i(c’61’62)-ER21(C’61’62)I<72§;2-=€/2

 

5 1'2 sup
c'e[0.r2]

with probability approaching one as n —> oo.

By (1.23) and Dominated Convergence Theorem. the function
(61.02) —> E” ER2,(c.6,.62)dc =15 £2 R2,(c,6..62)dc

is continuous on 9,0 x 920. so

(8/2 (1.25)

 

| f ER2,(c.él.éz)dc — L" ER2,(c.0,0.620)dc

with probability tending to one as n -—> oo.

The inequalities (1.24) and (1.25) imply that

.<€

 

IE2 VIZ?=1 R2.(c. 51.52 )dc — E2 ER2,(c.6,0.020)dc

P
with probability approaching one as n —) co. Therefore the convergence R2—>L2 is

P
proved. The proof for R3—-) L3 is identical.

P
Finally we will show that R4—>L4.

Let [(1.0.0,.62.z,.zz) the function obtained from —a—a“(t.6.)—a—a2|(c.62)
86,,- 892...

by replacing the covariates 211,221 with the arguments z,.z2. Deﬁne also

R4.- (t. c.0|.62) = {(t.c.01.02.Zl,-.Zz.- )Y,,-(t)Y2,-(c) . By Fubini’s Theorem

R4 = E £2(n42;R4.(t.c.él.é2))dtdc and L4: El [:2 ER41(‘»C19101920)d‘dc°

67

P
The proof follows the same steps as in the proof of R2—-) L2. We will show only

that

sup sup
(t.CE[O.T| No.12 1(61 .0) Eeloxezo

 

NZ; R4,-(t.c.6,.62) - ER..(:.c.0..92) —-) o a.e. (1.26)

as n -—> 0°. The SLLN that has to be applied in this case is an extension to the space
DE ([0,7l]x[0.12]) and it is proved in Appendix A.
SLLN 3:

Let {Va} a sequence of i.i.d. random elements of DE ([0,1,]x[0.z'2]). Suppose

EHVIH=E SUP “V,(t.c)ﬂ<oo. Then [In-‘2'.z Vi‘EV1u—W a.e.as n—)eo,u
(l.c)e[0.rI ]x[0,r2] 181

In our case E is the space of continuous real functions on 9,0 x 920 . endowed

with the supremum norm. and V..(t.c) = R4,-(z'l 4.12 — c....). The SLLN 3 assumption is

satisﬁed:
E [ sup sup IR,“ (t.c.6,.192)|] S
(’.CE[0.T| No.72 1(0‘ .02 Eeloxem
by
SE[ SUP SUP |§(t.c.91.02.2u.221)|] =. 582(211»221)<°°1
(l.c)E[0.r. ]x[0.r2 ] (6, .6, )eemxe20 notation

because 32 is a continuous function and Z“.Z2l are bounded. Thus (1.26) is proved.

This concludes the proof of Theorem 3. I

68

1.2.3 Point Estimates and Conﬁdence Intervals
for the Median LOS and Median Cost;

Application for the Bivariate Normal Case

In Section 1.2 joint analysis of hospital LOS and cost is based on the linear model

(1.12). Let 8.5 the median of the distribution of the marginal errors 8,“- and Z0 a

speciﬁed covariate proﬁle. Denote by Z05 the vector (Z6455 )I.

By the model (1.12). we consider the point estimates for the 30.08) median and

g(cost) median (denoted T5 and C5. respectively) to be

is = 181,20 + 615.5 = él’ZOe" (l 27)

where 61 is a consistent estimator of 6“, = ( 5120,0110) .

The standard errors of the estimators f5 and C5 are
.. , _1 1/2
350;.) = [20.01 C..)zo.] .
.. , _. 1/2
SE<C.51=[ZO.<n 022120.] .
where Cu is the asymptotic covariance matrix of n"2 (61 - 6w) . as deﬁned in the
Theorem 2 of the sub-section 1.2.2. Consistent estimators of the standard errors SE (is)

and sac.) are

69

A g ’ -l“ 112
SE(T5) = [log (’1 C11)ZO€] ’
(1.28)
A a ’ -l" 1/2
SE<C.S)=[20.~<n 022120.] .

where Cu is the consistent estimator of the asymptotic covariance matrix Cu (see

Theorem 3).

Then. retransforming the results in the original unit of measurement. given a
covariate proﬁle 20 . we consider for the median LOS and median cost the following
point estimates:

f; = 8401.5):

6.; = 84(65)-

Considering g‘l nondecreasing. the conﬁdence intervals are deﬁned as
[3“[1‘3— z. swig}? (it. + a. was)”.
[a‘ [6.5 — z. SEté.s>].g“ [6.5 + 2.6. swig)”.
where 2,, is the upper or quantile of the standard normal distribution and 13.65 and

SE(f5).SE(C_5) are given in (1.27) and (1.28). respectively.
Application: Bivariate Normal Case

In the setting of Example 1.2.1. 8,“- are i.i.d. N(0. 1) . which is a symmetric

distribution with median 8.5 = O.

70

Using the notations of Theorem 2. consider

C CL: C13
1dr = I .
(c121

where C3 is the px p asymptotic covariance matrix of n”2 (3,. - ﬁlo) and C3,? is the

1/2(.

asymptotic variance of n 0',‘ — 0“,). Similarly deﬁne the estimators C”. CE. C33.

In this case the estimates and conﬁdence intervals for the median LOS and

median cost. at a speciﬁed covariate proﬁle Z0 are

7:5 = 8409170).

6‘3 = g"<Bszo).
and g"l

(
[.-»(

respectively. Again. we considered g’

az.-z.(zao-lélhz.)'”).g-l(19:2.+z.(za<n-'é:hz.)'”)].

. I _ . l/2 _ I _ .. 112
ﬂazo-za(zo(n 115120) )g l(13220+za(zo(n 'Céhzo) )]

' nondecreasing.

In the following we will provide the details of the calculation of the exact form of
the estimator C = (Cu.k.l 6 {1,2}) of the asymptotic covariance matrix of

I

n'” (at — 0.3.6“; - 6’20) . By Theorem 3. 6., = n22:'(ék mug-1(a), where -P.;‘(0.)
is the matrix of the second order partial derivatives of the log-partial likelihood and
é“ = ’1’! 2;;qu (ék )‘71" (é, )’ ,

with 17“(ék) is a (p+l)-dimensional vector.

71

.. . a . ..
me.) = E‘ﬁ—logakxuﬂkWh-(u).
k

Mum) = N...(u) - E Y..(s)a.,.(s.é,. )ds.

Because PROC LIFEREG in SAS provides the values of 2:1(12). we compute

only B“. We again adopt the notation ig(a) for —-a—g(6,()|6k=a . The vector

39., a9...

Vki(ék) has the components:

use the notation S,“ =

for je {l,...,p}:

a- A a -1 X H-ﬂlz
V{(9)=6,-[OSX,-St]—lo 6 M] __
It It k It It 3319' 8[ ka0[ ‘31:
t a -1 u-ﬁzzk.)
- — 6‘ ———‘ Y.(u)du;
Eaﬂkj[ k %[ 6k k
forj=p+l:

mg.»

. + . a -
Vk‘i’ 1(6k)=6,“-[OSX,“-51k]5E-log[dklao[ 6}

_ .2 -. u_._-1taz- .
£801 [61 a°( 6r DENIM“-

In the next calculations we will assume that in the study 0 S X ,u. 5 z" and we will

Xltr' ' 31:21:: .
31

Then. for je{1.....p}:

72

175(6).) = “51:1 %(§ki )6; IZkij + kai 671-254 [ff—£12111] Zkl'jdu =
I:

61:

= {I[f-2E1%(V)dV—6ki%(§ki)]zky.

For the standard normal distribution $3) = 010(x) - x. As a result

915(91): ak—l aO(§ki)_ ao[‘ﬁ—zz'kl] “511% (SAM) +5u§u]Zklj =

 

 

 

 

at
“ (1.29)
= &k-1 (1 - 6k1)ao(3h) _ a0 [_ 3:25 ]+ 6,3,“ J2)“,
_ It
For j = p +1:
V‘rf“'<é.> = —6,..-&;' - 6,..- 5“}(5111c11'8‘a +
xu —2 “-81:21“ a-2 u’ﬁlizki u—ﬂlzzlu' _
+1) [6, ao[———————&k ]+a, 0% at at du-
= &;|[f*:ﬂﬁao(v)dv+ fgﬁ%(v)vdv—5k, ‘6k1(ao(§k1)‘§k1)§k1]-
a. a.
By assumption B4 and integration by parts:
J:0o(V)dv+ f%(v)vdv =ao(b)b-ao(a)a-
Therefore
Vki+l(élr) = 5;! [an (Shaw/:1 + a0 [" 3321.]35511 ' 51a“ 511% (3&1)§ki + 5a§21:l =
k r
(1.30)

 

= 5’1:1 [(1' 51:1 )ao (§k1)§ki + a0 [‘ ml“)

6‘ ﬁzz“ ' 51a + @3131]
It

6‘1

73

Let

511 = (1'5kt)ao(~§kr)‘ do (if?) “Laugh,
I:

 

51a = (1’ 51:1)00 (511° ) 5'11 + a0 [‘ 1632“]3221. "' 51:14” 511%-
It It

By (1.29) and (1.30). the (p+1)-dimensional vector 17k,(ék) has the form
A .. _ 0“th
Vu(6k)=akl[ . ].
bl.-

Consequently
3a = n"Z;;.Vr.-<ér 191(6) 1’

—l n A A I —l n A A
n Zigalﬂ'ah‘zﬂzﬁ n Ziglakiblizki

n-l -l
=0 6'

"’ n-Iz “132’ 42" 1313
M011 ki 11 ’1 ,=, 1111

1.3 Application

We demonstrate the application of the methods proposed in Sections 1.1 and 1.2
to hospital LOS and costs in a cohort of patients admitted for coronary artery bypass
surgery (CABG). Data are from a tertiary care academically afﬁliated medical center. In
this study there were 1268 consecutive admissions for CABG from August 23. 1993 to
December 29. 1994. Computerized ﬁles were maintained on demographic
characteristics. medical history. clinical outcomes. resource use and costs.

Length of stay was deﬁned as the number of days from admission to discharge.

inclusive of the admission day. Costs were derived from services for operating room.

74

nursing. laboratory. and pharmacy as well as room and board and convenience items. All
professional fees were excluded.

The complete dataset was available. including cost histories for each patient. As
an example for our techniques that incorporate censoring and assume that only total costs
per patient were observed. we reconstructed the dataset six months after the study started
and we computed the total costs. At that time some patients were still in hospital and
some costs were not as yet incurred. This resulted in a dataset of 465 subjects. with
7.53% of the LOS and 9.68% of the hospital costs being censored. respectively. Our
objective is to examine in our sample the health care utilization. as measured by LOS and
costs. Using both approaches (semiparametric and parametric). the medians of these two

outcomes will be estimated and conﬁdence intervals will be provided.

CORRELATES OF LOS AND COST

Potential correlates of DOS and costs included demographic and clinical variables
that could be identiﬁed at admission. the use of cardiac catheterization during hospital
stay and discharge status. an indicator of whether the patient was alive at discharge. The
variables available at admission were age at admission. gender. race (W hite. African-
American or Other). marital status (Married. Alone or Unknown). insurance status.
comorbidity. ejection fraction and history of prior CABG. Insurance status was
categorized as Medicare. private. Medicaid or other. Ejection fraction. a useful measure
of cardiac function. is the volume of blood expelled at each systole as a fraction of the

volume of blood contained in the ventricle at the end of the diastole. A value less than

75

50% is generally considered abnormal. Values of ejection fraction were grouped as
below 35%. 35% to 49% and 50% and above.

Charlson Comorbidity Index (CCI)‘“s was used to assess comorbidity. It is a
weighted sum of the presence of 19 speciﬁed medical conditions at admission. These
conditions include diabetes. liver disease. congestive heart failure. peripheral vascular
disease. prior myocardial infarction. cerebrovascular disease. connective tissue disease.
dementia. chronic obstructive pulmonary disease. hemiplegia. tumor. and acquired
immunodeﬁciency syndrome (AIDS) or AIDS related complex. We formed three
comorbidity groups based on CCI scores 0-1. 2-3 and 4 or more.

Diagnosis Related Group variable (DRG) is in this case a binary variable that
indicates if a patient has cardiac catheterization during the hospital stay. Table 1.1 shows
the characteristics of the 465 patients in our sample. The mean age was 63.4 years and

the median age 65 years.

RESULTS

There is a high correlation between LOS and cost (Spearrnan r = .79. n = 465)
even if all censored observations are omitted (Spearrnan r =.77. n = 420). The range of
the uncensored LOS observations was 3 to 35 days. and the uncensored costs ranged
from $11,669 to $90,088. Figure 1.1 shows a plot of LOS and cost (circles represent
censored cost observations).

The bivariate models we use for LOS and cost allow inferences about regression

parameters simultaneously for these two outcomes. For example. suppose we are

76

interested in the effects of a covariate with three categories. such as CCI. on both LOS

and cost. Two dummy variables are created for these three categories. Denote by
’70 = ((1310),:(1610); »(ﬂzo)1t(ﬂ20)2)

I
the subvector of the vector of all true regression coefﬁcients ( ,B,’o. [350) corresponding to

the dummy variables for both LOS and cost. and by 7'7 the vector of estimators
77 = (31118121321’322),'
Let I]? be the consistent estimator of the asymptotic covariance matrix of 77 . as deﬁned in
Theorems 3. Sections 1.1 and 1.2. By the asymptotic normality of the regression
parameter estimators. the quadratic form W = ﬂ’l/7"ﬁ is asymptotically chi-square
distributed with 4 degrees of freedom and can be used to test jointly the null hypotheses:

H,, : [3,0, = 0.ke {l,2}.je {1.2}.

In the covariate selection procedure. correlates of LOS and cost were tested
jointly so that the resulting model would have the same constellation of signiﬁcant
variables. Each potential covariate was assessed individually and then in combination
with all others that were found to be signiﬁcant by univariate analysis (p-value< 0.20).
Only age at admission was regarded as a continuous independent variable. In the ﬁnal
regression model we retained only variables that were signiﬁcant at p-value< 0.10. All

analyses were performed with SAS software version 8 (SAS Institute Inc.. Cary NC).

77

Semiparametric Model

To implement the method developed in Section 1.1. we create two records for
each patient. one for DOS and one for cost. For the categorical covariates sixteen dummy

variables d, . 2 S i S16 were created and type-speciﬁc covariates were deﬁned as

follows:

For LOS:
Z” = age*[type =1] and 2,, = (I, *[type =1].2 S i S 16;
For cost:
Zn = age*[type =0] and 2,2 = d,- *[type = 0].2 S i S 16.
After the covariate selection procedure. the signiﬁcant correlates in the ﬁnal

model were age at admission (p=.0958). DRG (p<.0001). indicator of being discharged
alive (p<.0001). history of prior CABG (p=.0006). ejection fraction (three categories.
p=.0246) and Charlson Comorbidity Index (three categories p<.0001). Age was regarded

as a continuous variable and seven dummy variables d.- . 2 S i S 8 correspond to the

signiﬁcant categorical covariates.

Consider the vectors Z, =(Z,,..Z2,....Zs,.)’. ﬂ, =(ﬁ,,.ﬁu....ﬂgk )’. lSk S2

and Z = (Z,’.Z§ )I. [3 = (3.433). Note that for the LOS data: ﬂ’Z = ﬂfZ, and for the cost

data: ,B'Z = .6522. We are essentially ﬁtting two separate Cox models to LOS and cost.
but this formulation is amenable to the SAS PI-IREG procedure and permits simultaneous
estimation of [3, and .62 as well as direct estimation of the correlation between the two

estimators.

78

We present a part of the ﬁnal model estimates. For the three CCI categories. the

two (out of seven) dummy variables d2.d3 were deﬁned as following:

forCCIO-l: d2=l.d3=0;

for CCI 2-3: d2 = 0. d3 =1 ;

for CCI 4 or more d2 = 0. d3 = 0.

The LOS observations have Z2, = d2.Z3, = d3.222 = O.Z32 = 0 and the cost observations
have 22, = 0.23, = 0.Z22 = d2.Z32 = d3. Estimates of the corresponding regression
parameters ﬂZI’ﬂJI’ﬂﬂ’ﬂn are

192, =.683,/331 = 230,322 =.766.1932 = .140 .

The estimated adjusted covariance matrix of these four beta estimators is

(0.0155 0.0105 0.0140 0.0101)
0.0165 0.0102 0.0163

0.0219 0.0150 '
L 0.0230 )

 

 

When ﬁtting independent proportional hazard models for LOS and cost. the estimate for

the so-called nai've covariance matrix is

( 0.0224 0.0148 0 0 )
0.0200 0 0

0.0229 0.0151 1'

\ 0.0206 )

 

 

Estimates and approximate 95% pointwise conﬁdence intervals for the LOS and
cost survival distribution functions were calculated for the 6 covariate proﬁles deﬁned by

CCI and discharge status (alive or dead). for a patient age 65 years at admission. with

79

ejection fraction above 50. who had catheterization and with no history of prior CABG.
As presented in Section 1.1.4. conﬁdence intervals were obtained using the

log(-log) transformation and both the adjusted and naive variances were used. Figures
1.2 and 1.3 depict the LOS and cost survival function estimates and approximate
conﬁdence intervals for one of these proﬁles (for CCI = 4+. discharged alive). The naive
and adjusted conﬁdence intervals are different. but close.

Median LOS and cost were estimated from the corresponding survival
distributions (see Section 1.1.4). Table 1.2 shows these estimates for the 6 covariate
proﬁles. previously presented. The adjusted and naive conﬁdence intervals for the
median LOS are the same for patients discharged alive. but they differ substantially for
patients who died during their hospital stay. For the median cost the two types of
conﬁdence intervals were different for all proﬁles. possibly due to more variation in the
cost data. For the survivor proﬁles the naive conﬁdence intervals wholly contained the
corresponding adjusted conﬁdence intervals but this pattern changed completely for the
non-survivor proﬁles. In our sample of 465 subjects. 12 patients did not survive their
hospital stay. The small number of deaths and the large variability of the outcomes in
these 12 patients might be a reason for the instability of our estimates for the non-
survivor proﬁles.

In Table 1.2 we notice how the LOS and cost median estimates increase with
larger comorbidity. Patient who survived their hospital stay had larger LOS and smaller

costs than those who died.

80

Parametric Model

Both LOS and cost exhibit right skewness but this is not severe. No simple
transformation could eliminate it. In our analyses outcomes are in their original scale and
we assume bivariate normality.

The signiﬁcant correlates in the ﬁnal joint model were DRG (p<.0001). indicator
of being discharged alive (p<.0001). history of prior CABG (p=.0282). ejection fraction
(p=.0032) and Charlson Comorbidity Index (p<.0001). This parametric model has the
same set of covariates as the semiparametric one. except age. which is not signiﬁcant in
this case. Again. these assessments were made jointly for both LOS and cost.

Following the calculations of the Section 1.2.3. we computed the adjusted
estimator of the asymptotic covariance matrix of the regression parameters and the point
estimates and conﬁdence intervals for the medians of both outcomes. Since no
transformation was used. g was the identity function. Estimates and conﬁdence intervals
of the median LOS and median cost by comorbidity and discharge status are shown in
Table 1.3 for patients with ejection fraction above 50. no history of prior CABG. who had
cardiac catheterization. The estimates are larger than in the semiparametric case. but
similar patterns are noticed. The conﬁdence intervals are wider for non-survivors than
for survivors and non-survivors have lower LOS and larger costs than survivors. As

expected. LOS and cost increase with comorbidity.

81

DISCUSSION

We applied two models to estimate the median LOS and the median cost for
patients hospitalized for CABG. Each model permits adjustments for correlates and
recognizes the correlation between the dependent variables. The semiparametric
approach speciﬁes marginal models for LOS and cost and yields consistent estimates of
the regression parameters as long as the marginal models are correctly speciﬁed. The
adjusted covariance matrix then accounts for the correlation between the outcomes.
without explicitly specifying a joint distribution for them. The parametric approach
speciﬁes a joint distribution for LOS and cost but the parameters related only to the joint
distribution and not to the marginal models are considered nuisance parameters and are
left unspeciﬁed. In our study. using the adjusted instead of the naive covariance
estimates that do not address the correlation between LOS and cost. gave qualitatively
different results with respect to the conﬁdence intervals for the median cost.

The ﬁnal sets of covariates in both models were essentially the same:
comorbidity. discharge status (alive or dead). DRG. history of prior CABG. ejection
fraction and age at admission. that was signiﬁcant only in the semiparametric model.
Previous studies of LOS and hospital cost have shown the importance of these covariates
and their statistical signiﬁcance regardless of the model used.8' 22' 47

A problem with LOS and in-hospital cost data is the appropriate treatment of in-
hospital deaths. Previously. several authors7' 8 have regarded deaths as an early
curtailment of LOS and costs. In these studies observed cost and LOS of non-survivors is

considered right censored because if they had survived their hospital stay. their costs

82

would have been higher and LOS longer. In this approach the independent censoring
assumption is not veriﬁed. Besides this. for many applications estimates of costs for
those who died are just as important as for survivors. By censoring at death no model can
be used to derive predictions for a decision model or cost-effectiveness analysis in which
death is an explicit outcome. If only the observations of those who died are used in the
analysis. the investigator generally sacriﬁces considerable efﬁciency. since the majority
of the total sample typically survives. We regarded an in-hospital death along with other
demographic and clinical characteristics as potential correlates of LOS and total hospital
costs. Our ﬁnding was that the non-survivors had lower LOS and larger costs than
survivors. Treating non-survivor costs as censored would have increased the bias of the
cost estimators.

We have several limitations in the analyses of our application. One is the strong
distributional assumption needed in the parametric model. We made a normality
assumption and even though the resulting model did not provide a very good ﬁt for our
data. we used it as a comparison for the semiparametric model. When a distributional
assumption is plausible for a study. the parametric models have the advantage of the
simplicity of calculations and the efﬁciency of the estimators. Another limitation is the
problem of censoring for costs. The use of survival analysis techniques to analyze

1.9.11.48 on its

medical care costs is relatively new and has sparked a lively debate
applicability. given the assumptions that underlie traditional survival models for duration
times. Patients who accumulate costs over time at relatively higher rates tend to generate

larger cumulative costs at both the event time and the censoring time. leading to

dependent censoring. This contradicts the usual assumption of independent censoring

83

made in standard survival analyses. While no single approach can be expected to
perform in all situations. we believe the traditional survival methods will still have a

useful role in the cost analyses. especially of cost histories are not available.

TABLE 1.1: Characteristics of Patients

 

 

Variable Subgroup N Percent
Discharged Alive Yes 453 97.41
Gender Male 3 14 67.53
Race White 398 85.59
African Amer. 36 7.74
Other 22 4.73
Unknown 9 1.94
Marital Status Married 333 71.61
Alone 1 12 24.09
Unknown 20 4.30
DRG with CATH 235 50.54
Ejection Fraction < 35 58 12.47
35 - 49 126 27.10
50 + 281 60.43
History of prior CABG Yes 36 7.74
Charlson Comorbidity Index 0-1 198 42.58
2-3 189 40.65
4 + 78 16.77
Insurance Medicare 275 59.14
Private 133 28.60
Medicaid 30 6.45
Other 27 5.81

85

COST (1000 $3

 

 

 

 

1101
m5 0
so-j .
30: at
70:
3 r
4 It
50. )1
I * 4 *
‘ ll
50: * i: X,“ at:
. 71‘
: i a * t *
401 *Q *
3 " ll 1
a X
a * 4* * Ix‘
30; * 1 1 x... t ..
, * 11 h x
201 ** * i .. 1'
l a *
I]: o 5 *** UNCENSORED cosr
j 000 CENSOREDCOST
. 0
.go
0IIIIIIIIIIIIIIIIIIVIIIIIITTIIIIITTIIIIIIlTjIIIIIII]
0 10 20 30 40 50

IDS [days]

FIGURE 1.1: Distribution of Costs and LOS

86

 

1.01 _
3 I
0-9: I
0.8-? I
0.75 I
0.05
0.55 ..
l
0.4-:
0.35 ..
0.25

0.1{

J

 

 

 

00‘
' IIIIIIIIIIIIITTIIIIIIIIIIIIIIITTIIIllllllllllllllllllllllllllllllTTrllllllllllll

0 a .0 (a ,0 ,3: $0 15‘: .0

LOS (days)

FIGURE 1.2: Estimated LOS survival function and

) and naive (- - - -) pointwise

 

approximate 95% adjusted (
conﬁdence intervals. Estimates were made for a patient discharged
alive, who underwent CATH, with a CCI of 4+. ejection fraction 50+.

age 65 at admission and no history of prior CABG.

87

 

10'

A I J I

0.9:
0.8::
0.7%
0.6“:
0.5 1
0.43

0.3 :

  
 

0.2;

“-1-
‘------h----

0.1f

 

V

 

‘5-
”hhq--—---b----'

 

 

 

rrlllllllllllllllllllllllllllllllllIIIITTTIIITITIIIIIIIlllllllilIIIIIITIIIWTIIIIIIIIIII

'9 0° 0° 6° 6° 0° 1° 0° 0° 0°

C061 (1000$l

FIGURE 1.3: Estimated cost survival function and

 

approximate 95% adjusted ( ) and naive (- - - -) pointwise
conﬁdence intervals. Estimates were made for a patient discharged
alive. who underwent CATH, with a CCI of 4+. ejection fraction 50+,

age 65 at admission and no history of prior CABG.

88

TABLE 1.2: Length of Stay and Costs by Comorbidity and Discharge
Status (Semiparametric Model)

 

 

 

Length of Stay. days Cost. 3
Adjusted Adjusted Adjusted Adjusted
(naive) (naive) (naive) (naive)
CCI Status Median 95 % 95 % Median 95 % 95 %
LCL UCL LCL UCL
10 11 20.378 21,315
0-1 Alive 1 1 20.819
(10) (1 1) (20.356) (21.382)
8 15 18.922 34.884
Dead 10 24.167
(9) (12) (20.682) (29.422)
12 14 24.140 25.149
2-3 Alive 13 24.666
(12) (14) (24.116) (25.192)
9 19 21.415 79.103
Dead 12 30.155
(10) (15) (24.201) (44.637)
l3 16 24.913 27.083
4 + Alive 14 25.660
(13) (16) (24.728) (27.177)
9 22 23.048 79,103
Dead 13 31.276
(10) (16) (25.399) (46.662)

 

Estimates for a patient who underwent CATH. with ejection fraction 50+. age

65 at admission and no history of prior CABG.

89

TABLE 1.3: Length of Stay and Costs by Comorbidity and Discharge

Status (Parametric Model)

 

 

 

Length of Stay, days Cost. $
Adjusted Adjusted Adjusted Adjusted
(naive) (naive) (naive) (naive)
CCI Status Median 95 % 95 % Median 95 % 95 %
LCL UCL LCL UCL
10.98 12.10 22.094 24,380
0-1 Alive 1 1.54 23.237
(10.73) (13.35) (21.493) (24.981)
4.27 12.91 17.778 41.632
Dead 8.59 29.705
(5.92) (1 1.27) (23.945) (35.466)
12.47 14.47 25.790 29,892
2-3 Alive 13.47 27.841
(12.62) (14.32) (26.001) (29.681)
6.13 14.92 22.380 46.238
Dead 10.52 34.309
(7.89) (13.16) (28.636) (39.982)
13.51 16.23 27.058 32.728
4 + Alive 14.87 29.893
(13.66) (16.07) (27.287) (32.499)
7.35 16.49 23.278 49.445
Dead 1 1.92 36.361
(9.20) (14.64) (30.497) (42.226)

 

Estimates for a patient who underwent CATH. with ejection fraction 50+ and

no history of prior CABG.

CHAPTER 2

ESTIMATING HOSPITAL COST

OVER A SPECIFIED DURATION

Considering accumulating cost as a process evolving over time. we construct in
this chapter a regression model that permits the estimation of the mean cost over a
speciﬁed duration of hospital stay and also adjusts for the inﬂuence of patient
characteristics on LOS and cost. The proposed methods that model the relationship
between hospital cost and LOS are applied to assess the mean cost in a cohort of

hospitalized patients who underwent CABG surgery.

2.1 Model Description

Let T denote the LOS and Z, a vector of p explanatory variables that might have
an impact on the distribution of T. The cumulative cost C(t) through 1 days of hospital
stay is only observed at t = T and therefore we will focus on the influence of covariates
Z2 on the distribution of the total cost C = C(T).

To fully integrate the role of time into analyses of costs we would need the

cumulative cost histories as they manifest in each patient. Suppose that C(t) = £B(u)du .

91

so that cost is incurred at the rate B(u) at time u. The rate of hospital cost accumulation

is observed at time u only if u is less or equal than the hospital stay T. Therefore the

observed cost has the form

Err 2 u]B(u)du .
We want to estimate the expected value of the observed cost over a duration t for

a speciﬁed covariate proﬁle Z0. Given this covariate proﬁle. we assume that T is

independent of the rate process {B(u).u > 0}. Then. by Fubini’s Theorem. the mean

observed cost is
MC(t|Zo)= £b(u|Zo)S(u|Zo)du . (2.1)

where b(u |Zo) = E (B(u) | Z0) is the average potential rate at time u and
S (u IZO) = P(T > u | Z0) the survival function of T. both for the speciﬁed proﬁle 20.

Consider it individuals in the study. Using the same notations of Chapter 1. for

the i-th patient let 7} denote the true LOS. 7}’ the censoring time. X ,} = min(7}.7}') .
5,} = [7} S 7}] the indicator of non-censoring for time and Z,, a vector of p explanatory
variables that inﬂuence 7}. Consider also the true hospital cost C, . the censored cost C} .
X 2, = min(C,-.C,’) . 62,- = [C, S C,'] the indicator of non-censoring for cost and 22, a
vector of p explanatory variables that inﬂuence C} .

Let 5(.| Z0) . S(.|Zo) be estimators of b(.| Zo).S(.|Zo) . respectively. obtained
from the data {(X,,.6,}.Z,,).(X2,.62,.Zz}).1 S i S n}. Then the mean observed cost is

estimated by

92

rice | 20) = £130. | ZO)S(u | Zo)du. (2.2)

The construction of the estimators 6(. | Z0) . S(. | Z0) relies heavily on the study data. In

the application described below. a linear relationship between time and cost is considered
appropriate and a linear regression model is used for the estimation of the expected rate.
The survival function is estimated through a proportional hazard model.

In Chapter 3 we extend the model proposed here to a more general setup in which
patients pass through different health states. In the present chapter a patient makes a
single transition from the state of being hospitalized to the state of being discharged. A
rate model is used for describing the accumulating hospital costs. With transitions
between several states. costs might be incurred at transition between health states and
also during a sojourn in a health state. The latter costs can be described also through a
rate model. whereas the “jump” costs at transition times between states can be described
using marked point processes. For these cases we provide in Chapter 3 our methods of

estimation of mean cost.
2.2 Application

We apply the presented method to the hospital costs of the patients from the same
sample used in Section 1.3. In that section we described the study. the available potential
correlates and the covariate selection procedure.

The proportional hazard model provided a good ﬁt to the data. The signiﬁcant
correlates of LOS were age at admission (p=.0366). DRG (p<.0001). history of prior

CABG (p=.0685). ejection fraction (p=.0650) and Charlson Comorbidity Index

93

(p<.0001). The estimator S(. I 20) of the LOS survival function was derived from the

Cox regression model. as described in Section 1.1.

The plot of costs versus LOS (see Figure 1.1) suggests a linear relationship. The
variance seems to change a little over time but for simplicity we do not consider this fact
in the analyses of this example.

As in the parametric model used in Section 1.3. we assume costs approximately

normally distributed. For modeling total costs as a function of time. we allow LOS and

LOS2 to compete for inclusion in the ﬁnal model. We also regard the indicator of non-
censoring for cost along with the other demographic and clinical characteristics as
potential correlates of total hospital cost. The signiﬁcant correlates are LOS (p<.0001).
DRG (p=.0636). history of prior CABG (p=.0213) and indicator of being discharged alive
(p<.0001). Even though the indicator of cost non-censoring is not signiﬁcant (p=.6138).
we include it in the ﬁnal list of covariates because we want to be able to distinguish
between the censored and the non-censored cost observations. We then consider the

model

C1“) = 161621 + 55221 + 163 t + 051:
in which the parameters [3,432,133.17 are estimated by maximum likelihood. assuming
the errors 8,- independently normally distributed with zero mean and unit variance.

Consequently. for a given proﬁle 20 . we estimate the expected (uncensored) cost
Co(t 120): E(C(t) | 20) = E(£B(u)du IZO) = £b(u |Zo)du

by 60(t|zo)=19, +3gzo+B.r. for r>0 and é.(0|zo)=0. (2.3)

94

The above model for GO) incorporates the dynamics of time into the accumulating

hospital cost. In other applications the dependence on time might be more complex than
the simple linear relation used here. However. in practice a polynomial in I should
adequately capture the dependence on the time t. In a related study of hospital charges in
patients undergoing cardiac procedures. a quadratic in t provided reasonable ﬁt to the

data.49

Suppose we want to estimate the mean observed cost over the ﬁxed duration t.

The estimated survival function 8(. | 20) is a step function that jumps at the observed
LOS times. Let TU).I 21 denote the ordered observed LOS times in our sample. Then

the ﬁxed duration 1 is located between two of these times: 7},_,) < t S Tu) and. by (2.2).

" l-l ~ ~ ~
MC“ I 20) = Zj=|S(T(j-|) lzo)(Co(T(1) lZo)‘Co(T(1—ll 1200*

+ §(T(l—l) IZO)(éO(t l 20) _ 60(721-1) IZO))’

where COL | Z0) is given by (2.3). Thus the mean observed cost is an average of costs

increments. weighted by the likelihood of surviving through each incremental period.

Table 2.1 shows the estimated mean costs at the LOS median and at the largest
observed LOS (35 days) by comorbidity and discharge status (alive or dead). The
estimates are for survivors. age 65 at admission. ejection fraction 50+. with no prior
history of CABG. who had a cardiac catheterization during their hospital stay. Medians
of the LOS distribution for different covariate proﬁles were estimated from the Cox
model. The discharge status was not signiﬁcant in the model for LOS. so the LOS
medians do not differ between the survivors and non-survivors with the same

characteristics. As expected. the mean costs increase with comorbidity. We also notice

95

(hat non-surv
the estimated
and $34,030 1
by category or
for Table 2.1.
noticed in the
the linear trenl

In conc
stays of specif
survival anal )1,
potential corre
way We do not
valid in many 5
cost aCCUmUlut
absence 0f the .
cost and L03 1:
increments, Th
adopted in CVal
In that “My 811
estimated by th
incomplete f0“

Will-Ch gave the

 

Lemma [20)
l

 

that non-survivors have larger costs than survivors. Thus for patients with CCI of 2 or 3.
the estimated mean cost through their median LOS of 12 days was $21,651 for survivors
and $34,030 for non-survivors. Figure 2.1 is a plot of the estimated mean observed cost
by category of comorbidity for patients discharged alive. with the same proﬁle described
for Table 2.1. The fact that a model linear in time was used for the accumulating cost is
noticed in the ﬁgure. After approximately 9 days. the survival function weights change
the linear trend and make the estimated mean cost different by comorbidity.

In conclusion. the presented model estimates the mean observed cost for hospital
stays of speciﬁed duration. The censored LOS observations were analyzed by standard
survival analysis methods. The indicator of non-censoring for cost was regarded as a
potential correlate of total cost. even though it was not statistically signiﬁcant. In this
way we do not use the assumption of independent cost censoring. which might not be
valid in many situations. When assessing mean costs over a given duration. the potential

cost accumulation C(t) through time t was modeled by a linear function of time. In the

absence of the cost histories of the patients. the model was ﬁtted using the observed total
cost and LOS in our sample. The overall mean cost was a weighted average of cost

increments. This approach is similar to that Gardiner er al. (1995. 1999)"' '5

previously
adopted in evaluating the cost-effectiveness of the implantable cardioverter deﬁbrillator.
In that study survival time was the underlying stochastic variable whose distribution was

estimated by the Kaplan-Meier method or Cox regression. Censoring occurred due to

incomplete follow-up of some patients. The cumulative cost C(t) was assumed known.

which gave the expected total cost over a ﬁxed time interval [0.t] as

£e'”S(u |Zo)dC(u) . where r is the discount rate and S the survival function. Our

96

proposed me
the stochaSIiC

short hosPita
costs over [0

110w interpret
underlying re
estimator is 11

properties do

proposed method extends this analysis in a very important practical way by incorporating
the stochastic element in costs. Ignoring discounting (which was irrelevant for relative

short hospital stays considered in this study) we constructed an estimator of the mean
costs over [0.t] as £S(u|ZO)dCo(u|Z0). where C0(t|ZO)= E(C(t)|Zo) and C(t) is

now interpreted as the potential cumulative cost up to time t. This estimator exploits the
underlying relationship between total hospital cost and LOS. While consistency of the
estimator is immediate. other distributional properties will follow as special cases of the

properties developed in the more general set-up of Chapter 3.

97

TABLE 2.1: Estimates of mean cost at duration times by comorbidity and

discharge status

 

LOS Mean Cost ($) Mean Cost ($)

 

CCI Status Median at at Overall
(days) LOS Median Follow-up
0-1 Alive 10 19.190 22.759
Dead 10 31.569 35,138
2-3 Alive 12 21.651 26.697
Dead 12 34.030 39.076
4 + Alive 13 23.020 29.272
Dead 13 35,399 41 .651

 

Estimates for patients with age 65 at admission, ejection fraction 50+. no
history of prior CABG. who underwent catheterization during their hospital
stay.

98

:1; nun/W E m 5

. . .. . . I 141 141‘ 4 111.4. <... 1.1414 < 114311141114 .11.?‘44.11.1111111V31‘1‘441“. 1
0

..a<< <
0
2

G1.
@0003 «000 was:

3131')

 

35

30

I
. - ......
'
’ U

25

20

Mean Cost [1000$)

5 .00 COMORBIDITY 4+
000 COMORBIDITY 2-3

*** COMORBIDITY 0-1

 

0

IIIIIIIIIIIIITfIII

 

I I I I I I I I I I I I T I I I j I I

1 5 9 13 17 21 25 29 33 37
IDS (days)

FIGURE 2.1: Estimated mean cost at duration times by comorbidity
(for survivors with age 65 at admission. ejection fraction 50+. no history
of prior CABG. who underwent catheterization during their hospital

stay)

99

In 511
periods of fr

example. in
count. A le

and 500. an
costs are in.
will depenc
variables 0
Mu
have bCCOI
disease p“
decision a
of Patients
degIEe 0f
bem'een t
rand0m_

Chiaj IS “I

CHAPTER 3
ESTIMATING MEDICAL COSTS IN

LONGITUDINAL STUDIES

In studies of the natural history of diseases and in medical interventions with long
periods of follow-up. clinical conditions deﬁne several health conditions or states. For

example. in the progression of HIV diseaseso‘52 stages can be deﬁned by the CD4 cell
count. A level below 200(x106 / L) triggers aggressive treatment and levels between 200

and 500. and above 500 have recognizable clinical interpretation. In these situations
costs are incurred in each state and at transitions between states through resource use that
will depend on treatment and patient attributes. Assessing the inﬂuence of these
variables on costs is one of the objectives of this chapter.

Multistate Markov models which have their theoretical origin in survival models
have become the standard for modeling health related outcomes. specially in studying
disease progression in patients.”’58 They are also framework for cost-effectiveness and
decision analyses (see p152-153. Gold et al. (1996)”). By describing the event histories
of patients as sojoums through different health states. Markov models provide a sufﬁcient
degree of ﬂexibility to model the probabilistic mechanisms that underlie the transitions
between these states. The sojourn times in the states and transitions between them are
random. The Markov assumption restricts their dependency on the past information and

entails the conditional independence of sojourn times given the states.

100

In Section 3.1 of this chapter we use a Markov model to describe the experience
of patients in sustaining and changing states of health. Two types of costs are considered:
costs incurred at transition between health states and costs of sojoums in a health state.
Then present values are computed by discounting all costs at a ﬁxed rate. In Section 3.2
we provide estimators of the mean present value of these two types of costs incurred over
a ﬁxed duration. The estimators are obtained conditional on an initial state and a given

covariate proﬁle. Large sample properties of these estimators are presented in Section

3.3.

3.1 Model Description

3.1.1 A Markov Model for Describing Patient Health Histories

Let ($2.? . P) a probability space and let {X (I). (6 T} with T = [0.1"] ,z'<oo . a
non-homogeneous continuous time Markov process with ﬁnite state space E = {1.2.....k} .
having transition probabilities P,,, (3.1) and transition intensities a,”- (t). This Markov
process describes the evolution of one patient’s health history. with X (t) the patient health

state occupied at time t. Typically E consists of several transient states. such as “well”.

’9 66

“ill”. “recovery . relapse”. and one or more absorbing states such as “disabled” or

“dead”.

Let a = (ahj).h. j e {1.2....,k} be the matrix of these transition intensities. where

a“, = -Z ah, . Thus. starting from the time of entry into state h. the sojourn times in the
jack

lOl

given state it are

processijPS 0‘

Let A..,(I

the integrated in‘
called the negati
A = (Arrh' je {

Let P(s.

P(s.t) =

is the kxk Iran
The inte
registering each

Jacobsen (1982'

Theorel
LCI A CC
Yam =|

Nina):

given state h are continuously distributed. with hazard rate function -a,,,,. Given that the

process jumps out of state h at time t, it jumps into state j ¢ h with probability ah, /—a,.,, .

Let Aug) = £a,.,(s)ds and A... 2’29"" For hat j the function 14,, is called
j¢

the integrated intensity function for transitions from state h to state j. whereas AM is

called the negative integrated intensity function for transitions out of state h. The matrix

A = (Ah,.h. j e {1.2.....k}) is also called the intensity measure of the Markov process X.

Let P(s.t) = Ha +104) for s < t.s.tE T. The matrix

(3.11
P(s.r) = (P,,, (s.t).h. je {1.2.....k})

is the kxk transition matrix of the Markov process.
The intensity measure reappears in the compensator of the counting process
registering each type of jump of the process. The next result was ﬁrst proved by

Jacobsen (1982)“:

Theorem (Theorem 11.6.8, p94. Andersen et al. (1993)”)
Let A correspond to the intensity measure of a Markov process X and

M =0'{X(s):s St}. Deﬁne
Yh(1)=[X(t‘) = hi.

N,.,(t)=#{sSt : X(s-) =h.X(s)= j}.h ¢ j.

102

Then N = (N ,.j.h #3 j) is a multivariate counting process and its compensator with
respect to (f; ) = (0(X (0)) v M) has components Ahj(t) = LY, (s)A,,j (ds).

Equivalently. the processes M h} deﬁned by M h, = Nb} — Phat/1,., are martingales. 0

Here 1’, (t) is the indicator that the process was in state h just prior to time t. In
our absolutely continuous case with transition intensities ah, and Ah, (I) = ﬁah, (s)ds . we

say that the multivariate counting process N has intensity 1 = (Ahj’h at j) . with
Let Z0 8 given ﬁxed vector of basic covariates. The p-dimensional vectors 2,},0

of type-speciﬁc covariates are computed from the vector Z0 . reﬂecting that some of these

basic covariates may affect the different transition intensities differently. We assume that

for an individual with given basic covariates Z0 and corresponding type-speciﬁc
covariates 21.10 for h :1: j . the transition intensity ah, (I | Z0) in the Markov process has

the form
ah] (t i 20) = “100(1) exp(ﬂr’lzhjo) .

where ,60 is the true parameter value and am, the intensity corresponding to 2,1,, = 0.
Let A010 (1) = ﬁah,o(s)ds be the integrated baseline intensity for transitions from

state h to state j and AM}, = -Z Ath'
jzth

103

Given a random sample of patient histories. we describe in Section 3.2.1 how

suitable estimates ,3 and @0043) can be obtained for [30 and Ah,o(t).

3.1.2 Incorporating Costs in the Markov Model

As previously mentioned. we consider two types of costs that might be incurred in
the course of follow-up: costs at transition between health states and costs of sojoums in

a particular health state.

Suppose an amount Ch,(t) is incurred just after time t if a transition h to j takes
place at time t. The present value of expenditures in (O,t] associated with these

transitions is Cg.)(r) = £e"‘C,.} (s)dN,.j (s) . where r is the discount rate. In economic

studies expenditures to be incurred in the future are discounted to present value. A dollar
spent now is worth more than a dollar that would be spent later. The discount rates used
for the US have usually been between 3% and 5% per year. reﬂecting the rates on savings

accounts or certiﬁcates of deposit.

Conditional on the initial state. given the vector Z0 of basic covariates with the

corresponding type-speciﬁc covariates 2th . the mean of this present value is:

MPV,fj')(t |1,zo) = E(c;}’(t) | x0 =1.2,.) = E( Le-“Chj (s)d1v,., (s) | Xo =1,20).

We assume that:

104

A.0.1 Ch}(.) are bounded. non-negative real stochastic processes over 7'. adapted to

(if) . left continuous with right hand limits (so Ch, (.) are bounded. predictable

processes).

A.0.2 E(Ch,(t)| X0 = i. X(t—) = h.Zo) = E(Ch,(t) | X(t—) = h.Zo) for t> 0. so the
expected transition cost at any r> 0 does not depend on the initial health state.

It is known that if N is a counting process with intensity process A . M = N - IA
and H is locally bounded and predictable. then M and IHdM are local square integrable
martingales. with E(M ) = E( IHdM) = 0 (see Proposition 11.4.1. p70. Andersen et al.
(1993)”). Then. by assumption (A.0.1).

MPV,fj”(t | 1.20) = E( Lame... ($)/1,,j(s)ds| X0 = .320) =
= E( ﬁe"sc,.,(s)y,. (s)a,.,o(s)ertp(19:,z,.,0 )ds 1 X0 = 1.20).

By Fubini’s Theorem:

MPV,}(}”(I | 1.20) = Ie"‘E(C,.,(s)1/,(s) | X(0) = i.Zo)ah,o(s)exp(ﬂ,',Z,y-o)ds.

We can write

E(C..}(s)Y,.(s)| X0 = 1.20) = 13(C,.,(s)|1r0 = i. X(s—) = h. ZO)P(X(s-) = h| x0 = 1.20).

By the assumption (A.0.2). MPV,}(}”(I | i,Z0) has the form

MPV,}(}"(t | i. 20) = ﬁle-"ch1- (s | low... (0.s | zo)a,,o(s)exp(ﬁ;.z,,}o)ds. (3.1)

where ch}(s | 20) = E(C,,(s)| X(s—) = 12.20).

105

Estimation of the transition probabilities is described in Section 3.2.2 and the

method of estimation of Chj(3 | Z0) is presented in Section 3.2.3.

We now turn to the cost of sojoums in a health state. Suppose that the cost in

state h is incurred at the rate 8,, (u) at time u. The observed rate is zero at time u
whenever. just before u. the patient is not in state h anymore. so [X (u—) = h] = 0. Then

the observed present value of all expenditures in state h. started at time s and ended after

the duration time d is given by
(2) +d _m
C, (s,d)= I: e B,(u)Y,,(u)du,

where r is the discount rate and Y, (u) = [X (u—) = h].

Conditional on the initial state. given the vector 20 of basic covariates. the mean
of this present value is

MPV,,‘2)(s,d | .320) = E(Cj2’(s.d)l X0 = i. Z.) =
= [w 9"“E(B. (“Wk (u) l X0 = i’20)“-

Conditions similar to (A01) and (A.0.2) are assumed for B,, (.):
A.0.3 B,. (.) are bounded. non-negative real stochastic processes over [0.1].
adapted to (if).
A.0.4 E(B,| (u)| X0 =1“, X(u-) = h.Zo) = E(Bh(u)| X(u—) = h.Zo) for all
u E [0. T] .
Denote b,.(u | Z0) = E(Bh (u) | X(u-) = h.Zo). We can write

E(B,(u)Y,.(u)| x0 = 1.20) = E(Bh(u)| X(u—) = h. X0 = i.Zo)P(X(u—) = h | X0 = 1.20).

106

By assumption (A.0.4):
(2) . ‘+d -ru
MPV, (s.d|r.Zo)=J: e b,,(u|ZO)P,,,(0.u|Z0)du. (3.2)

The method of estimation of b,, (. | 20) is presented in Section 3.2.4.

3.2 Estimation of the Mean Transition Cost
and Mean Sojoum Cost
3.2.1 Estimation of the Regression Parameters and

Integrated Baseline Intensities

Consider 12 individuals in the study. We assume that given the random vector
X0 = (X,0..... Xno) and the random processes Z(.) = (Z,(.).....Z,, (.)) . independent
Markov processes X,(.)..... Xn(.) are constructed. with X,(0) = X}o.1 S i S n. Each
process X ,(.) has the same description as that of X (.) previously presented in Section

3.1. For the i-th individual. Z,(t) is the p-dimensional vector of covariates measured at

time t and X ,0 the initial state.

A multivariate counting process N, = (Nhj,.h ¢ j) is deﬁned from X }(.) :
Nhj,(t) =#{0SsSt : X,(s-) =h.X,(s) = j} . h it j.
Let f0 = 0(X0}. flu) = 0'{Z(s).s St}. Mo) = a(N,.,,(s).s $1.11 a: j} and
1,70) = f° v f2 (t) v MU) . By Theorem 11.6.8, p94. Andersen et al. (1993)”.

N, = (Nh,,.h at j ) is a multivariate counting process with its .7,(t)- transition intensities

107

2,,,,(t) = a,,,(t)Y,,,(t) . where Y,, (t) = [ X, (t—) = h] and (1,,, is the transition intensity from
state h to state j for the Markov process X, (.).

We assume the transition intensities ah}, have the form

a..,.-(t..6o) = a.,o(t)exp(ffoZ.,-.(t)). IE 7.
where the type-speciﬁc covariate vector Z,,,,(t) is computed from the vector Z, (t) of

basic covariates for the i-th individual. This is the standard pmportional hazard model.

In the following we present the construction and thelarge sample properties of the

estimators ,8 and Ah,o(.. ,3) of the true value of the p-dimensional regression parameter

[3,, and of the integrated baseline intensity Ah,0(t) = £a,,,0(u)du . respectively. Most of

this is the development of Cox regression model from Andersen et al. (1993)”. The
stated main results will be needed for the proofs presented in Section 3.3 on the mean

present values.

Let (9”) .f‘"). Pm) denote the product probability space and we deﬁne the

ﬁltration f (t) = f0 v f2 (t) v {M (t)...../l/,,(t)} . This ﬁltration is the same as the one

generated by the covariate vectors and all n Markov processes.

By the conditional independence of X, (.) and by the product construction (see
Section 11.4.3. Andersen et al. (1993)”). the multivariate counting process

N =(N,,,,;ie {l,...,n}.h.je {1.....k}.h¢ j)
has the intensity process (Aggie {l,...,n} ,h.je {l,...,k} .h 1: j) with respect to the

combined ﬁltration (f(t).re 7').

108

Next. suppose the observation of N, = (Nh,,.h at j ) is ceased after some random
time U, > 0. We say the process N, is right censored at U, . Deﬁne the censoring

indicator process C, (t) = [U , 2 t] . the ﬁltration 9} (t) = .7370) v 0{C,(s).s S t} and the
right censored counting process N," = (szhh ¢ j ) . where N,‘,},(t) = £C,(s)dN,,,,(s) =

AU-
: I: 'thﬂ-(s). The censoring process C,(.) is g-predictable_

Assume that given Z,(.). U, is independent of X ,(.). Then N,(.) has the same

compensator both with respect to (£(t).te 7') and with respect to (g; (0.16 7') . This

is the assumption of independent right censoring. referred in this sequel as “independent

censoring”.

Each N,,, with h at j has the decomposition

Nap-(t) = A)”, (I) + thi (I).
where M h}, is a local square integrable martingale with respect to (£(t).te 7') . Then
N5}, (t) = LC,(s)dN,,,-,(s) = £C,(s)dA,,,,(s) + LC,(s)dM,y-,(s) =A;,,(r)+M;,‘,,(t).
By the predictability and boundedness of C, (.) . M ,f,, is a local square integrable
martingale with respect to (game 7'). Thus. under independent censoring. N,‘,',, has
the (9}(1).re 7') - compensator A2,,(t) = LC,(s)dA,,,,-(s). so Nf has the intensity

process A," ={A,f,,,h ¢ j}. where 2,2},(1) =a,,,,(t)Y,f, (t) and

109

Yh‘,(t) = C, (t)Y,,, (t) = [X ,(t—) = h.U, 2 1]. Therefore N,f,, has the same “individual
intensity” (1,,}, as the uncensored process and the proportional hazards assumption is
preserved for N ,f ,,. Also 13,”, is interpreted as the predictable indicator process for the i-th
individual being observed in state h just before time t.

Forn independent processes N,".1 S i S n. N" = (Nf.ie {l,...,n}) has intensity

process 11" = (Nf .i E {l,...n}) (see Section 11.4.3, Andersen et al. (1993)”).
From now on the superscript “c” will be dropped and although not explicit in the
notation. N h}, and Y,,, are derived from censored observations.
The following standard notations. similar to those of Chapter 1. will be used. For
h at j:
2,}... (1)” = z,,,(t)z,.,,(t)’, ifm = 2
= Zh,,(t) . ifm =1

=l.ifm=0;

Sig-”0.13) = Z Y,.,(r)Z,.,,(t)®"' exP(/9'Zr.,-.-(t)). me 10.1.21 ;

i=1
15., (r. 13) = $1.10. 131/313?) 0. ﬂ) ;

Vac. ﬂ) = Sif’tt. .6) I sif’tt. 13) — a.e. a)“;
10.13) = Z £V,,(u.ﬂ)d1v,,(u). with N,,, = ZN,,,;

hit} i=1

s1?’(aﬂ>= E[Y.. (02.1.0)” eater/72.40)]. me (0.1.21;

110

states.

A.1

A.2

A.3

A.4

A.5

A.6

A.7

4.0.19) = s1? 0. 0) I sf,” (2. ﬂ) ;
01.10 | 20) = Ahj0(t’B)exp(3IZth) .

Z(t, p) = Z £v,,(u. ﬂ) sgj?’(u. ﬂ)a,,,o(u)du.

haej
Consider the vector 1’, = (Yh,.he {l,...,k}) . where 1.....k label all the health

As in Chapter 1. similar assumptions will be adopted throughout this chapter:

Model Assumptions:
Conditional on Z,(.). U, is independent of X ,(.);
(N,(.).Y,(.).Z,(.)).1SiSn are i.i.d.;

For h¢jz

A,,0(2') = ﬁahjoomr < co;
Zh,,(.) are bounded;
Zh,,(.) are adapted. left continuous with right hand limits processes (so Z,,,,(.) are

predictable processes);
P(Yh,(t) = 1.Vre [0,1]) > 0;

by
E, = 2(1.,BO) is positive deﬁnite.

notation

The form of the partial likelihood is functionally the same as in the case of the

ordinary survival Cox proportional hazards model. Thus the log-partial likelihood

evaluated at time t (see p483, Andersen et al. (1993)”) is:

111

n It
€0.13) =2 2 £[,B'Z,,ﬂ(u)—log sjf’(r.,6)]d1v,,,(u).

i=1 h.j=l
he j

Since sgjlo. 13) is the vector of ﬁrst partial derivatives of 5:9)0. 13) with respect to 19.

the vector U (1, [3) of partial derivatives of C(t. ,8) with respect to ,B is

n k
U(t,ﬂ)=2 Z £[Zh,,(u)—Eh,(u.ﬂ)]dN,,,,(u).

i=1 h.j=l
h¢j

The maximum partial likelihood estimator B of 50 is deﬁned as the solution

of the likelihood equation U (1. 3) =0. For h at j we estimate A,,o(t) by the Nelson-

Aalen estimator

A J (u)
Aorta = 1W0».

n n
where N,,, =ZN,,,, . J,,(u) =[Y,,(u) >0]. Y, =ZY,,. We use the convention %=0. Let

i=1 [:1

01.1.0 (t, 3) = -Z 01.10 (t, ,3). Thus the matrix of integrated baseline intensities
jaeh

140(1) = (A,,,o(t).h. je{1.....k}) is estimated by 1100,19) = (Ahjo(t,3).h. je {1,....k}).

As we have also seen in Chapter 1. under our model assumptions A.l-A.7. the

following conditions necessary for the asymptotic properties of our estimators are

veriﬁed. We denote by H . II the supremum norm of a vector or a matrix.

112

Conditions C.a-C.f:

0
There exist a compact neighborhood 5’ of 130 . with .60 e .6’ (the interior of [5’ ).

and scalar. p-vector and px p matrix functions 3g». 3,? and 3,3). h at j. deﬁned on

A(.|Zo)=(A,,,(.|Z0).h.je {l,...,k})such that for me {0.1.2} and h.je {l,...,k}. hat j:

C.a sup
(l were. rlxb’"

 

 

$1.701 1?)— arm mil—>0

C.b sg-"K. .) are uniformly continuous bounded functions of (t. ﬂ)e [0.r]x B ;

C.c s,,)(.. .) is bounded away from zero;

C.d ...,”(r ﬂ)=—- s.?’(t3% 19) s‘j’rt ﬁ)=— a}, 313W 13):

C.c E, is positive deﬁnite;

C.f I: a,,o(r)dr < co.

Theorem 1 (see Theorem V11.2.l. p497. Andersen et al. (1993)”)

Under the assumptions A. 1. A.2 and conditions C.a-Cf. the probability that the

P
equation U (2'. ﬂ) = 0 has a unique solution 13 tends to one and ,3 —> ,60 as n -) oo . 1:1

The next theorem gives the asymptotic normality of B and an estimator of the

asymptotic covariance:

113

Theorem 2 (see Theorem v11.2.2. p498. Andersen et al. (1993)”)

Assume A.l. A.2. AA and C.a-Cf. Then n”2(/3 - [30) converges in distribution

to a zero mean normal p-dimensional random vector with covariance matrix 2;! and

_l P. . .. by _l P
n 1(t,B)-z(t.ﬁo)|]—>0. In particular 2. = n [(1.13)—92,0

notation

 

 

sup
te[o.r]

The next theorem provides a description of the asymptotic joint distribution of the
estimators A,,o(t..8).h.je {1..k} and B.
First we need to state some deﬁnitions. We denote by (M ) and [M] the

predictable and the optional variation process of a martingale M. respectively.

Definition (see p83, Andersen et al. (1993)”):

A continuous k-dimensional vector martingale M = (M (t). t e 7' ) . 7' = [0.1) .
re [2 is called Gaussian if:

i) (M ) = V . a continuous deterministic k xk positive semideﬁnite matrix valued

function on 7' . with positive deﬁnite increments. zero at time zero;

ii) M (t) -M (s) has a multivariate normal distribution with zero mean and
covariance matrix V(t) -V(s) and is independent of (M (u). u S s) . for all 0 S S St in

7.13

114

Deﬁnition

Two sequences of processes (X n,(.).n _>. 1) and (X "Z(.).n 21) are called
asymptotic independent if (X M. X n2)(.) converges weakly to a process (X ,. X 2)(.) with

X, (.) independent of X 2(-) .0

Recall that E = {l,...,k} denotes the state space of all the Markov processes. Let

E“ ={(h.j),h.je {l,...,k}.h ¢ j}.

Theorem 3 (see Theorem v0.2.3. p503. Andersen et al. (1993)”)

Assume A.l. A.2. A4 and C.a-C.f. Then n” 2(13 — ,Bo) and the processes

Wat.) = n"2(/1.,.(..B>—A..,-.(.))+amtﬁ-ﬂ.)’[,e.,-(u.ﬁ.)a.,-.(u)du . (h. De E‘are

independent. Let WM (.) = -2 Wk, (.).
jack

The limiting distribution of the k x k matrix-valued process

W(.) = (Wh, (.),h. je {l,...,k}) is that of a k xk matrix-valued process

(150 =(Ug,.,(.).h. je {l,...,k}). where 115., =—ZU;,, and {U;,.,(.).(h. j)e E‘} is a
jath

continuous Gaussian vector martingale. with

i) 05.,(01 = 0.

ii) (U;,,.Ugm,) =0 for (h. j) a: (m.r).(h. j).(m.r)e E‘.

t by a ‘ (u)
. = 2. = MO
111) (Uoh,>(t) (01., 0)me Emilu .0

115

Notes:

1) The sequence of vector processes W (.).(h. j)e Et weakly converges to
hr

{115,}. (.).(h. j)e E'} in D[0.r]"(""). the space of R"”"”-valued right-continuous with
left-hand limits functions on [0,T] . endowed with the Skorohod topology.

2) Relation ii) implies that the processes Ham-(.).(h. j)e E. are independent.

3) For s.tE [0. 1'] and (h. j)e E. we have that

Cov(U,;,,j (s).U3,,, (1)) = (of, (s A t).

where 0111:)“ is deﬁned in iii).l:l

3.2.2 Estimation of the Transition Probabilities

The matrix of transition probabilities
P(s.t|Zo)=(P,,,(s.t|Z0).h.je {l,...,k})
for individuals with given ﬁxed basic covariates 2,, and corresponding type-speciﬁc

covariates Z,,,o is deﬁned as the product integral P(s.r | 20) = ”(I + dA(u | Z0 )) . for
(3.11

s St . 3.16 [0,T] . where the matrix of integrated intensity functions
A(.|Z,,) =(A,,,(.| Z0).h.je {l,...,k})

has elements A,,,(t | 2,.) = A,,o(t)exp(ﬂ,',Z,,,-0) for h if j and A,,,,(. | 20) = -Z A,,(.| zo).
jath

For a review of the deﬁnition and properties of the product integration. see Section 11.6 of

Andersen et al. (1993).29

116

We consider the estimators A,,(t | Z0) = A,,,,(t. 3)exp(.3'Z,,,o) for h #3 j.

A,,,,(.|Zo)=—ZA,,,(.|ZO) and the matrix A(.|Z,,)=(A,,(.|Zo).h.je{l,...,k}). Then
jaeh

the matrix of transition probabilities P(s.t | 20) is estimated by the product integral

P(sJ | Z0) = H (I + dA(u | 20)) , this estimate being meaningful as long as
(8.1]

AA“, (u | Z0) 2 —1 on (s.t]. A jump process AX is deﬁned by AX(t) = X(t) - X(t—).

Next we state and prove the asymptotic properties of R(s.. | Z0) for a given

.___.—_.

 

Se [0. 2'). We provide the details of the proofs because some results and intermediate

steps. such as representations or expansions of certain entities. will be used in Section
3.3. Even though we follow the development offered by Andersen et al. (1993)29 on

p521-516. clear statements and details of needed results were not provided in any
reference.

Let (h. j)e E'. First we want to show that
n"2(A,,(. | 2,.) - A,,,(.| 20)) is asymptotically equivalent to x,';,,(.) + x;,,(.) . (3.3)
where X,",.,(t) =exp(ﬂ,’,Z,,,o)n'/2(B— poy I’m,0 —e,,,(u.,60))a,,,o(u)du and
X53, (t) = exp(ﬁgz,,,)vi/,, (t) .

The proof of (3.3) is very similar to the one of Theorem VII.2.3. Andersen et al.

(1993).29 The process Wh,(.) deﬁned as

Wk] (t) = n1/2 £._‘]L(_u)__thj(u)

3,230.19.)

117

is asymptotically equivalent to the process W,”- (.) deﬁned in Theorem 3. “Asymptotically

equivalence” means convergence in probability to zero of the supremum norm of the

difference.

We use the expansion:
n"2</i..<r I Z.) — 4.012.» = n"2<4.,a<r.B>exp</?’Z...> — Ahjo(‘)CXP(/30Zhjo)) =
= "“2 £J,,(s){exp(/3’Z,,,0)S,(,}”(s. 3)" - exp(ﬁgz,,o)s,§j”(s, )3,)"}d~,,(s) +
+exp(,13.’,z,.,.,)n”2 £1,(s){s,‘.j.”(s. ,6., )"d1v,,,(s) - a,,o(s)ds}+
+exp(,6’..z,,,.,)n“2 £(J,(s) —1)a,.,o(s)ds.
The third term above converges in probability to zero and the second term is

X 3,, (t). By Taylor expansion around [3,, . the ﬁrst term equals

’ O O -1
exp(ﬁ Z...)n"2(B—ﬁ.)' photon—15,06 1181.90.13) dN..(s).
with ,6‘ on the line segment between ,3 and .50- It can be shown that

i i -1 P
sup £1,(s)(Z,,,o—E,,,(u. ﬂ ))s,‘,j.”(s. 13) dN,,,(s)- Lam-emu. 130))a,.,..,(s)du —>0

te[0.r]

 

 

P x
for any ﬂ'aﬂo. Then the ﬁrst term of the expanded nl’2(A,,,(t|Zo)—A,,,(t|Zo)) is

asymptotically equivalent to X ,",,,(.) . so (3.3) follows.

Then n”2(A,,,,(.|Zo)—A,,,,(.|Zo))=—n”2(z A,,(.|zo)—ZA.,(.|ZO)) is

jack jack
by
asymptotically equivalent to -Z X,",,,(.) + —Z x;,,(.) =. 11,"... (.) + X3... (.) .
j¢h j$h notation

Let x;(.) = (x:,,(.).h. je {l,...,k}).me {1, 2}. Thus

118

n"2(A(. | 20) - A(. | 20)) is asymptotically equivalent to X," (.) + X 5’ (.) . (3.4)
Now we will show the uniform consistency of the estimator R(s.. | 20) to
P(s..|Zo). From (3.3) we have that for (h. j)e E‘, A.,(.| zo)- A,,(.|zo) is

asymptotically equivalent to B,’,',(.) = 3,7,, (.) + 83,, (.) = n’” 2 (X ,"h, (.) + X 3,, (.)) . We have

that

Blnhj (I) = exp(ﬂazthB " .60). E(Zhjo - 31,,(14. 130))a,,0(u)du 9

thj (I) = exp(ﬁazhjo)£—1h—(‘l£)—thj(u) .

sij’tur.)
By the boundedness of Z,,,,. eh,(..ﬂo). assumption A.3: Eamo(t)dt<oo and

P
consistency of B to [30. we have that sup |B,’},,(t)|—>0. By the Lenglart’s Inequality
te[0.r]

for local square integrable martingales (see p86. Andersen et al. (1993)”). for every

£————d’ 1‘“) M...(u)

Sif’(u.ﬁo)

 

 

 

71.6 > 0:
5 J (u) (.,,
P SL1 >7] S-+P l! S (14,3 )a, (106114)?)
[151031 ] 772 [£5330 (u’ﬂ0)2 h] 0 1110
n

P P
and n" J (u)———a. (u)du—>0. Thus sup 1 " -(r)|—>O.
L h S’s?)(urﬂo) hJO te[0.t] 82h]

P
Therefore we have proved sup |B,’,', (t) | —>0 . from which
te[0.r]

 

 

sup

A(t | Z0)— A(t | Z0)“:)0. where H . H is the supremum norm.
te[0.r]

119

Consider a ﬁxed 36 [0.1). By the continuity of the product integral (see
Appendix B), the previous uniform convergence implies the uniform consistency of the

estimator [5(3le0) to P(s,.|Zo):

sup

A P
P(s,t | Zo)-P(s,t | zo)||—>o.
te[0,r]

 

 

In the following we will describe the asymptotic distribution of
n"2(;’(s,.|Zo)—P(s,.|ZO)). In (3.4) we have seen that n"2(A(.|zo)-A(.|zo)) was
asymptotically equivalent to X1” (.) + X 3' (.) .

By Theorem 2, Mum—£0) converges in distribution to a zero mean normal
distributed p-dimensional random vector 6 , with covariance matrix 2?. Then

X {'(J converges weakly to a kxk matrix-valued process

UK. '20) = (Ufa-(- I Zo).h.je {1.--..k}). (3.5)

where Ufhja | 20) = {wk}. |zo), with
WI; 0 I 20) = exp(ﬂozhjo) i (live " ehj(u’ﬂ0))ahj0(uﬂu for h 9‘ j and

Wh33('|ZO) = ’ZWI;(- I 20)-
jath

The process Whj (.) is asymptotically equivalent to the process Whj (.) described in
Theorem 3. Then

X 3 (.) converges weakly to a k xk matrix-valued process

U;(. IZo) = (U5,.,-(. | Zo),h, je {l,...,k}), (3.6)

120

where U; (t | 20) = exp( 1352mm; (t) and the process U5(.) is deﬁned in the statement of
Theorem 3.

Theorem 3 also implies that X {‘(J and X 5' (.) are asymptotically independent.
Then, by (3.5) and (3.6), n1/2(A(. | 20) - A(. | 20)) converges weakly (in the Skorohod
sense) to U'(. |Zo) = UK. |Zo) +U;(. | 20) , where the processes Uf(. IZO) and U;(. | 20)
are independent, with continuous sample paths. Thus n”2(A(. | 20) — A(. I 20 )) converges

weakly to U '(. |Zo) in the supremum norm sense (see Appendix B). By the compact

differentiability of the product integral and the Functional Delta Method (see Appendix

3).

n"2(i’(s,. | 20) — P(s,. | 20 )) converges weakly to
U(s,.|Zo) = £P(s,u IZo)dU‘(u IZO)P(u,.|Zo) = U1(s,. IZO)+U2(S,. IZO), (3.7)
where U,(s,. | 20) and U2(s,.|ZO) are independent,

Um(s,.lzo) = j‘ P(s,u |zo)du;,(u |Zo)P(u,.|ZO), me {1,2}.

We can write

k
Um(s,z|zo),,j =22 £13,8(s,u|20)dumg,(u|zo)a,(u,t|20)+
g=ll¢g

I:
+2 [8... (an I Zo)(‘Z dU;,,<u l zonPgtua | 2,) =
8‘1 (#3

k
=22 [PawIZoHI’y-(wIZo)-P,,-(u.tlzo)}du;,,tuI20).
g=ll¢g

121

where the matrix-valued processes U:(.|ZO) and U;(. |Zo) are described in (3.5) and

(3.6).

3.2.3 Estimation of Mean Transition Cost

Consider all h to j transition expenses that occur in the follow-up period [0.1] .

We assume that each patient has at least one of these expenses. In the comments at the
end of this subsection we suggest two possible approaches for including in the analysis
patients without such expenses. For a single individual i of our sample we consider the

vector C,- = (C,,,...,C,-,,'_ )' of all h to j transition costs, where C,, (with IE {l,...,n,})

denotes the observed expense incurred by the i-th patient at the h to j transition time t“.

We will use a two-stage analysis that will ﬁnally result in a general linear mixed
model (see Section 3.3, Verbeke and Molenberghs (1997)“). The ﬁrst-stage regression
model is given by

C,- = 25,8,- + 8,, (3.8)
where 13,- is a qxl vector of regression parameters, Z, is a n, Xq matrix of covariates

and 8,. a vector of error terms with mean zero and covariance matrix 2,. , all for the i-th

subject. Typically the ﬁrst column of Z,- is a vector of ones for the intercept while the

rest of the columns are variables that vary within the subject.
We assume that every individual h to j transition cost proﬁle can be well
approximated by a polynomial function of time. For example, consider the cost quadratic

in time. In this case Z,- is a n,- x3 matrix with ones in the ﬁrst column, the time points

122

t”, 16 {l,...,ni} in the second column and the squared time points :3 , le {l,...,ni) in the
third column. Elementwise,

C,, = ,8” + 1323,, + ﬂytﬁ + a”, le {l,...,n, }, (39)
where ﬁt =(ﬂ1i’ﬁ2i’ﬂ3i)"

In the second-stage regression, [3, are regarded as n independent q-dimensional

random vectors, called random regression coefﬁcients. One goal is to investigate what

subject-level characteristics affect these regression coefﬁcients. Thus we assume:

ﬁr = 31.3 + bi . (3.10)
where B,- is a q x p matrix of subject-level covariates, ﬂ is a p-dimensional vector of
ﬁxed-effects regression coefﬁcients and b,- are i.i.d., with mean zero and covariance

matrix D.

Replacing in (3.8) the random regression coefﬁcients [3,- by (3.10) yields the
model

C,=X,-ﬂ+Z,-b,-+£i, (3.11)
where X ,- = 2,8, is a n,- x p matrix of covariates and all the other components are as
deﬁned before. The model (3.11) is a linear mixed model with ﬁxed effects [3 and
random effects bi. We follow the notations and the theory development from Verbeke

and Molenberghs (1997).6| Other references with reviews of the linear mixed model

theory are Verbeke and Molenberghs (2000)62 and Di ggle, Liang and Zeger (1994). 63

123

Note:

The notation ﬂ applied conforms to usage in the mixed model theory and does
not refer to the previous usage as regression parameter for transition intensities. C]

For our example we assume the following second-stage (3.10)-type regression

model:

ﬂu = ﬂth' + ﬂ2F2i +...+ ﬂan' +in
.521 = ﬂr-HFli + r+2F2i +--~+ ﬂ2rFri +b2r (3-12)
ﬁ3i = .Bszrt' + ﬂ2r+2F2i + + 163er + but

where the subject-level covariates, considered independent of time, might be age at

diagnosis, indicator variables for disease, severity group etc. Usually Fl,- =1 for the

intercept. In this section we want to estimate the expected h to j transition cost,

conditional on a given vector 20 of covariates. Without loss of generality we can assume

that the subject-level covariates in the (3.12) model are among the ones available in Z0.
Let .51; = (ﬁt.--..ﬂ,)’. 13mm = (ﬂr+l"°"ﬂ2r)” ﬂ2r+l,3r = (ﬂ2r+l"°"ﬂ3r)’ and

F} = (F1,,...,F,,-)’. Then we write (3.12) as

.31; = [31,; F} ‘1' bu

I32: = ﬂ;+1,2r F} ‘1' b2:
.53: = ﬂ2r+l,3r Fi + by-

Replacing (3.12) in (3.9), the resulting model for C“, l e {l,...,ni} is

ca = (161’; Fr ) + (mum Fr) ’u + (ﬂ2r+l,3r 5) ‘3 + bli + blitr'l '1' bar‘s + 511° (3-13)

124

Notes:
1) Cnaan et al. (1997)"4 provides useful guidelines for modeling X, and Z, . One
of these is that, in general, the columns of Z, should be a subset of the columns of X, .

In our example we assume that a quadratic curve adequately models the time dependence
on the mean response for each subject. We would not want to include a quadratic effect

in Z, and omit it in the mean modeled by X, , since this model would imply that each
subject had a quadratic curve but that the population curve was linear. The two-stage
approach has the advantage that Z, are necessarily a subset of X, and any within-subject
variables modeled in X, are also contained in Z,.

2) Linear mixed model literature is vast. Verbeke and Molenberghs (1997,
2000)62 and Diggle, Liang and Zeger (1994)63 provide reviews of the general theory of
mixed models and also guidance on its application in practice.

.3) Depending on the data, a simple transformation such as log or square root

might be applied to C,, to attenuate the effects of cost skewness. 0

The underlying assumptions of the linear mixed model (3.11) are that
b, ~ MVN(O,D), 8, ~ MVN(O,2,) and b,,...,b,,,€,,...,£,, are independent. We denote by
MVN(ij) a multivariate normal distribution with mean vector u and covariance
matrix 2. The matrix D is a qu covariance matrix and Z, is a n, xn, covariance
matrix that depends on the index i only through its dimension n, . Thus the unknown
parameters in 2, do not depend upon 1'.

The mean and the covariance matrix of the cost vector C, are:

125

E(C,) = X,ﬂ
by

notation

Conditional on the random effect b,, C, is normally distributed with mean vector

X ,6 + Z,b, and with covariance matrix 2,. Further b, ~ MVN (0, D). Let f (c, lb,) and
f (b,) be the corresponding density functions. The marginal density function of C, is
f(c,): j f(c, lb, ) f(b, )db, , which is the density function for MVN(X,,B,V,). Statistical

inference is based on these marginal distributions of the response variables C, . Let a

denote the vector of all variance and covariance parameters (called variance components)

found in V,. Assuming the cost vectors C, independent, the marginal likelihood function
is
I!
lemma) = I_I{<27r)"""2 Ma) r‘” exp(—.5(C.- - X.-ﬂ)’V.-“(a)(C.- - X.-ﬁ))}.
i=1
where I . | denotes the determinant of the matrix.

lFirst assume the parameter a known. Conditional on a, the maximum

likelihood estimator (MLE) of ,6 is given by

i=1

I! n
3mm) = (Z X£ii“<a>X.->‘ZX.-’V.-"(a>0.
i=1
and follows a multivariate normal distribution with mean 5 and covariance matrix

It

cov ( ﬁrms (0')) = (Z X ,’ V,"(a)X ,)' .65 Then the parameter a is estimated by its
i=1

maximum likelihood (ML) or restricted maximum likelihood (REML) estimator.

Linear mixed models often contain many ﬁxed effects and in such cases it might

by important for the variance component estimation to explicitly take into account the

126

loss of degrees of freedom involved in estimating the ﬁxed effects. This can be done via
restricted maximum likelihood estimation. The REML estimator for the variance

components a is obtained from maximizing the likelihood function of error contrasts
U = A’Y , where Y = (Y,’,...,Y,,’ )’ and A is a (n X(n - p)) full-rank matrix with columns
orthogonal to the columns of X, the matrix obtained from stacking the matrices X,

underneath each other. This likelihood can be written as:

n -|/2
ZXIW'WWr xLML(ﬂMLE(a),a) 9

i=1

LREML(a) = C

 

 

where C is a constant which does not depend on a, so the resulting REML estimator
does not depend on the error contrasts (i.e. on the choice of A). See p43-47, Verbeke and
Molenberghs (2000)62 or Diggle, Liang and Zeger (1994)"3 for reviews of the REML

results and comparisons between ML and REML estimators.

Let (2 denote the ML or REML estimator of a and V, the estimator of V,

obtained by replacing the variance components a from D and Z, by (2. We will then

estimate 31141.5(“) by

3 = (i Xfﬁ"Xr)'iX.-’Vf‘cr (3.14)

i=1 i=1

and cov(Bm.s(a)) by coming) = (Z ﬂit-"xiii

i=1
It follows from classical likelihood theory (see for example Chapter 9, Cox and
Hinkley (1990)“) that under some regularity conditions the REML estimator (2 is

consistent and its distribution can be well approximated by a normal distribution with

127

mean vector 6! and covariance matrix given by the inverse of the Fisher information
matrix.

Given (I, suppose there exists the asymptotic matrix

n-roo

n
2(a) = lim lZX,’V,"1(ar)X,. Then nl’2(BMLE(a)—ﬂ) converges weakly to a
n i=1

multivariate normal distribution MVN(O,2(a)").
In many situations (i.e. different covariance structures) the consistency of 6"!
implies the consistency of our estimator ,3 = [9,,” (d) (called the feasible generalized

least squares estimator in the econometrics literature) and also implies that ""209 - ﬂ)

ll2

and n (19,“; (a) — ,B) have the same asymptotic distribution MVN(O,Z(a)").

Amemiya (1985)67 (see p186-222 ) provides in detailed proofs of these results for several
models with economic interpretation: 1) serial correlation; 2) seemingly unrelated
regression models; 3) heteroscedasticity; 4) error components model and 5) random

coefﬁcients model. We assume that the above stated results hold in our situation. Thus

3:13 and

D
n“2(B — ﬂ)—>§,, where ;, ~ MVN(0,>:(a)"). (3.15)

Now we need to interpret obj-(s | 20) = E(Ch, (s) I X (s-) = h,Zo) and get a suitable

estimator. For ease of explanation we will use the quadratic time model described
previously as an example throughout this section.

Recall that for transitions of the h to j type we recorde (C,, ,l e {l,...,n, }) , the costs

incurred at the i-th individual’s transition times (t,, ,l e {l,...,n, }). Based on our example

128

model (3.13), the expected cost for the i—th individual with subject level covariates
F,,,...,F,,, at his/herh toj transition time 1,, is
E(Cil) = (ﬂlFli + ﬂzei + + ﬂrFr-i) + (ﬂrHFIi + r+2FZi + + ﬂ2an')til +
+(l32r+1F1i + ﬂ2r+2F2£ + + 133an)‘5 = (3-16)
= 51’; F1 + an,» 1:} Iit + 133mm Ft ‘5-
In practice, after the covariate selection procedure, the model (3.13) might be reduced so
some of the 3r regression coefﬁcients in (3.16) might be zero.

For an individual with given subject covariates Z0, ch, (3 I 20) is the expected cost

of the h to j transition time 5. Thus we assume
t , t at 2
chj(s I 20) : ﬁlm 20 + r+l,2r 20 S + grﬂﬁr 20 S ’

where 2,; is the r-dimensional covariate vector obtained by substituting F,,,...,F,, by
their correspondent covariates in Z0.

Replacing the ﬁxed effect regression parameter vector )9 = (13,}, ﬂap/33“,) by
its estimator given in (3.14), we estimate ch,(s I Z0) by

51:1(3120) = Bl’r Z6 + A;+1,2r 26 S + 135mm 26 32- (3-17)

I

,32 (Z3) ) , so we can write Eh,(s|Z,,) = 375,0).

I

Denote zg,(s)=((zg) ,s(z,;)
The asymptotic properties (3.15) of the estimator [9 imply that

P
sup 1541(3' Z0)—c,,,(s | Zo)|—>0 and

.sE[0.T]

D D ,
n"2(5,,,(.| 20) -c,,,(. | 20))—->c,,,0(. | 20), where c,,o(s | zo)={,’zo,(s). (3.18)

129

Therefore the process ch,0(. | 20) is Gaussian, with mean zero and
cov(c,,,o(s | Z0),c,,,o(t | 20)) = zg,(s)’2(a)“zg,(r).

The matrix 2(a)1 is consistently estimated by 2(62)".

Comments
Suppose there are some patients that do not incur any h to j transition costs.

One possible approach for this situation is the use of a two-part mo®1.6'68'69 For
each individual i in the sample we observe the binary variable 60, that indicates if the
subject incurred any h to j transition expenses. Then we assume that P(6c, =1) = 7t,(a)

is governed by a parametric binary probability model (part one) and we consider the
mixed-effects model (part two)

C, = X,,6+Z,b, +s,,
where all the assumed error and random effects distributions are conditional on the

realization of the event (6,, = 1) . Then
E(C,) = P(6c, =1)E(C, I6}, =1)= 7r,(a)X,ﬂ.
The component 7t,(a) can be speciﬁed through either a logit model:

exp(a/Z,)

ora robit model It. a =<D a/Z- , where Z- is a set of covariates
l+exp(a/Z,) p '( ) ( ') '

 

7r,(a) =

and <1) the standard normal distribution function. The parameter a is estimated using all

the data. Then one could use the available costs C, to estimate the parameters [3 in the

linear mixed models.

A second possible approach is to consider for the individual i all transition

expenses C, = (C,l Cm, )' that occur in the follow-up period [0,1] and not only the h to

130

j transition costs. We denote by C,, the observed expense incurred by the i-th patient at
time t,,. Then, in the ﬁrst-stage model (3.8), the matrix 2, of covariates includes also

classiﬁcation variables for the origination and destination states of the transitions. For

example,

H H
C11: 131i '1‘ 2“,, ﬂ§i[X(’iI—) = a] '1' 25:, ﬂglxau) = 51+ ﬂutil ‘1' £5.43 '1’ 5i: . ’5 {l,...,n, } .
where 3,, denotes the originating health state and d,, the destination state. After the

second-stage regression, the resulting mean transition costs have the form

E(Cu)=ﬂi'F.- +Zﬁ;i(ﬂi)'F.-1X<ri->=a1+2f,;',(ﬂii) E[X(trt)=5l+ﬂ21‘} t, +1921”:- :5.

le {l,...,n, }. Hence we assume

c,,(s | 20) = 3,25 + (13;) z; +(ﬂ3j) z; + 1932,} s + ﬁgzg s2
and we estimate it by replacing the ﬁxed-effect regression parameter vector ,6 with its

estimator given in (3.14).

This strategy of utilizing the entire sample to estimate simultaneously all

(Cry-(- | Z0),h ¢ j) has the advantage of drawing strength from other parts of the data set,

when some patients do not have observed transitions of a speciﬁc type. Its limitation is
that, when considering all types of transition costs, it might be difﬁcult to distinguish any

pattern in time that approximates the individual transition cost proﬁles. :1

131

3.2.4 Estimation of Mean Sojourn Rate

For a given ﬁxed covariate vector Z0 , we deﬁned in Section 3.1.2

b,,(u | 20) = E(Bh (u) | X(u—) = h,Zo)
as the expected expense rate at time u for sojoums in state h. This quantity is never truly
observed unless we have a very ﬁne time scale on which the accumulating cost history is
observed. For example, daily or weekly costs incurred while sojourning in a state might
provide in same applications an adequate representation of the rate of expenditures. We
assume we do not have a detailed cost history and observation is restricted to the total
expenditures for each sojourn, together with the time of entry and duration of the

sojoums. The cumulative expected expense of a sojourn in state h with entry time s, after
. +d . .
duration d is C,,(s,d) = r b,, (u | Zo)du . We assume the rate of accumulating costs in a
S

sojourn does not depend on the entry time in that sojourn.

For the i-th individual, all observed total costs incurred in sojoums in state h are
collected into a single vector C, = (C,,,...,C,,,,)’. For 16 {l,...,n,} , C,, is the total cost (up
to transition in another state or up to censoring) of the l-th sojourn in state h that had
entry time 3,, and duration d,,. We assume the vector C, contains at least one cost
value, for all i. The comments from the end of the previous section apply for the
situation when this assumption does not hold.

Our approach is similar to our use of linear mixed models to derive estimates of

the expected transition costs. We use the same two-stage analysis and we also make the

132

assumption that the individual sojourn cumulative proﬁle can be well approximated by a
polynomial function of the duration of the sojourn.
As an example we consider again a quadratic curve model. For the ﬁrst stage

model, let

Cu = ﬂu + ﬂzili‘i '1' 33:31! + ﬁttidil + ﬁsrdi? + 5i! . (3-19)
where 1,? is the indicator that the l-th sojourn in state h of the i-th individual is
completely observed. The second stage model is similar to (3.12):

ﬁle = ak-l)r+l,krl:i + bki’ k E {1,23,4’51- (320)
We use the notations from the previous section. Let 3 = (3,9, 3;+,.2,,..., 3;,+,,5,)’ ,
b, = (b,,,...,b5,)’.

Replacing (3.20) in (3.19) we obtain the following ﬁnal model:

_ I I c I I 2
Ci! - ramp} + 16r+i,2rFi lil + ﬂ2r+l,3rFi Si! ‘1’ .33me dil + ﬂ4r+l,5rFi du +

+ bli + 52:15 ‘1' b31311 + b4idil + by": + gil‘

(3.21)

Note:
In practice, after the model selection procedure, some of the regression

coefﬁcients might be zero. 0
Based on the model (3.21), an estimator 3 analogous to (3.14) can be calculated.

For an individual i with an observed l-th sojourn in state h with entry time s,, and

duration d,, , the expected cost is

I I I I I 2
E(C,)) = ﬂu}? + ﬁr+l,2rFi + t62r+l,3rFiSil + ﬂ3r+l,4rFidil '1' ﬂ4r+l.5rFidil

= E(n’.a’.s.ia’.din’.diﬁ’)

133

Accordingly, for an individual with given subject covariates Z0, we assume that

I

mm) = st[(zg)’,(zg)'.r(za)'.d(251.612(25),)

and we estimate C,,(s,d) by

Ct<s.d>=ﬁ’((25) .(za) .siza) 2125) #21251) .
where, as deﬁned in the previous section, Z5 is the r-dimensional covariate vector
obtained on substituting the elements of F, by their corresponding covariates in Z0.
The rate b,,(. | Z0) was assumed not to depend on the entry time 3. Therefore we

estimate it by

13,, (u | 20) =a—Z-é,(o,d)|d=, = 32520;), (3.22)

I

where 25204) = [0’ o’ o’ (25) u(z{,) ] .
As described in the previous section, under some regularity conditions

A P
b,,(ulZo)—-b,,(u|Zo)l—>0 and

 

sup
ue[0.r]

. D D , .
n"2 (b,,(. | Zo)-b,,(. | 20))—>b,,0(.| zo) , with b,,o(u |20)={,zo,(u), (3.23)
D n
where n'/2(3 - 3)—->{2, ;, ~ MVN(0,2(a)") and 2(a) = lim l2X,5v,“(a)x,. We
""°° n i=1
use the same notations from the previous section. This will not create confusion because
we derive asymptotic properties separately for the mean transition cost and the mean

sojourn rate. The process b,,0(. | Z0) is Gaussian, with zero mean and

134

cov(b,,0(u IZO),b,,o(w | 20)) = 2,3,(u)’2(a)'l 252m) .

3.3 Large Sample Properties of the Mean Cost Estimators

3.3.1 Uniform Consistency of the Mean Cost Estimators

By (3.1), conditional on the initial state i, given the vector Z0 of basic covariates,

the mean present value of all expenditures associated with the h to j transitions in (0,t] is
MPV,,‘,"(: |i,Z0) = £y,,,(s)dA,,(s | 20),

where 7111i“) = e—rschj(s IZO)})M(O’S|ZO) and AMSIZO) = Eahjo(u)¢XP(/3’ozhjo)du is

the integrated intensity function of a h to j transition. We estimate this quantity by
Mﬁv,f,”(t |i,Z0) = £7,,,(s)dji,,,(s 120) ,

where y,,,(s) = e"‘6,,,(s |z,,)13,,,(0,s|zo) and 21,,(s |z,,) = A,,o(s,3)exp(,8’z,,,o). See

Sections 3.2.1, 3.2.2 and 3.2.2 for the deﬁnitions of the estimators Ah, (. | 20) , 13,,l (0,. | Z0)

and 8,,,(.|Zo).

We will prove the uniform consistency of the mean transition cost estimator, that

 

is
A l l P

sup MPV,§,’(: |i,Zo)—MPV,,(,)(t|i,Zo)|—)0. (3.24)
te[0,r]
We ﬁrst prove

P

sup thjiU) " 71w“) “90 - (3.25)

te[0,r]

135

By the deﬁnition of 7h,,(.) and y,,,(.),

5,,(r |zo)ﬁ,,,(o,t |zo) —c,,,(z |Z0)P,,,(O,t |zo)| s

 

3UP thjiU) “ thi(t)l S SUP
te[0.r] tE[0.fl

 

S sup 151210 |Zo)| sup
re[O,t] re[0,r]

é,(o,t|zo)— P,,,(0,t|ZO)|+ sup lamb IZo)-c,,,(t|Zo)|.
re[0,r]

P
By (3.18), sup |5,,,(r | 20) — c,,(t |20)|—>o and by the assumption A.0.l, c,,(. | 20) is
te[0.r]

A P
Pihmvt 120)“ Pmm.’ IZO)I->0.

 

bounded. We have shown in Section 3.2.2 that sup
re[0.r]

Consequently (3.25) follows.

For :6 [0,1]:

|Mﬁv,f,”(r |i,Zo) — MPV,,‘,”(: |i,Zo)l s 1:171:11“)— n,,(s)ld?\).,-(s I 20) +

 

+

 

£71m“) ((#1th I 20) ‘ (1"th 120))

,S0

 

sup

MﬁVg’o |i,Z,,) — MPV,§,”(t Ii, zo)| s sup |7,,,,(s) - y,,.-(s)|3,.,-(r | 20) +
re[0,t] £10.71

+ sup
te[0,r]

 

 

(rs-(swineIzoi-dAijmzoi)

.. P
By (3.25) and the fact Ah,(r | Zo)—~>A,,,(r | 20) < oo , the ﬁrst term of the right hand

side of the above inequality converges to zero in probability.

By our model assumptions y,,,(.) is bounded on [0,T]. Then, as in the proof of

(3.3), we can show that gyh,,(s)(dA,,,-(s | 20) — dAh,(s |Zo)) is asymptotically equivalent

to

136

J (s)
B.,-.-<r> = cxptﬁizn.) granawijts) +

+ (3 — ﬂoi’ f, yin-mam. — s,,-(s. ﬂo))ah,o(8)d8-
Using Lenglart’s Inequality for square integrable martingales, consistency of 3 to 30,

Eah,0(s)ds < co and our model boundedness assumptions, one can prove that

P
—-)0 and (3.24)

sup |B,,,(t)|fio. Thus sup £y,,,(s)(dri,,,(s|zo)—dA,,(s|zo))

te[0,r] 1610.1]

 

 

follows.

Next we will show the uniform consistency of the mean sojourn cost estimator.

By (3.2), conditional on the initial state i, given the vector Z0 of basic covariates, the

mean present value of expenditures after duration time d for the sojoums in state h with

entry time s is

3+
5

(2) - d -m
MPV, (S,d|l,Zo)=Iy e b,,(ulZo)P,,,(0,u|Z0)du.

We estimate this quantity by

'+
S

A d a A
MPV,,(2)(s,d |i,Z,,) = ]“ e'mb,(u |Z0)P,,,(0,u |Zo)du,

where the estimators P,,,(0,. |Zo) and 3,,(. | Z0) are deﬁned in Sections 3.2.2 and 3.2.4,
respectively.
Using an argument similar to the one used to show (3.25), it can be shown that for

every duration d such that 0 < s + d S T:

sup Mﬁv;2’(s,a |i,Z,,) — MPV,,(2)(s,a |i,Zo)| s
aE[0.d]

6,,(u | zo)ﬁ,.,(o,u |zo) —b,,(u |Z0)P,,,(0,u |zo)| = 0,0).

 

S constant x sup
u€[s,s+d]

137

Therefore the uniform consistency of the mean sojourn cost estimator holds:

A P
sup MPV,,(2)(s,a | i, 20) — MPV,,(2)(s,a |i,Zo)l—>0
aE[0,d]

for all dsuch that 0<s+d Sr.
3.3.2 Asymptotic Distribution of the Mean Transition Cost

In this section we will assess the asymptotic normality of
n“2 (MﬁVh‘P (t |i,Zo) — MPV,,(,”(t | i, Zo)) using the Functional Delta Method. Appendix

B provides a statement of this method, the concept of Hadamard differentiability and
other related results that we will use in this section.

Fix the time t. Consider the functional

to. : E —> R. «Ma y. z) = (xtsntsidzts).
where E is a subset of D[O,1’]3 such that (a, is well deﬁned. Notice we can write
MPV,,(,”(t |i,Zo) as

we. yozoi = ate-"cin- l Zora-imp I Zia/int. 120»-

If (p, has an extension to D[O,1’]3 that is Hadamard differentiable in (x0, yo,zo) then,

under some extra-conditions, we can apply the Functional Delta Method to obtain our
desired result.

First we recall our convergence results from the previous sections.

138

From (3.3), (3.5) and (3.6), we obtained in Section 3.2.2 that for (h, j)e E‘,
n”2(r§,,,(. | 20) — Ah,(. I 20)) converged weakly in D[0,z'] (in the Skorohod sense) to the
process U;,(. IZO) = Ufh,(.|Z0)+U;,,,-(.|Zo), where Ufh,(. IZO) and U;,,(. |Zo) were
independent. We showed that

o D , e
Uihj(tlzo)=§ WhjUIZo). (3.26)

where 6 ~ MVN (0,2;1) , the matrix 22, being deﬁned in assumption A.7, Section 3.2.1,

and Wh}(t | Z0) = exp(3,’,Z,,,o) £(Zhjo - eh,(u, 30))a,,,0(u)du . The other process is deﬁned

as

U§,,,(t | Zo):exp(3,',Z,,j-0)U5,,,(t) , (3.27)
where the matrix-valued process U5(.) is described in Theorem 3, Section 3.2.1.

By (3.7), n”2(13,,,(0,. lZo) — 13,,(0,. 120)) converges weakly to the process
U,,,(0,.|Zo) = U,,,,(0,. |Zo)+U2,,,(0,. |Zo), where U,,,,(0,. IZO) and U2,,,(0,.|Zo) are

independent and for m 6 {1,2}

It s .
U,,,,,(0,s | 20) = Z Z LP},(0,u IZO){P,,,(u,s | 20)- P,,,(u,s|zo)}du,,,,(u | 20).
g=ll¢g

Using (3.26) and (3.27), we write
D I
U,,,,(0,s|Zo)=§ F,,,(0,s|Zo), (3.28)
(12,},(0, S I 20) = exp(ﬂézhjo)x
(3.29)

k
x22 Eggmm[20){15,,(u,s|zo)-Pg,,(u,s|zo)}dugg,(u)’
g=ll¢g

139

k
where F,,,(0,s|Zo) = 22 £34044 |z,){P,,(u,s|z,)—P,,(u,s|zo)}dw,‘,(u |zo).

g=ll$g
By (3.18), n”2(6,,,(. IZO) — ch,(. I la» converges weakly to the Gaussian process

C,,,0(. | Z0) described in Section 3.2.3:
c,,,,(s | 2,) = §,’zg,(s), (3.30)

where 4, ~ MVN(O,2(a)" ), 2(a) = lim 12x31," (a)X,.
’H” n i=1

Next we prove the following lemma. We denote by fldzl the total variation of

the function z.

Lemma

Let E ={(x,y,z)€ D[0,2']3: [:le IS C}, where 0< C <00. For a ﬁxed time

:6 (0,1'] we deﬁne (p, : E—> R by (p,(x,y,z) = Exydz.

Let (x0, yo,z,,) be a ﬁxed point ofE such that El d(xoyo) |< co. Then (p, can be

extended to the space D[0,r]3 so as to be Hadamard differentiable at (x0, yo, zo) , with

derivative

d¢,(x0,yo,zo).(h,k,l)= Ehyodzo+ Exokdzo-I- Exoyodl, (3.31)

where the integral with respect to l is deﬁned by the integration by parts formula if! is

not of ﬁnite variation.u

140

Notes:

1) We interpret integration from 0 to 2' as being over the interval (0,2'].

2) The integration by parts formula gives

onyodl = xo(t)yo(t)l(t) - xo<0)yo(0)z(0) — j Ldtxoyo).
(0.11

3) The extension assessed in this Lemma is not necessarily unique and the

differentiability is shown only for the ﬁxed point (x0, yo, zo) .0

Proof Lemma:

Obviously the hypothesized derivative d¢,(xo, yo,z0) is a linear map. We will

show that d¢,(xo, yo, 20) is also continuous. Let (h,k,l)e D[0,r]3 be a ﬁxed arbitrary

point. Consider sequences 1,, e R” h k,,,l,, e D[0,r] that satisfy t,, —> 0, h,, -;>h,

9 n,

k,, —->k , 1,, $1 , where I] . II is the supremum norm, and deﬁne

x" = xO+tnhn
ya = yo +tnkn

z. = Zo +1.1.
and suppose (x,, , y,,, 2,, ) e E for each n. We need to prove that

d(p,(xo, yo,zo).(h,,,k,,,l,,) —) d¢,(x0, yo, 20).(h,k,l) as n —) oo. (3.32)

The sequence d¢,(xo, yo, zo).(h,,,k,,,l,,) has the form

d¢l(x0’ yO’ZO)'(hn'kn’ln) = EhnYOdZo + L'kandZO +‘ngyOd n ’

where the integral with respect to 1,, is deﬁned by integration by parts formula.

141

Slhn-hIIXIIyollxﬂleoL

 

lﬁhnyodZo - Lhytidztil = I £01. - h)yod20
By the hypothesis of the Lemma we prove Eldzol S. C and we also have "You < co

because yo 6 D[0,1']. Then the convergence h,, —r>h implies that
lghnYOdZo - ghyodzo'e 0 as n —-) co.
Similarly
lgxokndzo- gxokdzol—ioas n —>oo.
By the integration by parts formula

S Ixo(t>|><|>’o<t)l><lllr - lll+lxti(0)|><|yo(0)|><||la -l||+

 

ngohdln — £x0y0dl
+ ||l,,_ -—l-||x £|d(xoyo)|.

But "1,, —1||=i|1,,_ -1_||, so

S III. -zu(2nxou><1au+ 1:: Mayor)-

 

I (wait. - (and!

Because £|d(xoyo)| < ..., 1,, U1 and “x0","y0" < ..., the right hand side of the previous
inequality converges to zero as n —-> oo , so

——)0 as n—>oo.

(and. - (and:

 

 

We proved (3.32) that implies the continuity of the mapping d¢,(xo, yo, 20).
Next we apply the Lemma stated in Appendix B. In its context B, and 82 are

normed vector spaces, endowed with a — algebras 4 , 5’2 , respectively, where

142

6}, c, 6} g: 6:, , i = 1,2. The a — algebras 6}, ,6}, are generated by the open balls and

the open sets of B, , respectively. In our case 82 = R , 52’ = 52' and B, = D[0,T]3 is
endowed with the open ball topology.

According to the Appendix B Lemma, if
t," {(p,(x,,, y,,z,, ) - (0,(x0, yo, zo)}- dtp,(xo, yo, zo).(h,k,l) —> o as n —> oo (3.33)
then (a, can be extended to D[O,1t]3 in such a way that it is differentiable at (x0, yo, zo) ,

with derivative dqp,(xo, yo, zo).(h,k,l) as deﬁned in (3.31), so our Lemma holds.

Therefore we will prove (3.33).
By the continuity of d(p,(xo, yo, zo) , for (3.33) it is sufﬁcient to prove that

by
Sn = tit-l {¢t(xn’ Yntzn)—¢t(x0’ y0’20)}_d¢t(x0’ yO’z0)'(hn’kn’ln) —) 0

notation

as n —) co. This sequence is equal to

t; 1 £99. yndzn ‘ tn. 1 £XOYOdZo ‘ ﬁhn Yodzo ' Exokndzo ‘ ﬁxOYOdln -

We expand

r;‘ Exnyndzn = [,(tg‘xo + h,,)(y0 +t,,k,,)d(zo +t,,l,,) =

= r;‘ (xoyodzo + (xoyodl. + ﬁxokndzo + r. Law. + (mode + r. (11de. +
+ r, Lhnkndzo + t3 ﬁhnkndl,

Thus

5,, = r, Exokndln +t,, [)hmdz, +r, (lama, + 2,? Lhnkndl, =

143

= £x.k.d<z. - zo)+ (mode. -zo)+ (a. -xo>k.dzo+ (no. - yoidtz. -zo> =
= nl + Tn2 + Tn3 + Tn4'

We will prove that T,,, —> 0 as n —-) 0° for all ie {1,2,3,4}.

We start with T,,, = ﬁxokndkn — zo).

suxouxuk, —k"x(,;|az, (+£3.12, |)52Cuxo||x||k,, -k||—>o.

 

lgxou‘n _ k)d(Z,, — 20)

ll
because k,, —->k and "x0” < 0°. The fact El dz, |< C follows from our assumption that
(x,,, y,,, z,,)e E . Thus it is sufﬁcient to show
gxokd(z,-zo)—+o as n—)°°. (3.34)

Let f0 = xok . Then fo is an element of D[O,1] , so for every 8 > 0 there exists
f6 6 D[0,z'] such that f6 is a step function with a ﬁnite number (say N) of jumps and

“f0 ‘ f0,“ 5 8. We have

I E(fo - f6)d(z. - ml 5 |lfo - f6||( [:Idz, |+ Eldzo |) s 2C8. ’ (3.35)
By partial integration
lfifédun " Z0) 5 leféllxllzn - Zo||+||z,. — zollx L'Idfgl.

 

The mapping f6 is a step function with N jump points, say s,,...,sN , so

 

 

N
ﬂldfalsZlfats.>-f5(s.-)|sz~||f5
i=1

144

Thus

 

I (fo'dtz. -zo)sth+1)||fo'll><llz. —zoll (3.36)
and this quantity converges to zero because II fo 'II < co and
“2,, - 20" = t,, "1,," S t,,(IIl, — [II + IIIII) —) 0 as n —> oo.

Notice that similarly IIx,, - x0" —> 0 and II y,, — y,,II —> O as n —> 00.

By (3.35) and (3.36) we obtain limsup

"-909

S 2C8. Since 6 was

 

 

£f0d(zn -Zo)

arbitrary chosen, (3.34) is proved, so the convergence T,,, —> 0 follows.

The proof of T,,2 —9 0 is identical.
For T,,3 = £09: - xo)k,,dzo we have that
lTnsl S llxn -xoll><||kn||>< (Idle |-

Because IIx,, — x0 II —> 0 , IIk,, II S IIk,, — k" + "k" < co and £| dzo |< C, the convergence

T,,3 —-> 0 is an immediate consequence.
For T,,,, = £h,,(y,, — yo)d(z,, - Zo) .

IT,,4ISIIh,,IIxIIy,, —on|x(£|dz,, |+ £|dzoD52CIIhnI|xIIyn —y0II—->0 as n—ioo.

This completes the proof of Lemma. I

145

Recall that in Section 3.2.3 we considered C,’ = (C,,,...,C )’ to denote the vector

in,-
of all h to j transition costs related to the i-th subject. The cost vectors C, ,...,C,, are
assumed to be independent.

For technical reasons the following extra-assumptions are considered:

EA.1 C,,,(. IZO) is of ﬁnite variation over [0,1]. We write EIdcm-(s |Z°)I < co.

EA.2 The cost vectors C,,...,C,, are independent of (N,,Y,,Z,),lSi Sn.
Comments:

1) In the proof of the next theorem we will need

[flaw-“ch,“ | Z0)P,,,(0,s | zo))| < oo. (3.37)

’8'

The function s —> e’ is monotone, so of bounded variation on [0,1] . The

matrix-valued process P(0,. IZO) is (componentwise) right continuous with left hand

limits and of bounded variation (see Theorem 11.6.1, p90, Andersen et al. (1993)”). By

assumption EA. 1, c,,,-(. | 20) is of ﬁnite variation over [0,1] . A product of ﬁnite

variation functions is also a function of ﬁnite variation, so (3.37) follows.

2) The estimator Eh,(. I Z0) of c,,,(. | 20) was obtained from the cost vectors
C, ,..., C,,. The estimators AM. | Z0), 13,, (0,. |Zo) were calculated from

(N,,Y,,Z,),l S i S n . By assumption EA.2, we can consider that 6h,(. I 20) is independent

of (A,,(.IZO),B,,(0,.|Z0)) . This implies that

146

III/2 (Em-L IZO) - c,,,(. | 20)) is asymptotically independent of

"WI “A,,,(.|ZO)-A,,,(.|Zo) In (3.38)
Pn(0r-|Zo)-Iia(0v IZo)
Theorem 4

Under the assumptions A.0-A.7 and the extra-assumptions EA.1 and EA.2, for a

ﬁxed time t:
111/2 (Mﬁv,f,"(t | i, 20) - MPV,,‘,-"(t |i.Zo)) =
= "Wm, (e-r-c,,(. | 20113.10. 12.1411. 120)) -
_ a, (e-r-c,,(. 1201340.. 12014.3. I2.))1—Da
f, d,,, (ﬂew, , 20,, p,,(o,. lZo). A,,(, I 20)).(e-'°c,,o(. | Z0).U,-,.(0,. Izo).U,‘,(. Izo)) =
= fie-"motsIzoiPato.sIzo>dAn(slzo)+16"’chr<S|ZoX/w<°tsIZOWW'ZW

‘ by
+ £e""c,,,(s|Zo)P,,,(0,s|ZO)dU,,,(s|Zo) =, P(t).l:1
”010110"

Proof Theorem 4:

The mapping qr, : E —> R is deﬁned by tp,(x, y, z) = £x(s)y(s) dz(s) , where

E = {(x.y.z)e D[0.1]3: £le [5 C} and C = Ah,(1|Zo)+l<oo by assumption A.3.

147

Let x0(s) = (“C,,-(s I 20), y,,(s) = 8,, (0,3 I 20), zo(s) = Ah,(s IZo)- All
x0, y,,, zo e D[O,1] and
[lam-(s I Zo)I = exp(ﬂazini EI‘thoU)I = exp(ﬂazinmnm = 4.11 I 2,) < C .
Therefore (x0, yo,z0)e E. By (3.37), we have that E|d(xoyo)I < oo.

The previous Lemma implies that o, can be extended to D[O,1]3 so as to be

Hadade differentiable at (x0, yo, zo) , with derivative

do (xo.yo.zo).(h.k.z> = [hyodzo + [xokdzo + [xoyodu

where the integral with respect to l is deﬁned by the integration by parts formula if I is

not of ﬁnite variation. Denote by (pf the extension of (a, to D[O,1]3.
Deﬁne 55,,(s) = e"‘é,,,(s | 20). 51,.(s)= 13,7, (0,3 I Z0), 2,,(s) = 13,".(3 I Z0),se [0,1].

We have (in, 5”.»2n)6 D[O,1]3 for every n and
P(EdIan<C)=P(L’dIﬁh,o(s,3)|<c)=p(,§hj(,-|ZO)<C)_), as "a,”

. P
because A,,,(1IZ0)—>A,,,(1|Zo) < C. Thus (in, y,,2,)e E with probability tending to

one.

We have

D

n”2 (2,,(.) - x0(.))-> X00 = e-r'chjoc 120):
D

""2 ( y,,(.) - y0(.))—)Yo(.) = (1,,, (0.. I Z0).

D t
n"2 (2,,(.) — zo(.))-)Zo(.) = U,,,-(. I 20).

148

The processes X 0,Y,,,Z,, are Gaussian and hence have versions that are almost

surely continuous. Let C[O,1] the set of continuous functions on [0, 1]. The subset
C[O,1]3 C D[O,1]3 is separable, so (X 0,Y,,,Zo) has separable support.

Because ﬁ(0,. | 20) = [1(1 + dA(. | 20)) and P(0,. | 20) = [[(I + dA(. | 20)).
(0.,] (0,.]

the matrices 13(0,.IZ0),P(0,.IZO) are functionals of A(.|Zo),A(.|Zo), respectively. It

can be easily shown that jointly:

,2 13,,(0,.|zo)—P,,,(o,.|zo) SIU‘“(O"|Z°)I
holler-41,112.) 01,112.) '

Consequently, by (3.38),

"Hz ’ D ’
[(x. () y..() z .0) -(xo(.).yo(.).zo(-)) I—>(Xo(.).Yo(.).Zo(.)) .

By the Functional Delta Method stated in Appendix B,

D
"“2 (¢t(xnvynrzn )" (01(th YOrZO))_)d¢rE(x0r yOrZ0)°(X0’Y0rZO) = P(t) defined in the

theorem statement. This completes the proof of Theorem 4. I

By (3.26)-(3.30), we can write the limiting process P(t) as

P(t) = g,’ £e‘"Z,;,(s)P,,,(0,s | Zo)a,,,o(s)exp( 13;,sz
+§I£e-rschj(s I ZO)Fih (O, S I Zo)ahjo(3) exp(ﬂtozhjo)d3

+22 £e ”C,,,(SIZO)I:EP,-g (0, u IZo)( P,,,(u, sIZo)- P,,,(u,sIZo))dU,;g,(u)]

g= -ll¢g

ago“) CXP(2ﬂ'thjo)d3

149

1’: ’ ﬂinch)“ I Zo)Piti (0, S I Zo)exp(ﬁ’ozhjo) (Zhjo " em“. ﬁ0))ahj0(s)d3

+ £e’"c,,,(s | Z0)P,,,(0,s | Zo)exp( 1352,,0)dug,,(s).
By Theorem 3, {U5h,,(h, j)e E‘} are independent, continuous Gaussian

martingales. The integrals with respect to 115,, are Ito integrals and the theory from

Appendix C can be applied. Therefore, by the Fubini-type Theorem from Appendix C

applied to the bounded functions
H,,(s,u lZo) = 1,0.,,(u)e'”c,,,-(sIZO)R-g(0,uIZO)(P,,,(u,sIZO)— P,,,(u,s |zo))
and the ﬁnite measure deﬁned by ,u(0,s] = I: d,,,o(u)du , the third term of the previous

sum can be written as

It
exp(zﬂézhj0)2 Z Epig (0!“ I 20) X
g=ll¢g
XI [IPM (u,s I 20) —- Pgh (u,s I 20)) (”C,,-(s I Zo)a,,,0(s)ds:IdU38,(u).
Therefore the process P(t) has the form:
P(t) = Pi(t) + P20) + 1’30) + 110).

where P,(t) = {{T, (t),

P20) = 5720).

k
B(t)=Z Z [f.,,<u)dua,,(u).

g=l latg
(l.g)¢(h.j)

em = £f4hj(“)dut;hj(“)

and we denote by T,(.),T2(.), f3,,(.), fw(') the following expressions:

150

T, (t) = exp(ﬁgz,,,) I: e‘"zg,(s)1:,(o,s | 2,)a,,,(s)ds

T,(t) = exp(3gz,,o)x
x Ie’“c.,-(S I 2,)[F,,(o,s | z,)+ B,(o,s l 2,)(z,,, — e,,(s, 3,,))]a,,,o(s)ds

f3,,,(u) = exp(2352,,,0)R, (0,u | Zo)I [(3, (u,s I Z,,) - P,,, (u,s I 20)) e"’c,,,.(s I Zo)a,,,,,(s)ds:I

f,,,.(u) = exp(zﬁgz,,)P,,(o,u I2.) I: (P,,(u,s I 2,) — P,,,(u,s Iz,)) e"‘c,,,(s I 2,)a,,,(s)ds +

+ exp(3,;Z,,,o )e'”c,,,.(tt I ZO)P,,, (0,u I la).

The functions T,(.), T2(.), f3g,(.), f,,,,(.) are deterministic. Propertyiv) stated in

Appendix C and the fact that IU;,,,,(h, j)e E‘} are independent, Gaussian martingales

immymm

[30) + P4 (t) is normally distributed, with mean zero and variance

i Z £f311(u)d(ug,,,)(u)+ £f,2,,,(u)d(ug,,,)(u). (3.39)
g=1 latg
(I.s)¢(h.j)

By (3.30), 4', is multivariate normally distributed, with mean vector 0 and

covariance 2(a)". Thus

P,(t) is normally distributed, with mean zero and variance T,(t)'2(a)"T,(t). (3.40)
Similarly, by (3.26),
P2 (t) is normally distributed, with mean zero and variance T2 (t)'2;'T2 (t). (3.41)

By Theorem 3, 5 and {U5h,(.),(h, j)e E} are independent, so

151

P2(t) is independent of P30) + P,(t). (3.42)

In Section 3.2.3 the regression parameter estimator 3 was computed from the

cost vectors C,,..., C,,. Then, by assumption EA.2, we can consider {, , the asymptotic
limit of n”2(3 - 3), independent of if and {Ham-(.),(h, j)e E‘I. Consequently,
P,(t) is independent of P2(t) and P3(t)+P4(t). (3.43)
By (3.39)-(3.43), P(t) is normally distributed, with mean zero and variance

Var(P(t)) = T, (t)'2(a)"1,(z) + T,(t)'2;‘T2(r) +

+ i Z £f3231(“)d (U531>(“)+ £f42h,(u)d (U5m>(u)-

g=1 latg
(l.g)¢(h.j)

and“)

,0, u. A consistent estimator of the
Shj (“r 160)

By Theorem 3, for h it: j: (Ugh,>(t) = £

variance of P(t) is obtained replacing all unknown quantities by their corresponding

consistent estimators.

3.3.3 Asymptotic Distribution of the Mean Sojourn Cost

The asymptotic normality of n'” IMPVI2’(s,d |i,Z,,) - MPV,,m(s,d Ii, 20)) will

be assessed also by the Functional Delta Method. The entry time s and duration time d

are considered ﬁxed throughout this sub-section.

d
Consider the functional (y,,, :D[0,1]2 —)R, (y,,,(x,y)= I” x(u)y(u)du. The

mean present value MPV,,(2)(s,d Ii, 20) can be written as

152

MPV,,m(s,d |i,Z,,) = (y,,, (x0, yo) = w,_,, (e""b,,(. IZO),P,,, (0,. Izo».

We will show that III“, is Hadamard differentiable in (x0, yo).

Two convergence results from the previous sections will be needed. One is that
n” 2 (13,, (0,. I 20) — P,-,,(0,. I Zo)) converges weakly to the process

U,,,(0,. I 20) = U,,,,(0,. IZO) + U2,,,(0,. I Z0),
where U,,,, (0,. | Z0), U2,,,(0,. I 20) are described in (3.28) and (3.29). The second is that,
by (3.23), nm (3,,(. I20) — b,,(. I Zo)) converges weakly to the Gaussian process
b,,o(. | 20) described in Section 3.2.4:

bait. I2.) = 525.1). (3.44)

where Z520 is a deterministic vector function, {2 ~ MVN(0,Z(a)") and

2(a) = lim liX,'V,_l(a)X,.

n—roo .
n i=1

First we prove the following lemma:

Lemma

For a ﬁxed se [0, 1) and d > 0 such that s + d s 1 we deﬁne y,,, : D[O,1]2 —>-n

by Ill“, (x, y) = If“! x(u)y(u)du. Then W”, is Hadamard differentiable at every point

(x0, y,,)e D[O,1]2 , with derivative

dw,.,(x,,, y,,).(h,k) = I“ h(u)yo(u)du + f+dxo(u)k(u)du.c1

153

Proof Lemma:

Let (x0, yo) be an arbitrary element of the space D[O,1]2. It is straightforward

that dwmxxo, y,,) is a continuous, linear mapping.

Consider an arbitrary (h,k)e D[O,1]2 and the sequences t,, e R+,h,,,k,, e D[O,1]

that satisfy t,, —9 0, h,, —9h, k,, 9k , where II . II is the supremum norm. Deﬁne
x,, = x0 + t,,h,,
yn = Yo ‘1‘ tnkn°
For each n, (x,,, y,,)e D[O,1]2.

We want to show that w”, is Hadamard differentiable at (x0, yo) , so that

t;' {w.,.(x,. y,,i—w.,.,(x..yo)}—dw.,.(xo.yo).(h.k) —> o as n —> ..., (3.45)

By the continuity of cit/I”, (x0, yo) , it is sufﬁcient to prove that

by
Sn = I:{Ws,d(xnryn)-Ws,d(x0’yO)}-dWs,d(x0’y0)'(hn’kn)‘90

notation

as n —> 00. This sequence is equal to

15' If" x. (u)ya(u)du 4.71 If” xo(u)>’o(u)du - EM h.(u)yo(u)du - ISM Xo(u)k..(u)du.

We expand

t,',' 1 If” x,, (u) y,, (u)du = If"! (1,:le + h,, )(u)( yo + t,,k,, )(u)du =

= r;' f” x0(u)yo(u)du + If" h,, (u) y0(u)du + I“ xo(u)k,,(u)du + r, If” h,,(u)k,,(u)du.

As a result

SH!
.1

s+d
5,, =2, I h,,(u)k,,(u)du= I (x,, -x,,)(u)k,,(u)du.

154

We have that
IS,,I S dIIx,, -onIxIIk,,II —-) 0 as n —-> 00
because IIx,, - onI —> 0 , IIk,, — kII —) 0 and IIk,, II S IIk,, — kII+ "k" < oo . Therefore (3.45)

follows. This completes the proof of the stated Lemma. I

I

In Section 3.2.4 we considered C,’ = IC,,,..., C I to denote the vector of all total

in,-
observed costs incurred in sojoums in state h by the i—th individual. The cost vectors

C, ,...,C,, are assumed independent. The same notations are used in both Sections 3.2.3

and 3.2.4 but in the context we are easily able to recognize between them.
A similar extra-assumption as EA.2 is considered:

EA.3 The cost vectors C,,...,C,, are independent of (N,,Y,,Z,),l$i Sn.

The estimator 3,,(. I Z0) of b,,(. | 20) was obtained from the cost vectors C,,...,C,,
and 13,, (0,. I20) is calculated from (N,,Y,,Z,-),1 S i S n . By assumption EA.3, we can
consider that 3,,(.IZ,,) is independent of 13,,(0,. IZo)- This implies that

n1’2(3,(. I Z0) — b,, (. I 20)) is asymptotically independent of

n"2(ﬁ,,,(o,. IZo)- P,,,(0,. Izo)). (3.46)

Theorem 5
Under the assumptions A.0-A.7 and the extra-assumption EA.3, for a ﬁxed entry

time s and ﬁxed duration (1:

155

n1/2 (MﬁVIIz)(-Yrd “’20) — MPV,,(2)(S.d Ii,Zo)) =
1/2 -r"‘ ‘ "' D
=n [Wad (e bh('IZO)’Pih(Or'IZo))-W3.d (e bh(-IZO)’PM(O’°IZO))]—)

D
adv/ad Ie”'b;.(. I Zo).Pth(0.. I 20)).Ie"'b,,0(. I ZO)'Uih(0” '20)) =

1135r

s+d s d
= I e’mbaow IZo)P.~,,(0.u IZo)du + I+ e“’“b,,(s|z,,)u,,,(o,u |Z0)du R(s,d).c1
S S no

*3.

100

Proof Theorem 5:
We deﬁned w“, : D[O,1]2 -> R by W“, (x, y) = EM x(u)y(u)du. Let

x0(u) = e""‘b,, (u 120) and y,,(u) = P,,, (0,u IZo) , where both x0, yo are elements of

D[O,1].

By the previous Lemma, w”, is Hadamard differentiable at (x0, yo) , with
. . s+d +d
derivative dI/IU, (x0, yo).(h,k) = I; h(u)y,,(u)du + I: xo(u)k(u)du.

For every n deﬁne (2,, y, )e D[O,1]2, 2,, (u) = e‘mb“,(u IZo) .

5",.(u) = P,,.(Om IZo). u 6 [0,1]. By (3.46),
"112 [fn(°)]_[x0(r)] _D)[Xo(-)),
yn(°) y0(-) Y0(.)
Where X0(.) = e-r.bho(. I 20) , Yo(.) = U”, (0,. I 20) .

The processes X 0, Yo are Gaussian, so they have versions that are almost surely

continuous. Let C[O, 1] the set of continuous functions on [0,1] . The subset

C[O,1]2 C D[O,1]2 is separable, so (X ,Y ) has separable support.
0 o

156

By the Functional Delta Method (see Appendix B),

D
"“2 (Wst xnrf’n) -Ws,d (XOrYOIITdV’sa (xOryOI'IXO’YOI = ”3"!) ’ so Theorem Sis

proved. I

By (3.28), (3.29) and (3.44), the limiting process R(s,d)can be written as
R(s,d) = R,(s,d) + R2(s,d) + R3(s,d) ,

where R,(s,d) = (2’s,(s,d),
R2(s,d) = 6'52(s,d) ,

R3(S, d) =

= EELS“! e'mb,(u IZO)I:EPi g(0 V Izo)( PM": u Izo) Pgh(v.uIZo))dU,;g,(v):Idu

g= -ll$g

and we denote by S,(s,d) , 52(s,d) the expressions:

s+d -ru .
S,(s,d)= I e 202(u)P,,,(0,u|Z,,)du,

s d
s,(s,d) = I + 6%,, (u |Z0)F,,,(0,u |Z,,)du. (See p139-l40 for the deﬁnition of
F,,,(0,u I201)
By the Fubini-type Theorem from Appendix C, applied to the bounded functions
H,,(u,v) = I,,.,+,,,(u)e'"‘b,,(u Izo)1,,,,,(v)13.,(o,vIzo)(P,,,(v,u IZo)- P,,,(v,u Izo)),

u,v 6 [0,1] and the Lebesgue measure on [0,1] , R3(s,d) can be written as

k
R3(s,d) = exp(3,’,Z,,,-o)z Z £13.,(0,v IZo) x

g=ll¢g

XIEIls,s+d](u)1[v.1](u)(1)1110)!“I20)“ P8,, (V,“ '20)) e-mbh(u IZo)du]dU;gl(V),

157

k 3+ It
so R3(s,d)= 22 L dS3g,(v)dUog,(v),

g=ll¢g
where

S38! (V) = exp(ﬁézhjo)ﬂg (O,V '20) X

3+

d -m
X s Ilv.s+d](U)(Pul(V,uIZo)—Pgh(v,u'20)) e bh(UIZO)du.

Using the same approach from Section 3.3.3, we obtain that

R(s,d) is normally distributed, with mean zero and variance

Var(R(s,d)) = Sl(s,d) '>:(a)“s,(s,d) + 32(s,d)'2;'S2(s,d) +

+ i Z Em S328,(v)d<Ugg,>(v),

gal latg
. a u . . .
where for g at l : U (I) = —LIQS——)—du . A consrstent estimator of the variance of
°" s‘?’(u 130)
g 9

R(s,d) is obtained replacing all unknown quantities by their corresponding consistent

estimators.

Comments

1) The technique proposed in this chapter separates the temporal dynamics of
movement between states from the actual expenses. Transition probabilities and
intensities that capture the former are estimated by Markov models, while the level of
expense is modeled through mixed models.

2) Consider there are only two states: the initial state ‘0’ and the state we label as

‘1’. Denote by T the random time of transition from the initial state to the state ‘1’. For a

158

 

given proﬁle 20 , we have that the mean present value of all expenditures in (O,t]
associated with expenditures in state ‘0’ is

MPVo‘z’o | 20) = Le'mbom | ZO)S(u | Zo)du ,
where S (u | 20) = Poo (0,u | 20) = P(T 2 u | 20) is the survival function and
bo(u | 20) = E (Bo (u) | T > u,Zo) . Under the assumption that T is independent of the rate
process {Bo (u),u > O} , b0 (u | 20) = E (80 (u) | 20) is the expected rate of the accumulating

cost. Therefore, if ‘0’ and ‘1’ are labels for the states of a patient being ‘in-hospital’ and
‘discharged’, the model described in Chapter 3 for sojourn costs with no discounting

reduces to the model proposed in Chapter 2.

159

 

APPENDIX A

EXTENSION OF SLLN ON DE ([0, 112)

Let [2 =[0,l]2 and (E,|| . II) a separable Banach space. Following Neuhaus
(1971)70 we will introduce the space 05(12).

Let I . I be the maximum norm in R2. For Ac: R2 , A denotes the closure and

A the interior of A in the le-topology. Let P ={p = (p,,p2);pl,pze {0,1}} the set
consisting of the four vertices of 12.

Consider t= (t,,t2)e 12,p = (p,,p2)e P. We deﬁne the quadrants Q(p,t) and
é(p,t) in I2 with vertextby:

Q(.0J)= [(Pr.ll)><1(p2,12).

where I(O,tk)=[0,tk),l(l,tk)=(tk,1], ke {1,2} and

é(p,t) = i(pl’t|)Xi(p29t2)a

, ke{l,2}, where (D is the

.. O,t ift <1 - t,l ift <1
where I(O,tk)={[ k) I: {It} I:

,10,: =
[0,1] iftk=1 ( ") <r> iftk=l

null set. Figures A.l-A.4 provide a visualization of the deﬁned quadrants.

The following properties are immediate consequences of the above deﬁnitions:
(200.0 C (200.0 C é(p.t);

Q(p,t) = 0 if and only if Q(p,t) = O;

160

é(p.t) né(p’.t) = <I> if .0 ¢ 12’;
ZpepémJ) = 12 for every IE 12.
Also, for every :6 12 there exists one and only one p = p(t)e P (denoted 0')

with :6 Q(a',t). The quadrants Q(a,t) and Q(o,t) are called continuity quadrants in t.

For these quadrants é(a,t) at (I) and Q-(O'J) = Q(a,t).

Definition of the “quadrant limit”

Consider the function f : I2 -) E . If for the point Is 12 , the vertex p6 P with
Q(p,t) ¢<I> and for every sequence {tn} CQ(0',t) with tn —>t, the sequence { f (tn )}
converges then the limit (not necessarily unique) is denoted f (t +0 p) and it is called a

p-limit of f in t or a “quadrant limit”. 0

Definition of the space DE (12)
The space 05(12) is the set of all functions f : 12 —-> E for which the p -limit of

f in t exists for every p6 P,te I2 for which Q(p,t) $4) and which are “continuous

from above”, in the sense that f (t) = f (t + 00) for every t. 0

Definition of a partition generated by points of I 2

Let t,,....t, e 12. The collection of all rectangles R of the form:

R=[u1,u;>x[wz,u§),

161

where uj,u;-eKj={tlj,...,t,j}u{0,l}, uj<u}, (uj,u})nKJ-=<I>, je{l,2} is called
the partition generated by t1,...,t, and it is denoted P = P(t,,...,t,). The symbol ")"

means ")" or " " if the right endpoint of the interval is less than 1 or equal to one,

respectively. D

Neuhaus (1971)70 generalized the Skorohod metrics d,d0 on the space DR[O,1]
to the metrics d,do on Dnuz) (actually on DR(Ik),Ik =[0,l]x...x[0,l]). The space
DnUz) is separable and complete with respect to the metric do. Then, just like 03(12),
DE(12) is also separable and complete with respect to do, replacing the absolute value
on IR with the norm on the space E. The metrics d and do are equivalent.

The characterizations of compact sets of DR[0,1] given by Theorems 14.3 and
14.4 in Billingsley (1968)" and generalized by Neuhaus (1971 )70 to DRUZ) do not carry

over directly to 05(12) , since in E a closed, bounded set is not necessarily compact.

However, the given conditions are still necessary for compactness, even if not sufﬁcient

anymore.3|

Necessary condition for compactness

If K C 05(12) is a compact then

1' ' 5 =0,
stew >

162

 

 

where w;(6)= sup min(||x(t)—x(:,)||,||x(:)-x(:2)||), with [t,,tz] for t,,t2e12
te[r..12]
|12-1,|<6

denoting the Cartesian product [tl l,t21 ] x [(12,1‘22] .0

On R2 we say that t S u if and only if tl 5 ul and :2 S 142 (same if we replace

" s " by the strictly inequality sign " < " ). This is not an well deﬁned order relationship.

Deﬁnition of a 05(12) -valued random variable

By a 0502) -valued random variable we understand a function X = X (t,a)) such

that
l) for each ﬁxed re 12 , X(t,a)) is a random variable;

2) X(t,w)e DE(12) for almost all (0.0

For xe D502) we deﬁne IIxIId =supIIx(t)II. Then (05(12),I . IL) is a normed
rel;

space.

The following lemma is a generalized version of Lemma 2, Rao RR. (1963).43

Lemma

Let X be a 05(12) -valued random variable such that E IIX IL < co. Then, for each

8 > O , there exists a partition of [2 generated by some points r,,...,t, such that

sup EIIX(t) - X(t’)| s e

t.t’e R

 

163

for every rectangle R of the partition P(TI ,...,t,) .0

Proof Lemma:

For a < b we deﬁne p(a,b) = sup EIIX(t) — X(t’)II. Recall that for ae 12,
t,r’e[0.b)\[0.a)

[0,“) = [O’al)x[09a2) '
Consider the set Diag = {t e 12 :t1 = t2} endowed with the order relationships

"S", "<",where
tSu (t<u) ifandonlyiftl Sul (tl <ul).

Therefore we can consider the inﬁmum or supremum of subsets of Diag.
Consider an arbitrary E > 0.

If p ((0, 0),(1,1)) .<_ 8 then deﬁne the point t, = (1,1). As 1 "travels” on Diag from
(0,0) to (1,1), the function t—> p((0,0),t),te Diag is increasing (with respect to the
order relation " S " on Diag) and also continuous, by the Lemma hypothesis. As a result
we can deﬁne 1', = inf {re Diag : p((0,0),t) > 8}.

Generally, deﬁne I}. = (1,1) if p(z'j-,,1) S 8 and otherwise let

I}. =inf{te Diag 371—1 <t,p(rj_,,t)>£}.

Next we show that Tj = (1,1) for some j. If this is not true, there would exist a
sequence {In} C Diag,r,, S tn < Tn“ such that for each n

EIIX(t,,)—X(r,,+,)II>£/2. (A.1)

164

The sequence {Tn} is increasing (with respect to the order relation " S " on Diag)
and bounded, so there exists 2'6 Diag such that 1,, -) 1'. Then X (t,,) — X (1,,) —9 O in E
as n —> 00. We also have that EIIX(I,, ) -— X(z'n )II S 2EIIXIId < co. By Dominated

Convergence Theorem, (A. l) is then not possible.

Consequently there exists r such that r,“ = (1,1) and we consider the partition

generated by 11,...,z',. The stated Lemma is proved. I

Now we state and prove a Strong Law of Large Numbers (SLLN) on 05(12).

We followed the ideas of the proof of the SLLN on DE[0,1] done by Andersen and Gill

(1982).”

SLLN on 05(12)
Let X ,X,,X2,... a sequence of i.i.d. 05(12) -valued random variables such that

EIIXIId < oo. Then

IIn—IZLIXi —EXIId —)0 a.e. as n—->°°.CJ

Proof SLLN:

The space 05(12) is separable and complete with respect to the Skorohod do

metric. Then any random element of 05(12) is tight. This result is true by the following

8131611161112

165

Proposition (Theorem 1.4, p10, Billingsley (1968)")
If (3,5) is a metric space with 6‘ the class of Borel sets in S and S is separable

and complete then each probability measure on (S,t5') is tight. 0
Therefore, because E IIX IL < oo,
i) for every 6 > 0 there exists a compact set K C 05(12) such that
EIIXIIIX e K] < 8.
Next we show that
ii) for every 8 > O and every compact set K C 05(12) there exists 6 > 0 such that
if xe K and aSt < ,8 Sari-(5,6) then
IIx(t) - x(a)II s "qu + opo) — x(a)II + e ,
where the vertex p0 = (0,0).

Let 8 > 0 and K C DE(12) a compact set. Using the previously stated necessary

condition for compactness, there exists 5 > 0 such that sup w;(6) S 8. By the deﬁnition
xEK

of W;(5), if xe K and aSt<ﬂ$a+(5,6) then
min (IIx(t) - x(a)II,IIx(,B — e’) — x(t)II) s e
for every 8’: (£{,£§),£,’,£§ >0 such that (ISIS ,B-e’< ,6. Because xe 05(12) ,

52113)}, 0) X(ﬂ — 6‘ ) = X(ﬂ + Ono)“ Consequently

min (IIx(t) — x(a)||,|Ix(,8 + opo) — x(t)II) s e .

If IIx(t) - x(a)II s a then ii) is obviously satisﬁed. If |wa + opo) — x(t)II s s then

166

IIx(t) - x(a)II s |wa + op0 ) - x(a)II + |wa + op”) - x(t)II s ||x(,6 + ope) - x(a)II + a,

so ii) is again veriﬁed.
The last property we prove is:

iii) for every 8 > 0 and every 6 > 0 there exists a partition P of [2 generated by

some points t,,....t,\,_l such that for each rectangle Re P , R = [a, )3) we have
|,6—a|<6 and EIIX(ﬂ+0po)—X(a)IIs£.

By the stated Lemma, there exists a partition of 12 generated by some points

r.,...,z', e Diag such that sup E “X (t) — X (1’)“ S 8 for every rectangle R of the partition

r,t’e R

P(z'l ,...,r,). Taking on Diag intermediate points between 23,714, , we deﬁne a ﬁner
partition P = P(t,,...,tN_,) of 12 such that for every Re P , R = [(1,/3) we have
|,6 - a] < 6 and EIIX(I) — X0)“ 5 e for all t,t’e R. Then, taking po-limits ofX in ,6

and a in the previous relation (possible because E "X "(I < 00), we obtain

EIIX(ﬂ +opo) — X(a)II s e , so iii) follows.

In the following we will use the properties i)-iii) to prove the SLLN.

Consider an arbitrary 8 > 0. We choose a compact set K by i), a 6 > 0 by ii) and
ﬁnally the partition P = P(tl,...,tN_l) by iii).

First we show that

sup II"-12?=IX‘(’)- EX «)II —> o a.e. as n —> oo. (A.2)
te[0,l)x[0.l)

Let t a point of [O,1)x[0,1). Then there exists a rectangle R in the partition P

such that IE R. Denote R = [a.ﬁ).

167

We have that

"WE?=l X,(t) — EX (2)" s sf (t) + n"Zf=l||X,.||d [X,- e K] + EIIXIId [X e K], (A.3)

where of (r) =

 

 

r242; X,(t)[x,. e K]— EX(t)[X e K]II.

The quantity

sf (t) s sf (a) + n"'Z:=lIIX,-(t) — X,(a)||[x,. e K] + EIIX(t) — X(a)II[X e K].
Then, by ii) and iii),

8,50) s ef(a)+n"zf=lIIx,.(/9+opo)— X,(a)||+e+ EIIX(,B+0pb)- X(a)II+£

s £f(a)+n"Z;lIIX,-(ﬂ +0%)- X,(a)||+ 33.

We will apply a SLLN on separable Banach spaces, ﬁrst proved by Mourier
(1953)”:

SLLN on Banach spaces:

If (X ,II . II) is a separable Banach space and {Vn} a sequence of i.i.d. random
elements in X such that EIIVl II < 00 then III-12.11% — EV'I —> 0 a.e. as n -—> oo .0

By this SLLN, a: (a) —-) 0 a.e. and by the regular SLLN (for the real valued

random variables)
n“z;'=lIX,.(ﬂ+opo)- X,(a)II—> EIIX(,6+0po)— X(a)II a.e. as n -—>oo.

Therefore

by iii)
limsup sup of (t) s 0+ EIX(,6 + op”) - X(a)I+ 385 42. (AA)
n—m te[0.l)x[0,l)

We apply again the SLLN on Banach spaces and then, by (A.3),

168

 

lim sup sup
n—m te[0.1)x[0.l)

S limsup sup 8f(t)+2EIIXIId [X e K].
n—roo re[0.1)x[0.l)

 

n'12?=lX,-(t)— EX(t)II 5

By (AA) and property i),

limsup sup
n—Nn t€[0.l)x[0,l)

 

 

{‘2le X,(t) — EX(t)II s 68.

Taking 8 to converge to zero, we obtain (A.2).
The following is the extension of SLLN to the space DE[0,1] , proved by
Andersen and Gill (1982)“:

SLLN on DEW, 1]

Let {Va} a sequence of i.i.d. random elements of DE[0,12]. Suppose
EIIVIII = EI sup III/1(6)") < co. Then IIn"Z:__IV,- — EV'I -> O a.e. as n —) oo .0
ce[0.r2]

By this extended SLLN:

 

r142; X,(t) — EmeI ='

sup
re{(rI ,l).t,e[0,l]}

 

(A5)
= sup n-IZ:=1X"(I"1) -— EX (t,,l)II —> 0 a.e.
i.e[OJI

 

 

because X ,-(.,1) are DE[0,l]—valued random variables and E “X l(t,,1)II S EIIX1IId < co.
Similarly

sup III-12:! X,(t) — EX (1)" -> O a.e. (A.6)
:6 {(1.12 ).12E[0.1]} '

By (A.2), (A.5) and (A.6), the SLLN on 05(12) follows. I

169

 

p = (030)

 

 

 

 

 

 

 

 

 

 

 

Qtp,t)=t0,trlxto,trl

étp,t)=t0.tr>xttr,ll

Q'tp,t)=ttr.llxt0,tr>

 

 

 

 

 

 

 

 

t2 ........................................... .l
p t
p r
12__.:, ............................
5
p = (19 0)
12 ............................................
‘1
p = (191)
t2 ..............
5.

Q'tp,t)=ttr.llxtt2,ll

Figure A.1: Deﬁnition of the quadrants Q(p,t)
when t e [0, l) x[0, 1)

170

 

 

 

p=(0,0) Q(p,t)=[0,t1)x{0,1]

 

 

 

 

p = (1,0) Q(p,t) =[t1,1]><[0,1]

 

 

 

 

pE{(0,l),(l,1)} é(p,t)=<I>

Figure A.2: Deﬁnition of the quadrants Q"(p,t)
when t e {(t1,l), t1 6 [0,1)}

171

p=(0r0),

,0 =(0a1)

 

 

 

 

 

 

 

 

 

 

.06 {(1,0),(1,1)}

Qtp,t)=t0,llxt0,trl

Q"(p,t) =[0,ll><[t2,1]

Q(p,t) = <I>

Figure A.3: Definition of the quadrants Q(p,t)

172

when t e {(1,1‘2), t2 6 [0,1)}

 

p = (0,0) Q”(p,r) =[0,1]x[0,l]

 

 

 

p e {(0.1), (1,0),(1,1)} Q(p,t) = o

Figure A.4: Definition of the quadrants Q(p,t)
when t = (1,1)

173

APPENDIX B

The Functional Delta Method

We will brieﬂy review some results from Gill (1989).72
A concept of differentiability that allows a generalization of the usual Delta
Method is the one of Hadamard or compact differentiability.

Let B,, 82 denote two normed vector spaces.

Definition

The functional (p : B, —> 82 is compactly or Hadamard differentiable at a point
66 B, if and only if a continuous linear map d¢ : B, —) 82 exists, such that for all real

sequences a,, —> co and all convergent sequences h" —) h e B, ,
a, (qu + d,,-1h") - 0(6)) -> d¢(6).h as n —) co.

Here (“0(6) is called the derivative of w at the point 0.
(See Deﬁnitions 1-3, p100 and the characterizations of differentiability, p102, Gill

(1989)")0

An important property of Hadamard differentiation is that it satisﬁes the chain

rule: if (p : B, —9 B2 and W : 32 —> B3 are Hadamard differentiable at xe B, and

174

(0(x) e 82 respectively, then w o q) : B, —-> B3 is Hadamard differentiable at x, with

derivative dl/I(¢(x)).d(a(x).

Next we deﬁne the concept of weak convergence in normed vector spaces. Let

( B,II . II ) be a normed vector space endowed with a 0‘ — algebra B , such that

B' C B C B " , where B ' and B " are the a' — algebras generated by the open balls and

the open sets of B, respectively. Thus 13 " is the Borel 0' — algebra; when B is separable,

8': B" .

Deﬁnition (See Deﬁnition 4, Gill (1989)”)

Let X n be a sequence of random elements of (3,8) and let X be another random

element of that space. We say X n converges weakly (or in distribution) to X and we

D D
write X n —> X if and only if Ef (X n)—9 Ef (X ) for all bounded, norm-continuous,

B-measurable f : B -—> R .0
The full functional version of Delta Method is given by Gill (1989)”, Theorem 3:

Theorem (Functional Delta Method)

Suppose (p: B, —> 82 is compactly differentiable at a point p e B, and both it and
its derivative are measurable with respect to the a — algebras B, and B, (each nested

between the open ball and Borel a - algebras). Suppose X n is a sequence of random

175

D
elements of B, such that Z, = n"2(X,, - u)—->Z in B, , where the distribution of Z is

concentrated on a separable subset of B, . Suppose addition: 32 x 82 -—) 82 is measurable

(see Remark 2 below). Then

D
(1) [n“2(x,. -#).n”2(¢(X..)-¢(#))-d¢(#).n”2(X,, —m]—>(z.0) m Bl x32

and consequently (in particular)
1/2 1/2 P
(2) n (¢(X,.)-¢(#))-d¢(#).n (X.-#)—+0.

(3) nll2( __ D
(0(Xn) ¢(#))—>d¢(#)-Z .0

Remark 1:
Measurability of d¢(,u): B, —) 82 can often be shown to follow from
measurability of (p (see Lemmas 4.4.3 and 4.4.4, van der Vaart (1988)”). 0

Remark 2:

For x = (x,,x2)e B, x 82 we deﬁne "x" = max(IIx,II,IIxQI) and we give product
spaces B, x82 and 32 X32 their product a —algebras. if B, and 82 are D[0,z']P XR"
for some ﬁnite p, q and B, , B, are the open-ball 0' — algebras then all product

a — algebras are also the open-ball a — algebras with respect to the max norm. If one is

only interested in getting (3), it sufﬁces (qua measurability) to assume that left and right

hand sides here are random elements of 82 .0

176

The following is a useful lemma. In many applications the mapping w is only a
priori deﬁned on certain members of B, and one could set about choosing a particular

extension to all of B, such that the hypotheses of the Functional Delta Method are

satisﬁed in each particular application.

Lemma (see Lemma 1, Gill (1989)72)

Consider XE E C B, and (a: E —> 32. Suppose there exists a continuous linear
map d¢(x) : B, —) Bz such that for all tn —> 0 (tn 6 R) and h" —> he B, such that

x" = x + t,,h,, e E for all n, we have:
z;' (¢(x + t,,h,, ) -¢(x)) —> d¢(x).h as n —-> oo.

Then to can be extended to B, in such a way that it is differentiable at x, with
derivative d(p(x) . The derivative is unique if the closed linear span of possible limit

points h equals B, .0

Comments:

Let D[O,1] the space of real functions, right-continuous with left-hand limits,

deﬁned on [0,1]. We endow this space with II . II“ , the supremum norm. In Chapter 3

we consider spaces like B = (D[O,1])p . Under the Skorohod topology these spaces are

Banach and separable. Under the max-supremum norm the separability is not valid any
more. In these spaces, if the limiting process has continuous sample paths then weak
convergence in the sense of the Skorohod metric and in the sense of the supremum norm

are exactly equivalent. Otherwise, supremum norm convergence is stronger. 0

177

 

The next proposition characterizes the compact differentiability of the product
integral. As a reference see Theorem 8, Gill and Johansen (1990).74 We state the result

in the Andersen et al. (1993)29 form (see Proposition [1.8.7, p114):

Proposition

2
Let E5 C (D[O,1])k be the set of k xk matrix cadlag functions with components

of total variation boundedby the constant M. Let q? : E I: —> D[O,1]"2 be deﬁned by

¢(X)=H(I+dX).

[0»)

Let X be a ﬁxed point of E5 . Then (a can be extended to D[O,1]"2 so as to be

compactly differentiable at X, with derivative

(d¢(X).H)(t)= II](1+dx)H(ds)I‘[(1+dx),

selo.r][0.s) (3.1]
where, when H is not of bounded variation, the last integral is deﬁned by the application
(twice) of the integration by parts formula and the forward and backward integral

equations (see Theorem 5, Gill and J ohansen (1990)"). 0

The following continuity result is proved in Theorem 7, Gill and Johansen

(1990)":
If X n, X in E5 are such that X n —9 X in supremum norm then

HU +an) —-) “(I +dX) in supremum norm.

178

 

APPENDIX C

Results on Ito Integration

Let ($2,.F, P) a ﬁltered, complete probability space, with ﬁltration F: (f; ),20 ,

.7-3 containing all null sets of .7: . Let M a continuous Gaussian martingale on this space,
M = {M (t),t E ’1'}, with T =[0,1'],z' < 00 (see deﬁnition of Gaussian martingales in
Section 3.2.1). Let (M ) = V a continuous, deterministic, positive, increasing function on

’I' , zero at time zero.

Note:
The results in stochastic calculus are usually for the standard Brownian motion
(e. g. Harrison (1985)”, Oksendal (1995)"). They can be easily translated for continuous

Gaussian martingales which are called “time-transformed” Brownian motions.0

Let H 2 be the set of all adapted processes X on ((9,?) P),F) satisfying
E£X2(s)V(ds) <oo for all t6 7'.
For any ﬁxed I, one can deﬁned the Ito integral I,(X) = £XdM for X e H 2.

This integral is unique up to a null set, with E(I,(X)) = o, E(I,(X )2) = E L X2(s)V(ds).

179

As a process in t, I,(X ) is adapted, continuous, with 10(X ) = 0. We will give a sketch

of this deﬁnition, as presented in p56-61, Harrison (1985).” The stochastic integral
I,(X ) can be deﬁned for a more general class of processes X: set of all adapted processes

X such that

P(EX2(S)V(dS) < oo =1 for all IE 7'. In our applications X e H2 , so we will

not explore the deﬁnition further than H 2.
Let 52 be the set of all simple X e H 2. A process X is called simple if there exist
times {t,} such that 0=t0 <t, <...<t, —)00 and X(t,w)= X(t,,a)) for all :6 [t,,,t,,+,),

k 2 0. Note that the times {t,,} do not depend on the argument to.

Let L2 denote as usual the set of all random variables 6 on (52,]: , P) such that

1/2
||§I|=IE§2IU2 <oo. For Xe H2 and ﬁxed re 7', let ||xI= EI £X2(s)V(ds):I . The

same symbol II . II will be used to denote both a norm on L2 and a norm on H 2 .

We ﬁx t6 ’1' and to simplify notation, set I (X ) = I,(X ) until I is freed.

If X is simple then one can deﬁne 1(X) in the Riemann-Stieltjes sense for almost

n-l
every 0). Thus if X = 2X(t,)1[z,,t,,,),with o=t0 <t, <...<t, =t then

i=1

n-l
I(X) = Z X(t,)[M(t,,,) - M (t, )]. By Proposition 10, p57, Harrison (1985)”, if

i=1

x e 52 then E[I(X)] =0 and III(X)II=IIXI|.

180

The space S2 is dense in H 2. That is, for each X e H 2 , there exist simple
processes {X n} such that X n —> X in H 2 . As a reference see p92-95, Liptser-Shiryayev
(1977).77

For X e H 2 there exists a random variable I (X )e L2 , unique up to a null set,
such that I (X n) —> I (X ) in L2 for each simple sequence {Xn} satisfying X n —> X in
H2. Furthermore, E[I(X)] = o and I|I(X)II = "XI (see Proposition 11, p58-59, Harrison
(1985)”).

This concludes our short sketch of the deﬁnition of I ,(X ) for X e H 2 and a

ﬁxed time t2 0.

Properties of the Ito integral:

Let X,Ye Hzand let OSs<u<t, s,u,t6 7'. Then

i) IXdM = I’XdM + IXdM for a.a.a);

ii) I (cX + Y)dM = c I: XdM + I YdM for a.a.w;

iii) I:XdM is .f; - measurable;

iv) If X (aw) = X (t) only depends on I (so X(t) is deterministic) then I,(X )

is normally distributed, with mean 0 and variance £X 2(S)V(d3) .0

Proof:

For i)-iii) see Theorem 3.9, p27, Oksendal (1995).76

181

n-l
For iv) consider ﬁrst X simple, so I ,( X ) = Z X (t,,)[M (rm ) - M (t, )]. Because
i=0

X (t) is deterministic, by the properties of the Gaussian martingales, I,(X ) is normally

distributed. We have seen that E1, (X) = 0 and the standard deviation is
std(I,(X)) =IIXII.
Let X e H2 and a simple sequence {Xn} such that X" —) X in H2 (i.e.

IIXnII—>IIXII). Then I,(X,,)—)I,(X) in L2, so I,(X,,) converges in distribution to

x

"X."

I,(X ). The distribution function of I ,(X ,,) is <1>I I(where CD is the standard

normal dlstnbutlon function) which converges to <1) I—] , the dlstnbutlon function of

IX“

I,(X ). Thus iv) follows.-

The next theorem is a type of Fubini’s Theorem for stochastic integration.

Fubini-type Theorem
Let (s,u) —> H (s,u), (s,u)e 7' x7 be a bounded, 6’ x5 measurable function,
where 6’ is the set of Borelians on 7' . Let [1 a ﬁnite measure on the space (7,5).

Then, for every :6 7', A6 6’:
LI EH (“OW (“)IMdS) = £I: IAH(S.u)#(dS):IdM (u) for a.a.w. 0

For more general versions of this theorem and their proofs, see p159-l61, Protter

(1990).78

182

REFERENCES

Lipscomb J, Ancukiewicz M, Parmigiani G, Hasselblad V, Samsa G, Matchar
DB. Predicting the Cost of Illness: A comparison of alternative models applied to
stroke. Medical Decision Making. 1998;18 suppl:SB9-SS6.

Intrillagator MD, Bodkin RG, Hsaio C. Econometric Models, Techniques, and
Applications. Second ed. Upper Saddle River: Prentice Hall; 1996.

Wooldridge JM. Econometric Analysis of Cross Section and Panel Data.
Cambridge, MA: MIT Press; 1999.

Duan N. Smearing estimate: A nonparametric retransformation method. Journal
of the American Statistical Association. l983;78(383):605-610.

Zhou XH, Melﬁ CA, Hui SL. Methods for comparison of cost data. Annals of
Internal Medicine. 1997;127(8):752-756.

Mullahy J. Much ado about two: Reconsidering retransforrnation and the two-
part model in health econometrics. Journal of Health Economics. 1998;17:247-
281.

Dudley RA, Frank E. Harrell J, Smith LR, et al. Comparison of analytic models
for estimating the effect of clinical factors on the cost of coronary artery bypass

graft surgery. Journal of Clinical Epidemiology. l993;46(3):261-271.

183

10.

11.

12.

13.

14.

15.

16.

17.

Smith LR, Milano CA, Molter BS, Elbeery JR, Sabiston DC, Smith PK.
Preoperative determinants of postoperative costs associated with coronary artery
bypass graft surgery. Circulation. 1994;90(5, Part 2): 124-128.

Etzioni RD, Feuer EJ, Sullivan SD, Lin D, Hu C, Ramsey SD. On the use of
survival analysis techniques to estimate medical care costs. Journal of Health
Economics. 1999;18:365-380.

Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete
follow-up data. Biometrics. 1997;53:419-434.

Lin DY. Proportional means regression for censored medical costs. Biometrics.
2000;56:775-778.

Hallstrom AP, Sullivan SD. On estimating costs for economic evaluation in
failure time studies. Medical Care. l998;36(3):433-436.

Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika.
2000;87(2):329-343.

Gardiner J, Hogan A, Holmes-Rovner M, Rovner D, Grifﬁth L, Kupersmith J.
Conﬁdence intervals for cost-effectiveness ratios. Medical Decision Making.
1995;15:254-263.

Gardiner J, Holmes-Rovner M, Goddeeris J, Rovner D, Kupersmith J. Covariate-
adjusted cost-effectiveness ratios. Journal of Statistical Planning and Inference.
1999;75:291-304.

Lin DY. Linear regression of censored medical costs. Biostatistics. 2000;1:35-47.
Rapoport J, Teres D, Lemeshow S. Explaining variability of cost using a severity-

of—illness measure for ICU patients. Medical Care. 1990;28:338-348.

184

 

18.

19.

20.

21.

22.

23.

24.

25.

Jones KR. Predicting hospital charge and stay variation: the role of patient
teaching status, controlling for diagnosis related groups, demographic
characteristics, and severity of illness. Medical Care. 1995;23:220-235.
Silberbach M, Shumaker D, Menashe V, Cobanoglu A, Morris C. Predicting
hospital charge and length of stay for congenital heart disease surgery. Am J
Cardiol. 1993;72:958-963.

Calvin JE, Klein LW, VandenBerg BJ, Meyer P, Ramirez-Morgen LM, Parrillo
JE. Clinical predictors easily obtained at presentation predict resource utilization
in unstable angina. American Heart Journal. 1998;136:373-381.

Benzaquen BS, Eisenberg MJ, Challapalli R, Nguyen T, Brown KJ, Topol El.
Correlates of in-hospital cost among patients undergoing abdominal aortic
aneurysm repair. American Heart Journal. 1998;136:696-702.

Krumholz HM, Chen J, Murillo JE, Cohen DJ, Radford MJ. Clinical correlates of
in-hospital costs for acute myocardial infarction in patients 65 years of age and
older. American Heart Journal. 1998;135:523-531.

Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete
failure time data by modeling marginal distritutions. Journal of the American
Statistical Association. 1989;84(408):1065-1073.

Hoem J M, Aalen OO. Actuarial values and payment streams. Scandinavian
Actuarial Journal. 1978:38-47.

Praestgaard J. Nonparametric estimation of actuarial values. Scandinavian

Actuarial Journal. 1991;2: 129- 143.

185

26.

27.

28.

29.

30.

31.

32.

33.

35.

Norberg R. Payment measures, interest, and discounting - an axiomatic approach
with applications to insurance. Scandinavian Actuarial Journal. 1990: 14-33.
Norberg R. Reserves in life and pension insurance. Scandinavian Actuarial
Journal. 1991:3-24.

Norberg R. Hattendorffs Theorem and Thiele's Differential Equation
Generalized. Scandinavian Actuarial Journal. 1992:2-14.

Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on
Counting Processes. New York: Springer-Verlag; 1993.

Lin DY, Wei U. The robust inference for the cox proportional hazards model.
Journal of the American Statistical Association. 1989;84(408):1074-1078.
Andersen PK, Gill RD. Cox's regression model for counting processes: A large
sample study. Annals of Statistics. l982;10(4):1100-1120.

Wei LJ, Lachin JM. Two-sample Asymptotically Distribution-Free Tests for
Incomplete Multivariate Observations. Journal of the American Statistical
Association. 1984;79:653-661.

Thomas DR, Grunkemeicr GL. Conﬁdence interval estimation of survival
probabilities for censored data. Journal of American Statistical Association.
1975;70:865-871.

Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New
York: John Wiley & Sons; 1980.

Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and

Truncated Data. New York: Springer-Verlag; 1997.

186

36.

37.

38.

39.

40.

41.

42.

43.

45.

J acod J. Multivariate point processes: Predictable projection, Radon-Nikodym
derivatives, representation of martingales. Z. Wahrsch. verw. Geb. 1975;31:235-
253.

J acod J. On the stochastic intensity of a random point process over the half-line.
Technical Report 15, Department of Statistics, Princeton University. 1973.

J ohansen S. An extension of Cox's regression model. International Statistical
Review. 1983;51:258-262.

Arjas E, Haara P. A marked point process approach to censored failure time data
with complicated covariates. Scandinavian Journal of Statistics. 1984;11:193-
209.

Borgan O. Maximum likelihood estimation in parametric counting process
models, with applications to censored failure time data. Scandinavian Journal of
Statistics. 1984;11:1-16.

Fabian V, Hannan J. Introduction to probability and mathematical statistics. 1985.
J ennrich RI. Asymptotic properties of non-linear least squares estimators. Annals
of Mathematical Statistics. 1969;40(2):633-643.

Rao RR. The law of large numbers for D[O,1]-valued random variables. Theory of
Probabability with Applications. 1963;8:70-74.

Billingsley P. Statistical Inferences for Markov Processes. Chicago: University of
Chicago Press; 1961.

Mourier E. Elements aleatoires dans un espace de Banach. Ann. Inst. H. Poincare.

1953;13:161-244.

187

46.

47.

48.

49.

50.

51.

52.

53.

Charlson ME, Pompei P, Ales KL, Mackenzie CR. A new method of classifying
prognostic comorbidity in longitudinal studies: development and validation.
Journal of Chronic Diseases. 1987;5:373-383.

Matsui K, Goldman L, Johnson PA, Kuntz KM, Cook F, Lee TH. Comorbidity as
a correlate of length of stay for hospitalized patients with acute chest pain.
Journal of General Internal Medicine. 1996;11:262-268.

Fenn P, McGuire A, Backhouse M, Jones D. Modelling programme costs in
economic evaluation. Journal of Health Economics. 1996;15:115-125.
Polverejan E, Gardiner J C, Bradley CJ, Holmes-Rovner M, Rovner D. Estimating
mean hospital cost as a function of length of stay and patient characteristics. In
review. 2001.

Longini IR, Byers RH, Hessol NA, Tan WY. Estimating the stage-speciﬁc
numbers of HIV infection using a Markov model and backcalculation. Statistics in
Medicine. 1992;] 1:831-843.

Satten GA, Longini IR. Markov chains with measurement error: estimating the
'true' course of a marker of the progression of HIV disease. Applied Statistics.
1996;45:275-309.

Aalen OO, Farewell VT, de-Angelis D, Day NE, Gill ON. A Markov model for
HIV disease progression including the effect of HIV diagnosis and treatment:
Application to AIDS prediction in England and Wales. Statistics in Medicine.
1997;16:2191-2210.

Longini [M], Clark WS, Byers RH, et al. Statistical analysis of the stages of HIV

infection using a markov model. Statistics in Medicine. 1989;8z831-843.

188

54.

55.

56.

57.

58.

59.

61.

62.

Gentleman RC, Lawless JF, Lindsey J C, Yan P. Multi-state markov models for
analysing incomplete disease history data with illustrations for HIV disease.
Statistics in Medicine. 1994;13:805-821.

Hansen BE, Thorogood J, Hermans J, Ploeg RJ, Bockel JHV, Houwelingen JCV.
Multistate modelling of liver transplantation data. Statistics in Medicine.
1994;13:2517-2529.

Dabrowska DM, Guo-wen S, Horowitz MM. Cox regression in a Markov renewal
model: an application to the analysis of bone marrow transplant data. Journal of
the American Statistical Association. 1994;89:876-877.

Wanek LA, Elashoff RM, Goradia TM, Morton DL, Cochran A]. Application of
multistage markov modeling to malignant melanoma progression. Cancer.
1994;73:336-343.

Perez-Ocon R, Ruiz-Castro JE, Gamiz-Rerez ML. A multivariate model to
measure the effect of treatments in survival to breast cancer. Biometrical Journal.
1998;40:703-715.

Gold MR, Siegel JE, Russell LB, Weinstein MC, eds. Cost-Eﬁectiveness in
Health and Medicine. New York: Oxford University Press; 1996.

J acobsen M. Statistical Analysis of Counting Processes. New York: Springer-
Verlag; 1982.

Verbeke G, Molenberghs G, eds. Linear Mixed Models in Practice: A SAS-
Oriented Approach. New York: Springer-Verlag; 1997.

Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New

York: Springer-Verlag; 2000.

189

63.

65.

66.

67.

68.

69.

70.

71.

72.

73.

Diggle PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data. Oxford, UK:
Oxford University Press; 1994.

Cnaan A, Laird NM, Slasor P. Using the general linear mixed model to analyse
unbalanced repeated measures and longitudinal data. Statistics in Medicine.
1997;16:2349-2380.

Laird NM, Ware J H. Random-Effects Models for Longitudinal Data. Biometrics.
December 1982 1982;38:963-974.

Cox DR, Hinkley DV. Theoretical Statistics. London: Chapman & Hall; 1990.
Amemiya T. Advanced Econometrics. Cambridge, MA: Harvard University Press;
1985.

Manning WG, Duan N, Rogers WH. Monte Carlo evidence on the choice between
sample selection and two-part models. Journal of Econometrics. 1987;35:59-82.
Duan N, Willard G. Manning J, Morris CN, Newhouse JP. Choosing between the
sample-selection model and the multi-part model. Journal of Business &
Economic Statistics. 1984;2(3):283-289.

Neuhaus G. On weak convergence of stochastic processes with multidimensional
time parameter. Annals of Mathematical Statistics. 1971;42(4): 1285-1295.
Billingsley P. Convergence of probability measures. New York: Wiley; 1968.
Gill RD. Non- and Semi-parametric Maximum Likelihood Estimators and the von
Mises Method (Part 1). Scandinavian Journal of Statistics. 1989;16:97-128.

van der Vaart AW. Statistical estimation in large parameter spaces. Vol 44.

Amsterdam: Centrum voor Wiskunde en Inforrnatica; 1988.

190

74.

75.

76.

77.

78.

Gill RD, Johansen S. A survey of product-integration with a view towards
application in survival analysis. Annals of Statistics. 1990;18: 1501-1555.
Harrison JM. Brownian Motion and Stochastic Flow Systems. New York: John
Wiley; 1985.

Oksendal B. Stochastic Differential Equations. Fourth ed. New York: Springer-
Verlag; 1995.

Liptser RS, Shiryayev AN. Statistics of Random Processes I . New york: Springer-
Verlag; 1977.

Protter P. Stochastic Integration and Differential Equations. New York: Springer-

Verlag; 1990.

191

HIII HIII‘I HI