ASSESSING THE IMPACT OF MISSING DATA ON HOSPITAL PERFORMANCE PROFILING
By
Michael P. Thompson

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Epidemiology – Doctor of Philosophy
2015

ABSTRACT
ASSESSING THE IMPACT OF MISSING DATA ON HOSPITAL PERFORMANCE PROFILING
By
Michael P. Thompson
Ischemic stroke is a leading cause of mortality, long-term disability, and high healthcare
costs in the US. In light of this clinical and financial burden, the Centers for Medicare &
Medicaid Services (CMS) has decided to incorporate ischemic stroke measures of 30-day
mortality and hospital readmission into its current pay-for-performance program. This decision
has come under intense scrutiny, as many clinicians and researchers believe that the current
risk adjustment model is inadequate because it does not include a measure of stroke severity.
Due to its well-documented importance in individual-level prediction, there is concern that
excluding a measure of stroke severity from risk adjustment will lead to incorrect rankings of
hospital performance, i.e. hospital profiling.
However, administrative datasets used in CMS currently do not capture a measure of
stroke severity, such as the National Institutes of Health Stroke Scale (NIHSS), and in clinical
databases which capture NIHSS, it is frequently missing. Little work has been done to asses if
the documentation of NIHSS is biased, and if so, what impact bias would have on hospital-level
estimates of mortality. In this study, we analyzed data from ischemic stroke patients from an
existing stroke registry to identify patterns and characteristics that predict NIHSS
documentation at the patient- and hospital-level. Next, we tested for the presence of selection
bias in patients with documented NIHSS using the Heckman Selection Model. Finally, using
computer simulations, we estimated the impact of missing NIHSS data on hospital profiling of

30-day mortality, under different assumptions about the prevalence and mechanism of missing
NIHSS data.
We found that patients with documented NIHSS were, in fact, a biased subsample of all
ischemic stroke patients. Documentation of NIHSS was driven by a combination of patient-level
and hospital-level factors. At the patient- and hospital-level, analyses suggested that patients
with more severe strokes (i.e. increased NIHSS score) were better documented than patients
with less severe strokes. These findings were confirmed using the Heckman Selection Model.
However, in both analyses, we found that the amount of bias was modest.
In computer simulations, we quantified the impact that missing data would have on the
accuracy of hospital ischemic stroke profiling, under different assumptions about how NIHSS
data was missing. Any effect of missing NIHSS mechanism was trumped by the impact of
missingness on sample size. Because patients with missing NIHSS data were dropped from riskadjustment models as documentation of NIHSS decreased, the accuracy of hospital riskstandardized mortality rates (RSMRs) estimated by the hierarchical logistic model deteriorated.
All of our findings were substantially modified by the hospital ischemic stroke volume, with low
volume hospital suffering the worst accuracy. These results are a reflection of the fact that the
loss of sample size (either through the documentation rate or hospital volume), increases the
amount of shrinkage in RSMR estimates, which makes any random noise more impactful on
changes in RSMR. Overall, our findings raise concerns about the addition of NIHSS data into risk
adjustment models for hospital-level ischemic stroke outcomes, and illustrate shortcomings in
current methodologies used to profile hospitals. It is crucial that data used in risk adjustment
for hospital profiling be documented with very high levels of completeness.

Copyright by
MICHAEL P. THOMPSON
2015

This dissertation is dedicated to my family and friends for their support and belief in me
throughout my education. Above all, I would like to thank my wife, Megan, for her love and
encouragement to pursue my career ambitions.

v

ACKNOWLEDGEMENTS

I will be forever grateful to my mentor and dissertation committee chair, Dr. Mat
Reeves, who pushed me to never settle for good enough and inspired me to pursue greater
ambitions. I would also like to thank Dr. Zhehui Luo, Dr. Joseph Gardiner, and Dr. Jim Burke for
their commitment and direction as part of my dissertation committee. In addition, I would like
to acknowledge the faculty, staff, and fellow students of the Department of Epidemiology and
Biostatistics for sharing their knowledge, support, and camaraderie throughout my program.
I would also like to acknowledge Adrienne Nickles of the Michigan Department of
Community Health for her assistance in working with the Michigan Stroke Registry, which plays
an integral part of my dissertation.
Finally, I am indebted to the Michigan Health & Hospital Association, notably Sam
Watson and Steve Levy, for funding my graduate assistantship and giving me the opportunity to
learn from and contribute to their continued work in improving the quality of health care across
the State of Michigan.

vi

TABLE OF CONTENTS

LIST OF TABLES ......................................................................................................................... ix
LIST OF FIGURES ....................................................................................................................... xi
KEY TO ABBREVIATIONS............................................................................................................ xiii
CHAPTER 1: BACKGROUND AND OBJECTIVES .......................................................................... 1
Burden of Stroke in the US ........................................................................................... 1
CMS and Pay-for-Performance ..................................................................................... 1
Hospital Profiling and Risk Adjustment........................................................................ 3
Hospital-Specific Mortality as a Performance Measure .............................................. 5
Controversy with 30-Day Ischemic Stroke Measures .................................................. 7
Current Limitations to Including NIHSS in Risk Adjustment Models........................... 10
Bias as a Result of Missing Data ................................................................................... 11
Statement of Problem, Aims, and Outline ................................................................... 13
CHAPTER 2: PATTERNS AND PREDICTORS OF NIHSS DOCUMENTATION................................ 17
Aim 1 - Background ....................................................................................................... 17
Aim 1 - Methods ............................................................................................................ 18
Data and Participants ........................................................................................ 18
Predictor Variables ............................................................................................. 19
Outcome Variable .............................................................................................. 19
Statistical Analysis.............................................................................................. 20
Aim 1 - Results ............................................................................................................... 22
Aim 1 - Discussion.......................................................................................................... 29
CHAPTER 3: ASSESSING SELECTION BIAS IN PATIENTS WITH DOCUMENTED NIHSS USING THE
HECKMAN SELECTION MODEL .................................................................................................. 34
Aim 2 - Background ....................................................................................................... 34
Aim 2 - Methods ............................................................................................................ 37
Data and Participants ........................................................................................ 37
Predictor Variables ............................................................................................. 38
Outcome Model Specification ............................................................................ 39
Selection Model Specification ............................................................................ 39
Estimating the Correlation Coefficient ............................................................... 40
Aim 2 - Results ............................................................................................................... 40
Aim 2 - Discussion.......................................................................................................... 43

vii

CHAPTER 4: THE IMPACT OF MISSING NIHSS DATA ON THE ACCURACY OF HOSPITAL
PROFILING.................................................................................................................................. 46
Aim 3 - Background ....................................................................................................... 46
Aim 3 - Methods ............................................................................................................ 47
Section 1 – Parameter Generation for Simulations ........................................... 48
Section 2 – Generating Datasets for Simulations .............................................. 54
Section 3 – Missing NIHSS Data Model Specification ........................................ 57
Section 4 – Hospital Profiling Methodology....................................................... 60
Section 5 – Assessments of Profiling Accuracy Using the Simulated Data ........ 60
Aim 3 - Results ............................................................................................................... 63
Accuracy of Hospital RSMR Rank-Order ............................................................ 63
Accuracy of Hospital High/Low Performer Classification .................................. 64
Absolute Change in Hospital RSMR Rankings .................................................... 69
Aim 3 - Discussion.......................................................................................................... 71
CHAPTER 5: DISCUSSION AND FUTURE DIRECTIONS ............................................................... 78
Summary of Findings ..................................................................................................... 78
Limitations ..................................................................................................................... 83
Including NIHSS in Risk Adjustment Models for Stroke Performance Measures ....... 85
Critique on Current Profiling Methodologies ............................................................... 87
Future Directions ........................................................................................................... 91
Conclusion ..................................................................................................................... 92
CHAPTER 6: SUMMARY ............................................................................................................. 94
APPENDICES ............................................................................................................................ 96
Appendix A: Supplementary Tables ............................................................................ 97
Appendix B: Supplementary Figures ........................................................................... 101
Appendix C: IRB Determination ................................................................................... 102
Appendix D: Example Data Generation SAS Code ...................................................... 103
Appendix E: Example Simulation Assessment SAS Code ............................................ 108
BIBLIOGRAPHY........................................................................................................................... 114

viii

LIST OF TABLES

Table 1.1. Domains and score/descriptions for National Institute of Health Stroke Scale, final
score ranges from 0-42. ............................................................................................................. 8
Table 2.1. Patient demographics, EMS and admission information, medical history and
discharge status in Ischemic Stroke patients 65 years of age or older, in the overall sample
(n=10,717) and stratified by NIHSS Documentation status. (2009-2012) ................................. 23
Table 2.2. Michigan Stroke Registry hospital-level characteristics in the sample of 23 hospitals,
stratified by tertile of hospital NIHSS documentation rate. ...................................................... 25
Table 2.3. Unadjusted and adjusted odds ratios (and 95% CIs) for patient and hospital
characteristics predicting NIHSS documentation (yes vs. no) and estimated hospital-level
variation and intraclass correlation (n=10,717). ....................................................................... 27
Table 3.1. Heckman Selection Model specifications for outcome and selection models in the
full sample of n=10,717 stroke cases (2009-2012). ................................................................... 41
Table 3.2. Estimated correlation coefficient between error terms of outcome and selection
models for the full sample (2009-2012), and by 2009-2010 and 2011-2012............................ 43
Table 4.1. Get With the Guidelines-Stroke in-hospital mortality risk score variables, categories,
and respective points................................................................................................................. 50
Table 4.2. Results of ordered probit model of NIHSS category predicted by sub-risk score.
(n=7,957) .................................................................................................................................... 52
Table 4.3. NIHSS category assignment cutoff intervals derived from the ordered probit model
predicting NIHSS category given the patient sub-risk score...................................................... 55
Table 4.4. Specification for missing NIHSS models, including model parameters and estimated
documentation rates in each category of NIHSS. ...................................................................... 58
Table 4.5. Calculations for sensitivity (Se), specificity (Sp) and predictive value positive (PVP)
and negative (PVN) for true vs. observed high/low performer classification. .......................... 62
Table 5.1. Diagnostic ability of hierarchical logistic model to identify hospital high/low
performers when documentation of NIHSS is complete (i.e. no missing NIHSS data), stratified by
definition of high/low performer and hospital stroke volume. ................................................ 88

ix

Table A.1. Average proportion (%) of hospital high/low performer classification for top/bottom
5th percentile of rank-order (true positive, false positive, true negative, false negative) for
different mechanisms of missing NIHSS data, stratified by hospital stroke volume (n=100, 300,
and 500)... .................................................................................................................................. 97
Table A.2. Average proportion (%) of hospital high/low performer classification for top/bottom
20th percentile of rank-order (true positive, false positive, true negative, false negative) for
different mechanisms of missing NIHSS data, stratified by hospital stroke volume (n=100, 300,
and 500). .................................................................................................................................... 98
Table A.3. Average absolute change in hospital RSMR rankings (# of positions) in different
scenarios of missing NIHSS data, stratified by quintile of true hospital ranking and hospital
stroke volume (n=100, 300, and 500). ....................................................................................... 99

x

LIST OF FIGURES

Figure 2.1. Hospital-level NIHSS documentation rates over time. .......................................... 26
Figure 2.2. Scatter plot of aggregated mean hospital NIHSS score vs. hospital NIHSS
documentation rate with fitted regression line (95% CI) in each year (2009-2012). ................ 28
Figure 2.3. Kernel density curves for patient distribution of NIHSS score, stratified by tertile of
hospital NIHSS documentation (<70%, 70-85%, ≥85%), with ANOVA and Kruskal-Wallis (KW) test
results......................................................................................................................................... 29
Figure 3.1. Conceptual Framework of Heckman Selection Model in this analysis................... 37
Figure 4.1. Overview of data generation process for simulations ........................................... 48
Figure 4.2. Distribution of patient-level NIHSS score categories in the Michigan Stroke Registry
(n=7,957) .................................................................................................................................... 51
Figure 4.3. Spearman rank correlation coefficients between true rankings and RSMR rankings
as NIHSS documentation increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume. ................................................................................... 64
Figure 4.4. Sensitivity of HLM to classify hospitals as high/low performers based on top/bottom
5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as documentation of
NIHSS increases under different mechanisms of missing NIHSS data. Results are stratified by
hospital stroke volume. ............................................................................................................. 65
Figure 4.5. Specificity of HLM to classify hospitals as non-high/low performers based on
top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as
documentation of NIHSS increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume. ................................................................................... 66
Figure 4.6. Predictive value positive of HLM to classify hospitals as high/low performers based
on top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as
documentation of NIHSS increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume. ................................................................................... 67
Figure 4.7. Predictive value negative of HLM to classify hospitals as non-high/low performers
based on top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order
as documentation of NIHSS increases under different mechanisms of missing NIHSS data.
Results are stratified by hospital stroke volume. ...................................................................... 68
xi

Figure 4.8. Average absolute change in hospital RSMR rankings (# of positions) as NIHSS
documentation increases under different mechanisms of missing NIHSS data. Results are
stratified by hospital size and quintile of true ranking. ............................................................. 70
Figure 4.9. Illustrating the effect of shrinkage on RSMR distribution as depicted by range (i.e.
minimum/maximum, solid lines), 5th/95th percentiles (dotted lines), and 25th/75th percentiles
(dashed lines) of RSMRs. Estimates are the averages of 500 simulations for each of 100
hospitals. .................................................................................................................................... 73
Figure B.1. Pearson correlation coefficients between true rankings and RSMR rankings as NIHSS
documentation increases under different mechanisms of missing NIHSS data. Results are
stratified by hospital stroke volume. ......................................................................................... 101

xii

KEY TO ABBREVIATIONS

AHA/ASA

American Heart Association/American Stroke Association

CMS

Centers for Medicare & Medicaid Services

EMS

Emergency Medical Services

FY

Fiscal Year

GWTG-Stroke

Get With The Guidelines – Stroke

Hospital IQR

Hospital Inpatient Quality Reporting

HLM

Hierarchical Logistic Regression Model

HVBP

Hospital Value-Based Purchasing

ICC

Intraclass Correlation

IQR

Interquartile Range

MAR

Missing at Random

MCAR

Missing Completely at Random

MNAR

Missing Not at Random

MSR

Michigan Stroke Registry

NIHSS

National Institutes of Health Stroke Scale

NQF

National Quality Forum

OR

Odds Ratio

O/E

Observed/Expected

P/E

Predicted/Expected

P4P

Pay-for-Performance
xiii

PVP

Predictive Value Positive

PVN

Predictive Value Negative

RSMR

Risk-Standardized Mortality Rate

RSRR

Risk-Standardized Readmission Rate

SD

Standard Deviation

Se

Sensitivity

Sp

Specificity

xiv

CHAPTER 1: BACKGROUND AND OBJECTIVES
Burden of Stroke in the US
Stroke is the 4th leading cause of death and the leading cause of serious long-term
disability in the United States.1 Recent estimates indicate that there are 795,000 new and
recurrent strokes annually1, with direct medical costs of $17.5 billion in 2011.2 There are over 1
million hospital admissions for stroke in the US every year. The average inpatient stay for
stroke patients is about 6 days in the US1,3, with the average hospitalization resulting in an
estimated $46,518 in charges.3 Consequently, stroke is the 10th most expensive condition billed
to Medicare and Medicaid and private insurers, and the 5th most expensive condition for
uninsured patients in the US.4
CMS and Pay-for-Performance
In light of this extraordinary clinical and financial burden, the Centers for Medicare &
Medicaid Services (CMS) has decided to incorporate a 30-day ischemic stroke risk-standardized
mortality rate (RSMR) and readmission rate (RSRR) into its Hospital Inpatient Quality Reporting
(Hospital IQR)5 and Hospital Value-Based Purchasing (HVBP)6 programs. These programs
illustrate the implementation of pay-for-performance (P4P) models in healthcare. P4P models
tie provider reimbursement to reporting and predetermined performance measure standards,
as opposed to the volume and complexity of services provided in the traditional fee-for-service
model of reimbursement.7 With health expenditures reaching $2.7 trillion in 20118 and
expected to grow to almost 20% of the US gross domestic product by 20239, both private and
public healthcare providers are implementing P4P models in an attempt to improve the
efficiency of healthcare delivery.10 The overall mission of the CMS P4P programs is to promote

1

high-quality, patient-centered care and accountability through the reporting of predetermined
performance measures.11
The CMS Hospital IQR program was mandated by the Medicare Prescription Drug,
Improvement, and Modernization Act in 2003. The program is designed to incentivize hospitals
to report on condition-specific quality measures,5 which are publicly available through the
Hospital Compare website. The Hospital Compare program allows health care consumers to
find and compare hospitals based on their reported measures.12 Recently, the Affordable Care
Act (ACA) utilized the Hospital IQR infrastructure to tie the reported performance on quality
measures to proportional financial reimbursements through the HVBP program.6 Changes in
reimbursement are dictated by adjustment factors, which are determined by a total
performance score, which reflecting a combination of clinical processes, patient experience,
outcomes, and efficiency of care measures.13,14 Hospitals put a percent of their
reimbursements (currently 1.5%) into a pool, and based on their performance score rank order,
either earn back or lose a proportion of that amount.
Additionally, in June 2007, CMS began publicly reporting hospital 30-day RSMRs for
acute myocardial infarction (AMI) and heart failure (HF), and subsequently added a 30-day
mortality rate for pneumonia in June 2008.15,16 Hospital 30-day readmission rates (RSRRs) were
added for the same conditions in June 2009 as a part of the Hospital Readmissions Reduction
Program (HRRP).17 In 2014, hospitals began to submit 30-day ischemic stroke and chronic
obstructive pulmonary disorder (COPD) RSMRs and RSRRs, in addition to the AMI, HF, and
pneumonia measures.16,18 Measures related to clinical processes, patient experience, patient
safety, and spending per beneficiary are also publicly reported.

2

Hospital Profiling and Risk Adjustment
The HVBP program, and more generally, the P4P model presupposes that hospitals can
be accurately compared based on predetermined performance measures. The process of
comparing hospitals through rank-ordering performance measures (e.g. process rates, outcome
rates) is commonly referred to as hospital profiling.19,20 A critical aspect of hospital profiling is
accounting for the variation in patient characteristics between hospitals – referred to as casemix – using risk-adjustment methods.21-24 Because patients are not randomized into hospitals,
we must use statistical adjustment to account for imbalances in hospital case-mix.25 Thus, the
purpose of risk-adjustment, or case-mix adjustment, is to control for confounding that exists
due to differences in the case-mix of patients between hospitals.19 An important aspect in
building risk adjustment models to accurately rank hospitals is including predictors of the
outcome that vary between hospitals. If predictors are evenly distributed between hospitals,
their inclusion in risk adjustment models will have little effect on improving the accuracy of
hospital rankings.26 The adequacy of risk adjustment is often a focal point of debate; and
without satisfactory risk-adjustment, the use of hospital profiling becomes problematic.
All risk adjustment models assume that after accounting for case-mix differences, the
resulting differences in hospital outcomes (e.g. RSMR and RSRRs) are due to underlying
differences in quality between hospitals that are under control of the hospital.18 To account for
case-mix differences CMS currently uses hierarchical logistic regression modeling (HLM) to
calculate a hospital RSMR or RSRR, adjusting for patient case-mix.19 HLM is a multilevel
modeling approach that accounts for the clustering of observations by hospital, and can
estimate hospital-specific deviation in an outcome from the population average based on the

3

estimated hospital random intercept.20,27,28 This method is generally preferred to indirect
standardization by way of standard logistic regression models, as it has been shown to be less
sensitive to smaller hospitals that have fewer observed outcome events, avoids regression-tothe-mean bias, and calculate more accurate predicted probabilities based on hospital-level
effects.28-31
The HLM approach estimates a hospital RSMR, which is calculated as the ratio of
“predicted” deaths to “expected” deaths multiplied by the overall mortality rate. The
“predicted” number of deaths is the sum of individual predicted probabilities from the
multivariable HLM for all patients seen at a particular hospital (which accounts for case-mix),
conditional on the hospital’s performance on mortality, i.e. the hospital-specific random
intercept.19,32 The “expected” number of deaths is the sum of individual predicted probabilities
of death based on case mix, conditional on the average hospital performance, i.e. setting the
hospital-specific random intercept to zero. 19,32 The “predicted” to “expected” ratio (P/E ratio)
is therefore the ratio of deaths expected at a given hospital compared to the number of deaths
expected at the average hospital with the same case-mix. The P/E ratio is then multiplied by
the overall mortality rate to get the RSMR. If the “predicted” number of deaths in a hospital is
higher than the “expected” number of deaths (i.e. P/E ratio > 1) the resulting RSMR for that
hospital would be greater than the overall average mortality rate. Conversely, if there are
fewer “predicted” deaths than “expected” deaths, the hospital RSMR would be lower than
average.
In addition to calculating RSMRs, the HLM approach can be used to identify statistical
outliers in hospital performance using the estimated hospital random intercept. The

4

distribution of hospital random intercepts is assumed to be a normal distribution centered on
zero. Thus, hospitals can be identified as “outlier” hospitals, or hospitals with extreme
performance (high or low), based on where the estimated hospital-specific random intercept
lies on the normal distribution of random intercepts. Typically, if a hospital random intercept
95% confidence interval does not include 0 (i.e. the hospital average), it is considered an outlier
hospital.33,34 This method has been shown to identify outlier hospitals more accurately
compared to the partitioning of hospitals into categories based on their performance measure,
such as quintiles of performance, where many hospitals in the lowest or highest quintiles are
not statistically identified as outliers.35
Hospital-Specific Mortality as a Performance Measure
Despite advances in the statistical methodology used to profile hospitals, a contentious
debate surrounds the use of mortality to compare hospitals. Supporters of mortality as a
performance measure often cite that mortality is a single, easily interpreted, and clinically
meaningful measure to many different stakeholders, especially to patients.36 They also claim
that mortality may reflect an aggregate measures of quality that may not otherwise be
identified through other specific quality measures that reflect processes or structural
measures.22 Furthermore, all-cause mortality is considered a highly reliable, universally
available, and unambiguous measure across all settings, which makes it an ideal reporting
measure.22 A recent study by McCrum, et al. showed that 30-day RSMRs for AMI, HF, and
pneumonia were highly predictive of mortality rates for other medical and surgical conditions
within a hospital, suggesting that they may be useful surrogates for overall hospital mortality
performance.37

5

However, another study by Jha, et al. found that performances on AMI, HF, and
pneumonia mortality rates are not well correlated within a hospital, signifying that overall
mortality performance may not adequately identify “good” or “bad” performing hospitals.38
Other significant limitations with using mortality as a comparative measure of hospital
performance include its inability to discriminate well between high and low performing
hospitals, the significant impact of coding and risk adjustment methods on resulting measure
estimates, the ability for interventions to impact hospital mortality, and that it may be
misleading true quality of a hospital. A study by Mackenzie, et al. suggests that RSMR estimates
are not precise enough to sufficiently discriminate “good” from “bad” hospitals when used to
profiling hospitals.39 Differences in coding and admission practices across hospitals may also
bias hospital standardized mortality ratios, which may incorrectly attribute differences in
outcomes between hospitals to underlying differences in quality of care.40 The methods by
which RSMRs are risk adjusted have also been shown to produce substantially different results,
even though they were applied to the same population.41-43 A recent review of conceptual and
methodological challenges of hospital-wide mortality measures concluded that while mortality
rates may provide useful information, they may also obscure or distort important signals of
quality that are of interest to various stakeholders.36 Importantly, Hogan, et al. found that
while mortality is a clinically relevant measure, few hospital deaths are preventable, which
would limit its value as an endpoint for quality improvement initiatives aimed to improve
hospital performance.44 A study of mortality following coronary artery bypass graft surgery
showed that only one third of in-hospital deaths were deemed preventable.45 Nonetheless,
while debate rages about the appropriateness of using of hospital-wide mortality as a

6

performance measure to compare hospitals, public and private payers are forging ahead and
incorporating them into their P4P programs.
Controversy with 30-Day Ischemic Stroke Measures
The recent addition of the 30-day ischemic stroke RSMR and RSRR to the Hospital IQR
and HVBP programs has been especially contentious. Currently, they lack support from the
National Quality Forum – a non-partisan organization which evaluates proposed performance
measures – and the American Heart Association/American Stroke Association (AHA/ASA).46-48
The primary reason cited for opposing the RSMR and RSRR measures is that they are
inadequately risk-adjusted due to the exclusion of a measure of stroke severity, such as the
National Institute of Health Stroke Scale (NIHSS).47-49 The NIHSS50 is a commonly used measure
of stroke severity collected in stroke trials and registries51, which includes functional domains of
level of consciousness, horizontal eye movement, visual field test, facial palsy, arm motor
function, leg motor function, limb ataxia, sensory perception, language impairment, and speech
impairment.50 (Table 1.1) In a Presidential Advisory statement from the AHA/ASA, Fonarow et
al. state that the “outcome measures as currently constructed may be prone to
mischaracterizing the quality of stroke care being delivered by hospitals and may ultimately
harm ischemic stroke patients.”48

7

Table 1.1. Domains and score/descriptions for National Institute of Health Stroke Scale, final
score ranges from 0-42.
Domain

Score/Description

1a.

Level of Consciousness (Alert, drowsy, etc.)

1b.

LOC Questions (Month, age)

1c.

LOC Commands (Open/close eyes, make fist let go)

2.

Best Gaze (Eyes Open – Patient follows examiners
finger or face)

3.

Visual Fields (Introduce visual stimulus/threat to
patients visual field quadrants

4.

Facial Paresis (Show teeth, raise eyebrows and
squeeze eyes shut)

5a.
5b.

Motor Arm - Left
Motor Arm – Right
(Elevate arm to 90⁰ with patient supine)

6a.
6b.

Motor Leg – Left
Motor Leg – Right
(Elevate leg to 30⁰ with patient supine)

7.

Limb Ataxia (Finger-nose, heel down shin)

8.

Sensory (Pin prick to face, arm, trunk and leg –
compare side to side)

9.

Best Language (Name item, describe a picture, and
read sentences)

10.

Dysarthria (Evaluate speech clarity by repeating
listed words)

11.

Extinction and Inattention (Use information from
prior testing to identify neglect or double
simultaneous stimuli testing)

8

0 = Alert
1 = Drowsy
2 = Stuporous
3 = Coma
0 = Answers both correctly
1 = Answers one correctly
2 = Incorrect
0 = Obeys both correctly
1 = Obeys one correctly
2 = Incorrect
0 = Normal
1 = Partial gaze palsy
2 = Forced deviation
0 = No visual loss
1 = Partial hemianopia
2 = Complete hemianopia
3 = Bilateral hemianopia (blind)
0 = Normal
1 = Minor
2 = Partial
3 = Complete
0 = No drift
1 = Drift
2 = Can’t resist gravity
3 = No effort against gravity
4 = No movement
X = Untestable (Joint fusion or limb amputation)
0 = No drift
1 = Drift
2 = Can’t resist gravity
3 = No effort against gravity
4 = No movement
X = Untestable (Joint fusion or limb amputation)
0 = No ataxia
1 = Present in one limb
2 = Present in two limbs
0 = Normal
1 = Partial loss
2 = Severe loss
0 = No aphasia
1 = Mild to moderate aphasia
2 = Severe aphasia
3 = Mute
0 = Normal articulation
1 = Mild to moderate alluring of words
2 = Near to unintelligible or worse
X = Intubated or other physical barrier
0 = No neglect
1 = Partial neglect
2 = Complete neglect

As is done with current 30-day RSMRs for AMI, HF, and pneumonia, CMS administrative
data are used to generate RSMR and RSMRR used to determine hospital-level
performance.23,24,35,52 Absent from CMS administrative claims data is a measure of stroke
severity, such as the NIHSS. Studies have shown that measures of stroke severity, such as the
NIHSS, significantly improve prediction of patient-level stroke outcomes and are widely
believed to be essential for risk adjustment at the hospital-level.53-56 A systematic review of
case-mix adjustment models for post-stroke mortality and functionality found that stroke
severity is a commonly used and important variable in individual-level risk-adjustment.57
However, it is unclear if stroke severity varies substantially across hospitals enough to make it a
significant confounder. A study of Veterans Affairs (VA) hospitals showed that the addition of
NIHSS into risk adjustment had minimal improvement on model fit, most likely due to little
variation in NIHSS between VA hospitals.58 There are little data on the true variation in stroke
severity across all US hospitals.
Due to its importance in individual-level prediction models, there is concern that
excluding stroke severity from risk adjustment will lead to incorrect rankings of hospital
performance, particularly in stroke referral centers that typically see a more severe spectrum of
patients.26,48,59 In a similar situation, a study conducted by Friese et al. that compared
outcomes in surgical cancer patients, the severity of cancer varied significantly between
hospitals, and resulting risk-adjusted hospital mortality rates were lower among hospitals with
less severe patients compared to hospitals with more advanced disease patients when cancer
severity was not included in the risk adjustment model.60 Furthermore, studies of ICU
performance have shown that referral centers – which frequently accept more severe patients

9

– typically have higher RSMRs compared to referring centers.61,62 Therefore, it is reasonable to
believe that not accounting for stroke severity in risk adjustment may similarly bias hospital
ischemic stroke RSMRs, assuming that there is significant variation in stroke severity between
hospitals.
Current Limitations to Including NIHSS in Risk Adjustment Models
To date, there has been conflicting evidence supporting the use of NIHSS in risk
adjustment for hospital profiling. One study conducted by Fonarow, et al. found that among
hospitals profiled into the top or bottom 20% according to their RSMRs, 26% were ranked
differently once NIHSS was included in risk adjustment.53 However, this was in a dataset with
>50% missing NIHSS data. As previously mentioned, data from VA hospitals showed little
variation in NIHSS between hospitals, and hospital RSMRs calculated with and without NIHSS
were nearly identical.58 It is yet unclear if there is sufficient variation in stroke severity between
hospitals – a necessary condition for risk-adjustment variables26 – especially among hospitals
that are assumed to treat a more severe set of patients, such as tertiary referral centers and
certified primary stroke centers.63
A more practical limitation to including stroke severity in risk adjustment cannot be
ignored. Unlike CMS administrative claims data, clinical registries often do collect measures of
stroke severity. But, despite recent improvements in documentation, registries still struggle to
achieve complete reporting of stroke severity.53,54,56 Given that hospital-specific measures are
calculated from risk-adjustment models using only cases with complete data on risk adjustment
variables, i.e. a complete case analysis, resulting measures can be significantly biased when a
biased subset of patients are used.64,65 One study suggests that assessments of mortality using

10

a complete case analysis of subjects with observed NIHSS may be subject to bias in hospitals
with very low documentation of NIHSS.66 Unless complete reporting of NIHSS can be achieved
through CMS administrative data, hospital-wide measures calculated from incomplete data may
be biased.
Bias as a Result of Missing Data
The extent of bias from a complete case analysis of incomplete data depends on the
mechanism by which data are missing. Missing data are typically classified as missing
completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR).67
If data are MCAR, the missing data are unassociated with any exposure or outcome
information. In other words, missing data are the result of a purely random incident, and the
observed data are a random sample of the entire data. In theory, a complete case analysis
under MCAR should result only in a loss of statistical efficiency (because of the smaller sample
size), but not produce biased estimates.68 If data are MAR, missing data are associated with
fully observed variables. For example, if stroke severity documentation is better in males
compared to females (all observed), data would be considered MAR. In addition to a loss in
statistical efficiency, complete case analysis under MAR may result in biased estimates if the
reason for missing data (gender) is not accounted for.68,69 Methods such as maximum
likelihood estimation and multiple imputation can be employed to combat biased estimates
and a loss in statistical efficiency when data are MCAR or MAR.68,69
The most problematic missing data scenario is when data are MNAR, which is to say that
missingness is related to either unobserved characteristics or the value of the missing variable
itself.68 For example, if stroke severity documentation was better in patients with more severe

11

strokes compared to less severe strokes, the data would be MNAR. Again, bias and loss of
statistical efficiency are attributed to MNAR data. However, the methods employed when data
are MCAR or MAR cannot correct for all the bias resulting from MNAR data, because you
cannot directly estimate a pattern based on missing data.68,69
Missing data are common in clinical research.68,70 They are especially common in
administrative datasets such as billing data, where certain variables may be completely
unavailable, or data from electronic health records, where variables are often incompletely
documented.71 Research has shown that a complete case analysis when covariate data are
missing can lead to biased estimates of patient-level outcomes.72-74 How a complete case
analysis in the presence of missing data impacts hospital-level estimates is less obvious. One
simulation study comparing hospital trauma-related mortality measures showed that a
complete case analysis when risk-adjustment variable data are MNAR led to considerable
changes in hospital-level mortality profiling.64 Using a complete case analysis to profile
hospitals when missing data are present has also been shown to underestimate the proportion
of poorly performing providers.75 Another simulation study examining the impact of missing
data on profiling in P4P outcomes showed that between 11 to 21 percent of misclassified
hospitals were attributable to missing data in risk adjustment.65
An analogous problem to excluding patients in hospital-level measures based on
incomplete documentation is variation in administrative data coding. While there are already
well documented limitations to using administrative data in hospital profiling71,76-80, differences
in coding of data can lead to differential exclusion of patients between hospitals. There are a
number of examples that illustrate how variations in coding between hospitals impact hospital-

12

level measures. An analysis of data in the United Kingdom showed that differential coding of
comorbidities between hospitals case-mix adjustment may create biased hospital RSMRs.40
Another recent study demonstrated that excluding patients from pneumonia RSMR calculations
due to variation in coding for pneumonia misclassified 28% of hospitals.81 Austin, et al.
suggested that undercoding of significant comorbidities or severity indicators, which makes
patients appear healthier than they actually are, can potentially misclassify hospitals.82 Using a
“present-on-admission” indicator to distinguish between existing comorbidities and
complications related to quality of care when risk-adjusting for patient health status showed
that a quarter of hospital AMI mortality rankings were misclassified by 10% or more.83
In sum, there is a multitude of research showing that excluding patients from hospitallevel measures, either due to missing clinical data or administrative coding variation, can lead
to inaccurate hospital profiling. However, it is unclear how different mechanisms of missing
data, and the frequency at which missing data occur, can impact the accuracy of hospital
profiling. To our knowledge, this is the first study to assess how different mechanisms and
frequencies of missing NIHSS data impact the accuracy of hospital profiling of stroke mortality
measures.
Statement of Problem, Aims, and Outline
Currently, administrative data used to profile hospitals on CMS 30-day ischemic stroke
RSMRs do not collect measures of stroke severity, such as NIHSS. When NIHSS is collected in
clinical data, such as stroke registries, it is frequently missing and little is known about what
predicts NIHSS documentation. If NIHSS is to be included in risk-adjustment models, cases with
missing NIHSS will be excluded in the calculation of the hospital 30-day RSMR for ischemic

13

stroke. The resulting RSMR may be biased, depending on the mechanism and frequency of
missing NIHSS data. Ultimately, biased RSMRs could lead to inaccurate hospital profiling, which
may unfairly distribute financial incentives in P4P reimbursement models. The aims of this
study are as follows:
1) To identify significant patterns or predictors of NIHSS documentation at the patient-level
and hospital-level in an existing stroke registry.
2) To test for the presence and magnitude of selection bias in patients with documented
NIHSS using the Heckman Selection Model.
3) To estimate the impact of the prevalence and mechanism of missing NIHSS data on the
accuracy of hospital profiling of 30-day ischemic stroke RSMRs using computer simulation
models.
The subsequent chapters of this dissertation will be organized by answering questions for each
of these aims.
What are the overall patterns or predictors of patient-level NIHSS documentation at the
patient- and hospital-level? Chapter 2 will test the hypothesis that there are significant patient
and hospital predictors of NIHSS documentation. Using data from the Michigan Stroke Registry,
we will provide insight into patient or hospital characteristics that explain the documentation of
NIHSS data in stroke patients. Analyses of NIHSS documentation to identify patterns and
predictors will help identify the mechanism and pattern of missing NIHSS data.
Is the subset of patients with NIHSS documented a biased sample, and, if so, to what
extent? Chapter 3 will assess the presence of selection bias in the documentation of NIHSS
using the Heckman Selection Method. The Heckman Selection Method will be used as a

14

diagnostic test for the presence of selection bias in patients with NIHSS documented, i.e.,
patients with observed NIHSS data are systematically different from patients with unobserved
NIHSS. While the previous aim helps to identify significant patterns and predictors of NIHSS
documentation, this aim will provide statistical evidence for selection bias in NIHSS
documentation based on patient stroke severity. The Heckman model also indicates the
magnitude and direction of selection bias in patients with undocumented NIHSS data.
Furthermore, if there is significant selection bias, it would suggest that missing NIHSS data are
MNAR, or non-ignorable. Jointly, the first and second aims will provide a clearer picture of the
mechanism and pattern of NIHSS documentation, which will motivate the use of different
missing data mechanisms in the subsequent computer simulations used in Aim 3.
How does the presence of missing data impact the accuracy of hospital performance
profiling? What role does the prevalence and mechanism of missing NIHSS data have on the
accuracy of hospital profiling? How does hospital case volume modify this relationship?
Chapter 4 will assess the hypothesis that the accuracy of hospital profiling will be affected in
datasets with missing NIHSS compared to fully documented data. This aim will illustrate how
sensitive hospital profiling is when RSMRs are calculated in the face of missing data.
Furthermore, it will illustrate which mechanisms and patterns of NIHSS documentation result in
the most inaccurate hospital rankings at various frequencies of NIHSS documentation. Finally,
we will assess how missing data impact profiling at different hospital ischemic stroke case
volumes.
These analyses will demonstrate the accuracy of hospital profiling based on riskadjusted mortality models when an important risk adjustment variable is frequently

15

undocumented. Furthermore, it will illustrate what role the mechanism of missing data plays in
profiling accuracy. Finally, it will illustrate the importance of hospital volume as an vital
modifier of the relationship between missing data and profiling accuracy. As Voltaire is
famously quoted, “It is better to risk saving a guilty person than to condemn an innocent one.”
We seek to quantify just how many guilty hospitals are saved, and more importantly, how many
innocent hospitals will be condemned.

16

CHAPTER 2: PATTERNS AND PREDICTORS OF NIHSS DOCUMENTATION
Aim 1 – Background
The National Institutes of Health Stroke Scale (NIHSS)50 is a commonly used measure of
stroke severity collected in stroke trials and registries.51 NIHSS has been shown to be one of the
strongest predictors of outcomes in ischemic stroke patients.54,56,84 Despite its clinical
importance, complete documentation of NIHSS in clinical registries has yet to be achieved.
While NIHSS documentation has improved recently, documentation was below 50% in the first
5 years of the Get With the Guidelines (GWTG) – Stroke national registry.54 Furthermore,
measures of stroke severity are currently absent from administrative data.
The Centers for Medicare & Medicaid Services (CMS) will soon be adding 30-day
measures of hospital-level ischemic stroke mortality and readmissions to its pay-forperformance incentive programs.5,6 Because of its importance as a clinical prognostic variable
at the patient-level, it is believed that risk adjustment models used to calculate hospital-level
performance metrics must include a measure of stroke severity,48,59 although evidence to
support this claim has been mixed.53,58 Given that complete documentation of NIHSS has not
been achieved, excluding patients with undocumented NIHSS from risk adjustment models may
impact the validity of hospital-level performance measures. Furthermore, any bias in hospitallevel measures may be aggravated if NIHSS data are missing not at random (MNAR).67,68
Because variation in NIHSS documentation has the potential to bias hospital-level
ischemic stroke performance measures, the purpose of this study is to describe trends in NIHSS
documentation in an existing multi-center clinical stroke registry, and identify any significant
patient- and hospital-level factors associated with NIHSS documentation. Also, we will attempt

17

to determine the extent of bias in NIHSS scores by describing the relationship between NIHSS
documentation and NIHSS scores at the hospital-level. In essence, we will try to determine the
mechanism by which NIHSS is missing. We hypothesize that patients with documented NIHSS
are not simply a random sample of all patients, and that missing NIHSS data may be MNAR.
Aim 1 – Methods
Data and Participants
The Michigan Stroke Registry (MSR) is a statewide clinical registry which originated as a
prototype for the Paul Coverdell National Acute Stroke Registry, and has been described
elsewhere.85 Currently, the MSR is used to provide a data driven approach to improve the
quality of stroke care in the State of Michigan.86,87 The MSR collects information on many
different patient level characteristics including demographics, emergency medical services
(EMS) and hospital admission information, and clinical information such as stroke severity,
ambulatory status, and medical history. In addition, we obtained hospital characteristics from
the American Hospital Association annual survey88 and Paul Coverdell National Acute Stroke
Registry hospital inventory.
We used MSR data from 2009 to 2012 for this analysis. To increase the generalizability
of our findings to a CMS ischemic stroke population, we applied a number of exclusions to the
MSR data. Ischemic stroke patients were included if they were aged 65 years or older and
excluded if they belonged to a hospital with <25 annual cases of ischemic stroke, which is the
minimum number of cases for a hospital risk-standardized mortality rate (RSMR) to be
calculated, as defined by CMS.18 We also excluded patients if the stroke occurred in a hospital
inpatient setting. As this study was a secondary analysis of deidentified registry data, it was

18

considered exempt from Institutional Review Board review. All analyses were conducted with
the use of SAS version 9.3 (SAS Institute Inc, Cary, NC).
Predictor Variables
We examined a number of patient-level predictors of NIHSS documentation.
Demographic characteristics included: age, gender (male vs. female), race (white, black, other,
not documented), and insurance status (Medicare, Medicaid, private, no insurance). We also
assessed emergency medical services (EMS) and hospital admission information, such as: place
stroke occurred (at home vs. in a healthcare setting), arrival mode (EMS, private transportation,
transferred), arrival to the ER (yes vs. no), symptoms resolved prior to arrival (yes vs. no), and
tPA administration (yes vs. no). Finally, we also examined several clinical variables in this
analysis, including: able to ambulate pre-stroke, diabetes mellitus, congestive heart failure,
peripheral artery disease, hypertension, current smoker, and history of prior stroke, transient
ischemic attack/vertebrobasilar insufficiency (TIA/VBI), or myocardial infarction/coronary artery
disease (MI/CAD).
Hospital characteristics included bed size, annual stroke volume (<200, 200-600, 600+),
urban vs. rural location, teaching status, presence of an acute stroke team, and Joint
Commission primary stroke center status.63
Outcome Variable
The NIHSS is a composite measurement of eleven symptoms measurements, including
level of consciousness, horizontal eye movement, visual field test, facial palsy, arm motor
function, leg motor function, limb ataxia, sensory perception, language impairment, and speech
impairment.50 (Table 1.1) The resulting score is an integer which ranges from 0 to 42, with 0

19

representing no stroke symptoms and 42 representing the most severe form of stroke. In
patient-level analyses, we used a binary NIHSS documentation indicator (yes vs. no) as the
outcome variable. In hospital-level analyses, we used the patient- and hospital-level average
NIHSS score as the outcome variable. Hospital-level NIHSS documentation rates were
calculated and categorized by tertiles of NIHSS documentation (<70%, 70-85%, and ≥85%) to
represent low, moderate, and high documenting hospitals.
Statistical Analysis
First, a patient-level descriptive analysis of the data was conducted, which assessed the
distribution of demographic, EMS and hospital admission information, and clinical variables, as
well as patient-level hospital characteristics in the sample, stratified by NIHSS documentation
(yes vs. no). To identify patient-level factors associated with documentation, bivariate
associations were assessed using chi-square tests and ANOVA for categorical and continuous
variables, respectively. We also assessed differences in hospital characteristics, mean NIHSS
score, mortality rates and average length of stay (in days) between tertiles of NIHSS
documentation rate. Fisher’s Exact Test and ANOVA were used to test for any significant
differences between tertile for categorical and continuous variables, respectively. We then
tested for significant changes in hospital-level documentation rates over time with ANOVA,
which were then illustrated using box plots for each year (2009-2012).
Significant patient- and hospital-level predictors of NIHSS documentation at the patientlevel were assessed using unadjusted and adjusted hierarchical logistic regression model, which
accounting for clustering of data within hospitals. The modeling procedure was motivated by
the multilevel modeling approach by Singer.89 First, a model with a hospital random intercept

20

and no fixed effects was run to assess the within-hospital variation in NIHSS documentation.
The hospital-level variance (𝜎𝑗 ) was used to calculate the intraclass correlation coefficient (ICC)

in the model using the equation, 𝐼𝐶𝐶 = 𝜎𝑗 ⁄ �𝜎𝑗 +

𝜋2
3

�. Then, we specified a fully saturated

model, which included all patient and hospital-level variables with p<0.20 in the previous

bivariate analysis as fixed effects, as well as a hospital random intercept. Using a backward
selection approach with stepwise deletion, we eliminated all non-significant (p>0.05) variables
from the model. The final model contained significant patient and hospital fixed effects, and a
hospital random intercept. We tested for the statistical significance of 𝜎𝑗 using a log-likelihood

test. A priori hospital-level fixed effects terms, including primary stroke center status and
stroke volume, were retained in the final model regardless of their statistical significance.

To determine if NIHSS documentation is related to the NIHSS score, i.e. undocumented
NIHSS is MNAR, we performed two analyses. First, Pearson and Spearman correlation
coefficients were calculated to assess relationships between hospital-level NIHSS
documentation and hospital-level NIHSS score, for each year (2009-2012). A significant
correlation indicates that the level of documentation is associated with the observed NIHSS
score, suggesting data may be MNAR. Second, we tested for significant differences in the
patient-level distribution of NIHSS scores stratified by tertile of hospital-level NIHSS
documentation rate using ANOVA and a Kruskal-Wallis test. Differences in patient-level NIHSS
score distributions by tertile of hospital documentation rate were illustrated by overlaying
smoothed frequency distributions (kernel density curves) for patients within each tertile. A shift
in the distribution in lower levels of hospital-level documentation may also suggest data may be
MNAR.
21

Aim 1 – Results
Between 2009 and 2012, 18,280 ischemic strokes admitted to 39 hospitals were
abstracted from the Michigan Stroke Registry. A total of 6,572 cases (36.0%) were excluded
because they were under the age of 65. We also excluded data from 16 hospitals (n=991 cases,
5.4%) because their annual case load was below 25 cases.18 Therefore, the final sample
contained 10,717 cases from 23 hospitals, of which 7,956 cases (74.2%) had NIHSS
documented. (Table 2.1) The mean (standard deviation=SD) and median (interquartile range=
IQR) for patients with documented NIHSS was 7.3 (SD=7.8) and 4 (IQR=2-11) respectively.
Table 2.1 shows the patient demographics, EMS and hospital admission information,
and clinical information of the sample, stratified by NIHSS documentation status. Patients with
NIHSS documented were more likely to be white compared to patients who did not have NIHSS
documented (74.1% vs. 68.3%). Patients with NIHSS documented were less likely to have
Medicaid (4.8% vs. 6.6%) and more likely to be privately insured (45.5% vs. 43.2%) compared to
those with NIHSS undocumented. Patients who had NIHSS documented also tended to be at
home at the time of onset (90.6% vs. 86.5%, p<0.0001) and were more likely to arrive to the ER
(89.1% vs. 85.9%, p<0.0001). (Table 2.1) There was a marked and significantly lower percent of
patients whose symptoms had resolved by hospital arrival in those with NIHSS documented
compared to undocumented (3.6% vs. 15.6%, p<0.0001). Another striking difference was in tPA
administration rates between those with and without NIHSS documented (9.3% vs. 1.0%,
p<0.0001). (Table 2.1) In regard to patient medical history, patients with NIHSS documented
were slightly more likely to be ambulatory pre-stroke (96.0% vs. 92.6%, p<0.0001), had higher
rates of atrial fibrillation (23.7% vs. 19.9%, p<0.0001) and dyslipidemia (47.3% vs. 40.4%,

22

p<0.0001), and a lower rate of prior stroke (26.8% vs. 30.5%, p=0.0002). (Table 2.1) There were
no significant differences in NIHSS documentation by age, gender, mode of arrival, history of
diabetes mellitus, prior TIA/VBI, MI/CAD, congestive heart failure, peripheral artery disease,
hypertension, or smoking status.
There were a number of patient-level hospital characteristics which were associated
with NIHSS documentation. (Table 2.1) Patients with NIHSS documented tended to be treated
at hospitals with slightly fewer beds (p<0.0001) and fewer stroke discharges (p<0.0001). (Table
2.1) Modest differences in the proportion of patients with NIHSS documented were observed
between hospitals which were rural vs. urban hospitals, teaching vs. non-teaching hospitals,
possessed an acute stroke team, and primary stroke center status as certified by the Joint
Commission. (Table 2.1)
Table 2.1. Patient demographics, EMS and admission information, medical history
and discharge status in Ischemic Stroke patients 65 years of age or older, in the overall
sample (n=10,717) and stratified by NIHSS Documentation status. (2009-2012)
Variable
Overall Sample
Demographics
Age, mean (SD)
Female
Race
White
Black
Other
Not Documented
Insurance status
Medicare
Medicaid
Private
None
EMS and Admission
Place stroke occurred

NIHSS Documentation Status
Documented
Undocumented
n (%)
n (%)
7,957 (74.3)
2,760 (25.8)

p-value

78.6 (8.2)
4,345 (54.6)
5,897 (74.1)
1,295 (16.3)
93 (1.2)
672 (8.5)
3,884 (48.9)
378 (4.8)
3,619 (43.2)
61 (0.8)

78.8 (8.4)
1,512 (54.8)
1,884 (68.3)
646 (23.4)
23 (0.8)
207 (7.5)
1,364 (49.6)
181 (6.6)
1,189 (43.2)
17 (0.6)

0.2975
0.8773
<0.0001

-

-

<0.0001

23

0.0010

Table 2.1. (cont’d) Patient demographics, EMS and admission information, medical

history and discharge status in Ischemic Stroke patients 65 years of age or older, in
the overall sample (n=10,717) and stratified by NIHSS Documentation status. (20092012)
Variable
At home
In a healthcare setting
Arrival Mode
EMS
Private
Transfer
Arrived in the ER
Symptoms resolved
tPA Administration
Medical History
Ambulatory Pre-Stroke
Atrial Fibrillation
Diabetes Mellitus
Prior Stroke
Prior TIA/VBI
MI or CAD
CHF
Peripheral Artery Disease
Dyslipidemia
Hypertension
Smoking
Hospital Characteristics
Bed size, Mean (SD)
Acute stroke discharges
<200
200-600
600+
Mean (SD)
Rural Hospital
Teaching Hospital
Acute stroke team
Joint Commission Primary
Stroke Center

NIHSS Documentation Status
Documented
Undocumented
7,208 (90.6)
2,386 (86.5)
755 (9.4)
375 (13.4)
3,892 (49.9)
1,310 (48.1)
2,673 (34.3)
996 (36.7)
1,230 (15.8)
411 (15.1)
7,092 (89.1)
2,372 (85.9)
279 (3.6)
414 (15.6)
743 (9.3)
27 (1.0)

<0.0001
<0.0001
<0.0001

7,957 (96.0)
1,882 (23.7)
2,612 (32.8)
2,130 (26.8)
904 (11.4)
2,704 (34.0)
1,050 (13.2)
504 (6.3)
3,766 (47.3)
6,442 (81.1)
882 (11.1)

2,368 (92.6)
548 (19.9)
947 (34.3)
842 (30.5)
306 (11.0)
929 (33.7)
395 (14.3)
180 (6.5)
1,114 (40.4)
2,244 (81.3)
342 (12.4)

<0.0001
<0.0001
0.1534
0.0002
0.6198
0.7572
0.1392
0.7281
<0.0001
0.6910
0.0629

505.1 (245.1)
758 (9.5)
3,339 (42.0)
3,860 (48.5)
569.3 (308.3)
1,223 (15.9)
7,347 (92.3)
6,713 (84.4)

578.6 (274.3)
211 (7.6)
801 (29.0)
1,748 (63.3)
671.9 (373.7)
290 (11.5)
2,605 (94.8)
2,426 (87.9)

<0.0001
<0.0001

6,257 (78.6)

2,092 (75.8)

0.0020

p-value

0.0829

<0.0001
<0.0001
0.0003
<0.0001

Note: Categories with small n were excluded from the table, so cells may not add up to 100%.

24

Hospital-level characteristics stratified by tertile of hospital NIHSS documentation rate
(<70%, 70-85%, ≥85%) can be seen in Table 2.2. Median documentation rates in low,
moderate, and high documenting hospitals was 52.7%, 74.8%, and 89.1%, respectively. (Table
2.2) Mean NIHSS scores were significantly different (p=0.0122) between hospitals with low
(mean=8.8), moderate (mean=7.1) and high (mean=6.7) documenting hospitals. (Table 2.2)
There were no statistically significant differences between any of the hospital characteristics,
tPA administration rates, mortality rates, or average length of stay between tertile of hospital
documentation. (Table 2.2) Although non-significant, low and moderate documenting hospitals
had greater annual stroke volumes compared to high documenting hospitals (p=0.0703).
Table 2.2. Michigan Stroke Registry hospital-level characteristics in the sample of 23
hospitals, stratified by tertile of hospital NIHSS documentation rate.
Variable
Num. of Patients
Characteristics, n (%)
Bed Size*
Annual Stroke Discharges
<200
200-600
600+
Rural Hospital
Teaching Hospital
Acute Stroke Team
Primary Stroke Center
tPA Administration Rate (%)†
Mortality Rate†
Avg. Length of Stay (in days)†
Hospital-Level NIHSS
NIHSS Score†
NIHSS Score*
Documentation Rate†

Hospital NIHSS Documentation Rate
1st Tertile: <70% 2nd Tertile: 70-85% 3rd Tertile: ≥85%
(n=7)
(n=8)
(n=8)
2,663 (24.8)
4,749 (44.3)
3,305 (30.8)
n (%)
n (%)
n (%)
443 (88-675)
407 (311-546)
390 (211-407.5)

p-value
0.5092
0.0703

2 (28.6)
2 (28.6)
3 (42.9)
2 (28.6)
5 (71.4)
6 (85.7)
4 (57.1)
8.4 (4.9)
6.2 (3.9)
5.9 (1.7)

1 (12.5)
2 (25.0)
5 (62.5)
1 (12.5)
8 (100.0)
6 (75.0)
7 (87.5)
6.5 (3.6)
4.6 (1.3)
4.8 (0.8)

2 (25.0)
6 (75.0)
0 (0.0)
2 (25.0)
7 (87.5)
7 (87.5)
5 (62.5)
6.3 (3.6)
4.4 (1.4)
4.5 (0.8)

0.8369
0.2727
1.0000
0.5299
0.5460
0.3029
0.1854

8.8 (1.6)
8.7 (7.0-9.7)
52.7 (14.5)

7.1 (1.2)
7.6 (5.8-8.0)
74.8 (2.7)

6.7 (1.1)
6.9 (5.8-7.4)
89.1 (2.7)

0.0122
<0.0001

* Median (IQR), † Mean (SD)

25

As illustrated in Figure 2.1, hospital-level NIHSS documentation rates have significantly
improved over time (p=0.0072), from a median of 66.8% (IQR: 52.4-76.3%) in 2009 to 86.8%
(IQR: 71.7-92.8%) in 2012.

Figure 2.1. Hospital-level NIHSS documentation rates over time.
Table 2.3 presents the results of the final hierarchical logistic regression model fitted to
predict patient-level NIHSS documentation (yes vs. no) based on patient- and hospital-level
characteristics. After adjustment, patient-level predictors of NIHSS documentation included the
stroke occurring at home (OR=1.22; 95% CI: 1.01, 1.48), mode of arrival (hospital transfer vs.
private transport OR=1.29; 95% CI: 1.05, 1.58), ER presentation (OR=1.69; 95% CI: 1.36, 2.11),
and if the patient was administered tPA (OR=11.46; 95% CI: 7.31, 17.99). NIHSS documentation
was also predicted by pre-stroke ambulatory status (OR=1.75; 95% CI: 1.37, 2.23) and a medical
history of atrial fibrillation (OR=1.17; 95% CI: 1.02, 1.34) and dyslipidemia (OR=1.15; 95% CI:
1.02, 1.28). Patients with a prior stroke (OR=0.86; 95% CI: 0.76, 0.97) and patients whose
symptoms resolved by ER arrival (OR=0.13; 95% CI: 0.11, 0.16) were also at significantly

26

reduced odds of NIHSS documentation. Although non-significant, large hospitals had a reduced
odds of documentation, and Joint Commission primary stroke centers had an increased odds of
documentation. We estimated a statistically significant hospital-level variance as 𝜎𝑗 = 1.0930
(p<0.0001), and calculated the 𝐼𝐶𝐶 = 1.0930� �1.0930 +

𝜋2
3

� = 0.249 or 24.9%, which

suggests that roughly a quarter of the unexplained variation in NIHSS documentation can be
attributed to the hospital-level.
Table 2.3. Unadjusted and adjusted odds ratios (and 95% CIs) for patient and hospital
characteristics predicting NIHSS documentation (yes vs. no) and estimated hospital-level
variation and intraclass correlation (n=10,717).
Variable
Place Stroke Occurred
Home
Healthcare Setting
Arrival Mode
EMS
Transfer
Private
Received in ER
Symptoms Resolved
tPA Administered
Ambulatory Pre-Stroke
History of Atrial Fibrillation
History of Dyslipidemia
History of Prior Stroke
Year
2012
2011
2010
2009
Primary Stroke Center
Stroke Discharges
600+
200-600
<200
Estimated hospital-level
variance, 𝜎𝑗 = 1.0930

Unadjusted OR
(95% CI)

p-value

Adjusted OR
(95% CI)

1.51 (1.32, 1.72)
Ref
1.11 (1.01, 1.22)
1.12 (0.98, 1.27)
Ref
1.34 (1.18, 1.53)
0.20 (0.17, 0.23)
10.43 (7.08, 15.34)
1.88 (1.56, 2.27)
1.25 (1.12, 1.39)
1.33 (1.22, 1.45)
0.83 (0.76, 0.92)
2.70 (2.38, 3.09)
1.83 (1.63, 2.06)
1.49 (1.33, 1.68)
Ref
1.18 (1.06, 1.32)

<0.0001
0.0830
0.0379
0.1091
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
0.0002
<0.0001
<0.0001
<0.0001
<0.0001
0.0022

1.22 (1.01, 1.48)
Ref
1.08 (0.95, 1.21)
1.29 (1.05, 1.59)
Ref
1.69 (1.36, 2.11)
0.13 (0.11, 0.16)
11.46 (7.31, 17.99)
1.75 (1.37, 2.23)
1.17 (1.02, 1.34)
1.15 (1.02, 1.28)
0.86 (0.76, 0.97)
3.03 (2.59, 3.55)
1.88 (1.63, 2.17)
1.39 (1.21, 1.61)
Ref
1.86 (0.65, 5.30)

0.62 (0.52, 0.72)
1.16 (0.98, 1.38)
Ref

<0.0001
0.0881
-

1.09 (0.37, 1.81)

<0.0001

ICC = Intraclass correlation

27

p-value

0.0377
0.0520
0.2302
0.0179
<0.0001
<0.0001
<0.0001
<0.0001
0.0296
0.0198
0.0151
<0.0001
<0.0001
<0.0001
<0.0001
0.2320
0.4219
0.53 (0.15, 1.95)
0.3224
1.03 (0.30, 3.47)
0.9660
Ref
𝜋2
𝐼𝐶𝐶 = 1.0930� �1.0930 + �
3
𝐼𝐶𝐶 = 24.9%

Figure 2.2 plotted the hospital-level documentation rate against the mean hospital-level
NIHSS score for all hospitals (n=23) in each year (2009-2012). The significant, negative Pearson
(r = -0.44, p<0.0001) and Spearman (r = -0.39, p<0.0001) correlation coefficients indicate
moderate correlation between hospital-level NIHSS documentation and NIHSS score. This
suggests that at the hospital-level, mean observed NIHSS scores were higher amongst hospitals
with lower documentation of NIHSS.

Figure 2.2. Scatter plot of aggregated mean hospital NIHSS score vs. hospital NIHSS
documentation rate with fitted regression line (95% CI) in each year (2009-2012).
Figure 2.3 overlays the patient-level distribution of NIHSS scores stratified by the tertile
of hospital documentation (<70%, 70-85%, ≥85%). Both ANOVA (F=14.4, df=2, p<0.0001) and
Kruskal-Wallis tests (chi-square=64.5, df=2, p<0.0001) found statistically significant differences
in NIHSS score distributions between tertiles of hospital-level documentation rate. The kernel
28

density curves confirm our findings, with the lower levels of hospital-level NIHSS
documentation resulting in slightly higher reported NIHSS scores (i.e. a “shift to the right”).
Both of these findings suggest that missing NIHSS data may be MNAR.

Figure 2.3. Kernel density curves for patient distribution of NIHSS score, stratified by tertile of
hospital NIHSS documentation rate (<70%, 70-85%, ≥85%), with ANOVA and Kruskal-Wallis
(KW) test results.
Aim 1 – Discussion
The purpose of this study was to investigate patient- and hospital-level patterns and
predictors of NIHSS documentation. Our study confirmed our hypothesis that patients with
documented NIHSS are not simply a random sample of all stroke patients, and suggests that
missing NIHSS data may be MNAR. Our data also suggest that documentation of NIHSS is a

29

reflection of both patient-level factors, including stroke severity, and the overall hospital-level
documentation at the facility in which a patient is treated.
From the hierarchical logistic regression model, we found that patients whose
symptoms had resolved by arrival had roughly one-tenth the odds of documentation compared
to patients still experiencing stroke symptoms. If it is assumed that resolution of symptoms is
accurately recorded at ER arrival, then such patients could be imputed to NIHSS=0. In patients
with observed NIHSS, the median (IQR) NIHSS for patients whose symptoms had resolved was
as expected 0 (0-2). However, 32% of patients had an NIHSS greater than 0. Thus, imputation
may be a feasible solution in improving the overall documentation of NIHSS, given that 16% of
undocumented cases had symptoms resolve by arrival.
In a previous study, we found that documentation of NIHSS reflected patients who were
candidates for tPA.66 In this study, we also found that tPA administration was much higher in
patients with documented NIHSS than those with undocumented NIHSS (9.3% vs. 1.0%). This
may also explain why patient who were received in the ER had greater odds of documentation,
as they are typically initial candidates to receive tPA. Similarly, patients who were transferred
had higher odds of documentation compared to patients who arrived by EMS or private
transportation, as they most likely represent more severe patients. Any effect of EMS was most
likely accounted for by the variable for arrival to the ER. Additionally, we found that patients
with atrial fibrillation and dyslipidemia had higher rates of documentation. These factors may
also be proxies for more severe strokes, as atrial fibrillation90 and dyslipidemia91 are significant
predictors of stroke severity.

30

No hospital-level characteristics significantly predicted documentation, although it
appears that documentation is higher in hospitals with primary stroke center certification from
the Joint Commission. Low power due to a small number of hospitals (n=23) with little
between-hospital variability in hospital characteristics may explain this finding. We also found
a large statistically significant proportion of the variation (𝜎𝑗 = 1.09, 𝑝 < 0.0001; ICC=25%) in
NIHSS documentation in our model can be attributed to the hospital level, suggesting that
patient-level NIHSS documentation is also a reflection of overall hospital-level NIHSS
documentation.
Analyses of hospital-level documentation and NIHSS scores also confirmed our
hypothesis. At the hospital-level, increased documentation of NIHSS was moderately
correlated with lower mean NIHSS scores (Pearson correlation r = -0.44; Spearman correlation,
r = -0.39). This suggests that hospitals with lower documentation of NIHSS may be
underreporting less severe strokes. This was also reflected in our kernel density curves, which
showed a “right shift” in the patient distribution of NIHSS in lower documenting hospitals. Our
data also shows that hospital-level NIHSS documentation has greatly improved from 2009 to
2012, which is a promising trend if NIHSS is to be used in risk adjustment models.
If missing data were not associated with any characteristics, i.e., a truly random sample
of patients, it would be considered missing completely at random (MCAR), and estimates from
a complete case analysis are less subject to bias.67,68 Since we identified significant predictors
of NIHSS documentation, we can eliminate the possibility that missing NIHSS data is MCAR. The
other possibility is that data is missing at random (MAR), which is to say it is related to some
observed variable, but not the value of the missing data itself.67,68 In cases of either MAR and

31

MNAR, estimates from a complete case analysis may be biased, however, statistical methods
such as multiple imputation or maximum likelihood estimation are often used to correct bias
from data that is MAR.68,92-95 It should be noted that no statistical methods can distinguish
between MAR and MNAR mechanisms of missing data. Furthermore, it is possible that missing
NIHSS data may be both a combination of MAR and MNAR mechanisms. However, given that
our hierarchical model suggests that characteristics of severe stroke patients are associated
with documentation, and that increased hospital-level documentation is associated with a shift
towards less severe patients, we suspect that NIHSS may be MNAR.
Accurate risk adjustment in hospital profiling requires that variables used in risk
adjustment model be of sufficiently high quality.22 There is already concern that the quality of
NIHSS in risk adjustment models due to poor documentation may not be adequate.26 With the
recent announcement that NIHSS is to be included in ICD-10 coding, there will be substantial
pressure to include stroke severity in risk adjustment models using administrative data. Based
on our evidence, it should be recognized that any hospital-level performance measure that
includes NIHSS in risk adjustment is potentially biased if missing NIHSS data are present.
Further research is needed to assess the extent of bias in hospital-level mortality measures
when cases with undocumented NIHSS are excluded from risk adjustment models and profiling,
especially if NIHSS data are MNAR.
There are limitations in this study that should be considered. The sample of hospitals
used in this study is a subsample of all Michigan hospitals, which may not be representative of
all Michigan hospitals or hospitals nationwide. A greater proportion of MSR patients go to
teaching hospitals (93% vs. 61%) and Joint Commission primary stroke center hospitals (78% vs.

32

65%) compared to patients in the GWTG-Stroke nationwide registry.96 Thus, patients in the
MSR may be more similar to each other compared to what may be seen in the GWTG-Stroke
registry or administrative claims data. Replication of this analysis in a larger sample of hospitals
may provide more generalizable results, and provide better estimates about the hospital
characteristics related to NIHSS documentation. Furthermore, it would be advantageous to
repeat this analysis in the future, given the improving documentation of NIHSS over time.
Finally, this analysis was conducted in a stroke registry setting, which has clearly defined data
abstraction procedures. Further research should be done to assess the completeness and
validity of NIHSS in administrative data.
In summary, despite recent improvements in documentation of NIHSS, our evidence
suggests that patients with documented NIHSS are a biased subsample of all ischemic stroke
patients. Documentation of NIHSS is associated with more severe stroke patients, and is also a
reflection of overall hospital-level documentation of NIHSS. Given that NIHSS is a strong
predictor of patient outcomes, further study should be done to assess the degree of bias in
hospital profiling when a subsample of patients is used to calculate hospital performance
measures. Unless complete documentation of NIHSS is achieved, this limitation should be
considered when using NIHSS in risk adjustment models.

33

CHAPTER 3: ASSESSING SELECTION BIAS IN PATIENTS WITH DOCUMENTED NIHSS USING THE
HECKMAN SELECTION MODEL
Aim 2 - Background
The missing data problem is common in clinical research.68,70 Excluding observations
with missing data in statistical models, i.e. performing a complete case analysis, has been
shown to bias model estimates.72-74 Missing data are especially pervasive in administrative
datasets such as billing data or electronic health records, where variables are frequently
undocumented.71
Measures of stroke severity, such as the National Institutes of Health Stroke Scale
(NIHSS), are strong predictors of patient outcomes.54,56,84 Currently, it is collected solely in
clinical registries where it is frequently underreported, and is absent from administrative data.54
However, it was recently announced that NIHSS is to be included in ICD-10 coding, with the
intent to include NIHSS in stroke performance measures using administrative data. But, if
NIHSS is underreported in administrative data, then excluding patients without NIHSS
documented may introduce bias into models of hospital performance, if it is a biased
subsample of patients, i.e. NIHSS data are missing not at random.67,68
Using the Heckman Selection Model, we will test for the presence of selection bias in
patients with documented NIHSS. The Heckman Selection Model (hereinafter referred to as the
Heckman model), was pioneered by James J. Heckman in 1979 to identify and correct for bias in
study estimates resulting from a non-randomly selected sample.97 He illustrated that when
estimating wages of women in the workforce, the population of women excluded housewives,
who had self-selected out of the workforce. Thus, the distribution of wages was truncated

34

because it excluded a group of women for whom wages were not sufficient for them to enter
the workforce. Previously, other methods – such as identifying patterns and predictors of
documentation – were used to provide evidence of selection bias, but ultimately, investigator
intuition was used to identifying potential selection bias. The Heckman model offered a
method to estimate the magnitude of selection bias in the sample, and importantly, could then
be used to adjust outcomes for the potential bias. While the Heckman model is commonly used
in economics and social sciences, it has been used sparingly in the biomedical sciences or health
services research. Examples of its use to assess and control for survey nonresponse bias include
assessments of medication use98, estimates of HIV prevalence99, and self-reported quality of
life.100
The Heckman model consists of a two-equation model with a model predicting the
outcome of interest – the outcome model – and a model predicting whether the outcome was
observed or not – the selection model. The outcome model is a linear regression model with a
normally distributed, continuous dependent variable, and set of independent predictors (𝑥𝑖 ).

The selection model is a probit model with binary dependent indicator (𝑅𝑖 = 1 if the outcome is

observed, 𝑅𝑖 = 0 if unobserved) and set of independent predictors, which typically include the
predictors from the outcome model (𝑥𝑖 ), as well as additional predictors of NIHSS

documentation (𝜔𝑖 ). As opposed to the logistic model typically used for binary outcomes, the

probit model is necessary because the Heckman model requires the two equations have jointly
normally distributed error terms. The overall model can be seen below.
𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝑀𝑜𝑑𝑒𝑙: 𝑁𝐼𝐻𝑆𝑆 ∗ = 𝑥𝑖 𝛽 + 𝜀𝑖 ,

𝑤ℎ𝑒𝑟𝑒 𝑁𝐼𝐻𝑆𝑆 ∗ 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑟𝑢𝑒 𝑠𝑐𝑜𝑟𝑒 𝑎𝑛𝑑 𝑁𝐼𝐻𝑆𝑆 = 𝑁𝐼𝐻𝑆𝑆 ∗ 𝑤ℎ𝑒𝑛 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑.
35

𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑀𝑜𝑑𝑒𝑙: 𝑅𝑖∗ = 𝑥𝑖 𝛽 + 𝜔𝑖 𝛾 + 𝑢𝑖 , 𝑅𝑖 �

= 1 𝑖𝑓𝑅𝑖∗ > 0
= 0 𝑖𝑓𝑅𝑖∗ ≤ 0

Both selection and outcome models have error terms with mean zero: 𝜀𝑖 ~𝑁(0, 𝜎 2 ) and

𝑢𝑖 ~𝑁(0,1). When the available data is a random sample of all data, i.e. no selection bias, the

error terms are assumed to be independent, and the correlation between error terms is thus
𝜌 = 𝑐𝑜𝑟𝑟(𝜇𝑖 , 𝜀𝑖 ) = 0. However, in the presence of selection bias, the available data is

determined by a sample selection process, which means that the outcome model is dependent
on the selection process, which is reflected through correlation between the error terms, i.e.,
𝜌 = 𝑐𝑜𝑟𝑟(𝜇𝑖 , 𝜀𝑖 ) ≠ 0.

Typically, the next step in the Heckman model is to obtain a correction factor termed

the inverse Mills Ratio using the error term correlation, and include the correction factor in the
outcome model to adjust for selection bias. However, previous work has shown that using the
correction factor may worsen, rather than improve estimates, especially when significant
selection bias has been found or if the selection model is incorrectly specified.101,102 For the
purposes of this analysis, we are simply interested in the Heckman model’s utility as a
diagnostic test for selection bias, rather than to use this estimate to adjust for selection bias. It
is suggested that an a priori understanding of possible drivers of selection bias, and the
direction and magnitude in which selection bias may occur improves the validity of the
method.101,103
In Chapter 2, we illustrated that NIHSS documentation may be greater in more severe
stroke cases as compared to less severe strokes, but the differences observed were modest.
The purpose of this aim is to provide further evidence that NIHSS data are MNAR i.e., that
NIHSS documentation is correlated with the NIHSS score by using the Heckman Selection
36

Model. We hypothesize that a significant correlation coefficient between the outcome model
(patient-level NIHSS score) and selection model (patient-level NIHSS documentation) will be
detected. Figure 3.1 illustrates the conceptual framework of the Heckman model for this
analysis.

Figure 3.1. Conceptual Framework of Heckman Selection Model in this analysis.
Aim 2 – Methods
Data and Participants
We will again use data from the Michigan Stroke Registry (MSR) from 2009-2012 as
described in Chapter 2. Briefly, the MSR is a statewide clinical registry which originated as a
prototype for the Paul Coverdell National Acute Stroke Registry, and has been described
elsewhere.85 Currently, the MSR is used to provide a data driven approach to improve the
quality of stroke care in the State of Michigan.86,87 The MSR collects information on many
37

different patient level characteristics including demographics, EMS and hospital admission
information, and clinical information such as stroke severity, ambulatory status, and medical
history. In addition, we obtained hospital characteristics from the American Hospital
Association annual survey88 and Paul Coverdell National Acute Stroke Registry hospital
inventory.
We used MSR data from 2009 to 2012 for this analysis. To increase the comparability of
our findings to a CMS ischemic stroke population, we applied a number of exclusions to the
MSR data. Ischemic stroke patients were included if they were aged 65 years or older and
excluded cases if they belonged to a hospital with <25 annual cases of ischemic stroke, which is
the minimum number of cases for a hospital risk-standardized mortality rate (RSMR) to be
calculated, as defined by CMS.18 We also excluded patients if the stroke occurred in a hospital
inpatient setting. As this study was a secondary analysis of deidentified registry data, it was
considered exempt from Institutional Review Board review. All analyses were conducted with
the use of SAS version 9.3 (SAS Institute Inc, Cary, NC).
Predictor Variables
We examined a number of patient-level predictors in the outcome and selection
models. Demographic characteristics included: age, gender (male vs. female), race (white,
black, other, not documented), and insurance status (Medicare, Medicaid, private, none). We
also assessed EMS and hospital admission information, such as: place stroke occurred (at home
vs. in a healthcare setting), arrival mode (EMS, private, transferred), arrival into the ER (yes vs.
no), symptoms resolved prior to arrival (yes vs. no), and tPA administration (yes vs. no). Finally,
we also examined several clinical variables in this analysis, including: able to ambulate pre-

38

stroke, diabetes mellitus, congestive heart failure, peripheral artery disease, hypertension,
current smoker, and history of prior stroke, transient ischemic attack/vertebrobasilar
insufficiency (TIA/VBI), or myocardial infarction/coronary artery disease (MI/CAD).
Hospital characteristics were also examined and included annual stroke volume (<200,
200-600, 600+), urban vs. rural location, teaching status, presence of an acute stroke team, and
Joint Commission primary stroke center status.63
Outcome Model Specification
The dependent variable in the outcome model is the patient-level NIHSS score, which is
continuously distributed with a 0-42 point integer scale. A primary assumption of the outcome
model in the Heckman model is that the dependent variable be a normally distributed,
continuous variable.97 To satisfy this assumption, we transformed the NIHSS score to a normal
distribution using a Box-Cox transformation, as the NIHSS distribution is highly right-skewed.
The outcome model was specified using a backward selection process of predictor variables
with stepwise deletion of non-significant predictors. Quasi-maximum likelihood estimation was
used to produce robust estimates of standard errors to account for a clustering effect of
patients within hospitals.
Selection Model Specification
The dependent variable in the selection model is the patient-level binary indicator of
NIHSS documentation (documented vs. undocumented). The selection model was specified to
include all significant predictors of NIHSS score, regardless of statistical significance in the
selection model. Additional significant predictors of NIHSS documentation were again derived
from a backward selection process of the remaining predictor variables with stepwise deletion

39

of non-significant predictors. Again, we calculated robust standard errors using quasimaximum likelihood estimation to account for clustering of patients in hospitals.
Estimating the Correlation Coefficient
Once the outcome and selection models were specified, we utilized the PROC QLIM
procedure in SAS to estimate the correlation coefficient between the outcome model and
selection model error terms. The PROC QLIM (qualitative and limited dependent variable
model) procedure allows users to estimate the correlation between a simultaneously specified
multivariable outcome and selection model.104 A statistically significant correlation coefficient
would indicate the presence of selection bias in patients with documented NIHSS. The
correlation coefficient ranges from +1 to -1, with 0 representing no selection bias, and +/-1
representing strong selection bias. A positive correlation would indicate that as the NIHSS
score increases, i.e. strokes are more severe, documentation increases. Conversely, a negative
correlation indicates that as NIHSS score increases, documentation decreases.
To investigate how the prevalence of missing NIHSS data impacts the amount of
selection bias in the sample, we repeated the analysis using only data from 2009-2010, when
documentation was lower (68%), and again using only data from 2011-2012, when
documentation was greater (80%), and estimated the correlation coefficient between the
models in each time period (2009-2010 and 2011-2012).
Aim 2 - Results
From the Michigan Stroke Registry, we used data from 10,717 ischemic stroke cases
discharged from 23 hospitals for the analysis, of which 7,957 cases (74.3%) had NIHSS
documented. The following variables were statistically significant independent predictors of

40

NIHSS score in the outcome model: age (in years), gender (male vs. female), stroke occurred at
home vs. in a healthcare setting, symptoms resolved prior to arrival, mode of arrival (EMS,
private, transfer), tPA administration (yes vs. no), ambulatory pre-stroke (yes vs. no), presence
of an acute stroke team (yes vs. no), and history of atrial fibrillation, prior stroke, dyslipidemia,
and heart failure (yes vs. no). (Table 3.1) A positive beta coefficient indicates that the variable
is associated with a more severe stroke, while a negative beta coefficient is associated with a
less severe stroke. When these variables were included in the selection model, all were
statistically significant predictors of documentation except age, gender, and PMH heart failure.
The selection model included the following additional significant predictors of NIHSS
documentation: insurance status (Medicare, Medicaid, Private, none), race (white, black, other,
not documented), patient received in the ER (yes vs. no), year (2009, 2010, 2011, and 2012),
hospital Joint Commission primary stroke center status (yes vs. no), and hospital stroke volume
(<200, 200-600, 600+). (Table 3.1)
Table 3.1. Heckman Selection Model specifications for outcome and selection models in the
full sample of n=10,717 stroke cases (2009-2012).
Variable
Intercept
Age (per year)
Gender (Male)
Stroke Occurred at
Home
Symptoms Resolved
Arrival Mode
EMS
Private
Transfer
tPA Administered
Ambulatory Pre-stroke
History of Atrial
Fibrillation
History of Prior Stroke

Frequency
% or Mean(SD)
78.7 (8.3)
54.7

Outcome Model
Coefficient (SE) p-value
1.4828 (0.1261) <0.0001
0.0103 (0.0013) <0.0001
-0.0625 (0.0202)
0.0020

Selection Model
Coefficient (SE) p-value
1.3778 (0.2089) <0.0001
-0.0022 (0.0015)
0.1547
-0.0308 (0.0240)
0.1992

89.5

-0.1612 (0.0357)

<0.0001

0.1842 (0.0379)

<0.0001

6.6

-0.8008 (0.0501)

<0.0001

-0.9348 (0.0310)

<0.0001

49.5
34.9
42.6
7.2
95.1

-0.0239 (0.0283)
-0.6342 (0.0293)
Ref
0.8149 (0.0393)
-0.4972 (0.0536)

0.3986
<0.0001
<0.0001
<0.0001

-0.1055 (0.0412)
-0.1030 (0.0416)
Ref
1.1631 (0.0885)
0.3147 (0.0517)

0.0105
0.0134
<0.0001
<0.0001

22.7

0.2319 (0.0244)

<0.0001

0.0943 (0.0303)

0.0018

27.7

0.1515 (0.0226)

<0.0001

-0.0604 (0.0258)

0.0193

41

Table 3.1. (cont’d) Heckman Selection Model specifications for outcome and selection models

in the full sample of n=10,717 stroke cases (2009-2012).
Variable
History of Dyslipidemia
History of Heart Failure
Acute Stroke Team
Insurance Status
Medicare
Medicaid
Private
None
Race
White
Black
Other
Not Documented
Received in the ER
Year
2009
2010
2011
2012
PSC Status
Stroke Volume
<200
200-600
600+

Frequency
% or Mean(SD)
45.5
13.5
85.3

Outcome Model
Coefficient (SE) p-value
-0.1005 (0.0203) <0.0001
0.1609 (0.0299) <0.0001
0.1076 (0.0276) <0.0001

Selection Model
Coefficient (SE) p-value
0.1263 (0.0239) <0.0001
-0.0327 (0.0245)
0.3433
-0.1458 (0.0362) <0.0001

49.1
5.2
45.0
0.7

-

-

0.2944 (0.1220)
0.1043 (0.1299)
0.2496 (0.1225)
Ref

0.0158
0.4222
0.0417
-

72.6
18.1
1.1
8.2
88.3

-

-

-0.0758 (0.0456)
-0.3237 (0.0503)
0.0627 (0.1285)
Ref
0.2424 (0.0431)

0.0964
<0.0001
0.6258
<0.0001

25.0
23.5
26.0
25.5
77.9

-

-

-0.5756 (0.0346)
-0.3620 (0.0458)
-0.2103 (0.0359)
Ref
0.2211 (0.0294)

<0.0001
<0.0001
<0.0001
<0.0001

9.0
38.6
52.3

-

-

0.2675 (0.0443)
0.3344 (0.0283)
Ref

<0.0001
<0.0001
-

Table 3.2 shows the estimated correlation coefficients between the error terms of
specified outcome and selection models. For the entire sample, we estimated a statistically
significant correlation coefficient of 𝜌=0.11 (95% CI: 0.09, 0.13; p <0.0001). (Table 3.2) This is
interpreted as weak, but statistically significant, selection bias. The positive sign on the

correlation indicates that as NIHSS score increases, the probability of documentation also
increases. When we restricted data to 2009-2010, when documentation was relatively lower
(68%), we found a slightly higher correlation coefficient of 𝜌=0.13 (95% CI: 0.07, 0.20;

p<0.0001), indicating a modest increase in selection bias when documentation was lower.

42

Conversely, when we limited our data to 2011-2012, when documentation was better (80%),
we found a slightly lower correlation coefficient of 𝜌=0.07 (95% CI: 0.05, 0.09; p<0.0001),
indicating less selection bias when documentation had improved.

Table 3.2. Estimated correlation coefficient between error terms of outcome and selection
models for the full sample (2009-2012), and by 2009-2010 and 2011-2012.

a
b

c

Sample

Total

Num. NIHSS
Documented, n (%)

Estimated Correlation
Coefficient (95% CI)

p-value

2009-2012

10,717

7,957 (74.3)

0.11 (0.09, 0.13) a

<0.0001

2009-2010

5,197

3,554 (68.4)

0.13 (0.07, 0.20) b

<0.0001

2011-2012

5,520

4,403 (79.8)

0.07 (0.05, 0.09) c

<0.0001

Outcome and selection model variables can be seen in Table 3.1.
Outcome model variables: age, gender, stroke occurred at home, arrival model, received in the ER, symptoms resolved by
ER arrival, tPA administration, ambulatory pre-stroke, and history of atrial fibrillation, heart failure, stroke and myocardial
infarction.
Selection model variables: Outcome model variables plus race, history of smoking, year, and hospital stroke volume, rural
location, Joint Commission Primary Stroke Center status, presence of acute stroke team.
Outcome model variables: age, stroke occurred at home, arrival mode, symptoms resolved by ER arrival, tPA
administration, ambulatory pre-stroke, hospital presence of acute stroke team, and history of atrial fibrillation, heart
failure, stroke, and TIA/vertebrobasilar insufficiency.
Selection model variables: Outcome model variables plus race, received in the ER, year, and hospital stroke volume, rural
location, teaching status, Joint Commission Primary Stroke Center status.

Aim 2 – Discussion
Using the Heckman model, we found evidence of selection bias in patients with
documented NIHSS. Although statistically significant, the magnitude of the selection bias
appears to be relatively weak (𝜌=0.11). The positive correlation between stroke severity (NIHSS
score) and NIHSS documentation suggests that as stroke severity increases, the probability of
documentation also increases. In Aim 1, we concluded that more severe strokes are better
documented. Our results in this study confirm our findings, which also indicate that as patientlevel stroke severity increases, the probability of documentation also increases. Furthermore,
as expected we found that when documentation of NIHSS was lower, the magnitude of
selection bias slightly increased (𝜌=0.13), and bias subsequently decreased slightly when

43

documentation improved (𝜌=0.07). In the traditional Heckman model process, we would
subsequently calculate the inverse Mills ratio using the estimated correlation coefficient, and
include this parameter in the model predicting patient-level NIHSS score to correct for the
selection bias. However, since our aim is only to assess for the presence of bias, and not
estimate patient-level NIHSS score, we did not perform this step.
Together, this evidence provides a compelling argument that documentation of NIHSS is
associated with the NIHSS score itself, and that the probability of documentation increases as
patient-level NIHSS increases. As such, it can be safely concluded that missing NIHSS data are
missing not at random (MNAR). Therefore, any analysis that includes NIHSS when it is not
completely documented is subject to selection bias; however, the extent of the bias appears to
be modes and its implications have yet to be understood. Both analyses we conducted
(Chapters 2 and 3) suggested that NIHSS data are MNAR; but selection bias appears to be
relatively weak. In Aim 1, we showed a slight “right shift” in the distribution of patient-level
NIHSS in low documenting hospitals. The average NIHSS score in high versus low documenting
hospitals was 6.7 compared to 8.7, respectively. (Table 2.2) But, would using a more severe
subsample of stroke patients translate to biased estimates of hospital-level mortality? We will
attempt to answer that question in the next aim.
This particular analysis does have some limitations to consider. First, the Heckman
model relies on an accurately specified selection model.103 Failure to specify a correct selection
method may result in inaccurate assessment or correction for selection bias. Having a priori
information about the possible direction of selection bias or what variables might predict
selection may improve the validity of Heckman model estimates. The analysis in Aim 1

44

corroborates our findings, which also suggest a similar magnitude and direction of selection
bias, improving the validity of our results. The Heckman model also requires that the
dependent variable for the outcome model is a continuous, normally distributed variable.97
The NIHSS score is not a normally distributed variable (as illustrated in Figure 2.3). However,
we used a Box-Cox transformation to transform NIHSS into an approximately normal
distribution. As this study used a relatively small subset of hospitals (n=23), further research
should be done to improve the generalizability our findings in a more representative sample of
hospitals.
Using the Heckman Selection Model, we were able to corroborate previous analyses
which showed evidence of weak selection bias in patients with NIHSS documented. We also
illustrated that as documentation of NIHSS improved the magnitude of selection bias in our
data reduced. It is unclear if hospital-level performance measures (e.g. mortality) are biased
when documentation patterns change in respect to patient stroke severity. In the next chapter,
we will employ computer simulations to explore how the prevalence and mechanism of missing
NIHSS data impacts the accuracy of hospital performance profiling.

45

CHAPTER 4: THE IMPACT OF MISSING NIHSS DATA ON THE ACCURACY OF HOSPITAL
PROFILING
Aim 3 – Background
While it is widely accepted that using a complete case analysis in the presence of
missing data may introduce bias into any given analyses68,69, how this impacts hospital-level
estimates used for hospital profiling is less certain. There is some evidence that hospital-level
measures of performance may be biased when missing data is present. One simulation study
comparing risk-adjusted hospital trauma-related mortality measures showed that a complete
case analysis when risk-adjustment variables were missing not at random (MNAR) led to
considerable changes in hospital-level mortality profiling.64 Another simulation study
examining the impact of missing data on profiling of pay-for-performance outcomes showed
that between 11 to 21 percent of misclassification was attributable to missing data used in risk
adjustment models.65 Studies have also shown that differential coding40 or undercoding82 of
comorbidities and severity indicators between hospitals – which would cause “missing” data if
variables were coded incorrectly – can bias hospital-level risk standardized mortality rates.
The addition of 30-day mortality and readmission measures for ischemic stroke into the
Centers for Medicare & Medicaid Services pay-for-performance schemes5,6 has generated
considerable contention regarding the contents of models used in risk adjustment. There is
serious concern that excluding a measure of stroke severity, such as the National Institutes of
Health Stroke Scale (NIHSS)50,51, will not adequately risk adjust hospital-level performance
measures.53-56 Furthermore, it has been suggested that hospitals which tend to see a more
severe case-mix of patients – such as tertiary referral centers or primary stroke centers – may

46

be at greater risk of misclassification.46-48 With the announcement that NIHSS is to be included
in ICD-10 coding, it will likely be included in future risk adjustment models for ischemic stroke
outcomes. Although documentation of NIHSS in clinical datasets has improved in recent years,
it is still frequently missing in large clinical datasets.53 Therefore, it is essential to understand
how missing NIHSS data may impact hospital-level estimates of mortality used in profiling
schemes, especially if it is MNAR.
In this study, we utilize computer simulations to illustrate how missing NIHSS data
impacts the accuracy of hospital performance profiling on ischemic stroke mortality.
Specifically, we will assess how the prevalence and mechanism by which NIHSS is missing
impacts our ability to classify hospital outliers, estimate hospital deviation in ischemic stroke
mortality, and correctly rank-order hospitals on ischemic stroke mortality. We hypothesize that
our ability to correctly identify outlier hospitals and rank-order hospitals will degrade as the
prevalence of missing NIHSS increases, especially in situations where missingness is related to
the severity of the stroke, i.e. is MNAR. Finally, because hospital case volume has previously
been shown to impact profiling accuracy in myocardial infarction, we will also investigate how
hospital ischemic stroke volume modifies our findings.
Aim 3 – Methods
To pursue our aims, data must be generated in such a way that the variation in patient
case-mix and ischemic stroke mortality within and between hospitals reflect empirical
estimates from real-world data. Briefly, a top down approach for data generation was used,
where a set of hospitals were generated with assigned components of case-mix and ischemic
stroke mortality variation. Patients were then generated within each hospital, and assigned

47

patient characteristics that reflect the underlying observed case-mix and mortality. We then
replicated the generated dataset, simulated missing NIHSS data within each dataset based on a
different mechanism and prevalence of missing NIHSS data. Hospital-level outlier status and
RSMRs were estimated from a complete case analysis of patients with observed NIHSS. This
data generation scheme can be seen in Figure 4.1

Figure 4.1. Overview of data generation process for simulations.
Section 1 - Parameter Generation for Simulations
A series of analyses of 10,717 ischemic stroke patients 65 years of age and older from 23
hospitals in the Michigan Stroke Registry (MSR) were conducted to generate parameters
needed for the computer simulations. The MSR is a statewide clinical registry which originated

48

as a prototype for the Paul Coverdell National Acute Stroke Registry, and has been described
elsewhere.85 Descriptive statistics of the sample can be seen in Table 2.1.
Parameters needed for the simulation studies were generated in three distinct steps:
first we created a multivariable patient risk score for in-hospital mortality using MSR data.
Second, we quantified the variation in the patient risk score between hospitals in the registry
(this variation represents the differences in hospital case-mix). Finally, we estimated
hierarchical model parameters for in-hospital mortality model given the patient risk score and
hospital random intercepts. Specific details of the steps are described below.
Patient Risk Score: We used the Get With the Guidelines – Stroke (GWTG-Stroke) inhospital mortality risk score for this analysis, which includes NIHSS score.54 In-hospital
mortality was used because the MSR does not have data on 30-day mortality. However, for
acute myocardial infarction patients in-hospital mortality has been shown correlate well
with 30-day mortality.105 The GWTG-Stroke in-hospital mortality risk score was developed
from the logistic model using the method described by Sullivan, et al.106, and contains nine
clinical variables: patient age, NIHSS score categories (0-2, 3-5, 6-10, 11-15, 16-20, 21-25,
and 26-42), mode of arrival, gender, and presence of atrial fibrillation, previous stroke or
TIA, coronary artery disease, diabetes mellitus, or history of dyslipidemia. (Table 4.1) The
score ranges from 0 to 109, and is shown in Table 4.1; the NIHSS score is by far that most
important variable contributing to the total score

49

Table 4.1. Get With the Guidelines-Stroke in-hospital mortality
risk score variables, categories, and respective points.
Variable
Age (in years)

NIHSS Score

Mode of Arrival

Presence of:

Categories
<60
60-70
70-80
≥80
0-2
3-5
6-10
11-15
16-20
21-25
26-42
Private transport
Did not present via ED
Ambulance from scene
Male gender
Atrial fibrillation
Previous stroke or TIA
Coronary artery disease
Diabetes mellitus
History of dyslipidemia
54

Points
0
1
5
9
0
10
21
37
48
56
65
0
16
12
Yes
0
5
0
5
2
0

No
3
0
2
0
0
2

Information taken from Smith, et al. 2010

To allow for manipulation of the NIHSS score variable, we calculated the NIHSS risk
score component separately from the rest of the risk score. Therefore, the total risk score
(𝑇𝑅𝑆𝑖𝑗 ) for patient i in hospital j is the sum of the NIHSS score component (𝑁𝐼𝐻𝑆𝑆𝑖𝑗 ) and a

non-NIHSS component – hereinafter referred to as the sub-risk score component (𝑆𝑅𝑆𝑖𝑗 ) –

which contains the remaining eight variables. (1)

(1) 𝑇𝑅𝑆𝑖𝑗 = 𝑆𝑅𝑆𝑖𝑗 + 𝑁𝐼𝐻𝑆𝑆𝑖𝑗

In the MSR, the sub-risk score (𝑆𝑅𝑆𝑖𝑗 ) is normally distributed with mean 21.4 and standard
deviation (SD) 8.3, i.e. 𝑆𝑅𝑆𝑖𝑗 ~𝑁(21.4, 8.32 ). The distribution of NIHSS score categories in

the 7,957 (74.3%) cases with documented NIHSS in the MSR can be seen in Figure 4.2. The

50

mean (SD) and median (IQR) for patients with documented NIHSS were 7.3 (SD=7.8) and 4
(IQR=2-11), respectively.
40.0%
35.0%

34.2%

Percent

30.0%
25.0%

23.0%

20.0%
14.6%

15.0%

9.4%

10.0%

7.2%

5.0%

4.7%

3.9%

21-25

26-42

0.0%
0-2

3-5

6-10
11-15
16-20
NIHSS Score Categories

Figure 4.2. Distribution of patient-level NIHSS score categories in the Michigan Stroke Registry
(n=7,957)
To measure the association between 𝑆𝑅𝑆𝑖𝑗 and 𝑁𝐼𝐻𝑆𝑆𝑖𝑗 components, we used an

ordinal regression model to predict NIHSS score categories given the patient 𝑆𝑅𝑆𝑖𝑗 . For

simplicity in the simulation process, we used an ordered probit model, which has a normally
distributed random error term, as opposed to the traditional ordinal logistic model, where
the error term has a logistic distribution. The ordered probit model yields a beta coefficient
for the sub-risk score, and six intercept terms, which reflect the cutoff points between the
seven ordinal NIHSS categories. (Table 4.2)

51

Table 4.2. Results of ordered probit model of NIHSS category
predicted by sub-risk score. (n=7,957)
Parameter
Intercept 1*
Intercept 2
Intercept 3
Intercept 4
Intercept 5
Intercept 6
Sub-Risk Score (𝑆𝑅𝑆𝑖𝑗 )

Estimate
0.63
1.26
1.79
2.16
2.55
2.98
-0.050

Standard Error
0.0353
0.0365
0.0381
0.0396
0.0419
0.0459
0.00152

Note: All parameter estimates are statistically significant, p<0.0001
* Intercepts reflect cutoff points between seven ordinal NIHSS categories, 0-2, 3-5, 610, 11-15, 16-20, 21-25, and 26-42.

Using the model intercepts shown in Table 4.2, patient-level NIHSS category can be
imputed by multiplying the generated sub-risk score and sub-risk score model beta
coefficient (β=-0.050). This step will be explained in more detail in the section describing
the data simulation process.
Between-Hospital Variation in Risk Score: To estimate the between-hospital variation in
patient risk score, i.e. case-mix variation, we ran a variance components model to estimate
the hospital-level variation in the sub-risk score component (𝑆𝑅𝑆𝑖𝑗 ) which was centered
with mean of 0. The variance of the sub risk score was made up of a hospital-level

component, 𝜇𝑗 with variance 𝜎µ2 , for hospital j, and a patient-level component, 𝛿𝑖𝑗 with

variance 𝜎δ2 , for patient i in hospital j, 𝑆𝑅𝑆𝑖𝑗 = 𝜇𝑗 + 𝛿𝑖𝑗 .

It is assumed that the hospital and patient-level variance components are

independent from one another. Thus, 𝑣𝑎𝑟�𝑆𝑅𝑆𝑖𝑗 � = 𝜎µ2 + 𝜎δ2 . From the variance

components model, we estimated 𝜇𝑗 ~𝑁(0, 𝜎µ2 = 1.5) and 𝛿𝑖𝑗 ~𝑁(0, 𝜎δ2 = 68.0). Using the
formula for calculating intraclass correlation coefficient – 𝐼𝐶𝐶 = 𝜎µ2 ��𝜎µ2 + 𝜎δ2 � =

1.5⁄(1.5 + 68.0) = 0.022, or 2.2%. This means that only 2.2% of the variation in the sub52

risk score is attributed to between-hospital differences in the overall mean sub risk score.
Because the NIHSS component is estimated from the sub-risk score component, case-mix
variation in NIHSS will also be reflected by variation in the sub-risk score.
Between-Hospital Variation in In-Hospital Mortality: Using data from the MSR, a
hierarchical logistic regression model was used to estimate the between-hospital variation
in mortality, given the patient total risk score (𝑇𝑅𝑆𝑖𝑗 ). This model can be seen below (2),

where 𝑝𝑖𝑗 represents the probability of in-hospital mortality for patient i in hospital j, 𝛽0

represents the overall log-odds of mortality, 𝛽1 represents the log-odds of mortality given a
one unit increase in the total risk score for patient i in hospital j (𝑇𝑅𝑆𝑖𝑗 ), and 𝑏0𝑗 represents

the random intercept for hospital j.

(2) 𝑙𝑜𝑔𝑖𝑡�𝑝𝑖𝑗 � = 𝛽0 + 𝛽1 ∗ 𝑇𝑅𝑆𝑖𝑗 + 𝑏0𝑗

When this model was run in 7,957 ischemic stroke patients who had NIHSS recorded in the
MSR, we obtained the following model estimates (3)
(3) 𝑙𝑜𝑔𝑖𝑡�𝑝𝑖𝑗 � = −6.1 + 0.054 ∗ 𝑇𝑅𝑆𝑖𝑗 + 𝑏0𝑗

We estimated the distribution of the hospital random intercept, 𝑏0𝑗 ~𝑁(0, 𝜎 2 = 0.13),

where 0.13 represents the between-hospital variation in in-hospital mortality. The ICC from
a logistic regression model is calculated using the equation107, 𝐼𝐶𝐶 = 𝜎 2 ⁄(𝜎 2 + 𝜋 2 ⁄3) =
0.13⁄(0.13 + 𝜋 2 ⁄3) = 0.039, or 3.9%, which means that only 3.9% of the unexplained
variation in in-hospital mortality is attributed to between-hospital differences. These

estimated parameters were subsequently used to simulate a full dataset which mimics the
between- and within-hospital variation in patient risk score and mortality.

53

Section 2 – Generating Datasets for Simulations
We simulated S=500 independent samples of patients within each hospital from the
parameters generated in the previous section. In each sample (S), N=100 hospitals were
simulated using n patients per hospital; each scenario reflected a unique combination of the
NIHSS documentation rate (%), the mechanism of missing NIHSS data, and hospital stroke
volume. To assess how the accuracy of performance profiling is modified by the hospital
ischemic stroke volume, independent simulations of hospital volumes of n=100, 300, and 500
patients were used, to represent low, moderate, and high volume hospitals. Each hospital was
assigned a random intercept for mortality, which represents its true deviation in mortality from
the overall average, i.e., true hospital performance. Patient-level risk scores for mortality were
then simulated to represent within- and between-hospital variation in patient risk of mortality
observed in the MSR. Finally, using the assigned hospital random intercept and patient-level
risk score, we simulated a binary mortality outcome (alive/died) for each patient. Specific
details for these three steps are detailed below.
Assigned Hospital Random Intercept: From the analyses conducted in the MSR, we
observed a normal distribution of hospital random intercepts of 𝑏0𝑗 ~𝑁(0, 𝜎 2 = 0.13).
Simulated hospitals were randomly assigned a random intercept from this normal

distribution. The assigned random intercept represents a hospital’s known deviation in
mortality compared to the overall average, after adjusting for the patients risk of mortality.
As such, it represents a hospital’s true in-hospital mortality compared to the average
hospital, and was used as the gold standard for comparison with estimated hospital
performance rankings.

54

Assigned Patient Risk Score: To generate the total risk score for patient i in hospital j
(𝑇𝑅𝑆𝑖𝑗 ), we first generated the non-NIHSS component of the risk score, i.e. the sub-risk

score (𝑆𝑅𝑆𝑖𝑗 ). As previously mentioned the sub-risk score has a hospital-level component,

𝜇𝑗 , and patient-level component, 𝛿𝑖𝑗 . For each hospital, 𝜇𝑗 was randomly drawn from the

distribution, 𝜇𝑗 ~N(0, 𝜎µ2 = 1.5). Within each simulated hospital j, for patient i, 𝛿𝑖𝑗 was
randomly drawn from the distribution 𝛿𝑖𝑗 ~N(0, 𝜎δ2 = 68.0). The hospital and patient

components were summed to create the 𝑆𝑅𝑆𝑖𝑗 , which was then centered on the observed

mean from the MSR (mean=21.4), which can be seen in equation (4).
(4) 𝑆𝑅𝑆𝑖𝑗 = (𝜇𝑗 + 𝛿𝑖𝑗 ) + 21.4

Next, we assigned each patient an NIHSS score category by multiplying the 𝑆𝑅𝑆𝑖𝑗

with the beta coefficient for 𝑆𝑅𝑆𝑖𝑗 from the ordered probit model, and added a random

error term, 𝜀, drawn from an 𝑁(0, 1) distribution. (5)

(5) 𝛾𝑖𝑗 = −0.050 ∗ 𝑆𝑅𝑆𝑖𝑗 + 𝜀

The estimate 𝛾𝑖𝑗 was then compared to the cutoff points from the ordered probit model
intercepts, which refers to an imputed NIHSS category, as seen in Table 4.3.

Table 4.3. NIHSS category assignment cutoff intervals derived
from the ordered probit model predicting NIHSS category given
the patient sub-risk score.
Cutoff Interval

Assigned NIHSS
Category

γ ≥ -0.63
-0.63 > γ ≥ -1.26
-1.26 > γ ≥ -1.79
-1.79 > γ ≥ -2.16
-2.16 > γ ≥ -2.55
-2.55 > γ ≥ -2.98
-2.98 > γ

Risk Score Points*
(NIHSSij)

0-2
3-5
6-10
11-15
16-20
21-25
26-42

0
10
21
37
48
56
65

54

* Risk Score Points from Smith, et al. 2010
Note: γ is calculated using the patient Sub Risk Score (SRS)

55

Finally, the NIHSS risk score and sub risk score components were summed to create
the total risk score (𝑇𝑅𝑆𝑖𝑗 ), which ranges from 0 to 109 and mimics the distribution

observed in the MSR. For instance, if a patient was assigned an 𝑆𝑅𝑆𝑖𝑗 = 20 and random

error term 𝜀 = 0, the resulting 𝛾𝑖𝑗 is: 𝛾𝑖𝑗 = −0.050 ∗ 20 + 0 = −1.0. Thus, the patient

would be assigned to an NIHSS category of 3-5 (i.e. −0.63 > 𝛾𝑖𝑗 ≥ −1.26), and 10 points

would be added to the sub-risk score for a total risk score of 𝑇𝑅𝑆𝑖𝑗 = 𝑆𝑅𝑆𝑖𝑗 + 𝑁𝐼𝐻𝑆𝑆𝑖𝑗 =
20 + 10 = 30.

Generating In-Hospital Mortality: Using the logit model in equation (2), we input the
assigned hospital random intercept and for each of the n patients in the particular sample
we generated the logit of the predicted probability of mortality (𝑝𝑖𝑗 ) from the 𝑇𝑅𝑆𝑖𝑗 . To
reflect 30-day mortality rates (~15%) used in CMS outcome metrics as opposed to in-

hospital mortality rates (~4%), we re-scaled the model intercept to generate an average
mortality of 15% (𝛽0 = −4.4 vs. –6.1). (6)

(6) 𝑙𝑜𝑔𝑖𝑡�𝑝𝑖𝑗 � = −4.4 + 0.054 ∗ 𝑇𝑅𝑆𝑖𝑗 + 𝑏0𝑗

The reverse logit of this model estimated the predicted probability of mortality (𝑝𝑖𝑗 ) for

each patient in the sample. From this we generated a patient-level binary mortality status (0
if alive, 1 if dead) using a random draw from the Bernoulli distribution.
From these three steps, we generated patient samples within each simulated hospital
which reflect empirical estimates of variation in case-mix and ischemic stroke mortality
obtained from the MSR. Each patient has a generated risk score for mortality, including an
NIHSS component, and binary mortality indicator. In the next section, we discuss models which
were used to simulate missing NIHSS in the fully observed dataset.

56

Section 3 – Missing NIHSS Data Model Specification
In the Chapters 2 and 3, we provided evidence that NIHSS documentation is related to
the patient NIHSS score. To replicate missing NIHSS data in our simulated data, we generated a
mixture of different prevalences and mechanisms of NIHSS documentation. First, we simulated
a mechanism where NIHSS documentation is completely unrelated to the NIHSS score, i.e., data
are missing completely at random (MCAR). To replicate a MCAR model of NIHSS
documentation, we generated a binary indicator of documentation by a random draw from a
Bernoulli distribution. In addition to a fully observed dataset, we created datasets which
modified the probability of documentation between 30 and 90% in increments of 10% in
addition to the fully observed dataset.
Next, we simulated a mechanism where NIHSS score category and NIHSS documentation
are directly related (as NIHSS score category increases, documentation increases) and inversely
related (as NIHSS score category increases, documentation decreases). These mechanisms
represent a missing not at random mechanism of missing data (MNAR), where the missingness
in NIHSS is related to the value of the NIHSS score itself. Logistic regression models were used
to estimate the probability of NIHSS documentation (𝑅𝑖𝑗 ) given the patient NIHSS category
(𝑁𝐼𝐻𝑆𝑆𝑖𝑗 ). (7)

(7) 𝑙𝑜𝑔𝑖𝑡�𝑝�𝑅𝑖𝑗 = 1| 𝑁𝐼𝐻𝑆𝑆𝑖𝑗 �� = 𝛽0 + 𝛽1 ∗ 𝑁𝐼𝐻𝑆𝑆𝑖𝑗

Because we cannot observe these models directly, we estimated the model intercept (𝛽0) –

which represents the overall documentation rate – and the beta coefficient (𝛽1) for the NIHSS
score – which indicates the estimated increase or decrease in odds of documentation by

moving up one NIHSS category (0-2, 3-5, 6-10, 11-15, 16-20, 21-25, and 26-42). The signs of

57

beta coefficients were manipulated to reflect direct and inverse relationships between NIHSS
and NIHSS documentation. Additionally, in each scenario, we altered the values of the beta
coefficient to reflect a relatively weaker and stronger effect of NIHSS category on
documentation. The weaker effect represents a 10% increase or decrease (Beta = +/-0.095) in
odds of documentation as NIHSS category increases; the strong effect represents a 25%
increase or decrease (Beta = +/-0.225) in odds of documentation as NIHSS category increases.
In total, four MNAR models were created (direct-weak, direct-strong, inverse-weak, inversestrong). Similar to the MCAR model, we altered the model intercepts to reflect overall
documentation rate of 30 to 90% in increments of 10%. All missing NIHSS model specifications
(MCAR and MNAR), and their estimated NIHSS documentation rates can be seen in Table 4.4.
Table 4.4. Specification for missing NIHSS models, including model parameters and estimated
documentation rates in each category of NIHSS.
Scenario

Estimated NIHSS Documentation Rates
NIHSS Score Category

Model Coefficients

Effect

% Doc.

Intercept

Beta

-

90
80
70
60
50
40
30

-

-

Weak

90
80
70
60
50
40
30
90
80
70
60
50

2.00
1.15
0.60
0.17
-0.25
-0.65
-1.10
1.65
0.85
0.29
-0.15
-0.58

0-2

3-5

6-10

11-15

16-20

21-25

25-42

90
80
70
60
50
40
30

90
80
70
60
50
40
30

90
80
70
60
50
40
30

90
80
70
60
50
40
30

92
83
74
66
55
45
35
94
88
80
73
63

93
85
76
67
58
48
37
95
90
84
77
68

93
86
77
70
60
50
39
96
92
87
81
73

Missing Completely at Random
90
80
70
60
50
40
30

90
80
70
60
50
40
30

90
80
70
60
50
40
30

Missing Not at Random – Direct Relationship

Strong

0.095
0.095
0.095
0.095
0.095
0.095
0.095
0.225
0.225
0.225
0.225
0.225

89
78
67
57
46
36
27
87
75
63
52
41

90
79
69
59
49
39
29
89
79
67
58
47

58

91
81
70
62
51
41
31
91
82
72
63
52

92
82
73
64
53
44
33
93
85
76
68
58

Table 4.4. (cont’d) Specification for missing NIHSS models, including model parameters and
estimated documentation rates in each category of NIHSS.
Scenario

Effect

% Doc.
40
30

Model Coefficients
Intercept
-1.45
-2.00

Beta
0.225
0.225

0-2
23
15

Estimated NIHSS Documentation Rates
NIHSS Score Category

3-5
27
17

6-10
32
21

11-15
36
25

16-20
42
30

21-25
47
34

25-42
53
40

Missing Not at Random – Inverse Relationship
Weak

90
2.46
-0.095
92
91
90
89
88
87
86
80
1.64
-0.095
83
81
79
78
77
74
73
70
1.10
-0.095
73
71
70
67
65
64
61
60
0.66
-0.095
64
61
60
57
54
53
50
50
0.24
-0.095
54
51
49
47
44
42
38
40
-0.16
-0.095
44
42
39
37
34
32
30
30
0.60
-0.095
33
31
29
27
25
24
22
Strong
90
2.85
-0.225
93
92
89
87
85
82
77
80
2.02
-0.225
86
83
80
76
71
65
62
70
1.47
-0.225
77
74
69
64
59
53
48
60
1.01
-0.225
69
64
58
53
47
43
35
50
0.58
-0.225
59
53
48
42
36
32
26
40
0.17
-0.225
49
43
37
33
28
23
19
30
-0.28
-0.225
38
33
28
24
20
16
14
Note: Shading represents the % rate of documentation at the patient-level determined by the specified missing
data models (i.e. intercepts and beta coefficients).

Collectively, we simulated five models of NIHSS documentation, which includes one
MCAR model and four MNAR models (direct-weak, direct-strong, inverse-weak, inverse-strong).
Each model was repeated to illustrate seven overall rates of NIHSS documentation, 30% to 90%
by 10%, and a fully documented dataset. Finally, to determine the impact of hospital volume,
we modified hospital patient volumes as n=100, 300, and 500. In total, there were 5
missingness models x 8 documentation rates x 3 hospital volumes = 120 simulations with S=500
samples per simulation of N=100 hospitals. In each permutation of missingness pattern,
documentation rate, and hospital volume, hospitals were identified as “observed” outliers from
their estimated hospital random intercept, and were rank-ordered based on calculated risk
standardized mortality rates (RSMR). The details on hospital outlier identification and RSMR
profiling are outlined below.

59

Section 4 – Hospital Profiling Methodology
At this point, we have generated data sets with fully observed NIHSS and missing NIHSS
data based on different mechanisms of missing NIHSS data. These datasets were then used to
profile hospital-level RSMRs as is done in real-life datasets, by only including patients with
complete documentation of NIHSS. We utilized the hospital profiling methodology employed
by CMS to calculate hospital 30-day ischemic stroke RSMRs, which employs a hierarchical
logistic regression model.19,33 The hospital RSMRs were obtained as the ratio of predicted (P) to
expected (E) mortality – or the P/E ratio – multiplied by the overall unadjusted mortality rate
(~15% for 30-day ischemic stroke mortality). The numerator of the P/E ratio is the predicted
mortality in each hospital, given its case mix and hospital-specific deviation in mortality (i.e.
hospital random intercept). The denominator of the P/E ratio is the expected mortality in that
hospital given the same case-mix if it had the mortality of the average hospital (i.e. hospital
random intercept equal to 0).19,32 Hence, the predicted number is the number of expected
mortalities in that “specific” hospital.52 A P/E ratio of >1 represents poorer hospital
performance than expected, and a P/E ratio of <1 represents better hospital performance than
expected. The P/E ratio was then multiplied by the overall 30-day mortality rate (15%) to
produce the hospital RSMR, which was subsequently rank-ordered from lowest (#1) to highest
(#100) in each simulation scenario.
Section 5 – Assessments of Profiling Accuracy Using the Simulated Data
The primary assessment of this study is to determine the accuracy of profiling (i.e.
hospital RSMR rank order) under difference scenarios of missing data and hospital volume.
Hospital RSMR rank-order is the primary method of profiling used in the CMS Hospital Value-

60

Based Purchasing Program (HVBP).6 We determined accuracy in three different ways. First, we
estimated the correlation between the true hospital rank-order (as defined during data
generation) and the observed rank-order (as defined by estimated RSMRs). Spearman rank
correlation and Pearson correlation coefficients were both estimated between the true and
observed performance rank order in each scenario of missing NIHSS data. This approach
estimates profiling accuracy on a continuous scale, as opposed to a binary categorization, which
is done in the next assessment. Again, because we know the hospital’s “true” performance,
these correlations assess the validity of the RSMRs to accurately rank-order hospitals. These
data were generated for each scenario of missing NIHSS data, stratified by hospital stroke
volume.
Second, we assessed the accuracy based on the ability of the HLM to accurately identify
high and low hospital performers on mortality. We defined high/low performing hospitals as
being in the top or bottom 5th percentile of rank order (i.e. 10% high/low performer prevalence)
and 20th percentile of rank order (i.e. 40% high/low performer prevalence). These
categorizations of performance have been frequently used in previous research.53,108,109 A
hospital is considered a true high/low performing hospital if the rank-ordered, assigned random
intercept is in the top/bottom 5th or 20th percentiles. We compared the true high/low
performer status with the rank ordered RSMRs, which were similarly categorized.
Because we simulated “true” performance, we are able to calculate the sensitivity (Se),
specificity (Sp), and predictive value positive (PVP) and negative (PVN) of the HLM to correctly
identify high/low performers. Sensitivity represents the ability of the model to correctly classify
a hospital as a high/low performer, given that it is in fact a true high/low performer. Specificity

61

refers to the models ability to correctly classify non-high/low performer hospitals, given that
they are not high/low performers. The predictive value positive of the model represents the
proportion of hospitals classified as high/low performers by the model which are known to be
high/low performers. Conversely, the predictive value negative is the proportion of hospitals
classified by the model as non-high/low performers which are known to not be high/low
performers. These calculations (Table 4.5) were generated for each scenario of missingness
and stratified by hospital stroke volume. We plotted the average Se, Sp, PVP and PVN over all
500 replications for each scenario of missing data, stratified by hospital stroke volume.
Table 4.5. Calculations for sensitivity (Se), specificity (Sp) and predictive value positive
(PVP) and negative (PVN) for true vs. observed high/low performer classification.
Observed High/Low
Performer Status†

True High/Low Performer Status*

Calculation

Yes

No

Yes

True Positive (A)

False Positive (B)

PVP = [A / (A+B)]

No

False Negative (C)

True Negative (D)

PVN=[D / (C+D)]

Se = [A / (A+C)]

Sp = [D / (B+D)]

Calculation

th

Note: High/low performers were defined as being in the top/bottom 5 percentile of rank-ordered
th
performance or 20 percentile of rank-ordered performance
* Determined from assigned hospital random intercept in data generation step (i.e. true performance)
† Determined by the estimated hospital RSMR from the HLM (i.e. observed performance)

Lastly, we estimated the average absolute change in rank-order position relative to the
hospitals true rank position in each scenario of prevalence and mechanism of missing NIHSS
data. In each sample (S=500), we calculated the absolute difference between the true hospital
ranking, and observed hospital ranking from the rank-ordered RSMRs in each scenario of
missing NIHSS. Next, hospitals were categorized by quintile of their true hospital ranking (i.e. 120, 21-40, 41-60, 61-80, and 81-100). In each quintile, we calculated the average absolute
difference between the true hospital ranking and observed hospital ranking for each scenario of

62

missing NIHSS data, averaged over the S=500 samples. We then plotted the absolute average
difference between true and observed rankings for each prevalence and mechanism of missing
NIHSS data, stratified by the true quintile ranking and hospital stroke volume (n=100, 300, and
500).
Aim 3 – Results
Accuracy of Hospital RSMR Rank-Order
The spearman rank correlations between true and estimated hospital performance can
be seen in Figure 4.3. Note that the mechanism of missing NIHSS data did not have an
important effect relative to the effect of sample size as dictated by hospital stroke volume and
NIHSS documentation. In the low stroke volume hospitals, the Spearman rank correlation
coefficient between assigned and estimated random intercepts was moderate at best (ρ=0.72)
when NIHSS was fully documented. As documentation decreased, the correlation fell to
between ρ=0.52 and 0.47, depending on the mechanism of missing NIHSS data. While some
variation between mechanisms of missing NIHSS data was observed, at any given level of
documentation the differences in correlations were at most 5% between the different
mechanisms. In moderate stroke volume hospitals, correlation was as high as ρ=0.87, but also
fell as documentation reduced. However, even at the lowest levels of documentation, there
was moderate correlation between hospital random intercepts (ρ=0.70). Correlation between
rankings was high (ρ>0.80) in most scenarios of missing NIHSS data in large stroke volume
hospitals. Pearson correlation coefficients can be seen in Figure B.1, and were almost identical.

63

Figure 4.3. Spearman rank correlation coefficients between true rankings and RSMR rankings
as NIHSS documentation increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume.
Accuracy of High/Low Performer Classification
In general, as documentation increases, the number of true positives and true negatives
increase, and the number of false negatives and false positives decrease (thus both Se and Sp
increase). As hospital stroke volume increases, the number of true positives and negatives also
increases, and the number of false positives and negatives decreases. There are no substantial
differences in classification between mechanisms of missing NIHSS data.

64

Figure 4.4. Sensitivity of HLM to classify hospitals as high/low performers based on top/bottom
5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as documentation of
NIHSS increases under different mechanisms of missing NIHSS data. Results are stratified by
hospital stroke volume.
The sensitivity of the hierarchical logistic regression model to classify high/low
performer hospitals according to estimated RSMRs, given that they are truly a high/low
performer hospital can be seen in Figure 4.4. As documentation of NIHSS increases, sensitivity
increases. Also, sensitivity was substantially higher when classifying high/low performing
hospitals based on the top/bottom 20th percentiles compared to the top/bottom 5th
percentiles. It should be noted that sensitivity is never greater than 80% in any scenario of
missing NIHSS data or hospital volume. Notably, when documentation was complete in low
volume hospitals, sensitivity was still worse compared to moderate and high volume hospitals
65

at the lowest levels of NIHSS documentation (30%). Differences in sensitivity between
mechanisms of missing data was modest at each level of NIHSS documentation and hospital
volume (<5%).

Figure 4.5. Specificity of HLM to classify hospitals as non-high/low performers based on
top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as
documentation of NIHSS increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume.
Figure 4.5 illustrates the specificity of the hierarchical model to identify non-outlier
performing hospitals (i.e. not high/low performers). In contrast to sensitivity, the specificity of
the HLM is much higher when classifying hospitals in the middle 90% (i.e., outliers are defined
as the top/bottom 5th percentiles), and lower when using the middle 60% (i.e., outliers are
defined as the top/bottom 20th percentiles. When classifying hospitals in the top/bottom 5th
66

percentiles, specificity was greater than 90% in all combinations of documentation and hospital
volume, with only modest reductions as documentation fell. More substantial improvements in
specificity were observed as documentation increased when hospitals were classified using the
top/bottom 20th percentiles. Again, the mechanism of missing NIHSS data had little importance
on specificity compared to the effect of sample size, as defined by hospital volume and NIHSS
documentation. Although, differences between mechanisms were greater when classifying
hospitals based on top/bottom 20th percentiles compared to 5th percentiles.

Figure 4.6. Predictive value positive of HLM to classify hospitals as high/low performers based
on top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order as
documentation of NIHSS increases under different mechanisms of missing NIHSS data. Results
are stratified by hospital stroke volume.

67

Figure 4.7. Predictive value negative of HLM to classify hospitals as non-high/low performers
based on top/bottom 5th (solid lines) and 20th (dashed lines) percentiles of mortality rank-order
as documentation of NIHSS increases under different mechanisms of missing NIHSS data.
Results are stratified by hospital stroke volume.
Figures 4.6 and 4.7 show the predictive value positive (PVP) and negative (PVN) of the
HLM to classify high/low performers, respectively. Patterns and values of PVP were similar to
values obtained for sensitivity, because we categorized high/low performance based on rankorder cutoffs, the number of false positives and false negatives are essentially the same. The
same can be said for the similarity between PVN and specificity. Briefly, PVP was greater when
classifying hospitals based on top/bottom 20th percentiles compared to 5th percentiles, due to
the greater prevalence of high/low performers. Consequently, PVN was lower when classifying
hospitals based on top/bottom 20th percentiles compared to 5th percentiles. As documentation
68

of NIHSS increased, significant improvements in PVP and PVN were observed. PVP and PVN
were highest in high volume hospitals, and lowest in low volume hospitals. Again, the
mechanism of missing NIHSS data had modest impact on PVP and PVN.
We note that the average hospital high/low performer classification (i.e. true/false
positives, true/false negatives) for each prevalence and mechanism of missing NIHSS, stratified
by hospital stroke volume can be seen in Table A.1 (top/bottom 5th percentiles) and Table A.2
(top/bottom 20th percentiles).
Absolute Change in Hospital RSMR Rankings
Figure 4.8 shows the estimated magnitude of absolute change in observed rankings
relative to the true (known ranking) stratified by the quintile of true hospital ranking. In
general, the mechanism of missing NIHSS data did not have an effect, except at the lowest rates
of NIHSS documentation. In low stroke volume hospitals, observed hospital rankings of the
lowest and highest quintile of true hospital rankings changed as much as 25 positions on
average when documentation was 30%. When documentation of NIHSS was complete,
rankings of hospitals in the lowest and highest quintile still changed as many as 14 positions on
average. It should be noticed that the results in Figure 4.7 are symmetrical in that they are the
same for the 1st and 5th quintile, and 2nd and 4th quintile. Low volume hospitals in the second
and fourth quintile of true ranking changed on average between 24 and 18 positions when
documentation was 30% and 100%, respectively. Similar patterns were observed in moderate
and large stroke volume hospitals, but the average change was smaller compared to low stroke
volume hospitals. At most, moderate volume hospitals changed as many as 14 to 25 positions
on average in the lowest and highest quintiles of true ranking when documentation was at 30%,

69

and only changed by 8 positions on average when NIHSS was fully documented. In large stroke
volume hospitals, the average difference between true and observed hospital rankings was no
more than 12 positions in the lowest and highest quintile in any scenario of missing NIHSS data.

Figure 4.8. Average absolute change in hospital RSMR rankings (# of positions) as NIHSS
documentation increases under different mechanisms of missing NIHSS data. Results are
stratified by hospital size and quintile of true ranking.
70

Aim 3 – Discussion
In this study, we explored how current methods used to profile hospitals on ischemic
stroke mortality are susceptible to inaccuracies when an important risk adjustment variable is
missing. We imitated hospital-level rates of NIHSS documentation which are observed in the
Michigan Stroke Registry, and theoretical mechanisms of missing NIHSS data motivated by
previous analyses. To understand the importance of hospital stroke volume in our assessment,
we conducted simulations with hospital stroke volumes of n=100, 300 and 500 ischemic strokes
per hospital. Our main assessments were the ability of current methods to accurately rankorder hospitals according to their estimated risk standardized mortality rate (RSMR), to
correctly classify high/low performing hospitals, and to estimate the average change in hospital
RSMRs in the presence of missing data. Our primary finding was that the mechanism by which
NIHSS was missing did not have a meaningful impact on the accuracy of hospital profiling per
se, and was trumped by the much larger impact of the sample size that was determined by the
level of NIHSS documentation and hospital size.
We hypothesized that when NIHSS documentation was associated with stroke severity,
i.e. missing not at random (MNAR), the accuracy of hospital profiling would diminish compared
to a missing completely at random mechanism (MCAR). On the whole, we found that the
mechanism of missing NIHSS data did not lead to substantial differences in accuracy. Any
observed differences in Se/Sp/PVP/PVN or in the correlation coefficients were less than 5% or
<ρ=0.05, respectively, and the RSMR rank-order between mechanisms was less than 4 positions
on average in any scenario of missing data. However, the mechanisms which were associated
with inverse relationships (i.e. as NIHSS score increased, documentation decreased)

71

consistently had lower accuracy. This may be because under this assumption of missing data,
the more severe patients are missing more frequently, and so the exclusion of these patients
would lower the observed mortality in the hospital. As the rate of mortality and differences in
mortality between hospitals reduce, accurate discrimination between hospitals becomes more
problematic. The fact that we did not find the mechanism of missing NIHSS data to be very
important could be explained by the modest variation in NIHSS between hospitals (ICC=2%),
which were observed in the MSR. A necessary condition for a variable to have an meaningful
effect in a risk adjustment model is that it should vary significantly between hospitals26, and this
has yet to be substantiated in regards to NIHSS. In Chapters 2, we found only modest
differences in overall NIHSS at the hospital-level. If greater between-hospital variation in NIHSS
were observed, the mechanism by which NIHSS is missing may play a larger role.
We found that reduced sample size – whether due to lower NIHSS documentation rate
or low hospital case volume – resulted in poorer profiling accuracy, as depicted by substantially
reduced rank correlation, and lower sensitivity and specificity. I hypothesize that changes in
profiling accuracy based on sample size can be attributed, in part, to changes in the shrinkage
of estimated random intercepts in the HLM model, which is inversely related to sample
size.110,111 Shrinkage is the phenomenon whereby estimated random intercepts in low volume
hospital are “shrunken” toward the mean of all hospitals.12,19,112 This is done because small
volume estimates are presumed to be imprecise, and shrinkage accounts for the imprecision by
stabilizing these estimates to the overall mean.112 Because estimated random intercepts are
utilized in calculating RSRMs, if there is greater shrinkage in low volume hospitals, subsequent
RSMRs will also be “shrunken” toward the overall mortality rate.19,109,111,113

72

Figure 4.9. Illustrating the effect of shrinkage on RSMR distribution as depicted by range (i.e.
minimum/maximum, solid lines), 5th/95th percentiles (dotted lines), and 25th/75th percentiles
(dashed lines) of RSMRs. Estimates are the averages of 500 simulations for each of 100
hospitals.
To illustrate this phenomenon, we estimated the range (i.e. minimum, maximum),
5th/95th percentiles, and 25th/75th percentiles of estimated RSMRs for the 100 hospitals
averaged over all S=500 samples. (Figure 4.9) The estimates are repeated for each scenario of
NIHSS documentation (i.e. 30% to 100% by 10%) and hospital volume (n=100, n=300, and n=500
patients). (Figure 4.9) Because our previous findings did not support a significant role of
missing NIHSS mechanism, here we only illustrate the MCAR mechanism. As sample size
decreases, the plausible range of RSMR values decreases. Notably, while there are modest
increases in the 25th/75th percentiles as documentation increases, there are much greater gains

73

in the observed range of RSMRs (i.e. minimum and maximum RSMRs) and 5th/95th percentiles.
This illustrates the expansion of the RSMR distribution tails, indicating less shrinkage in
estimated RSMRs.
We believe that shrinkage due to small sample size, either through NIHSS
documentation rate or low hospital volume, is largely driving the reduced accuracy in RSMR
profiling. Let’s imagine that we are rank-ordering 100 hospital RSMRs, similar to our simulation
methods. When sample size is small, the RSMRs for these hospitals will be more compressed
around the overall mortality rate due to shrinkage. Any stochastic or random variability in
these hospital RSMRs would lead to greater changes in profiling rank order, because they are
more closely grouped together. Conversely, when sample size is large the same number of
hospital RSMRs (n= 100) are less “shrunken”, and so would be spread further apart. In this
situation the same stochastic or random variability will be less impactful on RSMR rank order
because they are more distanced apart. Thus, as sample size reduces, the accuracy of hospital
performance profiling also reduces. A study by Silber et al. has illustrated the phenomenon of
shrinkage in the context of Hospital Compare outlier performance by showing that the
hierarchical model frequently underestimates poor performance in small hospitals with
mortality rates moved close to the hospital average.112
Small sample size has long been a thorn in the side of hospital profiling.112,114-116 Even
when perfect risk adjustment is achieved, in typical clinical case volumes, much of the variation
in performance measures is due to random noise, especially in centers with low volumes of
cases (e.g. <100 annual ischemic strokes).109 An oft cited benefit of the hierarchical model is
its ability to produce more valid provider-specific estimates in low volume providers.22,27 We

74

illustrate that even in the highest volume hospitals with complete documentation of NIHSS, the
HLM approach still misses 2 of 10 hospitals in the top/bottom 5th percentiles of performance
(Se=78%), and 3 of 10 (Se=68%) in low volume hospitals. If a more conservative definition of
high/low performer is used (top/bottom 20th percentile), specificity in low volume hospitals
becomes equally troubling, with more than 1 in 4 hospitals falsely identified as a high/low
performing hospital (Sp=73%). The hierarchical model assumes that the variation in mortality
left after adjusting for case-mix can be attributed to differences in hospital quality.18 This study
shows a substantial amount of random noise unrelated to true hospital performance influences
hospital profiling. It is important to note that variation in our simulations cannot be attributed
to confounding because our simulations achieved perfect case-mix adjustment. Until this noise
can be accounted for, the accuracy of hospital profiling will remain suspect.
How we should interpret these findings relative to the current policies regarding
hospital profiling methods is less clear. Low volume hospitals have frequently been shown to
have poorer patient outcomes in ischemic stroke117-119 and other clinical contexts.120,121 As
such, profiling methods should be robust enough to accurately capture performance outliers in
small sample size scenarios. We also showed that as you expand the definition of high/low
performers to include more hospitals, sensitivity and PVP is increased, but at the expense of
reduced specificity and PVN. How you classify hospitals as high/low performers directly effects
model sensitivity and specificity. The cost of identifying more false positives or false negatives
depends on your viewpoint as a healthcare provider or consumer, and no correct answer
exists.122 If you are a patient or payer, such as CMS, it may be more beneficial to identify all the
truly poor performing hospitals, at the risk of falsely identifying average or good performing

75

hospitals. On the other hand, hospitals may lose much needed financial reimbursements or be
unfairly stigmatized if they are incorrectly labeled as a poor performer. Ultimately, both
providers and consumers must be made aware of the limitations of current profiling methods
to facilitate better interpretation of hospital profiling results.
There are some limitations and caveats to our study that should be considered. First,
we simulated a 30-day mortality rate in our analysis, even though we did not have data on 30day outcomes. However, current datasets which capture 30-day outcomes do not collect
measures of stroke severity, so utilizing data with 30-day outcomes was not possible unless
directly linked to administrative data. With ICD-10 codes set to include NIHSS, evaluation of
hospital profiling methods using administrative data which includes both 30-day outcomes and
stroke severity could be conducted in the future. Second, we did not obtain bootstrapped
standard errors and 95% confidence intervals of individual hospital RSMRs to assess the
accuracy of identifying statistical outliers, which is the approach used in Hospital Compare.12
Future work should be done to test the accuracy of performance outliers using this method.
Third, we did not compare our findings with the diagnostic ability of the current proposed CMS
risk adjustment model, which is based on administrative data and does not include NIHSS.18 A
direct comparison would help illustrate the benefits and limitations regarding the current CMS
risk adjustment model, and that of a model that includes NIHSS with various amounts of
missing data. Fourth, while the models we specified to replicate missing NIHSS data were
motivated by our analyses in Chapters 2 and 3, assessing the impact of missing data
mechanisms rely on correct specification of the missing data model, which cannot be known
with certainty. Fifth, in imputing the total risk score for individual patients we assumed a linear

76

relationship between the patient NIHSS component and non-NIHSS variables (i.e. the sub-risk
score). This relationship may not be accurately captured, and should be validated using other
data sources. Finally, our simulation parameters were based off a hospital sample which did
not have substantial variation in severity between hospitals (ICC = 2.2%). Future studies should
be conducted to assess how profiling accuracy is impacted when greater variation in stroke
severity between hospitals is present, even though it remains unclear how much variation in
severity actually exists.
In conclusion, the accuracy hospital profiling of ischemic stroke mortality is in large part
a reflection of the sample size used to calculate hospital-level estimates, and sample size is
influenced by both documentation rates of key risk adjustment variables and hospital case
volume. Our simulation work shows that the mechanism of NIHSS missingness which is
associated with severity (MNAR) has only a minimal impact on hospital profiling accuracy.
However, even when NIHSS was completely documented, significant limitations in the accuracy
of current methods used to profile hospitals should be acknowledged, especially in low volume
hospitals. This study is innovative because it quantifies how much less accurate profiling
becomes as missing data proliferates, and how accuracy interacts with hospital case volume. It
also has advantages in that by using simulation methods we were able to determine the true
ranking of hospitals performance with certainty and had no residual confounding by case mix.

77

CHAPTER 5: DISCUSSION AND FUTURE DIRECTIONS
The overall aim of this study was to quantify the accuracy of hospital profiling when an
important risk adjustment variable is missing. Specifically, using simulation based methods we
investigated how hospital profiling based on ischemic stroke mortality is impacted when a
strong predictor of mortality56, stroke severity (i.e. NIHSS), is frequently undocumented.53,54,56
Furthermore, we investigated how the mechanism by which NIHSS is missing impacts profiling
accuracy, and how our findings are modified by hospital ischemic stroke volume. To test the
underlying hypothesis that ischemic stroke patients with NIHSS documented are not a random
sample of all patients, we conducted a series of analysis to identify patient- and hospital-level
characteristics that are associated with NIHSS documentation in an existing clinical stroke
registry (Michigan Stroke Registry). Additionally, we utilized the Heckman Selection Model as a
diagnostic tool to assess the presence and magnitude of selection bias in the clinical registry.
Summary of Findings
Our analysis of the Michigan Stroke Registry (MSR) revealed a number of important
findings. In Chapter 2, we found that at the patient- and hospital-level, patients with less
severe stroke were less likely to have NIHSS documented. Beyond that, we found that
documentation of NIHSS was a reflection of overall hospital-level documentation. Roughly a
quarter of the variation in documentation was attributed to the hospital in which the patient
was treated (ICC=25%). To illustrate the scale of this hospital-level variability, ICC’s associated
with hospital-level mortality and readmissions measures are typically below 5%.23,24,123,124 This
indicates that NIHSS documentation has both patient-level and hospital-level attributes; but

78

was not found to be accounted for by hospital characteristics such as annual stroke volume or
Joint Commission primary stroke center status due to a lack of power at the hospital-level.
Notably, patients whose stroke symptoms had resolved by arrival to the ER had one
tenth the odds of NIHSS documentation compared to patients who were still symptomatic upon
arrival. Assuming that the absence of stroke symptoms is recorded with accuracy for these
patients (who make up 6.5% of the registry) it would be reasonable to assume that these
patients had an NIHSS of 0, which could be imputed into current registries with some
confidence. We also found that patients who were administered tPA had higher rates of NIHSS
documentation compared to non-tPA patients, which has been previously suggested.66 If
patients were not administered tPA because they missed the window for treatment, they may
have worse outcomes compared to patients who received tPA. Thus, excluding these patients
because they are missing NIHSS may also bias hospital-level estimates of mortality.
When we applied the Heckman Selection Model to the same MSR data in Chapter 3, we
found as expected, evidence of selection bias in patients with documented NIHSS, although it
was rather modest (correlation coefficient: ρ = 0.11). The positive correlation also indicates
that as NIHSS increases (i.e. strokes are more severe), the probability of NIHSS documentation
also increases. We repeated the analysis in time periods with lower (documentation = 67% in
2009-2010) and higher rates (documentation = 87% in 2010-2012) of NIHSS documentation to
assess the impact of missing NIHSS data prevalence. Selection bias increased marginally when
documentation was lower; and conversely, when documentation was higher, selection bias
decreased. Together, these analyses support the hypothesis that patients with documented

79

NIHSS are not simply a random sample of all stroke patients at the patient- or hospital-level,
and that subsequent hospital-level estimates using this sample may subsequently be biased.
How selection bias at the patient-level translates to bias in hospital-level estimates is
not clear, and was what originally motivated our study. We employed computer simulations to
estimate the accuracy of hospital profiling based on ischemic stroke mortality under various
mechanisms and prevalences of missing NIHSS data. Simulations were essential in this
instance, because they allow us to assign a known (true) hospital-level mortality performance,
which is impossible to determine in real-world conditions.29,82,108,109 Since true hospital
performance is known, we can measure the diagnostic accuracy of profiling, using measures of
as sensitivity, specificity, and predictive value positive and negative, by comparing true hospital
performance with the performance estimated using current hospital profiling methods under
various scenarios of data documentation.
There are some other benefits to computer simulations that should be noted. One
benefit is that we employed risk adjustment models which were not subject to inadequate riskadjustment.108,109 This is because the fitted risk adjustment model was identical to the model
used in the data generation process. Consequently, any hospital misclassification cannot be
attributed to residual confounding from unmeasured case-mix differences, but to random
variation. Simulations also allow one to explore a variety of scenarios to be developed in order
to explore the modifying effect of other variables (such as hospital volume) and are ideal to
conduct sensitivity analyses of underlying parameters and assumptions.125 But, simulation
studies can be difficult to understand, which can lead to confusion when interpreting results

80

and making correct conclusions.126 They also rely on correct assumptions about real-world
data, which should be justified at each step.125
The results from our simulation studies in Chapter 4 can be succinctly summarized as
follows: 1.) the mechanism by which NIHSS is missing (i.e., MCAR, MNAR) plays only a minor
role in the accuracy of profiling, 2.) because of its effect on sample size the NIHSS
documentation rate (where cases with missing NIHSS data are deleted) has a substantial impact
on the accuracy of profiling, and 3.) the relationship between NIHSS documentation and
profiling accuracy was exacerbated by hospital ischemic stroke volume. In sum, the mechanism
by which NIHSS is missing is not as important in the context of profiling accuracy as is the
amount that is missing, and in the size of hospitals in which it is missing. This study illustrates
fundamental limitations of the profiling method by showing how the underlying sample size has
a profound effect on the accuracy of performance profiling.
The first assessment in Chapter 4 was the ability of the hierarchical model to accurately
estimate hospital rank order. We compared the rank order of true hospital performance to the
estimated rank order generated from the RSMR estimates. This assesses the accuracy of
profiling on a continuous scale, as opposed to the subsequent assessments, which
dichotomized hospitals as either outliers (i.e., high/low performers) or not outliers based on
arbitrary cut points. We found that in moderate and high volume hospitals, correlation
between these true and observed ranking was generally high (>0.80). But, as documentation of
NIHSS decreased, correlation between rankings also decreased, more markedly in moderate
sized hospitals. With perfect documentation, correlation between rankings in low volume

81

hospitals was moderate (ρ=0.72), but dropped to almost ρ=0.50 when documentation reduced
to 30%. The mechanism of missing NIHSS had negligible effect on the correlation coefficients.
The next assessment in Chapter 4 was the ability of the hierarchical logistic model to
correctly classify high/low performing hospitals, based on the true and estimated performance
rank-order. Two definitions of outlier hospitals were used; top/bottom 5th percentile and
top/bottom 20th percentile hospitals. We found that, in general, as documentation of NIHSS
reduced, the model sensitivity, specificity, PVP, and PVN all reduced. There was little variation
in these measures between mechanism of missing NIHSS at a given level of documentation and
hospital volume. Sensitivity was never higher than 80% in any scenario, and as expected was
much higher when categorizing hospitals in the top/bottom 20th percentiles compared to
top/bottom 5th percentiles, because it is easier to classify hospitals as high/low performers
when it is defined more broadly. Conversely, specificity was much higher when categorizing
hospitals into the top/bottom 5th percentiles. Again, the mechanism of missing NIHSS data had
only modest effects. Similar effects were observed for PVP and PVN
Our final analysis in Chapter 4 was to assess the magnitude of change between true
performance rankings and rankings based on calculated hospital risk-standardized mortality
rates (RSMRs). We found that observed performance rank order (which ranged from 1 to 100
in each simulation) could change significantly compared to the true performance rank order,
and this was especially evident in low volume hospitals. Changes in rankings between different
mechanisms of missing NIHSS data were again only modest or almost non-existent. Even with
perfect NIHSS documentation and perfect case-mix adjustment, hospitals in the top and bottom
quintile of true performance rankings changed on average 13 positions. As documentation of

82

NIHSS reduced, the average difference between observed and true performance rank order
increased to almost 24 positions in low volume hospitals in the top (1-20) and bottom (81-100)
quintile of hospital true performance rankings. While changes in position were not as volatile
in moderate and high volume hospitals, hospitals still changed at least an average of 5 positions
in the top and bottom quintile of true performance rankings.
Again, these findings illustrate that random noise after risk adjustment negatively
impacts hospital profiling, especially when sample size is low, due to shrinkage of RSMR point
estimates towards the mean. Previous work in the GWTG-Stroke population showed that
including NIHSS in risk adjustment improved the model fit and reclassified a significant
proportion of hospitals.53 However, more than half of ischemic stroke patients in GWTG-Stroke
were excluded from this analysis because they did not have NIHSS documented. We showed
that at this rate of NIHSS documentation, hospital RSMR rankings could change on average 9-16
positions in high volume hospitals, 12-18 positions in moderate volume hospitals, and 20-23
positions in low volume hospitals due to random variation alone. Given the great degree of
inaccuracy at this level of reporting, significant changes in rankings are not unexpected.
Limitations
There are several limitations of this study. First, our analysis used the Michigan Stroke
Registry (MSR), which has data on a limited number of hospitals and may not be representative
of all stroke patients. A greater proportion of MSR patients go to teaching hospitals (93% vs.
61%) and Joint Commission primary stroke center hospitals (78% vs. 65%) compared to patients
in the national GWTG-Stroke registry.96 Thus, patients in the MSR may be more similar to each
other compared to what may be seen in the GWTG-Stroke registry, and are likely different than

83

patients treated at all US hospitals. A repetition of our simulations using parameters estimated
from a more comprehensive dataset, such as the national GWTG-Stroke registry data linked
with Medicare claims data, would be useful in generalizing our results to data used in CMS payfor-performance schemes. Access to Medicare claims data would also allow for a direct
comparison with the risk-adjustment model currently proposed to profile hospitals on ischemic
stroke 30-day risk standardized mortality, which was not done in this study.18 Linking Medicare
claims data to GWTG-Stroke registry data may also allow for an evaluation of both a model with
and without NIHSS on the proposed 30-day risk standardized readmission measure for ischemic
stroke.
With regard to our simulations, there are other limitations to consider. First, we
simulated variation in patient- and hospital-level risk of mortality which reflects data observed
in the MSR. However, this variation was not substantial (ICC = 2.2%), and may not reflect what
is observed in most hospitals. Although this between-hospital variation in mortality is small, it
is consistent with prior estimates in the literature which are typically <5%.23,24,123,124 Additional
simulations should be conducted to reflect a greater between-hospital variation in risk, which
may have important consequences on our findings. We also did not examine the accuracy of
profiling as reported by the Hospital Compare program, which identifies hospitals with betteror worse-than-expected mortality rates based on a statistical test of the estimated RSMRs
relative to the average hospital.12 Future work should examine how missing data impacts the
accuracy of statistical outlier identification as used by the Hospital Compare program.
However, a previous study has already shown that the methods used in Hospital Compare to
identify outlier hospitals significantly underestimates poor performance in low volume hospitals

84

due to the shrinkage phenomenon.112 Furthermore, the missing not at random (MNAR)
mechanisms used in simulations were motivated by findings in Chapters 2 and 3, but may not
represent the actual missing data mechanism. Additional mechanisms, such as bimodal
mechanisms or mechanisms related to other important covariates, should be explored to
complement our analyses. However, our findings suggest that the mechanism by which data is
missing may have minimal impact on performance profiling.
Including NIHSS in Risk Adjustment Models for Stroke Performance Measures
Advocates for including NIHSS in risk adjustment models for ischemic stroke
performance measures will be energized by its addition to ICD-10 coding in administrative
data.127 Given its importance in patient-level outcome prediction56, the enthusiasm is
warranted. However, including it in risk adjustment models for hospital-level estimates of
performance should be approached with caution because it is frequently undocumented in
clinical registries. How complete documentation of NIHSS will be in ICD-10 is unknown. But,
documentation of NIHSS has been improving in clinical registries, such as the Get With the
Guidelines – Stroke national registry, where in recent years it has been as high as 70%. It is
likely that hospitals participating in clinical registries such as GWTG-Stroke represent a more
engaged and trained subset of hospitals and a concerted effort has been made by the GWTGStroke program to improve NIHSS documentation in participating hospitals. Hence, it may be
unreasonable to expect that NIHSS documentation in hospitals not involved in such programs
would achieve similar levels as those seen in more recent years of the GWTG-Stroke. Since our
study has shown that hospital-level documentation of NIHSS is a significant driver of patientlevel documentation, and has tremendous impact on the accuracy of ischemic stroke hospital

85

profiling, eagerness to include NIHSS in risk adjustment should be tempered, until NIHSS has
increased to an acceptable level, such as 80% or greater.
Our findings also showed that hospital-level NIHSS did not vary sufficiently between
hospitals in our sample. Little hospital-level variation of NIHSS was also illustrated in a study of
VA hospitals.58 The rational for the addition of NIHSS as a risk adjustment variable is weakened
if it does not vary sufficiently between hospitals to warrant inclusion.26 However, both the VA
study and our study sample may not be representative of most hospitals. While it has been
suggested that hospitals which see more severe strokes – such as tertiary referral centers or
Joint Commission primary stroke centers – may be at greater risk for misclassification if stroke
severity is not included in risk adjustment26,48,59, little evidence has been presented to support
that claim.
Further research should be done to investigate the amount of between-hospital
variation in stroke severity. Since sufficient between-hospital variation in patient-level
variables is a prerequisite for inclusion in risk adjustment models, understanding the extent of
between-hospital variation may help guide decisions about the need to include stroke severity
in models for ischemic stroke mortality and readmissions. Analysis should also be done to
understand if variation is driven by hospital-level characteristics, such as tertiary referral
centers or Joint Commission primary stroke centers. These characteristics may be able to serve
as proxies for stroke severity, which are easier to obtain than measures of stroke severity on
every patient.
Simulation studies could be used to assess how modifying the variation in case-mix at
the hospital level – particularly as it pertains to stroke severity – improves the accuracy of

86

hospital profiling. The variation in case-mix in our simulations reflected observed differences in
the MSR, but altering parameters of our simulation would allow us to investigate the impact of
greater variation in case-mix between hospitals. This could be achieved in two ways: 1.) by
increasing the overall amount of variation in hospital-level case mix, and 2.) by increasing the
proportion of case-mix which can be attributed to the hospital-level (i.e. intraclass correlation
of case-mix). This analysis would illustrate how the presence of missing NIHSS data impacts
hospital profiling when greater disparities in case-mix between hospitals are present.
Critique on Current Profiling Methodologies
The analysis presented here highlights important drawbacks to current methods of
hospital profiling in general. Pay-for-performance models assume that profiling methods can
accurately compare hospitals on predetermined performance measures after accounting for
patient case-mix. 19,20 However, a growing body of literature suggests that current profiling
methods are inadequate. Low sample size is a well documented limitation of hospital
profiling114-116, which is especially problematic in the context of stroke, given that low volume
settings have been shown to have higher rates of mortality in ischemic stroke117-119 and in other
clinical applications, such as surgical outcomes.120,121,128 Simulation studies have found that the
accuracy of hospital report cards in case volumes typically seen in clinical settings is low, and
further deteriorates in lower case volume hospitals.108,109
Our analysis echoed these concerns, showing that profiling accuracy is inextricably
linked with provider sample size. By any measure of accuracy, the estimated RSMR from the
HLM used to profile hospitals is less accurate as sample size reduces, either through hospital
volume or missing data. This is due to the effect of shrinkage in RSMR estimates toward the

87

mean when sample size is small, which was illustrated in Figure 4.9. Even when documentation
of NIHSS is 100%, our simulations show serious limitations in the accuracy of current profiling
methods. Data in Table 5.1 illustrate the observed Se, Sp, PVP and PVN across the hospital
volumes under the scenario of complete NIHSS documentation. Even in these best case
scenarios, when the definition of high/low performers is strict (i.e. top/bottom 5th percentiles),
sensitivity and PVP are quite poor, while specificity and PVN are generally high. If the definition
of top/bottom performer is expanded to include more hospitals (i.e. changed from top/bottom
5th percentile to 20th percentile), sensitivity and PVP increase, but at the expense of specificity
and PVN.
Table 5.1. Diagnostic ability of hierarchical logistic model to identify hospital high/low
performers when documentation of NIHSS is complete (i.e. no missing NIHSS data),
stratified by definition of high/low performer and hospital stroke volume.
th

Top/Bottom 5 Percentiles

th

Top/Bottom 20 Percentiles

Hospital Stroke Volume

Diagnostic Measure

n=100

n=300

n=500

n=100

n=300

n=500

Sensitivity

41%

58%

67%

60%

73%

78%

Specificity

94%

95%

90%

73%

82%

85%

Predictive Value Positive

42%

59%

67%

60%

73%

78%

Predictive Value Negative

94%

95%

96%

73%

82%

85%

th

Note: Diagnostic measures calculated using data from Table A.1 (top/bottom 5 percentiles) and Table A.2
th
(top/bottom 20 percentiles)

These findings illustrate the need to apply optimal decision making theory in order to
guide hospital performance profiling benchmarks by placing relative values/costs on identifying
false positive vs. false negative high/low performers.129,130 By making the definition of
high/low performer more conservative, i.e. changing from top/bottom 5% to 20%, we
substantially improved the sensitivity and PVP of HLM, but at the cost of specificity and PVN.

88

Austin, et al. found that decisions about the significance-level used to classify hospitals as
performance outliers can lead to outlier designations which are more or less preferable to
patients as opposed to providers, based on the values associated with false positive or false
negative hospital outliers.122 Decisions should be made regarding which classification is more
important, false negatives or false positives, and the potential economic impact of these
decisions in the context of pay-for-performance incentive structures.
Given that low hospital case volume will be omnipresent in any hospital profiling
scheme, future research should explore solutions to improve the accuracy of performance
profiling in these hospitals. In addition to traditional frequentist methods, Bayesian methods
can be used to provide further evidence that a hospital may indeed be a performance
outlier.20,122,131,132 Longitudinally profiling hospital performance may also be useful, especially
as data collection in hospitals becomes routine.133 Further simulation studies may also provide
insight into how many years of data should be pooled to profile hospitals accurately, especially
in the case of low volume hospitals.
Although we used data from clinical registries for the simulations, our findings are
equally relevant when considering the use of administrative data to profile hospitals.
Administrative data has previously been shown to lack important prognostic indicators as
compared to clinical datasets.22,134-136 Even if risk adjustment models are developed in
administrative datasets with similar model fit compared to clinical models, coding and
documentation inconsistencies between hospitals can threaten the validity of the hospital-level
estimates.40 Krumholz, et al. outlined standards for using administrative data to profile
hospitals, which outlines that data must be sufficiently high-quality and timely.22 We illustrated

89

how hospital-level performance profiling measures can be impacted when data are not
sufficiently high quality, such as in the case of NIHSS. A greater emphasis should be placed on
the quality and completeness of data used in risk adjustment models, whether data from
administrative sources or clinical registries are used.
If current profiling methods are going to continue to use administrative data, solutions
to missing or miscoded data are needed. Missing data methods such as maximum likelihood
estimation or multiple imputation may provide a solution to address frequently undocumented
or miscoded data. Multiple imputation has been shown to facilitate the identification of
provider outlier status, but can be sensitive to the assumptions made about reasons for missing
data.75 Our results show that the amount of missing data is more problematic than the
mechanism by which it is missing. Thus, any bias associated with imputing values using missing
data methods, especially when data are MNAR, is probably outweighed by the gain in sample
size. Future research could use simulation methods to compare the accuracy of hospital-level
estimates generated using imputation of missing data to those calculated by a complete case
analysis that excludes observations with missing data.
As mentioned in Chapter 1, mortality-based performance measures of mortality already
suffer from a plethora of limitations, including the inability to accurately discriminate between
“good” and “bad” hospitals39, the sensitivity of RSMRs to risk adjustment model specification4143

, and that few deaths in hospitals may actually be preventable.44,45 Additionally, low variability

in hospital-level performance measures between providers has been also shown to reduce the
accuracy of hospital performance classification.137 In Hospital Compare – the Medicare hospital
performance reporting system12 – hospitals in the top or bottom tier of performance have been

90

shown to be not statistically different from at least one hospital in the middle tier of
performance, suggesting that side-by-side comparisons using publicly reported performance
profiling measures may be misleading to consumers 138 Our research extends this observation
by illustrating that profiling accuracy is quite low in many instances, and that calculated RSMRs
are subject to a substantial amount of random noise, especially when sample size is low, either
through low hospital volume or through missing data.
Future Directions
We proposed a number of future directions as a result of our study, which can be
summarized as follows. First, a better understanding of hospital-level variation in stroke
severity is needed to assess its utility as a risk-adjustment variable for hospital-level
performance measures. Simulation studies may also illustrate how much between-hospital
variation in stroke severity (leading to case mix differences) is needed to impact hospital
profiling accuracy. Altering parameters in our simulations to mimic increased case-mix
variation is also needed to understand its impact on our findings. Second, obtaining simulation
parameters from a more comprehensive dataset, such as the GWTG-Stroke registry, would
improve the generalizability of our findings. Linking this with claims data would allow us to
compare an NIHSS risk-adjusted model with the current CMS model, obtain simulation
parameters associated with 30-day outcomes, and evaluate the accuracy of the 30-day
readmissions measures (RSRR) as well. Third, if NIHSS is to be included in profiling, missing data
methods (e.g. multiple imputation) should be explored to address the problem of missing data,
and its impact on profiling accuracy. Fourth, simulations should be done to obtain
bootstrapped standard errors and 95% confidence intervals for estimated 30-day outcomes of

91

individual hospitals, to evaluate the accuracy of the CMS method to identify performance
outliers as currently employed by the Hospital Compare program.12 Finally, we should explore
other statistical methods such as decision making theory to guide outlier performance
categorization, and Bayesian and longitudinal methods to assess their utility to accurately
profile hospitals compared to the current HLM method.
Conclusion
In sum, there are significant concerns about the validity and reliability of current
profiling methods which should be considered when developing policies which rely on accurate
performance comparisons. But, in spite of this evidence, healthcare stakeholders, such as the
CMS, are doubling down on pay-for-performance models which are tied to performance
profiling. U.S. Secretary of Health and Human Services Sylvia Burwell announced that by 2018,
90% of Medicare fee-for-services payments will be tied to quality or value139, which emphasizes
that it is critical to have reliable and valid measures of quality and value. Unless methods to
compare hospital performance are improved, a substantial proportion of hospital could be
unfairly punished for poor performance that may not actually be poor (i.e. low predictive value
positive), and hospitals that are providing poor care may go undetected (i.e. low sensitivity or
predictive value negative). It is important to note that we are not advocating for the
abandonment of pay-for-performance models or hospital profiling, but simply suggesting that
intrinsic limitations to current methods should be realized, and further research should be
conducted, such as in the outlined in the future directions section above, to create more robust
profiling methodology.

92

Ultimately, hospital performance profiling should be one method in a larger repertoire
of tools to assess hospital quality of care. Healthcare is multidimensional and interdependent,
and excelling in every category of hospital quality is important in its own right. While statistical
methodologies used to assess healthcare quality should continued to be improved upon,
healthcare providers should strive to improve all aspects of care, rather than focusing on a
handful of quality measures. Thomas H. Lee astutely conveyed this notion in a recent
editorial140 when he stated: “Reliability matters. Safety matters. Efficiency matters. Patient
experience matters. All of these dimensions of performance are intertwined, and interact to
define the quality of an institution’s care.”

93

CHAPTER 6: SUMMARY
Pay-for-performance schemes, which are currently used as a model to improve the
quality and value of care, rely on accurate comparisons of hospital performance, i.e. hospital
profiling. Proposed measures to profile hospitals on ischemic stroke mortality and
readmissions at thirty days have been controversial because they lack a measure of stroke
severity. The National Institutes of Health Stroke Scale (NIHSS) is a commonly used measure of
stroke severity that is highly predictive of patient outcomes; however, it is frequently missing in
large scale clinical databases and is currently completely absent from administrative data. With
the announcement that NIHSS is to be included in ICD-10 administrative coding, there will be
pressure to include it in risk adjustment models for ischemic stroke outcomes. But, if the
subsample of patients with documented NIHSS is a biased sample of ischemic stroke patients,
there is the potential that hospital-level estimates of mortality may also be biased, but the
extent of which is unknown.
The main contribution of this study is a quantification of the impact that missing data on
an important risk adjustment variable has on the accuracy of hospital profiling. We conclude
that the accuracy of hospital profiling is strongly impacted by missing data, although this is not
because the mechanism by which missing data occurs is important. Rather, missing NIHSS data
has an important effect on profiling because it results in a smaller “effective” hospital sample
size, which has a much stronger effect on profiling accuracy due to the impact of shrinkage on
estimated hospital random intercepts. Moreover, this study also illustrates limitations of
current profiling methods, even when perfect documentation and risk adjustment are achieved.
It is noteworthy that documentation of NIHSS is driven by a combination of both patient-level

94

and hospital-level factors. Furthermore, we note that hospital-level variation in actual NIHSS
scores in our sample of hospitals is not substantial, which, if true, would lessen the rationale for
including NIHSS in risk adjustment models. These findings are important when considering
covariates to be used in risk adjustment models, as well as the validity of profiling hospitals on
ischemic stroke mortality, and other hospital-level performance measures.

95

APPENDICES

96

Appendix A: Supplementary Tables
Table A.1. Average proportion (%) of hospital high/low performer classification for top/bottom 5th
percentile of rank-order (true positive, false positive, true negative, false negative) for different
mechanisms of missing NIHSS data, stratified by hospital stroke volume (n=100, 300, and 500).
Average Hospital High/Low Performer Classification (%)
Mechanism of
NIHSS
n=100
n=300
n=500
Missing NIHSS Doc. Rate
TP
FN
TN
FP
TP
FN
TN
FP
TP FN
TN
FP
30 2.3 7.7 82.4 7.6 3.9 6.1 83.9 6.1 4.8 5.2 84.8 5.2
40 2.8 7.2 82.8 7.2 4.4 5.6 84.4 5.6 5.2 4.8 85.2 4.8
50 3.2 6.8 83.2 6.8 4.7 5.3 84.7 5.3 5.6 4.4 85.6 4.4
60 3.3 6.7 83.3 6.7 5.1 4.9 85.1 4.9 5.9 4.2 85.9 4.2
MCAR
70 3.5 6.5 83.5 6.5 5.3 4.7 85.3 4.7 6.1 3.9 86.1 3.9
80 3.8 6.2 83.8 6.2 5.6 4.4 85.6 4.4 6.4 3.6 86.4 3.6
90 3.9 6.1 83.9 6.1 5.8 4.2 85.8 4.2 6.5 3.5 86.5 3.5
100 4.2 5.8 84.2 5.8 5.9 4.1 85.9 4.1 6.7 3.3 86.7 3.3
30 2.4 7.6 82.5 7.5 4.1 5.9 84.1 5.9 4.9 5.1 84.9 5.1
40 2.8 7.2 82.8 7.2 4.5 5.5 84.5 5.5 5.4 4.7 85.4 4.7
50 3.1 6.9 83.1 6.9 4.9 5.1 84.9 5.1 5.7 4.3 85.7 4.3
MNAR
60 3.3 6.7 83.3 6.7 5.2 4.8 85.2 4.8 5.9 4.1 85.9 4.1
Direct – Weak
70 3.6 6.4 83.6 6.4 5.4 4.6 85.4 4.6 6.3 3.7 86.3 3.7
80 3.7 6.3 83.7 6.3 5.6 4.4 85.6 4.4 6.3 3.7 86.3 3.7
90 3.9 6.1 83.9 6.1 5.8 4.2 85.8 4.2 6.5 3.5 86.5 3.5
100 4.0 6.0 84.0 6.0 5.9 4.1 85.9 4.1 6.7 3.4 86.7 3.4
30 2.0 8.0 82.5 7.5 3.7 6.3 83.7 6.3 4.5 5.5 84.5 5.5
40 2.6 7.4 82.7 7.3 4.2 5.8 84.2 5.8 5.1 4.9 85.1 4.9
50 3.3 6.7 83.3 6.7 5.0 5.0 85.0 5.0 5.8 4.2 85.8 4.2
MNAR
60 3.4 6.6 83.4 6.6 5.3 4.7 85.3 4.7 6.0 4.0 86.0 4.0
70 3.6 6.4 83.6 6.4 5.5 4.5 85.5 4.5 6.3 3.7 86.3 3.7
Direct – Strong
80 3.7 6.3 83.7 6.3 5.6 4.4 85.6 4.4 6.4 3.6 86.4 3.6
90 3.9 6.1 83.9 6.1 5.8 4.2 85.8 4.2 6.6 3.4 86.6 3.4
100 4.0 6.0 84.0 6.0 5.9 4.1 85.9 4.1 6.7 3.4 86.7 3.4
30 2.3 7.7 82.6 7.4 3.8 6.2 83.8 6.2 4.7 5.3 84.7 5.3
40 2.7 7.3 82.8 7.2 4.3 5.7 84.1 5.9 5.1 4.9 85.1 4.9
50 3.0 7.0 83.0 7.0 4.6 5.4 84.6 5.4 5.6 4.4 85.6 4.4
MNAR
60 3.3 6.7 83.3 6.7 4.9 5.1 84.9 5.1 5.9 4.1 85.9 4.1
Inverse – Weak
70 3.5 6.5 83.5 6.5 5.3 4.7 85.3 4.7 6.1 3.9 86.1 3.9
80 3.8 6.2 83.8 6.2 5.5 4.5 85.5 4.5 6.3 3.7 86.3 3.7
90 4.0 6.0 84.0 6.0 5.6 4.4 85.6 4.4 6.5 3.5 86.5 3.5
100 4.1 5.9 84.1 5.9 5.8 4.2 85.8 4.2 6.6 3.4 86.6 3.4
30 2.2 7.8 82.6 7.4 3.7 6.3 83.7 6.3 4.5 5.5 84.5 5.5
40 2.5 7.5 82.7 7.3 4.1 5.9 84.1 5.9 5.1 4.9 85.1 4.9
50 2.8 7.2 82.8 7.2 4.5 5.5 84.5 5.5 5.5 4.5 85.5 4.5
MNAR
60 3.2 6.8 83.2 6.8 4.9 5.1 84.9 5.1 5.7 4.3 85.7 4.3
Inverse – Strong
70 3.4 6.6 83.4 6.6 5.1 4.9 85.1 4.9 6.0 4.0 86.0 4.0
80 3.8 6.2 83.8 6.2 5.5 4.5 85.5 4.5 6.2 3.8 86.2 3.8
90 3.9 6.1 83.9 6.1 5.6 4.4 85.6 4.4 6.5 3.5 86.5 3.5
100 4.1 5.9 84.1 5.9 5.9 4.1 85.9 4.1 6.6 3.4 86.6 3.4
Abbreviations: MCAR = missing completely at random, MNAR = missing not at random, Doc. = documentation, TP
= true positive, FP = false positive, TN = true negative, FN = false negative
Note: Sensitivity = TP/(TP+FN), Specificity = TN/(TN+FP), PVP = TP/(TP+FP), PVN = TN/(TN+FN)

97

Table A.2. Average proportion (%) of hospital high/low performer classification for top/bottom 20th
percentile of rank-order (true positive, false positive, true negative, false negative) for different
mechanisms of missing NIHSS data, stratified by hospital stroke volume (n=100, 300, and 500).
Mechanism
NIHSS
Average Hospital High/Low Performer Classification (%)
of Missing
Doc.
n=100
n=300
n=500
FN
TN
FP
TP
FN
TN
FP
TP
FN
TN
FP
NIHSS
Rate TP

MCAR

MNAR
Direct – Weak

MNAR
Direct – Strong

MNAR
Inverse – Weak

MNAR
Inverse – Strong

30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
30

19.1
20.1
20.9
21.5
22.2
22.9
23.5
23.9
19.4
20.5
21.3
21.9
22.6
23.1
23.7
24.1
18.1
19.7
21.6
22.2
22.8
23.2
23.7
24.1
18.6
20.0
20.7
21.6
22.3
23.1
23.4
24.0

20.9
19.9
19.1
18.5
17.8
17.1
16.5
16.1
20.6
19.5
18.7
18.1
17.4
16.9
16.3
15.9
21.9
20.3
18.4
17.8
17.2
16.8
16.3
15.9
21.4
20.0
19.3
18.4
17.7
16.9
16.6
16.0

39.4
40.2
41.1
41.5
42.2
42.9
43.5
43.9
39.7
40.6
41.4
41.9
42.6
43.1
43.7
44.1
39.8
39.9
41.6
42.2
42.8
43.2
43.7
44.1
39.8
40.4
40.7
41.6
42.3
43.1
43.4
44.0

20.6
19.8
18.9
18.5
17.8
17.1
16.5
16.1
20.3
19.4
18.6
18.1
17.4
16.9
16.3
15.9
20.2
20.1
18.4
17.8
17.2
16.8
16.3
15.9
20.2
19.6
19.3
18.4
17.7
16.9
16.6
16.0

23.4
24.7
25.8
26.7
27.3
28.0
28.6
29.1
23.8
25.0
26.1
26.9
27.6
28.1
28.6
29.0
22.6
24.4
26.5
27.4
27.8
28.2
28.6
29.0
23.3
24.5
25.5
26.6
27.3
28.0
28.5
29.1

16.6
15.3
14.2
13.3
12.7
12.0
11.4
10.9
16.2
15.0
13.9
13.1
12.4
11.9
11.4
11.0
17.4
15.6
13.5
12.6
12.2
11.8
11.4
11.0
16.7
15.5
14.5
13.4
12.7
12.0
11.5
10.9

43.4
44.7
45.8
46.7
47.3
48.0
48.6
49.1
43.8
45.0
46.1
46.9
47.6
48.1
48.6
49.0
42.6
44.4
46.5
47.4
47.8
48.2
48.6
49.0
43.3
44.4
45.5
46.6
47.3
48.0
48.5
49.1

16.6
15.3
14.2
13.3
12.7
12.0
11.4
10.9
16.2
15.0
13.9
13.1
12.4
11.9
11.4
11.0
17.4
15.6
13.5
12.6
12.2
11.8
11.4
11.0
16.7
15.7
14.5
13.4
12.7
12.0
11.5
10.9

25.9
27.4
28.3
29.1
29.8
30.3
30.9
31.2
26.4
27.6
28.5
29.4
30.0
30.5
30.8
31.2
25.0
26.8
29.0
29.6
30.1
30.6
30.9
31.2
25.5
27.0
28.1
28.9
29.6
30.3
30.8
31.3

14.1
12.6
11.7
11.0
10.2
9.7
9.1
8.8
13.6
12.4
11.5
10.6
10.0
9.5
9.2
8.8
15.0
13.2
11.0
10.4
9.9
9.4
9.1
8.8
14.5
13.0
11.9
11.1
10.4
9.7
9.2
8.7

45.9
47.4
48.3
49.1
49.8
50.3
50.9
51.2
46.4
47.6
48.5
49.4
50.0
50.5
50.8
51.2
45.0
46.8
49.0
49.6
50.1
50.6
50.9
51.2
45.5
47.0
48.1
48.9
49.6
50.3
50.8
51.3

14.1
12.6
11.7
11.0
10.2
9.7
9.1
8.8
13.6
12.4
11.5
10.6
10.0
9.5
9.2
8.8
15.0
13.2
11.0
10.4
9.9
9.4
9.1
8.8
14.5
13.0
11.9
11.1
10.4
9.7
9.2
8.7

17.8

22.2

39.7

20.3

22.7

17.3

42.7

17.3

24.9

15.1

44.9

15.1

40 19.5 20.5 40.2 19.8 24.0 16.0 44.0 16.0 26.6 13.4 46.6 13.4
50 20.5 19.5 40.5 19.5 25.2 14.8 45.2 14.8 27.7 12.3 47.7 12.3
60 21.5 18.5 41.5 18.5 26.1 13.9 46.1 13.9 28.7 11.3 48.7 11.3
70 22.0 18.0 42.0 18.0 27.0 13.0 47.0 13.0 29.5 10.5 49.5 10.5
80 22.6 17.4 42.6 17.4 28.0 12.0 48.0 12.0 30.3 9.7 50.3 9.7
90 23.5 16.5 43.5 16.5 28.6 11.4 48.6 11.4 30.6 9.4 50.6 9.4
100 24.0 16.0 44.0 16.0 29.2 10.8 49.2 10.8 31.3 8.7 51.3 8.7
Abbreviations: MCAR = missing completely at random, MNAR = missing not at random, Doc. = documentation,
TP = true positive, FP = false positive, TN = true negative, FN = false negative
Note: Sensitivity = TP/(TP+FN), Specificity = TN/(TN+FP), PVP = TP/(TP+FP), PVN = TN/(TN+FN)

98

Table A.3. Average absolute change in hospital RMSR Rankings (# of positions) in different scenarios of missing
NIHSS data, stratified by quintile of true hospital ranking and hospital stroke volume (n=100, 300, and 500).
Mechanism of
Missing NIHSS

MCAR

MNAR
Direct – Weak

MNAR
Direct – Strong

MNAR
Indirect – Weak

NIHSS
Doc. Rate
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
30
40
50
60
70
80

Average Absolute Change Between True and Observed Rankings (# of Positions)
n=100

n=300

n=500

1Q

2Q

3Q

4Q

5Q

1Q

2Q

3Q

4Q

5Q

1Q

2Q

3Q

4Q

5Q

23.4
21.3
19.5
18.1
17.0
16.2
15.3
14.6
22.4
20.3
18.7
17.6
16.5
15.7
14.9
14.3
24.7
21.6
18.3
17.1
16.3
15.5
14.8
14.3
23.9
21.4
19.7
18.1
17.0
15.8

23.0
21.7
20.9
20.4
19.7
18.9
18.7
18.1
22.9
21.8
20.8
20.3
19.8
19.2
18.7
18.2
23.6
22.4
20.8
20.1
19.5
19.0
18.6
18.2
23.1
21.8
21.1
20.7
19.8
19.2

22.7
22.0
21.2
21.1
20.4
19.9
19.4
19.0
22.3
21.5
21.1
20.5
20.0
19.7
19.1
18.8
22.4
22.0
20.8
20.3
20.0
19.4
19.0
18.8
22.4
21.7
21.6
20.8
20.2
19.7

23.4
22.0
21.1
20.6
19.7
19.2
18.6
18.0
22.6
21.9
20.9
20.3
19.7
19.2
18.5
18.0
23.5
22.4
20.9
20.1
19.4
18.8
18.5
18.0
23.5
22.1
21.5
20.7
20.1
19.2

21.8
19.5
18.0
16.5
15.6
14.4
13.6
13.0
21.3
19.1
17.3
16.1
15.1
14.3
13.6
12.9
24.0
20.6
17.0
15.7
15.0
14.2
13.5
12.9
22.6
20.2
18.2
16.7
15.5
14.5

15.1
13.2
11.9
10.6
10.0
9.3
8.7
8.3
14.4
12.7
11.3
10.5
9.8
9.2
8.6
8.3
16.2
13.6
11.0
10.2
9.5
9.0
8.6
8.3
15.4
13.7
12.2
11.2
10.3
9.4

18.5
17.3
16.1
15.2
14.5
13.7
13.2
12.7
18.1
16.9
16.0
15.0
14.4
13.9
13.2
12.9
19.3
17.6
15.5
14.7
14.2
13.6
13.3
12.9
18.7
17.4
16.6
15.5
14.7
13.9

19.2
18.0
17.0
16.2
15.4
14.7
14.3
13.7
19.0
17.5
16.6
16.0
15.3
14.6
14.1
13.7
20.0
18.3
16.3
15.6
14.9
14.6
14.1
13.7
19.5
18.4
17.1
16.1
15.5
14.9

18.6
17.3
15.9
14.9
14.2
13.6
13.0
12.4
18.2
16.7
15.5
14.6
14.1
13.4
12.8
12.4
19.6
17.7
15.4
14.4
13.7
13.3
12.9
12.4
18.9
17.6
16.2
15.2
14.5
13.7

13.8
12.0
10.7
9.7
8.9
8.3
7.7
7.3
13.2
11.6
10.2
9.4
8.7
8.2
7.6
7.3
15.0
12.4
9.9
9.1
8.5
8.1
7.6
7.3
14.2
12.2
11.0
9.9
9.0
8.4

11.7
10.2
9.0
8.2
7.7
7.1
6.7
6.3
11.3
9.8
8.9
8.2
7.7
7.1
6.7
6.4
12.7
10.7
8.5
7.9
7.3
7.0
6.7
6.4
12.4
10.4
9.3
8.4
7.8
7.1

16.2
14.8
13.8
12.9
12.2
11.5
11.0
10.5
15.6
14.1
13.3
12.4
11.8
11.3
10.8
10.4
16.8
15.0
13.0
12.2
11.6
11.1
10.7
10.4
16.3
15.2
14.0
13.1
12.2
11.5

17.0
15.7
14.7
13.8
13.0
12.4
11.9
11.5
16.5
15.5
14.2
13.5
12.9
12.2
11.7
11.3
17.8
16.2
13.9
13.2
12.7
12.2
11.7
11.3
17.3
16.1
15.0
14.0
13.2
12.5

15.8
14.4
13.2
12.4
11.6
10.9
10.4
10.0
15.5
14.1
13.1
12.3
11.5
11.0
10.5
10.1
16.8
15.0
12.6
12.0
11.4
10.8
10.4
10.1
16.5
14.9
13.7
12.7
11.7
11.2

10.5
9.0
8.1
7.3
6.7
6.2
5.8
5.5
10.2
8.9
7.9
7.1
6.6
6.2
5.8
5.6
12.0
9.8
7.7
6.9
6.5
6.1
5.8
5.6
11.0
9.6
8.3
7.5
6.9
6.3

99

Table A.3. (cont’d) Average absolute change in hospital RMSR Rankings (# of positions) in different scenarios of
missing NIHSS data, stratified by quintile of true hospital ranking and hospital stroke volume (n=100, 300, and 500).
Mechanism of
Missing NIHSS
MNAR
Indirect – Weak

NIHSS
Doc. Rate

1Q

Average Absolute Change Between True and Observed Rankings (# of Positions)
n=100
n=300
n=500
2Q 3Q 1Q 2Q 3Q 1Q 2Q 3Q 1Q 2Q 3Q 1Q 2Q

90
15.1 18.7 19.1 18.7
100
14.3 18.1 18.6 18.1
30
24.7 23.3 22.3 23.6
40
22.3 22.4 22.0 22.7
50
20.3 21.8 21.8 21.8
MNAR
60
18.7 20.7 21.0 20.7
70
17.4 20.2 20.3 20.2
Indirect – Strong
80
16.4 19.4 19.8 19.4
90
15.3 18.6 19.2 18.8
100
14.3 18.1 18.6 18.1
Quintiles: 1Q:1-20, 2Q: 21-40, 3Q: 41-60, 4Q: 61-80, 5Q: 81-100

13.7
12.9
23.6
20.9
18.8
17.1
15.9
14.8
13.7
12.9

8.9
8.3
16.2
14.3
12.5
11.6
10.5
9.7
8.9
8.2

100

13.3
12.8
19.4
18.0
16.8
15.6
14.8
14.0
13.4
12.7

14.2
13.7
19.7
18.6
17.7
16.8
16.0
15.1
14.4
13.7

13.1
12.6
19.4
17.8
16.7
15.6
14.6
13.8
13.1
12.4

7.8
7.4
14.9
12.9
11.5
10.2
9.3
8.5
7.9
7.4

6.7
6.3
12.8
11.1
9.9
8.9
7.9
7.2
6.8
6.3

11.0
10.5
17.0
15.6
14.4
13.4
12.4
11.7
11.1
10.5

12.0
11.4
18.0
16.6
15.4
14.3
13.6
12.7
12.1
11.4

10.6
10.1
16.8
15.3
14.1
13.0
12.1
11.4
10.8
10.1

3Q
5.9
5.6
11.7
9.9
8.6
7.8
7.0
6.4
6.0
5.6

Appendix B: Supplementary Figures

Figure B.1. Pearson correlation coefficients between true rankings and RSMR rankings as NIHSS
documentation increases under different mechanisms of missing NIHSS data. Results are
stratified by hospital stroke volume.

101

Appendix C: IRB Determination

102

Appendix D: Example Data Generation SAS Code
/****************************************************************************
Title:
Data Generation for Simulation Modeling
Date: 11/10/14
Descr.: SAS code to generate data that is similar in structure to the
Michigan Stroke Registry (MSR). Does not include changes in Risk score
distribution as noted by the primary stroke center status. Does include
differences in missing NIHSS frequency
****************************************************************************/
/* Suppress log - nonotes=no log statements, notes=log statements */
options nonotes;
libname sim "L:\MASCOTS\Mike\Dissertation\Analysis\Simulation Runs";
/* Set # of samples to run (S) and Hospitals (M) per sample */
%Let S=500;
%Let M=100;
data init;
call streaminit(02052015);
do Sampleid = 1 to &S ;
/* Number of Patients per Hospital */
do hospid = 1 to &M;
/* b0 --> Assigned hospital random intercept "true ranking" */
b0 = rand("Normal", 0, sqrt(0.13));
do vol = 500;
hospSRS = rand("Normal", 0, sqrt(1.5));
do rep = 1 to vol;
/* SRS = Sub-risk score */
ptSRS = rand("Normal", 0, sqrt(68.0));
muSRS = ptsrs + hospSRS;
SRS = 21.4 + muSRS;
output;
end;
end;
end;
end;
run;
data init;
set init;
PatID = _N_;
if SRS<0 then delete; if SRS>44 then delete;
/* Generate NIHSS Categories and RS weights from Eta - based on ordinal
model cut points. Note: To change frequency of categories, adjust cut
points as necessary */

103

/* NIHSS */
eps = rand("Normal", 0, 1);
eta = -0.050*SRS + eps;
if eta>=-0.63 then nih=1;
if -0.63> eta >=-1.26 then nih=2;
if -1.26> eta >=-1.79 then nih=3;
if -1.79> eta >=-2.16 then nih=4;
if -2.16> eta >=-2.55 then nih=5;
if -2.55> eta >=-2.98 then nih=6;
if eta<-2.98 then nih=7;

if
if
if
if
if
if
if

nih=1
nih=2
nih=3
nih=4
nih=5
nih=6
nih=7

then
then
then
then
then
then
then

rsnih=0;
rsnih=10;
rsnih=21;
rsnih=37;
rsnih=48;
rsnih=56;
rsnih=65;

/* Total Risk Score (TRS) and probability from algorithm calculated */
TRS = SRS + rsNIH;
logitphat = -4.4 + 0.054*trs + b0;
phat = exp(logitphat) / (1 + exp(logitphat) );
/* Calc of Patient Mortality - use parameters from registry model*/
died = rand("Bernoulli", phat);

run;

drop hospsrs ptsrs musrs eta eps;
do doc=30 to 100 by 10;
output;
end;

data Miss;
set init;
/* Missingness Scenario */
/* MCAR */
/*if doc=100 then obs=rand("Bernoulli", 1.00);
if doc=90 then obs=rand("Bernoulli", 0.90);
if doc=80 then obs=rand("Bernoulli", 0.80);
if doc=70 then obs=rand("Bernoulli", 0.70);
if doc=60 then obs=rand("Bernoulli", 0.60);
if doc=50 then obs=rand("Bernoulli", 0.50);
if doc=40 then obs=rand("Bernoulli", 0.40);
if doc=30 then obs=rand("Bernoulli", 0.30);*/
if
if
if
if
if
if
if
if
if

doc=100 then obslow=1;
doc=90 then obslow=rand("Bernoulli",
doc=80 then obslow=rand("Bernoulli",
doc=70 then obslow=rand("Bernoulli",
doc=60 then obslow=rand("Bernoulli",
doc=50 then obslow=rand("Bernoulli",
doc=40 then obslow=rand("Bernoulli",
doc=30 then obslow=rand("Bernoulli",
doc=100 then obshigh=1;

1/(1+exp(-(2.00 + .095*nih))));
1/(1+exp(-(1.15 + .095*nih))));
1/(1+exp(-(0.60 + .095*nih))));
1/(1+exp(-(0.17 + .095*nih))));
1/(1+exp(-(-0.25 + 0.095*nih))));
1/(1+exp(-(-0.65 + 0.095*nih))));
1/(1+exp(-(-1.10 + 0.095*nih))));

104

if
if
if
if
if
if
if

doc=90
doc=80
doc=70
doc=60
doc=50
doc=40
doc=30

then
then
then
then
then
then
then

obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",
obshigh=rand("Bernoulli",

1/(1+exp(-(1.65 + 0.225*nih))));
1/(1+exp(-(0.85 + 0.225*nih))));
1/(1+exp(-(0.29 + 0.225*nih))));
1/(1+exp(-(-0.15 + 0.225*nih))));
1/(1+exp(-(-0.58 + 0.225*nih))));
1/(1+exp(-(-1.45 + 0.225*nih))));
1/(1+exp(-(-2.00 + 0.225*nih))));

run;
proc sort data=miss; by doc sampleid; run;
* Step 0: Import data set. The dataset "original" will be used throughout
the analysis as necessary, no need to change the name of the variable. Also,
create hospid variable based on whatever hospital id is being used in your
data. This dataset uses the outcome of "died" as binary a outcome (died=1,
alive=0). Change outcome as necessary, but be sure that event=1 for the
analysis. This analysis uses a single risk score variable, x1, as the
independent predictor, but more can be inserted as needed.;
/*******P/E Method - Hierarchical Logist Regression Model ***************/
/* LOW */
* Step 1: Calculate predicted probability of mortality. NOTE: This
probability includes hospital random effect in the calculation. (i.e. blup
statement);
ods trace on;
ods select solutionr /*parameterestimates*/ ;
title "Model for P/E Ratio Rankings";
proc glimmix data=miss initglm ;
where obslow=1;
by doc SampleID;
class hospid;
model died(event='1') = TRS / dist=binary link=logit ddfm=bw ;
random int / subject=hospid s;
nloptions tech=nrridg;
output out=gmxout1 pred(blup ilink)=phatpred pred(noblup ilink)=phatexp
;
ods output "Solution for Random Effects"=solutionr
/*"Solutions for Fixed Effects"=param*/;
run;
data solutionr;
set solutionr;
newvar=compress(subject,'hospid ');
hospid=newvar*1;
drop newvar subject effect;;
run;

105

* Step 2: Calculate predicted, expected and observed deaths for each
hospital. This is done by summing the predicted probability, expected
probability, and observed deaths for each patient in the hospital.;
proc means data=gmxout1 noprint;
by doc SampleID hospid;
output out=PE sum(phatpred)=pred sum(phatexp)=exp sum(died)=obsdied
mean(b0)=b0 ;
run;
* Step 3: Calculate: PMR/EMR, P/E ratio, SMR-P/E, OMR/EMR, O/E ratio, SMRO/E, observed hospital random intercept and outlier status;
data PElow;
merge PE solutionr;
by doc sampleid hospid;
pmr=pred/_FREQ_*100;
emr=exp/_FREQ_*100;
PEratio=pmr/emr;
SMRPE=15*peratio;
omr=obsdied/_freq_*100;
OEratio=omr/emr;
SMROE=15*oeratio;
hosp=put(hospid, 3.);
obsb0=estimate;
drop _type_ df estimate;
run;
/* Rankings of Hospitals */
title "True rankings";
proc rank data=pelow out=pelowrank;
by doc sampleid;
var b0;
ranks b0rank;
run;
title "Observed RI Hospital rankings";
proc rank data=pelowrank out=pelowrank;
by doc sampleid;
var obsb0;
ranks obsb0rank;
run;

title "RSMR - P/E rankings";
proc rank data=pelowrank out=pelowrank;
by doc sampleid;
var SMRPE;
ranks SMRPErank;
run;

106

title "RSMR - O/E rankings";
proc rank data=pelowrank out=pelowrank;
by doc sampleid;
var SMROE;
ranks SMROErank;
run;

107

Appendix E:

Example Simulation Assessment SAS Code

/********* Assessment of Observed v True Random Intercepts ****************/
/* Se, Sp, PVP, PVN for True Outlier status */
proc means data=pemain2 noprint;
class size mechanism doc sampleid;
output out=means3 sum(tp5rank)=tp5rank sum(tn5rank)=tn5rank
sum(fn5rank)=fn5rank sum(fp5rank)=fp5rank
sum(tp20rank)=tp20rank sum(tn20rank)=tn20rank
sum(fn20rank)=fn20rank sum(fp20rank)=fp20rank
sum(pe20)=pe20 sum(pe5)=pe5;
run;
data means4;
set means3;
if _type_ ne 15 then delete;
Se5=(tp5rank/(tp5rank+fn5rank))*100;
Se20=(tp20rank/(tp20rank+fn20rank))*100;
Sp5=(tn5rank/(tn5rank+fp5rank))*100;
Sp20=(tn20rank/(tn20rank+fp20rank))*100;
PVP5=(tp5rank/(tp5rank+fp5rank))*100;
PVP20=(tp20rank/(tp20rank+fp20rank))*100;
PVN5=(tn5rank/(tn5rank+fn5rank))*100;
PVN20=(tn20rank/(tn20rank+fn20rank))*100;
pe20=pe20/_freq_*100;
pe5=pe5/_freq_*100;
run;
proc means data=means4 noprint;
class size mechanism doc;
output out=acc2
mean(Se5)=Se5 stderr(Se5)=Se5Err
mean(Se20)=Se20 stderr(Se20)=Se20Err
mean(Sp5)=Sp5 stderr(Sp5)=Sp5Err
mean(Sp20)=Sp20 stderr(Sp20)=Sp20Err
mean(PVP5)=PVP5 stderr(PVP5)=PVP5Err
mean(PVP20)=PVP20 stderr(pvp20)=PVP20Err
mean(PVN5)=PVN5 stderr(PVN5)=PVN5Err
mean(PVN20)=PVN20 stderr(PVN20)=PVN20Err
mean(pe20)=pe20 stderr(pe20)=pe20err
mean(pe5)=pe5 stderr(pe5)=pe5err
mean(tp5rank)=TP5 mean(tn5rank)=TN5
mean(fp5rank)=FP5 mean(fn5rank)=FN5
mean(tp20rank)=TP20 mean(tn20rank)=TN20
mean(fp20rank)=FP20 mean(fn20rank)=FN20;
run;
data accuracy;
set acc2;
if _type_ ne 7 then delete;
run;

108

/* Sensitivity */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=se5_1 / nomarkers legendlabel="MCAR 5%"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=se5_2 / nomarkers legendlabel="Direct-Weak 5%"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=se5_3 / nomarkers legendlabel="Direct-Strong 5%"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=se5_4 / nomarkers legendlabel="Inverse-Weak 5%"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=se5_5 / nomarkers legendlabel="Inverse-Strong 5%"
lineattrs=(thickness=2 color=light_red pattern=1);
loess x=doc y=se20_1 / nomarkers legendlabel="MCAR 20%"
lineattrs=(thickness=2 color=black pattern=4);
loess x=doc y=se20_2 / nomarkers legendlabel="Direct-Weak 20%"
lineattrs=(thickness=2 color=dark_blue pattern=4);
loess x=doc y=se20_3 / nomarkers legendlabel="Direct-Strong 20%"
lineattrs=(thickness=2 color=light_blue pattern=4);
loess x=doc y=se20_4 / nomarkers legendlabel="Inverse-Weak 20%"
lineattrs=(thickness=2 color=dark_red pattern=4);
loess x=doc y=se20_5 / nomarkers legendlabel="Inverse-Strong 20%"
lineattrs=(thickness=2 color=light_red pattern=4);
rowaxis label="Sensitivity (%)" values=(20 to 80 by 5) grid;
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=2 valueattrs=(size=6) ;
format size size.;
run;
/* Specificity */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=sp5_1 / nomarkers legendlabel="MCAR 5%"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=sp5_2 / nomarkers legendlabel="Direct-Weak 5%"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=sp5_3 / nomarkers legendlabel="Direct-Strong 5%"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=sp5_4 / nomarkers legendlabel="Inverse-Weak 5%"
lineattrs=(thickness=2 color=dark_red pattern=1);

109

loess x=doc y=sp5_5 / nomarkers legendlabel="Inverse-Strong 5%"
lineattrs=(thickness=2 color=light_red pattern=1);
loess x=doc y=sp20_1 / nomarkers legendlabel="MCAR 20%"
lineattrs=(thickness=2 color=black pattern=4);
loess x=doc y=sp20_2 / nomarkers legendlabel="Direct-Weak 20%"
lineattrs=(thickness=2 color=dark_blue pattern=4);
loess x=doc y=sp20_3 / nomarkers legendlabel="Direct-Strong 20%"
lineattrs=(thickness=2 color=light_blue pattern=4);
loess x=doc y=sp20_4 / nomarkers legendlabel="Inverse-Weak 20%"
lineattrs=(thickness=2 color=dark_red pattern=4);
loess x=doc y=sp20_5 / nomarkers legendlabel="Inverse-Strong 20%"
lineattrs=(thickness=2 color=light_red pattern=4);
rowaxis label="Specificity (%)" values=(60 to 100 by 5) grid;
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=2 valueattrs=(size=6) ;
format size size.;
run;
/* PVP */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=pvp5_1 / nomarkers legendlabel="MCAR 5%"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=pvp5_2 / nomarkers legendlabel="Direct-Weak 5%"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=pvp5_3 / nomarkers legendlabel="Direct-Strong 5%"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=pvp5_4 / nomarkers legendlabel="Inverse-Weak 5%"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=pvp5_5 / nomarkers legendlabel="Inverse-Strong 5%"
lineattrs=(thickness=2 color=light_red pattern=1);
loess x=doc y=pvp20_1 / nomarkers legendlabel="MCAR 20%"
lineattrs=(thickness=2 color=black pattern=4);
loess x=doc y=pvp20_2 / nomarkers legendlabel="Direct-Weak 20%"
lineattrs=(thickness=2 color=dark_blue pattern=4);
loess x=doc y=pvp20_3 / nomarkers legendlabel="Direct-Strong 20%"
lineattrs=(thickness=2 color=light_blue pattern=4);
loess x=doc y=pvp20_4 / nomarkers legendlabel="Inverse-Weak 20%"
lineattrs=(thickness=2 color=dark_red pattern=4);
loess x=doc y=pvp20_5 / nomarkers legendlabel="Inverse-Strong 20%"
lineattrs=(thickness=2 color=light_red pattern=4);
rowaxis label="Predictive Value Positive (%)" values=(20 to 80 by 5)
grid;

110

colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=2 valueattrs=(size=6) ;
format size size.;
run;
/* PVN */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=pvn5_1 / nomarkers legendlabel="MCAR 5%"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=pvn5_2 / nomarkers legendlabel="Direct-Weak 5%"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=pvn5_3 / nomarkers legendlabel="Direct-Strong 5%"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=pvn5_4 / nomarkers legendlabel="Inverse-Weak 5%"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=pvn5_5 / nomarkers legendlabel="Inverse-Strong 5%"
lineattrs=(thickness=2 color=light_red pattern=1);
loess x=doc y=pvn20_1 / nomarkers legendlabel="MCAR 20%"
lineattrs=(thickness=2 color=black pattern=4);
loess x=doc y=pvn20_2 / nomarkers legendlabel="Direct-Weak 20%"
lineattrs=(thickness=2 color=dark_blue pattern=4);
loess x=doc y=pvn20_3 / nomarkers legendlabel="Direct-Strong 20%"
lineattrs=(thickness=2 color=light_blue pattern=4);
loess x=doc y=pvn20_4 / nomarkers legendlabel="Inverse-Weak 20%"
lineattrs=(thickness=2 color=dark_red pattern=4);
loess x=doc y=pvn20_5 / nomarkers legendlabel="Inverse-Strong 20%"
lineattrs=(thickness=2 color=light_red pattern=4);
rowaxis label="Predictive Value Negative (%)" values=(60 to 100 by 5)
grid;
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=2 valueattrs=(size=6) ;
format size size.;
run;

111

/* Spearman Correlation */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=spcorr1 / nomarkers legendlabel="MCAR"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=spcorr2 / nomarkers legendlabel="Direct-Weak"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=spcorr3 / nomarkers legendlabel="Direct-Strong"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=spcorr4 / nomarkers legendlabel="Inverse-Weak"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=spcorr5 / nomarkers legendlabel="Inverse-Strong"
lineattrs=(thickness=2 color=light_red pattern=1);
rowaxis label="Spearman Rank Correlation Coefficient" values=(0.4 to 1
by 0.1) grid;
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=1 valueattrs=(size=6) ;
format size size.;
run;
/* Pearson Correlation */
ods graphics / imagefmt=png height=4in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=assessment2 noautolegend;
panelby size/ columns=3 novarname;
loess x=doc y=pearcorr1 / nomarkers legendlabel="MCAR"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=pearcorr2 / nomarkers legendlabel="Direct-Weak"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=pearcorr3 / nomarkers legendlabel="Direct-Strong"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=pearcorr4 / nomarkers legendlabel="Inverse-Weak"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=pearcorr5 / nomarkers legendlabel="Inverse-Strong"
lineattrs=(thickness=2 color=light_red pattern=1);
rowaxis label="Pearson Correlation Coefficient" values=(0.4 to 1 by
0.1) grid;
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom across=5 down=1 valueattrs=(size=6) ;
format size size.;
run;

112

/*********** Difference between True/Observed Rankings *********/
ods graphics / imagefmt=png height=7.5in width=6.5in antialias=on
antialiasmax=1000;
ods listing device=png image_dpi=300;
title font=arial;
proc sgpanel data=difftable2 noautolegend;
panelby size rankcat / colheaderpos=top layout=lattice onepanel
novarname ;
loess x=doc y=pediff1 / nomarkers legendlabel="MCAR"
lineattrs=(thickness=2 color=black pattern=1);
loess x=doc y=pediff2 / nomarkers legendlabel="Direct-Weak"
lineattrs=(thickness=2 color=dark_blue pattern=1);
loess x=doc y=pediff3 / nomarkers legendlabel="Direct-Strong"
lineattrs=(thickness=2 color=light_blue pattern=1);
loess x=doc y=pediff4 / nomarkers legendlabel="Inverse-Weak"
lineattrs=(thickness=2 color=dark_red pattern=1);
loess x=doc y=pediff5 / nomarkers legendlabel="Inverse-Strong"
lineattrs=(thickness=2 color=light_red pattern=1);
rowaxis grid label="Absolute Change in Hospital RMSR Rankings (# of
Positions)"
values=(4 to 26 by 4) valueattrs=(size=6);
colaxis label="NIHSS Documentation Rate (%)" values=(30 to 100 by 10)
valueattrs=(size=6);
keylegend / position=bottom valueattrs=(size=6) ;
format size size. mechanism mechanism. rankcat rankcat.;
run;

113

BIBLIOGRAPHY

114

BIBLIOGRAPHY

1.

Go AS, Mozaffarian D, Roger VL, et al. Heart Disease and Stroke Statistics—2013 Update:
A Report From the American Heart Association. Circulation. January 1, 2013
2013;127(1):e6-e245.

2.

AHRQ. Household component summary table. Table 4: Total Expenses and Percent
Distribution for Selected Conditions by Source of Payment: United States, 2011.
http://meps.ahrq.gov/mepsweb/data_stats/tables_compendia_hh_interactive.jsp?_SER
VICE=MEPSSocket0&_PROGRAM=MEPSPGM.TC.SAS&File=HCFY2011&Table=HCFY2011_
CNDXP_D&_Debug=. Accessed November, 15, 2013.

3.

Stepanova M, Venkatesan C, Altaweel L, Mishra A, Younossi ZM. Recent trends in
inpatient mortality and resource utilization for patients with stroke in the United States:
2005-2009. Journal of stroke and cerebrovascular diseases : the official journal of
National Stroke Association. 2013;22:491-499.

4.

Wier LM, Andrews RM. The National Hospital Bill: The Most Expensive Conditions by
Payer, 2008. Rockville, MD: Agency for Healthcare Research and Quality;2011.

5.

Centers for Medicare & Medicaid Services. Hospital Inpatient Quality Reporting
Program. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-AssessmentInstruments/HospitalQualityInits/HospitalRHQDAPU.html. Accessed January 8, 2014,
2014.

6.

Centers for Medicare & Medicaid Services. Hospital Value-Based Purchasing. 2013;
http://www.cms.gov/Medicare/Quality-Initiatives-Patient-AssessmentInstruments/hospital-value-based-purchasing/index.html?redirect=/hospital-valuebased-purchasing. Accessed January 8, 2014, 2014.

7.

O'Kane ME. Performance-Based Measures: The Early Results Are In. Journal of Managed
Care Pharmacy. 2007;13(2(Suppl S-b)):S3-S6.

8.

Moses III H, Matheson DM, Dorsey E, George BP, Sadoff D, Yoshimura S. The Anatomy of
Health Care in the United States. JAMA. 2013;310(18):1947-1964.

9.

Sisko AM, Keehan SP, Cuckler GA, et al. National health expenditure projections, 201323: faster growth expected with expanded coverage and improving economy. Health
Aff. (Millwood). Oct 1 2014;33(10):1841-1850.

115

10.

James J. Health Policy Brief: Pay-for-Performance. 2012;
http://www.healthaffairs.org/healthpolicybriefs/brief.php?brief_id=78. Accessed
November 11, 2014, 2014.

11.

Centers for Medicare & Medicaid Services. Outcome Measures.
http://www.cms.gov/Medicare/Quality-Initiatives-Patient-AssessmentInstruments/HospitalQualityInits/OutcomeMeasures.html. Accessed October 24, 2014,
2014.

12.

Centers for Medicare & Medicaid Services. What is Hospital Compare? Hospital
Compare 2013; http://www.medicare.gov/hospitalcompare/About/What-Is-HOS.html.
Accessed January 14, 2014.

13.

QualityNet. Measure Comparison (Inpatient Hospital Quality Measures). 2014;
https://www.qualitynet.org/dcs/ContentServer?c=Page&pagename=QnetPublic%2FPag
e%2FQnetTier3&cid=1138900298473. Accessed November 19, 2014.

14.

QualityNet. Measures: Hospital Value-Based Purchasing. 2014;
http://www.qualitynet.org/dcs/ContentServer?c=Page&pagename=QnetPublic%2FPage
%2FQnetTier3&cid=1228772237361. Accessed December 12, 2014.

15.

Centers for Medicare & Medicaid Services. Outcome Measures. Hospital Quality
Initiative 2014; http://www.cms.gov/Medicare/Quality-Initiatives-Patient-AssessmentInstruments/HospitalQualityInits/OutcomeMeasures.html. Accessed November 11,
2014, 2014.

16.

Spivack SB, Bernheim SM, Forman HP, Drye EE, Krumholz HM. Hospital cardiovascular
outcome measures in federal pay-for-reporting and pay-for-performance programs: a
brief overview of current efforts. Circ Cardiovasc Qual Outcomes. Sep 2014;7(5):627633.

17.

Centers for Medicare & Medicaid Services. Readmissions Reduction Program. 2014;
http://www.cms.gov/Medicare/Medicare-Fee-for-ServicePayment/AcuteInpatientPPS/Readmissions-Reduction-Program.html. Accessed
December 12, 2014, 2014.

18.

Dorsey K, Grady JN, Wang Y, et al. 2014 Measures Updates and Specifications Report:
Hospital-Level 30-Day Risk Standardized Mortality Measures. Yale New Haven Health
Services Corporation/Center for Outcomes Research & Evaluation
(YNHHSC/CORE);2014.

116

19.

Ash AS, Fienberg SE, Louis TA, Normand S-LT, Stukel TA, Utts J. Statistical Issues in
Assessing Hospital Performance. COPSS-CMS White Paper Committee;2012.

20.

Normand S-LT, Glickman ME, Gatsonis CA. Statistical Methods for Profiling Providers of
Medical Care : Issues and Applications. Journal of the American Statistical Association.
1997;92:803-814.

21.

Iezzoni LI. Risk Adjustment for Measuring Health Care Outcomes. Third Edition ed.
Chicago, IL: Health Administration Press; 2003.

22.

Krumholz HM, Brindis RG, Brush JE, et al. Standards for statistical models used for public
reporting of health outcomes: an American Heart Association Scientific Statement from
the Quality of Care and Outcomes Research Interdisciplinary Writing Group:
cosponsored by the Council on Epidemiology an. Circulation. 2006;113:456-462.

23.

Krumholz HM, Wang Y, Mattera Ja, et al. An administrative claims model suitable for
profiling hospital performance based on 30-day mortality rates among patients with
heart failure. Circulation. 2006;113:1693-1701.

24.

Krumholz HM, Wang Y, Mattera Ja, et al. An administrative claims model suitable for
profiling hospital performance based on 30-day mortality rates among patients with an
acute myocardial infarction. Circulation. 2006;113:1683-1692.

25.

Harrell FE, Lee KL, Mark DB. Multivariable progostic models: Issues in developing
models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat.
Med. 1996;15(4):361-387.

26.

Katzan IL, Spertus J, Bettger JP, et al. Risk Adjustment of Ischemic Stroke Outcomes for
Comparing Hospital Performance: A Statement for Healthcare Professionals From the
American Heart Association/American Stroke Association. Stroke. January 23, 2014
2014.

27.

Daniels MJ, Gatsonis CA. Hierarchical Generalized Linear Models in the Analysis of
Variations in Health Care Utilization. Journal of the American Statistical Association.
1999;94(445):29-42.

28.

Normand S-LT, Shahian DM. Statistical and Clinical Aspects of Hospital Outcomes
Profiling. Statistical Science. 2007;22:206-226.

117

29.

Austin PC, Alter Da, Tu JV. The use of fixed- and random-effects models for classifying
hospitals as mortality outliers: a Monte Carlo assessment. Med. Decis. Making.
2003;23:526-539.

30.

Christiansen CL, Morris CN. Improving the Statistical Approach to Health Care Provider
Profiling. Ann. Intern. Med. 1997;127(8_Part_2):764-768.

31.

Goldstein H, Spiegelhalter DJ. League Tables and Their Limitations: Statistical Issues in
Comparisons of Institutional Performance. Journal of the Royal Statistical Society.
1996;159(3):385-443.

32.

Shahian DM, Torchiana DF, Shemin RJ, Rawn JD, Normand S-LT. Massachusetts Cardiac
Surgery Report Card: Implications of Statistical Methodology. The Annals of Thoracic
Surgery. 12// 2005;80(6):2106-2113.

33.

Jones HE, Spiegelhalter DJ. The Identification of “Unusual” Health-Care Providers From a
Hierarchical Model. The American Statistician. 2011;65(3):154-163.

34.

Ieva F, Paganoni AM. Detecting and visualizing outliers in provider profiling via funnel
plots and mixed effect models. Health Care Manag. Sci. Jan 10 2014.

35.

Shahian DM, He X, Jacobs JP, et al. Issues in quality measurement: target population,
risk adjustment, and ratings. Ann. Thorac. Surg. Aug 2013;96(2):718-726.

36.

Shahian DM, Iezzoni LI, Meyer GS, Kirle L, Normand S-LT. Hospital-wide Mortality as a
Quality Metric: Conceptual and Methodological Challenges. Am. J. Med. Qual. March 1,
2012 2012;27(2):112-123.

37.

McCrum ML, Joynt KE, Orav EJ, Gawande AA, Jha AK. Mortality for publicly reported
conditions and overall hospital mortality rates. JAMA internal medicine. Jul 22
2013;173(14):1351-1357.

38.

Jha AK, Li Z, Orav EJ, Epstein AM. Care in U.S. hospitals--the Hospital Quality Alliance
program. The New England journal of medicine. 2005;353:265-274.

39.

Mackenzie SJ, Goldmann DA, Perla RJ, Parry GJ. Measuring Hospital-Wide Mortality Pitfalls and Potential. J. Healthc. Qual. 2014;00(0):0.

40.

Mohammed MA, Deeks JJ, Girling A, et al. Evidence of methodological bias in hospital
standardised mortality ratios: retrospective database study of English hospitals. BMJ.
2009-03-18 15:01:48 2009;338.
118

41.

Shahian DM, Wolf RE, Iezzoni LI, Kirle L, Normand S-LT. Variability in the measurement
of hospital-wide mortality rates. The New England journal of medicine. 2010;363:25302539.

42.

Iezzoni LI, Ash AS, Shwartz M, Daley J, Hughes JP, Mackiernan Y. Judging Hospitals by
Severity-Adjusted Mortality Rates: The Influence of the Severity-Adjustment Method.
Am. J. Public Health. 1996;86(10):1379-1387.

43.

Iezzoni LI, Shwartz M, Ash AS, Hughes JS, Daley J, Mackiernan Y. Severity Measurement
Methods and Judging Hospital Death Rates for Pneumonia. Med. Care. 1996;34(1):1128.

44.

Hogan H, Healey F, Neale G, Thomson R, Vincent C, Black N. Preventable deaths due to
problems in care in English acute hospitals: a retrospective case record review study.
BMJ Qual Saf. Sep 2012;21(9):737-745.

45.

Guru V, Tu JV, Etchells E, et al. Relationship Between Preventability of Death After
Coronary Artery Bypass Graft Surgery and All-Cause Risk-Adjusted Mortality Rates.
Circulation. 2008;117:2969-2976.

46.

Centers for Medicare & Medicaid Services (CMS). Medicare Program; Hospital
Prospective Payment System and Fiscal Year 2014 Rates. Fed. Regist. August 2, 2013
2013;78(160):50495-51040.

47.

Arnett DK. Letter to Centers for Medicare & Medicaid Services. Re: Docket No. CMS1599-P. 2013. https://www.heart.org/idc/groups/heartpublic/@wcm/@adv/documents/downloadable/ucm_453664.pdf. Accessed June 25,
2013.

48.

Fonarow GC, Alberts MJ, Broderick JP, et al. Stroke outcomes measures must be
appropriately risk adjusted to ensure quality care of patients. Stroke. 2014;45(5):15891601.

49.

Centers for Medicare & Medicaid Services. Quality Data Reporting Requirements for
Specific Providers and Suppliers; Final Rule. In: Department of Health and Human
Services, ed. Vol 78: Federal Register; 2013:50774-50906.

50.

National Stroke Association. NIH Stroke Scale. 2014;
http://www.stroke.org/site/PageServer?pagename=NIHSS.

119

51.

Kasner SE. Clinical interpretation and use of stroke scales. The Lancet Neurology.
2006;5(7):603-612.

52.

Bratzler DW, Normand SL, Wang Y, et al. An administrative claims model for profiling
hospital 30-day mortality rates for pneumonia patients. PLoS ONE. 2011;6(4):e17401.

53.

Fonarow GC, Pan W, Saver JL, et al. Comparison of 30-day mortality models for profiling
hospital performance in acute ischemic stroke with vs without adjustment for stroke
severity. JAMA. 2012;308(3):257-264.

54.

Smith EE, Shobha N, Dai D, et al. Risk score for in-hospital ischemic stroke mortality
derived and validated within the Get With the Guidelines-Stroke Program. Circulation.
Oct 12 2010;122(15):1496-1504.

55.

Nedeltchev K, Renz N, Karameshev A, et al. Predictors of early mortality after acute
ischaemic stroke. Swiss Med. Wkly. 2010;140(17-18):254-259.

56.

Fonarow GC, Saver JL, Smith EE, et al. Relationship of National Institutes of Health
Stroke Scale to 30-Day Mortality in Medicare Beneficiaries With Acute Ischemic Stroke. J
Am Heart Assoc. Feb 2012;1(1):42-50.

57.

Teale EA, Forster A, Munyombwe T, Young JB. A systematic review of case-mix
adjustment models for stroke. Clin. Rehabil. September 1, 2012 2012;26(9):771-786.

58.

Keyhani S, Cheng E, Arling G, et al. Does Inclusion of Stroke Severity in a 30-day
Mortality Model Change Standardized Mortality Rates at VA Hospitals. Circulation.
Cardiovascular quality and outcomes. 2012;5:508-513.

59.

Kurth T, Elkind MV. Comparing hospitals on stroke care: The need to account for stroke
severity. JAMA. 2012;308(3):292-294.

60.

Friese CR, Earle CC, Silber JH, Aiken LH. Hospital characteristics, clinical severity, and
outcomes for surgical oncology patients. Surgery. 5// 2010;147(5):602-609.

61.

Rosenberg AL, Hofer TP, Strachan C, Watts CM, Hayward RA. Accepting Critically Ill
Transfer Patients: Adverse Effect on a Referral Center's Outcome and Benchmark
Measures. Ann. Intern. Med. 2003;138(11):882-890.

62.

Combes A, Luyt C-E, Trouillet J-L, Chastre J, Gibert C. Adverse effect on a referral
intensive care unit's performance of accepting patients transferred from another
intensive care unit. Crit. Care Med. 2005;33(4):705-710.
120

63.

The Joint Commission. Advanced Certification for Primary Stroke Centers. 2015;
http://www.jointcommission.org/certification/primary_stroke_centers.aspx. Accessed
January 5, 2015.

64.

Kirkham JJ. A comparison of hospital performance with non-ignorable missing
covariates: An application to trauma care data. Stat. Med. 2008;27(27):5725-5744.

65.

Ryan AM, Bao Y. Profiling provider outcome quality for pay-for-performance in the
presence of missing data: a simulation approach. Health Serv. Res. Apr 2013;48(2 Pt
2):810-825.

66.

Reeves MJ, Smith EE, Fonarow GC, et al. Variation and Trends in the Documentation of
National Institutes of Health Stroke Scale (NIHSS) in GWTG-Stroke Hospitals. Submitted
to Circulation. Cardiovascular Quality and Outcomes. 2015.

67.

Rubin DB. Inference and missing data. Biometrika. December 1, 1976 1976;63(3):581592.

68.

Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychol. Methods.
2012-09-10 2002;7(2):147-177.

69.

Graham JW. Missing Data Analysis: Making It Work in the Real World. Annu. Rev.
Psychol. 2009;60(1):549-576.

70.

Altman DG, Bland JM. Missing data. BMJ. 2007;334:424.

71.

Hersh WR, Weiner MG, Embi PJ, et al. Caveats for the Use of Operational Electronic
Health Record Data in Comparative Effectiveness Research. Med. Care. August
2013;51(8 Suppl 3):S30-S37.

72.

Knol MJ, Janssen KJ, Donders AR, et al. Unpredictable bias when using the missing
indicator method or complete case analysis for missing confounder values: an empirical
example. J. Clin. Epidemiol. Jul 2010;63(7):728-736.

73.

Gorelick MH. Bias arising from missing data in predictive models. J. Clin. Epidemiol. Oct
2006;59(10):1115-1123.

74.

Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA. Bias due to missing exposure
data using complete-case analysis in the proportional hazards regression model. Stat.
Med. 2003;22(4):545-557.

121

75.

Gomes M, Gutacker N, Bojke C, Street A. Addressing missing data in patient-reported
outcome measures (PROMS) - implications for the use of PROMS for comparing provider
performance. Health Econ. 2015.

76.

Hannan EL, Kilburn H, Jr., Lindsey ML, Lewis R. Clinical versus Administrative Data Bases
for CABG Surgery: Does it Matter? Med. Care. 1992;30(10):892-907.

77.

Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand SL. Comparison of clinical and
administrative data sources for hospital coronary artery bypass graft surgery report
cards. Circulation. Mar 27 2007;115(12):1518-1527.

78.

Hannan EL, Racz MJ, Jollis JG, Peterson ED. Using Medicare Claims Data to Assess
Provider Quality for CABG Surgery: Does It Work Well Enough? Health Serv. Res.
1997;31(6):659-678.

79.

Shojania KG, Forster AJ. Hospital mortality: when failure is not a good measure of
success. CMAJ. Jul 15 2008;179(2):153-157.

80.

Nicholas LH, Dimick JB, Iwashyna TJ. Do Hospitals Alter Patient Care Effort Allocations
under Pay-for-Performance? Health Serv. Res. 2011;46(1p1):61-81.

81.

Rothberg MB, Pekow PS, Priya A, Lindenauer PK. Variation in Diagnostic Coding of
Patients With Pneumonia and Its Association With Hospital Risk-Standardized Mortality
Rates: A Cross-sectional Analysis. Ann. Intern. Med. 2014;160(6):380-388.

82.

Austin PC, Tu JV, Alter DA, Naylor CD. The Impact of Under Coding of Cardiac Severity
and Comorbid Diseases on the Accuracy of Hospital Report Cards. Med. Care.
2005;43(8):801-809.

83.

Goldman LE, Chu PW, Bacchetti P, Kruger J, Bindman A. Effect of Present-on-Admission
(POA) Reporting Accuracy on Hospital Performance Assessments Using Risk-Adjusted
Mortality. Health Serv. Res. Oct 6 2014.

84.

Reeves D, Campbell SM, Adams J, Shekelle PG, Kontopantelis E, Roland MO. Combining
multiple indicators of clinical quality: an evaluation of different analytic approaches.
Med. Care. 2007;45:489-496.

85.

Reeves MJ, Broderick JP, Frankel M, et al. The Paul Coverdell National Acute Stroke
Registry: Initial Results from Four Prototypes. Am. J. Prev. Med. 12// 2006;31(6,
Supplement 2):S202-S209.

122

86.

Michigan Department of Community Health (MDCH). Michigan Stroke Registry and
Quality Improvement Program. 2014; http://www.michigan.gov/mdch/0,1607,7-1322945_5104_5279_57683-249680--,00.html. Accessed November 26, 2014.

87.

Centers for Disease Control and Prevention (CDC). CDC State Heart Disease and Stroke
Prevention Programs. 2013; http://www.cdc.gov/DHDSP/programs/stroke_registry.htm.
Accessed November 26, 2014.

88.

American Hospital Association. AHA Annual Survey Database Fiscal Year 2013. AHA Data
Viewer 2014; http://www.ahadataviewer.com/book-cd-products/AHA-Survey/.
Accessed November 26, 2014.

89.

Singer JD. Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and
Individual Growth Models. Journal of Educational and Behavioral Statistics.
1998;24(4):323-355.

90.

Wolf PA, Abbott RD, Kannel WB. Atrial fibrillation as an independent risk factor for
stroke: the Framingham Study. Stroke. August 1, 1991 1991;22(8):983-988.

91.

Cholesterol, diastolic blood pressure, and stroke: 13 000 strokes in 450 000 people in 45
prospective cohorts. The Lancet. 12/30/ 1995;346(8991–8992):1647-1653.

92.

Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to
imputation of missing values. J. Clin. Epidemiol. Oct 2006;59(10):1087-1091.

93.

Harel O, Zhou XH. Multiple imputation: review of theory, implementation and software.
Stat. Med. Jul 20 2007;26(16):3057-3077.

94.

Janssen KJM, Donders ART, Harrell Jr FE, et al. Missing covariate data in medical
research: To impute is better than to ignore. J. Clin. Epidemiol. 7// 2010;63(7):721-727.

95.

Schafer JL. Multiple imputation: a primer. Stat. Methods Med. Res. February 1, 1999
1999;8(1):3-15.

96.

Fonarow GC, Smith EE, Reeves MJ, et al. Hospital-Level Variation in Mortality and
Rehospitalization for Medicare Beneficiaries With Acute Ischemic Stroke. Stroke. January
1, 2011 2011;42(1):159-166.

97.

Heckman JJ. Sample Selection Bias as a Specification Error. Econometrica.
1979;47(1):153-161.

123

98.

Grotzinger KM, Stuart BC, Ahern F. Assessment and Control of Nonresponse Bias in a
Survey of Medicine Use by the Elderly. Med. Care. 1994;32(10):989-1003.

99.

Clark SJ, Houle B. Validation, Replication, and Sensitivity Testing of Heckman-Type
Selection Models to Adjust Estimates of HIV Prevalence. PLoS ONE. 10/17/accepted
2014;9(11):e112563.

100.

Sales AE, Plomondon ME, Magid DJ, Spertus JA, Rumsfeld JS. Assessing response bias
from missing quality of life data: the Heckman method. Health Qual. Life Outcomes.
2004;2:49.

101.

Stolzenberg RM, Relles AD. Tools for Intuition About Sample Selection Bias and its
Correction. Am. Sociol. Rev. 1997;62(June):494-507.

102.

Winship C, Mare RD. Models for Sample Selection Bias. Annual Review of Sociology.
1992;18:327-350.

103.

Cuddeback G, Wilson E, Orme JG, Combs-Orme T. Detecting and Statistically Correcting
Sample Selection Bias. Journal of Social Service Research. 2004;30(3).

104.

SAS. The QLIM Procedure. 2015;
http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm
#etsug_qlim_sect001.htm. Accessed March 3, 2015.

105.

Drye EE, Normand S-LT, Wang Y, et al. Comparison of Hospital Risk-Standardized
Mortality Rates Calculated by Using In-Hospital and 30-Day Models: An Observational
Study with Implications for Hospital Profiling. Ann. Intern. Med. 2012;256(1):19-26.

106.

Sullivan LM, Massaro JM, D'Agostino RB, Sr. Presentation of multivariate data for clinical
use: The Framingham Study risk score functions. Stat. Med. May 30 2004;23(10):16311660.

107.

Snijders TA, Bosker RJ. Multilevel Analysis: An Introduction to Basic & Advanced
Multilevel Modeling. 2nd ed. London, UK: Sage Publishers; 2012.

108.

Austin PC, Reeves MJ. The Relationship Between the C-Statistic of a Risk-adjustment
Model and the Accuracy of Hospital Report Cards: A Monte Carlo Study. Med. Care.
2013;00:1-10.

109.

Austin PC, Reeves MJ. Effect of Provider Volume on the Accuracy of Hospital Report
Cards: A Monte Carlo Study. Circ Cardiovasc Qual Outcomes. Mar 11 2014.
124

110.

Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis
Methods. Second ed. Thousand Oaks, California: Sage Publications, Inc.; 2002.

111.

Sullivan LM, Dukes KA, Losina E. An Introduction to Hierarchical Linear Modelling. Stat.
Med. 1999;18:855-888.

112.

Silber JH, Rosenbaum PR, Brachet TJ, et al. The Hospital Compare Mortality Model and
the Volume-Outcomes Relationship. Health Serv. Res. 2010;45(5 Pt 1):1148-1167.

113.

Clark DE, Hannan EL, Raudenbush SW. Using a hierarchical model to estimate riskadjusted mortality for hospitals not included in the reference sample. Health Serv. Res.
Apr 2010;45(2):577-587.

114.

Hofer TP, Hayward RA. Identifying Poor-Quality Hospitals : Can Hospital Mortality Rates
Detect Quality Problems for Hospitals Identifying Rates Detect Can Hospital Mortality
Quality Problems for Medical Diagnoses ? Med. Care. 1996;34:737-753.

115.

Thomas JW, Hofer TP. Accuracy of Risk-Adjusted Mortality Rate As a Measure of
Hospital Quality of Care. Med. Care. 1999;37:83-92.

116.

Hofer TP, Hayward RA, Greenfield S, Wagner EH, Kaplan SH, Manning WG. The
unreliability of individual physician "report cards" for assessing the costs and quality of
care of a chronic disease. JAMA. 1999;281(22):2098-2105.

117.

Lichtman JH, Leifheit-Limson EC, Jones SB, Wang Y, Goldstein LB. 30-Day riskstandardized mortality and readmission rates after ischemic stroke in critical access
hospitals. Stroke. Oct 2012;43(10):2741-2747.

118.

Saposnik G, Jeerakathil T, Selchen D, et al. Socioeconomic status, hospital volume, and
stroke fatality in Canada. Stroke. Dec 2008;39(12):3360-3366.

119.

Ogbu UC, Slobbe LC, Arah OA, de Bruin A, Stronks K, Westert G. Hospital Stroke Volume
and Case-Fatality Revisited. Med. Care. 2010;48(2):149-156.

120.

Halm EA, Lee C, Chassin MR. Is Volume Related to Outcome in Health Care? A
Systematic Review and Methodologic Critique of the Literature. Ann. Intern. Med.
2002;137(6):511-520.

121.

Birkmeyer JD, Siewers AE, Finlayson EV, et al. Hospital Volume and Surgical Mortality in
the United States. N. Engl. J. Med. 2002;346(15):1128-1137.

125

122.

Austin PC, Anderson GM. Optimal statistical decisions for hospital report cards. Med.
Decis. Making. 2005;25:11-19.

123.

Keenan PS, Normand S-LT, Lin Z, et al. An administrative claims measure suitable for
profiling hospital performance on the basis of 30-day all-cause readmission rates among
patients with heart failure. Circulation. Cardiovascular quality and outcomes. 2008;1:2937.

124.

Krumholz HM, Lin Z, Drye EE, et al. An Administrative Claims Measure Suitable for
Profiling Hospital Performance Based on 30-Day All-Cause Readmission Rates Among
Patients with AMI. Circulation. Cardiovascular Quality and Outcomes. 2011;4:243-252.

125.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical
statistics. Stat. Med. 2006;25(24):4279-4292.

126.

Hodgson T, Burke M. On Simulation and the Teaching of Statistics. Teaching Statistics.
2000;22(3):91-96.

127.

Journal of American Health Information Management Association Staff. Word from
Washington: Sights Set on ICD-10-CM/PCS. 2015;
http://journal.ahima.org/2015/01/30/word-from-washington-sights-on-icd-10-cmpcs/.
Accessed February 2, 2015.

128.

Krell RW, Staiger DO, Dimick JB. Reliability of Surgical Outcomes for Predicting Future
Hospital Performance. Med. Care. 2014;52(6):565-571.

129.

Weinstein MC. Clinical Decision Analysis. WB Saunders Co.; 1980.

130.

DeGroot MH. Optimal Statistical Decisions. Wiley; 2004.

131.

Austin PC. A comparison of Bayesian methods for profiling hospital performance. Med.
Decis. Making. 2002;22:163-172.

132.

Austin PC. Bayes rules for optimally using Bayesian hierarchical regression models in
provider profiling to identify high-mortality hospitals. BMC Med. Res. Methodol.
2008;8:30.

133.

Bronskill SE, Normand SL, Landrum MB, Rosenheck RA. Longitudinal profiles of health
care providers. Stat. Med. Apr 30 2002;21(8):1067-1088.

126

134.

Jollis JG, Ancukiewicz M, DeLong ER, Pryor DB, Muhlbaier LH, Mark DB. Discordance of
Databases Designed for Claims Payment versus Clinical Information Systems:
Implications for Outcomes Research. Ann. Intern. Med. 1993;119(8):844-850.

135.

Hammill BG, Curtis LH, Fonarow GC, et al. Incremental value of clinical data beyond
claims data in predicting 30-day outcomes after heart failure hospitalization. Circ
Cardiovasc Qual Outcomes. Jan 1 2011;4(1):60-67.

136.

Groene O, Kristensen SR, Arah Oa, et al. Feasibility of using administrative data to
compare hospital performance in the EU. International journal for quality in health care :
journal of the International Society for Quality in Health Care / ISQua. 2014;26(S1):108115.

137.

Ding VY, Hubbard Ra, Rutter CM, Simon GE. Assessing the accuracy of profiling methods
for identifying top providers: performance of mental health care providers. Health
Services and Outcomes Research Methodology. 2012;13:1-17.

138.

Paddock SM, Adams JL, Hoces de la Guardia F. Better-than-average and worse-thanaverage hospitals may not significantly differ from average hospitals: an analysis of
Medicare Hospital Compare ratings. BMJ Qual Saf. Feb 2015;24(2):128-134.

139.

Burwell SM. Setting Value Based Payment Goals - HHS Efforts to Improve US Health
Care. N. Engl. J. Med. 2015;372(10):897-899.

140.

Lee TH. Performance Metrics as Drivers of Quality: Getting to Second Gear. Circulation.
Dec 4 2015;131:967-968.

127