'THS TH ESlS (\ (.3 () Illlllllllllllllllllllllllllllllllllllllllilllllllllllllllllll 3 1293 020586404 This is to certify that the thesis entitled LEVEL OF INFERRATER AGREEMENT IN THE DIAGNOSIS DISABLING AND NON-DISABLING CEREBRAL PALSY presented by Shariff Kamal Bishai has been accepted towards fulfillment of the requirements for Master's degree in Epidemiology Major professor Date «I [47", /00 0-7639 MS U i: an Affirmative Action/Equal Opportunity Institution LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 11/00 animus-n14 LEVEL OF INTERRATER AGREEMENT IN THE DIAGNOSIS OF DISABLING AND NON-DISABLING CEREBRAL PALSY By Shariff Kama] Bishai A THESIS Submitted to Michigan State University In partial fulfillment of the requirements For the degree of MASTER OF SCIENCE Department of Epidemiology 2000 ABSTRACT LEVEL OF INTERRATER AGREEMENT IN THE DIAGNOSIS OF DISABLING AND NON-DISABLING CEREBRAL PALSY By Shariff Kama] Bishai Introduction: Variation in the prevalence of cerebral palsy (CP), in low birthweight infants is of considerable importance in evaluating neonatal care. However, it is suspected that considerable interrater variability exists in the diagnosis of this disorder. Methods: In a regional study of infant outcomes at age two of infants <2 kg at birth, 51 subjects were selected (15 with disabling CP, 16 with non-disabling CP and 20 without CP) for a study of reliability of CP diagnosis. The screening neurologic forms (minus the final diagnosis) were reviewed by five experienced raters. Sampling fiaction was not revealed to the raters. Results: The kappa statistic for agreement by five raters with the study diagnosis was 0.65 (fair to good). The kappa for disabling CP was 0.87 (excellent), but for non- disabling CP was 0.36 (poor). Kappas between pairs of raters varied from 0.41-0.87 for any cerebral palsy for the three-way classification. Kappas for all pairwise comparisons for disabling CP vs. all other diagnoses ranged from 0.80-0.95 while for all pairwise comparisons, for any CP vs. all other diagnoses, kappas ranged from 0.35-0.92. Conclusions: The high concordance between examiners found when a disability, in addition to a neurologic impairment, was present on examination, suggests that incorporating measures of functional disability into the diagnosis of cerebral palsy would improve comparability between studies and centers. Copyright by SHARIFF KAMAL BISHAI 2000 This is dedicated to my father: Kama] Naguib Bishai he had always been there for me, then it is my turn to be there for him, now he is gone, but again he is here for me, in my greatest dreams, thoughts, and most importantly in my heart. iv ACKNOWLEDGEMENTS I would like to thank Dr. Nigel Paneth for his great help and guidance throughout the completion of my degree. Although this thesis took me longer than I had hoped, Dr. Paneth always had a positive outlook about this project. I would also like to thank my parents, Kama] and Sophie, my brother, Amir and my future bride, Natalie, for inspiring me and motivating me to finish my degree. I just have one regret - that my father will not be here to appreciate my work. But I hope he will know when I succeed with my goals and dreams that we had talked about. I would also like to thank the raters in this project; James Jetton, Dr. Peter Rosenbaum, Dr. Saroj Saigel, and Dr. Sue Broyles. James Jetton was also very helpful with the statistical analysis in my thesis. And finally, I would like to thank Dr. David Kaufman and Dr. Joseph Gardiner for being on my Thesis Committee. TABLE OF CONTENTS LIST OF TABLES .................................................................................. vii LIST OF FIGURES ................................................................................ viii INTRODUCTION .................................................................................. 1 METHODS .......................................................................................... 8 RESULTS ........................................................................................... 14 DISCUSSION ...................................................................................... 17 APPENDIX ......................................................................................... 24 BIBLIOGRAPHY ................................................................................. 27 vi LIST OF TABLES Table l: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Three-Way Classification of Cerebral Palsy (No CP, Non-Disabling CP, and Disabling CP) Table 2: Pairwise Comparisons of Weighted Kappas for Original Diagnosis and All Raters for Three-Way Classification of Cerebral Palsy (No CP, Non-Disabling CP, and Disabling CP) Table 3: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Two-Way Classification of Disabling Cerebral Palsy vs. Any Other Diagnosis Table 4: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Two-Way Classification of Any Cerebral Palsy vs. Any Other Diagnosis Table 5: Kappas by Diagnosis and Combined Totals for Kappa for the Two-Way Classification of Cerebral Palsy with Fleiss’s Kappa Agreement Scale Catagory vii LIST OF FIGURES Figure 1: Scales for judging strength of Agreement for Kappa Figure 2: Pairwise Comparisons of Kappas for Raters and the Original Diagnosis for the Three-Way Classification of CP Figure 3: Pairwise Comparisons of Kappas for Raters and the Original Diagnosis for the Two-Way Classification of CP viii INTRODUCTION Cerebral palsy (CP) is among the most common of severe disabilities of childhood.“ It is also among the most life-limiting of childhood disabilities, exceeding asthma, epilepsy, and blindness in its effects on activities of daily living (Stanley 1994). Approximately one school age child in 500 has CP, and more than 100,000 Americans less than the age of 18 have neurologic abnormalities attributed to CF. CP can result from in utero brain damage or hemorrhagic-ischemic insults, from perinatal insults, or from postnatal insults such as meningitis, trauma, or prolonged ischemia occurring after birth but before the end of infancy (Albright 1996). Low birthweight, multiple pregnancy, hyperbilirubinemia, severe brain asphyxia, and neonatal difficulties, are known to be risk factors for CP (Vining 1976). The rate of CP in infants with birthweights less than 1500 grams is 25 to 30 times higher than among full-term babies (Albright 1996). It is interesting to note, though, that the incidence of CP is not declining in spite of major developments and improvements in obstetrical care, possibly because of the survival of more low birthweight neonates. The Advisory Council of the National Institute of Neurological Disorders and Stroke estimates the total cost in the United States for CP is $5 billion per year (Kuban 1994). Definitions The definition of cerebral palsy has been under much debate for decades. An expert group met in Edinburgh in 1964 and defined cerebral palsy broadly as: ‘A disorder of movement and posture due to a defect or lesion of the immature brain.’ This group noticed that classification of CP using clinical observations reflected a wide divergence in methods and techniques used by different specialties as well as by different cultures and countries (Bax 1964). CP can also be defined as a non-progressive disorder of motion and posture due to brain insult or injury occurring in the period of early brain growth, usually during gestation or within the first year of life (Vining 1976). Considerable heterogeneity in motor dysfimction can result from injury to the brain in early life. CP is comprised of a group of syndromes which are etiologically, pathologically and clinically heterogeneous (Stanley 1994). In 1992, Mutch et a] defined cerebral palsy as an umbrella term covering a group of non-progressive, but often changing, motor impairment syndromes secondary to lesions or anomalies of the brain arising in the early stages of development. They also stated that the estimated prevalence rate of the disorder depends crucially on definitions used and decisions made regarding which conditions are included and which excluded. These methods of classification rely primarily on clinical judgment and therefore make diagnosis a subjective choice. T mology Cerebral Palsy is classified by the predominant movement abnormality: spasticity, dystonia, athetosis, chorea, ataxia, hypotonia or mixed forms, and by the extremities affected (monoplegia, diplegia, triplegia, hemiplegia, and quadriplegia). Although the brain lesion is static, the resultant movement disorder may change, becoming either better or worse. Dystonia, athetosis, and chorea are dyskinetic movement disorders whereas spasticity is considered a hypertonic disorder of movement. Spasticity has been defined as “a motor disorder characterized by a velocity-dependent increase in tonic stretch reflexes (muscle tone) with exaggerated tendon jerks, resulting from hyperexcitability of the stretch reflex, as one component of the upper motor neuron syndrome.” Approximately two thirds of patients with CP have the spastic type which typically affects the lower extremities more than the upper, and the flexors, adductors and internal rotators more than their antagonists. The Ashworth scale, rated between 1-5, can assess spasticity on which muscle tone is graded. Treatment includes physical therapy, and orthopedic surgeries to release contractures. Dystonia is described as a sustained muscle contraction that results in twisting and repetitive movements or abnormal postures. It is present in 15-25% of patients with CP and it may begin at 3 to 5 years of age or older. As with spasticity, dystonia is relatively unresponsive to oral medications. Dystonia persists through life but does not cause contractures because there is continual motion. The Athetoid form of CP was fust defined in 1871 by Hammond as “an inability to retain the fingers and toes in any position in which they might be placed, and by their continua] motion,” and more currently is defined as distal, slow writhing motions of the appendicular musculature. Twenty-five percent of CP patients have some form of athetosis. Athetosis is most often seen in affected children born at term, and is often associated with normal intellect. It is a non-progressive but evolving disorder characterized by impairment of postural reflexes, and non-rhythmical involuntary movements. Usually, there is damage to the basal ganglia (Albright 1996). Two main risk factors for athetosis are severe asphyxia at birth and hyperbilirubinemia in the newborn period causing kemicterus (an accumulation of unconjugated bilirubin in the thalamus of the brain leading to destruction of the internal capsule). Kernicterus can also cause hypertonia (sometimes hypotonia), problems with lateral gaze, delayed motor development, and small head circumference. Chorea is defined as “a state of excessive, spontaneous movements, irregularly timed, non-repetitive and abrupt in character” and is characterized by involuntary abrupt, rapid, brief, unsustained irregular movements. Patients with chorea cannot maintain voluntary contractions. The choreiform movements in CP often become apparent in the 3rd to 5'h years of life. The choreiform type of CP occurs in 25% of patients, of whom 95% have an IQ of at least 70. The 30-year survival rate in CF is approximately 87%, with better survival in those patients with athetoid CP and worse survival in those with spastic quadriplegia, epilepsy, and severe mental retardation (Albright 1996). Mirment. D_isr_1bilitv, and Handicag Affected children can be classified as having an impairment, a disability, and/or a handicap. These definitions depend also on who is using the term, and the type of study being done (i.e., financial, government, scientific, etc.). For instance, Social Security Disability Insurance (SSDI), used by the federal government, defines the term impairment as follows results from anatomical, physiological, or psychological abnormalities which are demonstrable by medically acceptable clinical and laboratory diagnostic techniques; a disability as the inability to engage in any substantial gainful activity by reason of any medically determinable physical or mental impairment or impairments which can be expected to result in death or which have lasted or can be expected to last for a continuous period of not less than 12 months; a disability as a previously established impairment or a state resulting from an impairment which keeps a person from being able to complete specific tasks (Mooney 1987). The most accepted scientific alternate definitions are those developed by Susser and followed by the World Health Organization. For impairment, a stable and persisting defect in the individual at the organic level which stems from a known or unknown molecular, cellular, physiological or structural disorder (Susser 1990). Simplified, the term impairment refers to the specific anatomical, generally organ-specific, deficit that constitutes the diagnostic entity. In CP, neurologic findings such as spasticity, dystonia, choreaform constitute the impairment of CP. Susser defines disability as “a stable and persisting physical or psychological dysfunction at the personal level, by necessity again confined to the individual; this dysfunction stems from the limitation imposed by the impairment and by the individual’s psychological reaction to it” (Susser 1990). In simpler terms, a disability impedes an individual from completing a task necessary for ordinary fimctions. The inability to run, walk, or jump, get dressed and perform other tasks one can usually do alone, constitute the disabilities associated with CP. Handicap is defined as “a persisting social dysfunction, a social role assumed by the impaired or disabled individual that is assigned by the expectations of society” (Susser 1990). Therefore, as stated above, one cannot have a handicap without having an impairment and disability. For CP, a handicap can encompass the fact that patients affected by CP cannot accomplish certain jobs requiring motor skills, and are therefore handicapped in a societal sense (e.g., a police officer unable to hold his/her weapon). All of these definitions may be applied to diseases such as cerebral palsy, but the question is what is CP —- an impairment, a disability, or a handicap? Is it possible that it is all three? Diagnosis The diagnosis of cerebral palsy ofien utilizes some type of motor screening test. A universal Sign of cerebral palsy is delay in motor milestones (Molnar 1973). Screening tests, such as the Denver Developmental Screening Test, are widely used in pediatrics, but this test has been reported to miss infants with CP when infants are tested early in life. Another screening test, the Infant Motor Screen, includes 25 test items which assess muscle tone, primitive responses, automatic reactions and asymmetry of motor skills. The child is tested at four months, eight months, one year, and at 16 months (Nickel et a1 1988). More definitive motor information can be obtained from the Motor Sub-score of the Bayley Scales of Infant Development. However, most diagnoses of CP are based on neurological examinations completed by physicians. In general, these examinations do not incorporate information about the functional abilities of the child’s motor system. Neurologic examination thus defines the impairment of CP, but not the disability. Several factors contribute to potential variation in the diagnosis of CP: 1. The brain lesion is not progressive, but since the central nervous system is developing rapidly in the first five years, the motor manifestations of the lesion may change with age. 2. CP terminology is extracted from adult neurological terms, and therefore the use of these descriptions to developing neurology and pathology are only approximate. 3. Clinicians may be trained in different traditions of motor examination (Blair 1985). 4. Motor development in normal children shows a great range of variation. 5. Signs of CP may be observed and interpreted differently by different examiners. Scrutton stated, “we do group them together [CP patients], and in an attempt to communicate about the disorder we have a number of subclassifications in common usage. They are not, nor can they ever be, an accurate description of the child’s disorder, but they will begin to mean nothing to any of us unless we have some clear agreement about their usage.” Burns et a], tried to identify CP early in high-risk children. They found that an assessment at one month failed to identify a number of the CP infants, whereas at four months there was overidentification of CP. They noticed that at eight months assessment was highly predictive (Burns 1989). The search for a good screening or diagnostic exam has been intense, and includes the Infant Motor Screen (Nickel 1989), hand-held dynamometry in motor neuron disease (Goonetilleke 1994), goniometric measurements (Stuberg 1988), as well as functional assessments of the young nervous system (Prechtl 1997). A test with high sensitivity and high specificity is probably not likely to be found for such a disease with such diverse manifestations, but these questions indicate the importance of finding a better diagnostic tool and/or assessment of children for CP, as well as the great variability in diagnoses of Cerebral Palsy. Pur ose The purpose of this study was to examine the extent to which individuals with competence in CP diagnosis differ in their classification of individual cases. We also were interested in the question of whether a more constrained definition of CP, which includes assessment of motor firnctionality of the child, makes for a more reliable diagnosis. The importance of this study lies in its potential to initiate a better, more objective diagnosis, whether by changing the definition of CP or by better education of physicians. METHODS The eligible study population consisted of newborn infants born weighing 501- 2000 grams in three central New Jersey counties of Monmouth, Ocean, and Middlesex born between September 1,1984-June 30, 1987. Because of regionalization of perinatal services in central New Jersey, three neonatal intensive care units had clinical responsibility for virtually all low birth weight infants born in the tri-county region. A total of 1,105 babies were enrolled in this study, the Neonatal Brain Hemorrhage (NBH) study, 83% of all eligible newborns in the three counties. At age two, 725 (80% of survivors) children were examined. During the neurologic screening, a nurse or nurse practitioner specially trained for this study performed the following neurologic assessments: semi-quantitative assessment of tone and of extrapyramidal movements; tendon reflexes; shoulder flexibility; primitive reflexes; cranial nerve abnormalities; goniometric measurements of knee, ankle, and hip ranges of motion. Each child had a neurologic screening form completed. The same form, except for goniometry, was completed for each child by a study child neurologist to whom children who did not clearly pass the exam were referred. Only two people, a nurse practitioner and a nurse, had the opportunity to neurologically assess these children over this three-year period. The same person trained each of the nurses in the same manner. Also, each child neurologist was given a form that had the same information to be assessed, not to mention a strict protocol to follow in the assessment of each child (see attached handout located in the appendix). Cerebral palsy in the NBH study was based on review of specific neurologic findings by the nurse and if available, the study child neurologist. Medical records and maternal interview history of motor milestones were also recorded. Children were classified as having disabling CP when, in addition to specific neurologic findings, at least one of the following conditions was met: inability to walk five steps unaided by the age of two years; a score on the Bayley Psychomotor Developmental Index that was more than 1 standard deviation below the mental-development score on the Bayley Scales of Infant Development or the Stanford-Binet Intelligence Scales for Children; motor disability requiring physical therapy; motor disorder requiring surgical intervention; or the use of braces or other physical-assistance devices (Reuss 1996). To study the reliability of CP diagnosis, 51 subjects were randomly selected from among the 725 examinees, 15 of who had originally been classified in the study as having disabling cerebral palsy, 16 of who were classified as having non-disabling cerebral palsy, and 20 as not having cerebral palsy. The five page neurologic screening form used in the study was provided (except for the final diagnosis) to five raters located in three different countries. These five raters included two general pediatricians, one pediatrician/neonatologist, one pediatrician trained in child development and neurology, and one data analyst with interest in the subject matter and familiarity with the NBH database. All raters had experience with the diagnosis of CP. Raters were asked to classify the child as having disabling cerebral palsy, non-disabling cerebral palsy or as not having cerebral palsy, but no information was offered other than what was recorded on the nurses screening form. In a study of reliability of arteriograrn readings, Egglin and Feinstein (1996) agreed that the order in which the subjects are presented may influence the diagnosis. They concluded that “context bias” effect operates in the evaluation of subjectively interpreted tests (Egglin 1996). In our study, all raters received the 51 neurologic screening forms in the same random order. Statistical analysis used for assessing agreement was the Kappa statistic, as described by Cohen, which provides a chance-corrected measure of agreement for dichotomous or polychotomous choices. The kappa statistic is defined as follows: 1. Utilizing a categorical variable, it is possible to measure the reproducibility between surveys. PO-Pt’ 1"“pe K = and p0 = observed probability of concordance between the two surveys pe = expected probability of concordance between the two surveys = a,b,- where a,-, b,- are the marginal probabilities for the ith category in the c X c contingency table relating response at the two surveys. 2. Furthermore, / l 2 20 = _————X — .b- - b' 6K N(1_pe)2 {pe +p€ i=1 [a1 [(01 + I)]} To test the one-sided hypothesis H0: K = 0 versus H1: K > 0, use the test statistic K se(r< ) which follows an N(0,1) distribution under H0. 3. Reject H0 at level ifz > z] _ and accept Ho otherwise. 4. The exact p-value is given by p = 1 — (z) (Rosner 1995). 10 If the number of raters increases greater than two and there are more than two possible outcomes for the raters, the kappa is defined as follows: Let xi]- be the number or ratings on subject i, i = 1,...., n, into categoryj,j = l,...., k. Define E]. as the overall proportion of ratings in category j, (7]. = 1 — E, and let 1;}. be the kappa statistic for k = 2 when category j is compared with the amalgam of all other categories. Kappa is (Landis and Koch 1977): In the case where the number of raters per subject 2 jxij is a constant m for all i, Fleiss, Nee, and Landis ( 1979) derived the following formulas for the approximate standard errors. The standard error for testing If]. against 0 is: .. 2 s. = ’— ’ nm(m—l) and the standard error for testing if is: 9!] J—TM/(ijqj) -Zp,-q,- (61,- p1) :ijqu— nm(m— F our sets of Kappa statistics were generated for each paired contrast: 1. Classification of the data into three categories — no CP, non-disabling CP and disabling CP (unweighted Kappa). 2. Classification as above, but with the distance between disabling and non- disabling CP set at one half the distance from non-disabling CF to no CP (weighted Kappa). This is explained further below. 11 3. Classification of the data into two categories — CP and no CP. 4. Classification of the data into two categories — disabling CP and other. In order to lower the variation of the three-way comparison of diagnoses, the kappas were weighted. Weighting of the kappas made it more of a difference for a rater to choose Disabling CP instead of No CP than it would be if the rater to choose Non- Disabling CP instead of No CP. The weighting scale used in this study was 1.0, 0.5, 0.0. What this means is that a weight of 1 indicates perfect agreement. A weight of 0.5 means that the raters are in one-half agreement with each other. This happens when the raters are one category apart (i.e. one rater diagnoses Non-Disabling CP and the other diagnoses No CP). Finally, they are in complete disagreement when the weight is zero, which happens only when they are two apart (i.e. one rater diagnoses Disabling CP and the other diagnoses No CP). The series of classifications by each of the reviewers was compared to the original diagnosis in the data set and with the classifications made by all other reviewers. Different groups have interpreted the meaning of Kappa statistics as seen in Figure 1 below. We follow F liess in viewing Kappas above 0.75 as excellent, below 0.40 as poor, and from 0.40 to 0.75 as fair to good (Fleiss 1981). The findings were tabulated using Stata software (version 5.0, Stata, College Station, Texas) to give kappas between raters as well as to the original diagnosis for the two- and three-way classification of diagnoses. Kappas for the diagnosis categories (i.e. Disabling CP, Non-Disabling CP, No CP) were also calculated. The ranges for kappa are as shown in Figure 1. 12 Figure l Landis K APP A 81' Fleiss 1 .0 Koch EXCELLENT ALM OST PERFECT 0 .8 SUBSTAN'II AL 0 .6 FAIR TO GOOD MODERATE 0.4 FAIR 0.2 SLIGHT POOR 0-0 5E I I POOR , -'1.0 Figure 1: Scales for judging strength of Agreement for Kappa (Reprinted from Seigel DG, Podgor MJ, Ramaley NA. Acceptable values of Kappa for comparison of two groups. Am J Epidemiology. 1992; 135: 571-578.) Reprinted in Paneth 1993. Finally, pairwise comparisons of two-way classifications were calculated to assess the variability between raters when one diagnosis was made vs. any of the other diagnosis. 13 RESULTS Table 1 shows the unweighted kappas for the pairwise comparisons of the raters reviewing the neurologic screening forms and the original diagnosis for the three-way classification of not having cerebral palsy, Having Disabling CP, or Having Non- Disabling CP. Their kappas ranged from 0.4138 to 0.8750 (all p’s < 0.001). Weighting the kappas, as explained in the methods section, increased the agreement between raters. Table 2 shows the weighted kappas for the pairwise comparisons of the raters reviewing the neurologic screening forms and the original diagnosis for the three-way classification. Their kappas ranged from 0.6222 to 0.9144 (all p’s < 0.001). Table 1 Original Rater 1 Rater 2 Rater 3 Rater 4 Rater 5 Original Rater 1 0.7002 Rater 2 0.4138 0.5862 Rater 3 0.4536 0.6712 0.6531 Rater 4 0.6998 0.8750 0.5912 0.7400 Rater 5 0.5463 0.7119 0.7100 0.7291 0.7767 Table l: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Three-Way Classification of Cerebral Palsy (No CP, Non-Disabling CP, and Disabling CP) Table 2 Original Rater l Rater 2 Rater 3 Rater 4 Rater 5 Original Rater 1 0.7871 Rater 2 0.5598 0.6752 Rater 3 0.6222 0.7880 0.7476 Rater 4 0.7833 0.9144 0.7119 0.8274 Rater 5 0.6551 0.7844 0.7924 0.8238 0.8238 Table 2: Pairwise Comparisons of Weighted Kappas for Original Diagnosis and All Raters for Three-Way Classification of Cerebral Palsy (No CP, Non-Disabling CP, and Disabling CP) Upon changing the kappas to a two-way classification, the kappas for Disabling CP vs. any other diagnoses shows very high kappas consistently in the “excellent.” range. 14 Table 3 shows these data. When kappas were tabulated for Any CP vs. any other diagnoses the kappas fell considerably. The range of these kappas was 0.3517 to 0.9215 as Shown in Table 4. Table 3 Original Rater l Rater 2 Rater 3 Rater 4 Rater 5 Original Rater 1 .9537 Rater 2 .8496 .8046 Rater 3 .8111 .8610 .8496 Rater 4 .9518 .9057 .8970 .8555 Rater 5 .8035 .8561 .9470 .9017 .8490 Table 3: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Two-Way Classification of Disabling Cerebral Palsy vs. Any Other Diagnosis Table 4 Original Rater l Rater 2 Rater 3 Rater 4 Rater 5 Original Rater 1 .6456 Rater 2 .3517 .5721 Rater 3 .4711 .7264 .6617 Rater 4 .6456 .9215 .5721 .8046 Rater 5 .5392 .7260 .6731 .7606 .8043 Table 4: Pairwise Comparisons of Kappas for Original Diagnosis and All Raters for Two-Way Classification of Any Cerebral Palsy vs. Any Other Diagnosis Figure 2 shows a scatter plot of both the unweighted and the weighted kappas from the readers and the original diagnosis for the three-way classification. Agreement falls between the ranges of ‘moderate’ and ‘almost perfect’ using the Landis and Koch Kappa Agreement scale and between the ranges of ‘fair to good’ and ‘excellent’ in the Fleiss scale (Landis 1977) (F leiss 1981). Figure 3 also in the appendix shows a scatter plot graph of the pairwise comparison graph for the two-way classification of cerebral palsy. Kappas for Disabling CP vs. all other diagnoses were higher than those for Any CP vs. all other diagnoses as 15 can be seen in the graph. It is important to note that raters 1 and 4 had the highest interrater reliability. They are located at the same institution although they are of different professional backgrounds. Raters 1 and 2 have lower agreement although trained in the same professional discipline, but presently reside in different countries. When agreement was assessed by diagnosis for the two-way classification of CP, given the opportunity to classify a child with “Disabling” CP versus all other diagnoses, the average kappa was 0.8729. When children were classified as no-CP versus all other diagnoses, the average kappa was 0.6531. Finally, when a child was diagnosed as having “Non-Disabling” CP versus all other diagnoses, the average kappa fell to 0.3661. Table 5 shows these results. Table 5 Outcome Kappa Z Pr > Z Fleiss Category Norma] vs. all 0.6531 18.06 0.0000 Fair to Good other diagnosis N on-Disabling 0.3661 10.13 0.0000 Fair to Good Cvaafl other diagnosis Disabling CP 0.8729 24.14 0.0000 Excellent vs. all other diagnosis Combined 0.6541 24.64 0.0000 Fair to Good Totals Table 5: Kappas by Diagnosis and Combined Totals for Kappa for the Two-Way Classification of Cerebral Palsy with Fleiss’s Kappa Agreement Scale Category 16 DISCUSSION With as many as one in every 500 school aged children having cerebral palsy, it is important to be able to diagnose children correctly. This study points the way towards a definition of cerebral palsy that may offer a clearer and more unified definition for the neurologist, pediatrician, or researcher. We found, for CP overall, a range of kappas from 0.4138 to 0.9144 between pairs of readers. However, the distinction between “Disabling” and “Non-Disabling” CP allows researchers to better understand the source of CP misclassification. In this study, we found that with the addition of “Disabling” the kappa made it to the “excellent” range using the Fleiss scale for kappa agreement in Figure 1. Figure 3 shows the most compelling evidence of the value of incorporating the concept of “Disabling” in the definition of CP. Interrater agreement increased because of the inclusion of this term. As stated in the results, location of the rater seems to have more of an effect on diagnosis than does professional discipline and training. Other studies, such as the study conducted by Badawi et a], have also tried to assess the interrater reliability of cerebral palsy. In their study, they looked to see if the definition for cerebral palsy was the same across different cerebral palsy registries. Badawi et a] aimed was to measure the trends in prevalence over time as well as to make prevalence comparisons between registers in different populations. As they noted, many syndromes and conditions are associated with CP. They postulated possible inclusion/exclusion criteria for CP registries with a clear coding system. Conditions that may be excluded are: hypotonia, metabolic disorders, syndromes with vascular defects, and encephaloceles. Conditions that should always be excluded are: neurodegenerative conditions, neuromuscular disorders, neural tube defects of the spine and tumors. Their l7 suggestions intended to improve interobserver agreement in applying the term CP for the purposes of registers. They state, “it is an essential objective for a clear, valid, stable, and reproducible definition of CP in order to make scientific studies comparable” (Badawi 1998). Having a working definition of CP is so important for not only scientific studies, but also for the affected individuals to understand their diagnosis, and also for the patient in a societal role. However, classifying cerebral palsy as an impairment, a disability, and/or a handicap is also a problem. By definition, since cerebral palsy stems from an organic, cellular core, it is definitely an impairment. Next, it could be a disability depending which definition is used. If the definition of Susser is used, which the World Health Organization for its definition later modified, then children with CP who are unable to complete certain tasks would have an underlying disability (Susser 1990). In contrast, if the SSDI criteria are used, then a child with CP would not have a disability because, by the SSDI definition, a disability would prevent the child from working and making money. Obviously, a young child with CP or without CP will not be working; so no child could possible have a disability (Mooney 1987). Finally, it is unlikely that during the infant and toddler years that the child can by definition have a handicap because the child cannot have a societal role. Children at this age have little impact on society; therefore it is possible for the child to be impaired, or even disabled, but not handicapped. This does not mean that later on, when the individual becomes an adolescent, that CP cannot become a handicap. Therefore, the age of the subject with CP can change the complexion of any study that defines the impairment, disability, or handicap. These different choices of definitions add yet another dilemma in classifying these children. In 18 a scientific study, the Susser definition should be utilized. Definitions for impairments, disabilities, and handicaps are essential for common results across studies. Again, as Badawi states, comparing different definitions leads to conclusions that can yield no scientific relevance. Interobserver reliability is important in order to fully understand the variation in the diagnosis of cerebral palsy, but work in this area has been limited. Blair and Stanley (1985) found about 50% agreement (over-a1] observed probability of agreement) when six clinicians were given the chance to diagnose a child with cerebral palsy. In that study, the clinicians had the opportunity to physically assess the child. On two occasions, all clinicians responsible for diagnosing cases for the Western Australian CP Register assessed children seen for neurologic screening on two occasions twelve months apart. The six clinicians saw the children within one hour of each other. The children were categorized by motor type and severity type. The kappa agreement was 0.10 with probability of agreement 0.40. One year later, kappa for motor type increased to 0.35 with a probability of agreement of 0.50 when the children were seen the second time. The severity type did not change much over one year with probability of agreement increasing from 0.62 to 0.71, and the kappa increasing minimally from 0.38 to 0.40. In a similar study conducted by Gowland et a1, intraclass correlation coefficients for reliability were assessed, which varied from 0.84 to 0.92. Slightly different from kappa, an intraclass correlation coefficient is used to determine the strength of the relationship between two variables. Correlations may be positive or negative. The highest possible value of a correlation is 1.0, but rarely approach the perfect level. Interestingly, they used a new criterion for their diagnoses (MSU Course- KIN 871 19 2000). Utilizing a gross motor performance measure (GMPM) which evaluates the quality of movement of children with CP and a gross motor function measure (GMPM) which has 88 items in five dimensions (lying and rolling; sitting; crawling and kneeling; standing; and walking, running, and jumping) they noticed that it was much easier to diagnose children with CP. Two problems stand out with this study. The mean age of the children was 4.6 years of age, and during the assessment of children, the study diligently tried to reduce variability by keeping certain factors consistent. In both instances, diagnoses of CP would be much easier than in a random setting and with a younger child (Gowland 1995). Utilizing the GMPM and the GMFM, Palisano et al found that by using this scale, interrater reliability was 0.55 for children less than 2 years of age (N=37) and 0.75 for children 2-12 years of age (N=40). Utilizing a tiered functional level system within the GMPM and GMFM, reliability was 0.88 for Level 1 (roughly correlating with non- disabling CP in our study) whereas reliability fell to 0.44 for both Level 4 and 5 (roughly correlating with disabling CP in our study), thus showing that as the disease manifested in a more disabling manner, agreement was more difficult amongst observers for children less than 2 years of age. From ages 2-12, agreement for Level 1 was 0.67 and 0.89 for Level 5 (Palisano 1997). The trend for diagnosis through most of these studies is that the younger the child the more difficult it is to diagnose disabling CP, but as the child gets older and develops, it becomes easier. For example, children at a younger age may be uncoordinated or have some manifestations of dystonia or other symptoms of the different types of CP making it appear similar to disabling CP when in reality it is not disabling CP. As the child grows, 20 though, those with disabled CP will not outgrow their disability making it easier to identify and classify them at an older age. On the other hand, as the child gets older it becomes ever more difficult to assess non-disabling CP. For example, as children develop, they may outgrow their disabilities and/or some of the more disabling symptoms of their CP, making themselves appear more like a child without CP. Badawi et a1 realized this finding and used it in their study, choosing children under the age of five. This age was chosen because most cases of CP will become evident by age five, allowing signs that are going to resolve time to do so, and allowing many progressive symptoms of CF to become apparent (Badawi 1998). Returning to our study, it is important to note that the five pediatricians rating study children are from different countries. We conclude that it is possible to assess children with disabling CP with higher confidence than those with non-disabling CP. The kappas for children with disabling CP versus no-CP were 0.54 to 0.88, in the fair to excellent range, utilizing F leiss. Adding a measure of gross motor behavior is a step in the right direction in clarifying the definition of CP. Boyce et a] produced these measures utilizing two simple features: function and performance. Gross motor behavior encompasses these two main features. Gross motor function describes the achievement of particular activities, for example, sitting independently for 10 seconds. Gross motor performance described the quality of a motor activity such as postural alignment or stability while sitting. It is possible for raters of different disciplines to study a child with a simpler scale by utilizing conceptual issues including observational context, static and dynamic quality, scaling and numerous methodological issues. The Gross Motor Performance Measure assesses 2] alignment, coordination, dissociated movement, stability, and weight shift. The scale can be used across most cohorts because it is a collaborative multicenter and multidisciplinary approach which used standard methodological steps in instrument development, and most importantly, the use of consensual methods with therapists and experts. The culmination of this work is a new observational instrument for the evaluation of gross motor performance, or quality of movement, in children with CP (Boyce 1991). As stated by Palisano et a], this classification system has application for clinical practice, research, teaching, and administration. In this study, the Neonatal Brain Hemorrhage data may possibly overstate the level of disagreement among raters, as compared to CP derived from a cohort of infants born at term. Low birthweight infants often have minor motor and/or tonal abnormalities, which can be confused with mild CP, increasing the possibility of misclassification by the raters. On the other hand, because the incidence and prevalence of CP in this study population is high, the diagnosis of CP might have been easier for the raters, therefore increasing the agreement and kappa between raters. It is also important to note the purity of the data collection. As stated before, only two raters were used in this study for the original neurologic screening. The same person trained the original raters identically. Furthermore, these raters had a strict protocol to follow in order to rate these children allowing for definitive and objective diagnoses of CP for this study population. Misclassification of CP can produce many disadvantages for the child and family. Diagnosis and management may be delayed, or excessive anxiety induced. The results of this study suggest that the reliability of the diagnosis of CP is substantially increased by 22 relying not just on a measure of impairment, that is, specific findings on neurologic examination, but also on evidence of deficits in motor function that interfere with a child’s life. Such deficits include problems with walking, climbing, jumping, ascending stairs (all assessed on the Bayley Motor Exam), and are also reflected in the need for assistive devices and corrective surgery. Presence of these deficits indicates that the child has not only an impairment but a disability also. Utilizing such a tool as the gross motor performance measure may make diagnosing CP more of an objective process, rather than a subjective one. It is important to point out that little work has been done in assessing observer variability in the diagnosis of CP and we encourage clinicians and investigators, such as Gowland et a], to examine the reliability of their diagnoses, and the role of background factors, including motor functioning, in determining reliability of the diagnosis of cerebral palsy. 23 is) APPENDIX Figure 2: Pairwise Comparisons of Kappas for Raters and the Original Diagnosis for the Three-Way Classification of CP Figure 3: Pairwise Comparisons of Kappas for Raters and the Original Diagnosis for the Two-Way Classification of CP NBH Study: Screening Neurological Examination Coding Sheet 24 m0 mo 533E320 @36th 2: .50 mmmonwma 353.5 2: can Bound com magnum mo mnemEmEoU 8338; ”m 0.53". 88x usages IT Sang IOI m .n>¢ n.n>n «.2; v.2,“ ”.23. “.2; v.9. ”.2; «a: “9.2.0 «.396 ”.2125 «9.25 25.25 E..._m<_¢<>.¢m»<¢mu»z. «on. 9:35. N mmDOE v.0 N6 «6 v... 5.0 Vddw 25 no mo :ocmoEmmw—U €3.93. 2: 5m flmocwma EEwtO 05 EB Begum com 9&an mo m:0mtmmEoU omgbmm ”m 239m 030:qu .850 .2 .2, no >5 .2 8&3. Icl 28:85 350 .2 .2, no 2.385 .2 «figs. LT n.2,. n.2,.” v.9... rte; v.2,“ n.2,“ n.2,. v.9. n.2,. «.2; 9995 $9.95 «9.95 «.996 3.125 Kg! 5) \/ \ /.\ \2 \/ \ / .\ / \ §<>/ V _>.—._..=m<_m<> mmhgmmkg KO“. m