it PI 2007 This is to certify that the thesis entitled TEXT MINING INVESTIGATION OF SCALE ASSESSMENT WITHIN CLINICAL TRIALS presented by Allison Renee Mentele has been accepted towards fulfillment of the requirements for the MS. degree in Epidemiology W’ ffiajor Professor’s Signature 5 /2 2 1L0 6 Date MSU is an Affinnative Action/Equal Opportunity Institution LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE out AUG 2 9 2009 DATE DUE DATE DUE 2/05 p:/CIRC/DateDue.indd-p.1 I III-Til; p I .1 ‘l‘ TEXT MINING INVESTIGATION OF SCALE ASSESSMENT WITHIN CLINICAL TRIALS By Allison Renee Mentele A THESIS Submitted to Michigan State University In partial fulfillment of the requirements For the degree of MASTER OF SCIENCE Department of Epidemiology 2006 ABSTRACT TEXT MINING INVESTIGATION OF SCALE ASSESSMENT WITHIN CLINICAL TRIALS By Allison Renee Mentele Many diagnostic assessments use scales to quantify patients’ status. Text mining was used to obtain information from outside the assessment capability of the scales. SAS Enterprise Miner was used to investigate the actual words written by physicians during a course of sequential clinical trials. The concepts within the texts were extracted and classified based on a value assignment within domains outlined by clinicians. The text classification was correlated to the scales at each visit to investigate the relationship. The text classification corresponded strongly to a scale change, especially within specific scales. The classification of a random subset of documents was given to two clinical experts and four non-experts. The experts and the majority of non-experts agreed with the program concept and value assignment similarly. The correspondence between groups was lowest in the experts for concept assignment. However, the experts had a higher agreement than the non-experts in the value assignment. Since the experts had a lower agreement in concept classification than the majority of four non-experts, this process can offer an objective insight into the assessment of patients’ status. A manual investigation would be too time-consuming given the volume of documents analyzed. By implementing this text mining process, the main ideas available are understood quickly. Furthermore, the process outlined within this paper allows for extraction of less prominent ideas within the given documents. i I O P.--l...|l" I. ..’..l’vll 'u- IIIEP ‘0 - I I‘I-Il D) Acknowledgements I would like to take the time to thank the following people for helping me complete this work. First, I would like to acknowledge my advisor Dr. John B. Kaneene who made this project possible and who guided me through this with numerous suggestions and critiques. Thank you for always turning things back to me so quickly and having the time to answer my questions. Next, I would like to thank my other committee members Dr. Joseph Gardiner and Dr. David Todem for all the time, advice and knowledge they have given me throughout this project. Also, I would like to acknowledge that I was part of a team working on this project. Our team consisted of five technology experts, a senior researcher and a team manager. Thank you team for all your help. Finally, I would like to acknowledge the faculty and staff in the Epidemiology department, for your patience and guidance through all of the obstacles faced when completing this thesis. There are many other family members and friends who supported me through this without whom I would not have gotten this far. Thanks to all of you for your support and encouragement. iii TABLE OF CONTENTS: LIST OF TABLES ................................................................................ V LIST OF FIGURES .............................................................................. VI ABBREVIATIONS ............................................................................. VH BACKGROUND .................................................................................. 1 LITATURE REVIEW ............................................................................. 2 HYPOTHESIS & OBJECTIVES ............................................................... 9 INTRODUCTION ................................................................................ 10 METHODS ........................................................................................ 13 DATA PROCESSING ........................................................................... 13 SAS TEXT MINING BAG OF WORDS METHODOLOGY .................................... 14 CONFIDENTIALITY OF DATA SOURCE ....................................................... l6 TEXT MINING PROCESS ...................................................................... 16 ITERATIVE CLUSTERING ..................................................................... 18 MEASURE OF CONSISTENCY OF CLASSIFICATION ......................................... 19 RESULTS ......................................................................................... 21 OVERALL RESULTS ........................................................................... 21 RELATING SCALES TO TEXT ................................................................. 23 MEASURE OF CLASSIFICATION RESULTS ................................................... 25 DISCUSSION .................................................................................... 29 RELATING SCALES TO TEXT ................................................................. 29 MEASURE OF CLASSIFICATION RESULTS .................................................... 29 STRENGTHS AND WEAKNESSES ........................................................ 32 CONCLUSIONS AND FUTURE RESEARCH ............................................ 33 TABLES AND FIGURES ..................................................................... 34 ENDNOTES ..................................................................................... 47 BIBLIOGRAPHY .............................................................................. 50 iv List of Tables Table 1- Comparisons of sentence value classification ........................................ 35 Table 2- Comparisons of classification of sentences to Taxonomy Concepts ............... 36 Table 3- Comparisons of classification of sentences to Taxonomy and Value by the Non-Expert group with the ICM and among themselves ............................. 37 Table 4- Comparisons of classification of sentences to Domain and Value by the Expert Group (2 members) and ICM ......................................................... 39 Table 5- Total Scale Ratings vs. Total Value assignment 1 and Value assignment 2 Comments ............................................................................................ 40 Table 6- Summation Table of Clusters from Summary v. Baseline Scale Scores .......... 41 Table 7- Compilation of Visit Concept & Scores ............................................... 42 List of Figures Figure 1- SAS Text Mining Process .............................................................. 43 Figure 2- Text Mining Process Overview with Integration of Taxonomy ................... 44 Figure 3- IC Technique Applied to this Study ................................................... 45 Figure 4- IC Technique Applied to this Study ................................................... 46 vi Abbreviations ICM — Iterative Clustering Methodology ICD-9- International Classification of Diseases Ninth Revision Clinical Modification DRG — Diagnosis Related Group SAS EM — SAS Enterprise Miner SVD - Single Value Decomposition vii Background Typically, a research project will produce, as a by-product, large amounts of written notes and material on the subject of interest. These notes can accumulate to a large dataset; and investigation is tedious and difficult. Text mining is a process to investigate these unstructured texts (notes) without having to manually read each one. This process is a relatively new and not yet widely used in the biomedical sciences. The purpose is to extract meaningful and interesting concepts from a dataset consisting of words. The software groups the documents together based on vocabulary fiequencies; and the resulting groups are known as “clusters.” A North Carolina based software company, SAS, is a prominent developer of text mining applications. SAS has developed Enterprise Miner”, a text mining software package that was used for this project. SAS defines text mining as “. .. a process that employs a set of algorithms for converting unstructured text into structured data objects and the quantitative methods used to analyze these data objects.”(1). Text mining enables qualitative data to become quantitative with pre-processing manipulation. The overall idea is that the researcher can pull and group ideas from a source without any previous knowledge of the context. This is where the process of search engines and text mining differ; people pull documents based on an idea of what is within its content, text mining works without this knowledge. For instance, text mining can be employed to search published articles, where the information is known to an extent; however, the correlation between the information contained in different articles is unknown. Literature Review There are many possibilities for the application of text mining. “Text-mining methods include analyzing associations and trends between categories of entities, such as correlations between names of researchers and research topics, genes and gene products, drug and compound effects and disease indications, and so on.”(2) One highly pertinent application of text mining within the biomedical sciences is searching large databases. By using this application it is possible to investigate large sources of information quickly for specific concepts within a general topic. The documents of a search are pulled based on a search engine approach but are then grouped based on the sub-concepts involved in the documents. Of the literature examined, three articles engaged the application of searching large databases with text mining and the resulting implications for researchers. The first paper applied text mining in the development of new drugs. TAKMI, an IBM developed software was used. IBM developed it for the purpose of clustering documents in large databases (3). The researchers involved in its development created large lists of pertinent biological nouns. These lists were used to pull topics from the databases (4). Often in biomedical literature, biological entities have many different literary forms. The combination of capital letters & abbreviations causes confusion within the program as they are identified as different nouns. By incorporating this list, these variations were streamlined. This product demonstrated its ability in a remarkable way. Researchers searched published papers for biological interactions to develop new treatment sites for the treatment of diseases. This paper used a search of Medline to illustrate the methodology. During this research, Medline contained 330,000 biomedical abstracts. There were fifty-four papers found to contain concepts dealing with the gene, AMLl. Leukemia was the most frequent association within these papers at seventeen papers out of the fifiy-four (5). Since AMLl has an established association with leukemia from previous research, the methodology is shown to work. Other potential treatment sites were then investigated with a similar methodology. A general search for leukemia abstracts produced 1,051 papers. Within these groups signaling proteins STYKc and Terc were commonly mentioned. Phenotypes commonly associated with leukemia were HMG-COA lyase deficiency, hepatic lipase deficiency, and Miller-Dicker syndrome (6). These phenotypes share SAM and HATPase_c within their expression. The TAKMI software plots the phenotypes (vertical) and the signaling proteins (horizontal). Areas that indicate potential new treatment sites, such as HATPase_c, are interactive; and selecting these areas in the graph pulls the documents where these two entities are mentioned (7). Through an investigation of these particular documents a possible new drug interaction Site, and therefore treatment, could be developed. By employing such a process, a researcher could save time by narrowing down the number of papers of interest. Any search engine could pull the one-thousand papers on leukemia. By employing this software, however, a researcher could quickly reduce the number of papers of interest as well as focusing on a particular aspect of leukemia. This paper created a new way to examine the current research quickly and efficiently for alternate avenues of treatment capabilities. Another use of text mining is in the monitoring of public domain data to maintain an up-to-date surveillance of someone or some thing’s activities. This can be used to examine new research of particular people and of competitor’s public information. Text mining was applied in many different monitoring areas to obtain a clear idea of other peoples’ advancements. Specifically these researchers monitored: other researchers and institutions for patents, finding papers to investigate hypothesis on biological entities; and integrating structured data with unstructured data to obtain a more complete picture of positive and negative treatment effects (8). IBM developed new sofiware BioTeks especially intended for the biomedical community in which Medline, medical records, and patents can be searched and analyzed through a system which identifies biological entities such as genes, proteins, drugs, et a] through a compiled biomedical dictionary (9). The creators of this software also compiled a series of synonym lists that included domains such as patents, bioinformatics and medical information (10). This was then used to extract needed information from large databases. A significant business application of text mining is that it will allow organizations to monitor competitors’ websites for changes and developments. Competing researchers offer a similar circumstance, in which researchers around the world can investigate the current research topics of many experts in a certain field. Researchers mined ten college websites from a potential undergraduate applicant’s perspective based on the information about the university, vision statement, current news, and current research occurring (11). The software package TextAnalyst was used. The resulting clusters were: facility/ school, research/staff/ student, global, program/resource/tech/society and industry. With a graphical display of the schools’ internal (faculty/staff/resources) vs. external (research/student/industry) clusters it was possible to compare these universities to one another based on a multitude of criteria (12). This would be useful to determine where, and by whom similar projects in a particular field are being undertaken. The above papers incorporated the idea of pulling information from large databases. As aforementioned, text mining can incorporate structured data with unstructured data. If such data is available, it allows the researcher to gain more perspectives and the research to become more reliable. The following section of articles deals with the applications of text mining in healthcare. Text mining was used to investigate doctors’ prescription habits and the implications for their patients (13). A patient’s prescriptions were combined to form a string, a sentence consisting of all prescriptions for one person. These strings were mined to indicate a proper diagnosis based on medication intake. The software was applied to pull strings that contained a treatment for diabetes (14). This process indicated that a statistically significant difference in the diagnosis of diabetes (as defined by diagnosis within the medical record) and persons receiving treatment. Not all patients who were receiving treatment for diabetes had a diagnosis within his or her medical record (15). Thus, the unidentified diabetes patients were at risk for more complications since their medical record failed to contain a proper diagnosis. Contained within the same dataset were implications in the prescription habits of antibacterial medications. Vancomycin use is strongly advised for only ‘the treatment of serious infection with beta-lactam- resistant organisms, or for treatment of infection in patients with life-threatening allergy to beta-lactam antimicrobials.’ (16). The data was available to match Vancomycin prescription to the white cell counts of the patients. Since many of the people receiving Vancomycin prescriptions had white blood counts less than ten, the presiding physicians were making improper choices in their drug distributions (17). This provided a clear indication that prescription abuse occurred. Yet another application of text mining helped to enhance hospital ranking techniques. Creating prediction models on hospitals assumes uniform entry of secondary International Classification of Diseases Ninth Revision Clinical Modification (ICD-9- CM) codes from healthcare workers. This assumption is impossible to fulfill as a multitude of different people enter this code throughout the different hospitals, and each can have diverse training in such activities (18). Researchers looked at the ICD-9 codes to see how hospitals assess a patient’s risk level (19). This is used for hospital rank (prediction) that can lead to hospitals being under-ranked for under reporting risks. The secondary ICD-9 codes for heart Diagnosis Related Group (DRG) were treated as text and mined. These numbers provide an indication of a patient’s risk. Then different hospitals were compared for differences in their risk factor identification. There was a significant difference in the hospitals’ reporting of these risk factors (p<0.000l) (20). The less number of risk factors reported, the lower quality rating the hospital receives. This is because their patients will have a higher mortality because they will be improperly described as low risk when they are high risk (21). These researchers concluded that there is a significant difference in reporting risk factor among hospitals (22). The low ranking hospitals can then establish training programs for the risk factor reporting staff, to more accurately reflect the hospital’s actual ranking. Text mining also has relevance within the healthcare profession as a tool to assist in patient diagnosis. Psychiatric diagnosis is often an uncertain process as many disorders contain similar symptoms. A certain diagnosis in psychiatry is much harder than in many other medical arenas, as most of the diagnostic information comes from the patient (23). Researchers obtained data from patients’ medical records on the particular symptoms they were experiencing, and social and behavioral issues among common habits such as smoking, sexual interactions, and family concerns (24). A model was built based on the four classes of psychiatric disorders from ICD-lO (organic, psychoactive substances, schizophrenia, and affective mood disturbances) (25). Two hundred medical records were classified by a psychiatrist into the above four categories. These medical records were then clustered using text mining to obtain the most prominent ideas within each group’s medical records. The resulting clusters became the training criteria for new patients’ classification (26). The software was able to diagnose the resulting dataset with greater than eight percent accuracy based on post-analysis expert diagnosis (27). This application would allow for quick preliminary diagnosis of patients. Any area where there is recorded data in literary form is a prospective site for text mining. For instance, the potential for disease surveillance through the monitoring of nurse call centers is possible through applying text mining. In this paper, data was used fiom the 1993 Milwaukee outbreak, which was the largest water-borne outbreak in the United States (28). At this time, however, text mining software was unavailable, and a manual process was employed. The symptoms of the call centers were investigated to see if diarrhea-like symptoms increased during an outbreak of cryptosporidiosis and whether these calls could have identified the increase in incidence earlier than other surveillance methods (29). A four—fold increase in the standard deviation of calls with symptoms of diarrhea was noticed April 2"", 1993 (30). The media reported the outbreak April 6’“, and the department of public health released a statement on April 7th (31). Thus, monitoring the call centers could have allowed the public health department to more rapidly address the issue. This method was faster in indicating an outbreak than other established methods, like physician/hospital reporting (32). Another investigation using data fiom call centers in Milwaukee, Albuquerque, and Boston sought whether seasonal variation existed for flu-like symptoms. The call centers reflected a seasonal increase of flu-like symptoms for the winter months, in particular, November and December, indicating a definite seasonal variation. AS it was written in 1993, this paper mentioned that the United States and Canada has two hundred forty call centers established. If data mining methods were employed in conjunction with the nurse call centers, a national disease surveillance program could be efficiently created and monitored. These papers indicate the plethora of applications of text mining in many genres. Many of these applications deal with investigations of large databases, as this is a lengthy and needed process throughout any research topic. Text mining is particularly applicable to the healthcare field as the multitude of literature is often impossible to personally investigate. Further, with the continued expansion of information on the intemet, it is necessary to develop tools that help to consolidate the amount of information pertinent to a researcher. In general, anything that contains unstructured data can be mined to investigate its contents. Hypothesis & Objectives Since unstructured removes expectations of content, all drug reactions can be captured. A scale set assesses particular manifestations of a disease. Without these specific questions to illicit precise responses, the content of unstructured text has the potential to be very diverse. By removing the response constraints previously unavailable ideas can be investigated. The specific objectives of this project were to determine if there were unexpected outcomes within the physician’s notes of a clinical trial. Also, another objective was to identify patients’ with language indicating a status change. Once these patients were identified to investigate where there was a corresponding change in scales from the previous evaluation. This study is not limited to a specific epidemiological study design. By implementing this methodology, a researcher can gain further insight into most investigations. Introduction The disease diagnosis process consists of the integration of several aspects of information. Besides pathogenic testing, the process includes using information obtained from the patient; and in diseases in which a pathogen has yet to be identified, or a physical cause is unknown, this becomes the only data available for diagnosis. These data ‘long accepted as a productive partner in public health and evaluation research, qualitative research methods have begun to proliferate in health services research, clinical studies, health technology assessment and community-based intervention research. Such methods are preferred in field settings where the scope of work has yet to be determined, the relevant questions have still to be precisely formulated, local understandings are in flux and institutional arrangements unsettled’ (33). Qualitative information is gathered through interviews with the patient and quantitative measures (such as disease specific scales) are employed for status examination. This becomes an issue, as the physician must incorporate a conversion method to align a patient’s description into a numerical assessment. Data are defined as a numerical representation of a physical reality; whereas text is a natural language that can convey any meaning (34). Since text is inherently more diverse in its meaning, the potential for more diverse information is higher. The probability of information becoming lost is increased during the conversion process from verbal into numerical information. Furthermore, the perspective of the doctor becomes the most critical part of the diagnosis process. His/her personal opinion of the patient becomes integrated into the diagnosis, thus making it more of a subjective experience. Text mining can be applied in instances where the diagnostic data is gathered and 10 processed though language placed in an electronic format to increase objectivity in diagnosis. A preliminary investigation of PubMed articles dealing with a well-known mental disorder obtained two hundred eighty six articles in which ‘treatment’ was a top ten most frequent word. The parameters of the search'criteria were: the disease name, a nine hundred article limit, and by default the most recently published article would appear first. Text mining was used to cluster the nine hundred abstracts, which then allowed for the investigation of word frequencies. Within the set of the two hundred eighty six articles recovered, eighty four (29%) abstracts indicated a potential application of text mining. This potential existed because the study included evaluation scales, and the outcome was measured by the analysis of these scales. By incorporating text mining into the study design these studies would not have to convert patients’ status to numbers. Text mining offers a new potential for incorporation of the patient’s own words for analysis of status. Instead of incorporating the use of several scale sets it would be possible to employ text mining in an investigation to obtain a more thorough assessment of the patients’ status. By using physicians’ notes during a clinical trial, it would be possible to statistically group the documents based on the ideas present. Once these groups are established, they can be investigated for unexpected comments. These unexpected comments would be ideas and concepts that would not be captured by a particular scale or set of scales. A pharmaceutical company, for instance, could investigate several potential secondary applications of a drug simultaneously. Also text mining of the patients’ reactions could lead to early warning surveillance of potentially serious adverse 11 events. Scales do not capture side effects of drugs, as they are unexpected results. If an unexpected benefit resulted, the company could begin the steps to get approval for a secondary use. Aside from using this methodology to investigate unexpected results, it could also be used to test the association between the scales and the concepts obtain throughout a clinical trial. To do this, it would be possible to group concepts which indicate a change in status of a patient, and then correlate these changes to the scales. If the notes indicate a change in status, the scales data from that week would be investigated to see if a change was captured by the scales data. Through interaction with experts in a clinical setting, an established set of major concept domains could be created in connection to a disease. The domains outlined by the experts could be expected manifestations of the disease and potential ideas found in the clinicians notes. Text mining can be used to extract sentences with these concept ideas and cluster them together. The clusters could then be labeled to correspond with the domain outline. When such labeling is completed, the experts can validate the findings by repeating the procedure based on their knowledge without use of the software. If the results are compared to a non-expert who had training in the different domains, complied with the labeling from text mining, a measure of accuracy can be produced. 12 Methods Data Processing One physician’s note from one session often contained many different ideas and observations at the time of the patient interaction. It is necessary that these ideas are separated, as the ideas need to be distinct for clustering. The notes from one session could contain many sentences. The software uses all words of a single document together; therefore the definition of what should be considered a document becomes important. Each sentence becomes a separate document, which was itemized by date, and sequence number within the document. A program was used which recognizes breaks in sentences, such as a period followed by a capital letter. Many sentences were unrelated to clinical domains of interest and thus labeled as irrelevant due to impertinent information. The data were broken up into visit and summary sets. Visit differed fiom summary data as the physician reported current status after each visit; and summary was reported on the last visit, and therefore reflected an overall status of the patient relative to baseline. The complete dataset involved almost 3,500 patients. The summary dataset, before sentences and ideas were broken up, contained over 4,000 records. After the splitting, there were 9,931 records. The visit dataset had over 30,000 records before the split and 71,587 after. The program separated sentences with differing ideas present based on transition words such as ‘however,’ ‘but,’ and ‘although.’ When a sentence contained these words the two ideas would be separated into two separate documents. The physicians determined the evaluation scale set used in this analysis by assigning a value per scale between one and ten based on observation of the physician. The scale set used contained about two dozen specific measures. The individual scale 13 number became less important as analysis was conducted on the scale change between visits. Thus, a change measure was created. SAS Text Mining Bag of Words Methodology The next step in text mining is the process of clustering sentences together. This “bag of words” method begins with a text simplification process in which language is streamlined to recognize similar ideas (Figurel). In this process: nouns are converted to singular, verbs to present tense, and time, place, and titles are classified in a standard format. Parts of speech are tagged as: noun, verb, preposition, adjective, or adverb (35). By analyng each document, in this case each sentence, a simple frequency count of each word is tallied. A stop, start, and synonym list can be applied at various times to highlight or ignore certain ideas. A stop list is applied at the beginning of an investigation to discard common words. These words do not assist in clustering, as they occur within each document, and do not aid in differentiation. Examples of stop words are: “as,” “is,” “the,” “and,” “of.” An optional start list restricts the words to be used in a clustering process. Start lists allow the users to input the ideas or words that they are interested in investigating. SAS clusters the documents containing these words from the start list. When results are reviewed for completeness, adjustments to the start list and synonym list can be made. A synonym list is used to reduce the number of terms in the documents by eliminating redundancy of language. SAS has a default list with common words. For example: teach, educate, instruct, train, etc. can all convey similar meaning. These lists can be updated for specific domain investigations. 14 SAS converts each document to a vector with each word tallied for frequency. To begin this process, the synonym list is applied to merge the columns of synonyms. The user then applies the stop list or start list. The stop list eliminates all words on the stop list from the document; the start list eliminates all words except the words on the start list. The documents are compared for word frequencies within a matrix of documents and vocabulary. A single value decomposition (SVD) is performed on the (documents x words) matrix. The SVD process looks for combinations of words that provide the greatest variation among the documents (this is similar to the principal component analysis of statistical analysis to find the most significant factors involved in explaining variation among observations). “Single-value decomposition allows the arrangement of the space to reflect the major associative patterns in the data, and ignore the smaller, less important influences. As a result, terms that did not actually appear in a document may still end up close to the document, if that is consistent with the major patterns of association in the data”(36). The result of this analysis is a set of clusters that form from the similarity of the texts (37). The user can select the number of word combinations (derived concepts) that he/she will continue to use in the clustering process. There are a number of optional clustering techniques in SAS EM. All of these options are trying to find document vectors that are similar. The basic process involves aligning the topics of the documents (through the synonym list and stop list) and applying weights to the documents to illustrate how well they actually align. A Chi-square test is preformed on each cluster to determine the similarity of the documents in the cluster. 15 The output of the clustering process assigns each document an identifying label indicating to which cluster it belongs, the series of most frequent words or phrases within the cluster, and chi-square measure of consistency of the documents in the cluster. Confidentiality of Data Source Since the data were obtained through a pharmaceutical company, the degree of sensitivity and confidentiality increases. A prominent drug company that is legally bound by HIPPA did the clinical trials; therefore, the data provided was obtained in an ethical manner. This work was done by a third party research organization under a confidential contract to a major pharmaceutical company for the purpose of better understanding how text mining could be used to gain additional value from the unstructured data found in clinical trials. No mention of drug, disease, company or improvement or worsening derivatives will be used to describe this project. The purpose of this project is to highlight the numerous potential epidemiological applications. Text Mining Process Figure 2 illustrates the process used for text mining physician’s notes. The process begins with the domain experts identifying the basic concepts important to the domain in a taxonomy determined appropriate for the drug being studied. Each concept has an associated value, or was assigned “neutr ” (No change in patient status). An example of this would be: “Patient communicates well.” Although this statement is not necessarily a change, it could be inferred that the patient may have been less communicative in the past. In this dataset, the experts outlined five specific domains of interest and an ‘other’ concept classification (38). 16 There were two classifications of documents for the investigation. The first was a set of documents fi'om each visit a patient had with a physician (visit). The second consisted of the final visit of a patient for the end of the study (summary). This analysis technique was applied to each set of documents. The visits were sequentially ordered so a time line of visits could be constructed for each patient, and later combined with scale data. This methodology used SAS synonym processing to combine Similar ideas under one taxonomy element as synonyms. A preliminary understanding of the concepts within the documents is required to relate the potential ideas of the documents to the taxonomy. If the physician describes actions that the domain experts highlight within the domain concepts, these can be added to the synonym list under each specific domain and value. The start list contains value indicators of domain concepts. An example would be: “feels better” and “interacts well” under whatever domain would contain these ideas. Text mining would then be employed to cluster the documents based on these term frequencies, while incorporating the specified start/stop lists to obtain clusters. Once clusters were obtained through the text mining process, they were then reviewed by the domain experts for consistency. Ideas that did not fit into the three outlined domains were labeled as “uncategorized by domains.” Uncategorized results were investigated. The ideas found in the documents were expected ideas, but were not captured by the scales or domains. These ideas were issues that a doctor would want to document but are typical throughout any epidemiological study. An example of an idea that could be present would be weight issues. This reaction would not usually be part of a scale assessment but could be monitored by the physician. l7 The results of the clustering were reported and the largest percentage of the documents (sentences) appeared in ‘irrelevant’ clusters. Quality tests were also performed by comparing the results fiom samples done by domain experts and non- experts in the field with the text mining classification results. Iterative Clustering More specifically, the iterative clustering (IC) technique was used to pull concepts from the collection. This is done through a constant evaluation of clusters of concepts obtained through text mining (Figures 3 & 4). First the documents are processed using mostly default settings to see what ideas are present at the highest frequency. The easiest clusters to find are “irrelevant” since these are mostly symbols (such as ellipsis, dates, page numbers, etc) and trivial statements such as scheduling issues. The resulting clusters can easily be labeled as such and put aside since they offer little insight into patients’ status. The “interesting” clusters can be scanned for words that might warrant further investigation. In this project, domain experts highlighted ideas that they thought would be present in the text. Domain experts are people who are leaders in the field of study. These people were currently working on the drug and disease of study. The domain experts outlined three domains that they expected comments to be found. All sentences that had not yet been assigned to a concept were processed with a stop list. The resulting clusters were reviewed against the taxonomy. One element of the taxonomy was chosen, and a start list generated to represent that concept. The documents were re-clustered with the start list, and the synonym list was reviewed for completeness. The documents selected by the start list were extracted and clustered with the stop list. (This allows all 18 the terms of the sentences to be used in the next step.) Finally, the appropriate resulting clusters were labeled to the corresponding taxonomy element. The cyclical nature of this technique was observed as a stop list was applied to the leftover documents. The clustered product was examined for further interesting ideas, which were then compiled in a start list. After the start list was applied to the documents, the resulting documents were extracted, and the stop list applied to form clusters. The “interesting” clusters were then labeled with the appropriate concept. This process was repeated with the “interesting” documents until no further interesting clusters resulted. There are residual documents that fail to align to the taxonomy outlined by the domain experts. These residuals contain uncategorized results of potentially high value. Measure of consistency of classification The ICM is a process involving clustering and review to classify parsed sentences by doctors about patient status into one of five concepts (four domains and “Other”) and the value assignment of each domain or “neutr .” It is important to know the accuracy of such a classification system so that the cost of various methods can be compared to the knowledge obtained from the classification. However, the definition of accuracy is problematic. The results obtained from ICM were compared against several groups of humans. These people took a much longer time to obtain their results, compared to the time investment of the ICM. The classification of physicians’ comments resulting from the ICM was tested against two groups of professionals: one group consisted of four non-subject domain experts (non-experts), and the other of two domain experts in the field. 19 A stratified subset of sentences was randomly selected from the set classified by the ICM. After a short training period, the two groups of people (non-experts and experts) were given the subset and asked to classify each sentence with a concept and value assignment. 20 Results Overall results Overall visit contained almost 2,000 records of over six hundred individuals that had concepts with a value 1 assignment. There were 1,000 records of over six hundred individuals that indicated a value 2 assignment. These were analyzed per protocol since some protocols would contain more of one value assignment. As any large-scale clinical trial would have, the types of studies included were double blind, relapse prevention, and Open label. Since patients could be enrolled in several types of studies over time and have concepts in different domains there is some overlap with the categories. The double blind study visit dataset contained 11,826 comments relating to almost 1,500 patients. There were a total of two hundred fifty seven value 1 assignment comments relating to over one hundred patients, one hundred thirty nine value 2 assignment comments relating to over one hundred patients and 11,427 other comments relating to almost 1,500 patients. The breakdown of value 1 assignment of all domains of documents in the “visit” double blind dataset was as follows: one (domain 1), twenty-one (domain 2), thirteen (domain 3), one hundred six (general), and one hundred seventeen (uncategorized by the domains). The “visit” double blind dataset value 2 assignment contained ten (domain 1), eleven (domain 2), twenty-two (domain 3), fifty-eight (general) and thirty eight (uncategorized by the domains) documents. The double blind study “summary” dataset contained 1,554 documents relating to over six hundred fifty patients. There were a total of twenty three value 1 assignment comments relating to about twenty patients, thirty one value 2 assignment comments relating to about thirty patients and 1,500 other comments relating to over six hundred fifty patients. The “summary” double 21 blind value 1 assignment dataset contained one (domain 1), four (domain 2), one (domain 3) and seventeen (uncategorized by the domains) documents. The “summary” double blind value 2 assignment dataset contained seven (domain 1), ten (domain 2), nine (domain 3), one (general) and four (uncategorized by the domains). The open label protocols were broken down similarly. In the “visit” dataset there were 56,140 comments for over 2,500 patients. Of those, 1,384 comments were value 1 assignment of over five hundred patients, Six hundred fifty five comments were value 2 assignments of over four hundred patients and 54,099 comments of over 2,500 patients were considered neither value assignment. The breakdown of the value assignments within each domain of the “visit” dataset was: seven hundred twenty one comments value 1 assignment and two hundred seventy one value 2 assignment (uncategorized by the domains), thirty eight value 1 assignment and forty seven value 2 assignment (domain one), one hundred ninety seven value 1 assignment and sixty six value 2 assignment (domain two), one hundred eighty three value 1 assignment and one hundred thirty nine value 2 assignment (domain three), and two hundred forty seven value 1 assignment and fifty eight value 2 assignment “other” comments. The summary dataset had 4,818 comments pertaining to over 1,500 patients in the open label protocol. These were divided into one hundred nine value 1 assignment comments for almost ninety patients, eighty seven value 2 assignment comments for almost seventy patients and 4,622 “other” comments for over 1,500 patients. More specifically, there were fifty six value 1 assignment, and nineteen value 2 assignment (uncategorized by domains) comments, three value 1 assignment and nineteen value 2 assignment (domain one) comments, thirteen value 1 assignment and seventeen value 2 assignment (domain two) comments, 22 eight value 1 assignment and twenty seven value 2 assignment (domain three) comments and twenty nine value 1 assignment and five value 2 assignment (other). The Relapse Prevention protocols were categorized the same way. Within the “visit” dataset there were 3,126 comments for almost four hundred persons. This broke down to one hundred twenty nine value 1 assignment comments of almost fifty patients, one hundred twenty eight value 2 assignment comments of almost eighty patients and 2,869 “other” comments of almost four hundred patients. The numbers per domain were too small to offer any particular insights. About eighty percent of the uncategorized by the domains were value 1 assignment, over eighty percent of domain one was value 2 assignment, seventy percent of domain two were value 1 assignment, over sixty percent of domain three were value 2 assignment, and just under sixty percent of the general comments were value 2 assignment. The summary dataset had 1,078 comments for about three hundred fifty patients. There were two value 1 assignment comments for two patients, seventeen value 2 assignment comments for sixteen patients and 1,059 other comments for three hundred fifty three patients. The three domains contained only value 2 assignment comments and the “uncategorized by domain” had seventy percent value 2 assignment comments. Relating scales to text The scales used in the clinical trail had questions that could capture any of the three value assessments: value 1, value 2, or neutral. Overall for the “visit” dataset 64.6% of the comments were value 1 assignment, and 59.7% were value 2 assignment comments. The sum is over one hundred percent, as a patient can have both value 1 assignment and value 2 assignment comments. An example of this was analyzed within 23 the “summary” dataset. There were a total of one hundred seventy four patients who were within the value 1 assignment cluster and a total of one hundred sixty seven in the value 2 assignment cluster. Nineteen patients were found in both clusters. The change in patient status was calculated as change from the previous visit. If the value was decreasing then it was considered a value 1 assignment. If the change was increasing it was considered a value 2 assignment. Each visit considered decreasing in scales was grouped as to whether the textual value was value 1 assignment. Each patient visit that had a value 1 assignment in both scale and concept were added and the sum of four hundred thirty two such patient visits was -6,312. The set of value 1 assignment scored patient visits when summed over the value 2 assignment comments (two hundred seventy patient-visits) was -1960. This was repeated for value 2 assignment scale patient-visits. The value 2 assignment patient-visit (of two hundred fifty two instances) to value 1 assignment comments was calculated as 2008. The value 2 assignment patient-visit (of three hundred twelve instances) to value 2 assignment comments was calculated as 3,007. The overall summation of value 1 assignment comments to all scale changes was -4,304. The overall summation of value 2 assignment comments to all scale changes was 1,048 (Table 5). Such analysis was performed on the “summary” dataset with the follow results (Table 5). Each person with a specific domain concept identified within their visit was pulled and their overall scale changes were calculated and added through the patient set. Since there were three outlined domains and an “uncategorized by domain” category, this was calculated eight times within both the “visit” and “summary” datasets. The domain value (value 1 assignment or value 2 assignment) was identified and compared to the scale change (Table 6, 7). The strongest correlation was the value 1 assignment 24 comments to a decrease in scales. The correlation to value 2 assignment scales was less strong. The instances where the comment value assignment agreed with the scales were expected. The disagreement became the “interesting” instances and the number of sentences was a readable amount for the researchers to manually investigate. Without the ICM it would have been extremely labor-intensive to manually read the text dataset. Measure of Classification Results The taxonomy of the concepts was not precisely outlined by the experts. This was evident in the domains obtained from the experts that contained overlap in ideas. The value consideration of a doctor comment is likely to be the most significant feature in the classification interpretation. Thus, the most critical test of a classification system is whether the value assignment (value 1 assignment, value 2 assignment, or neutral) matches human interpretation. “Researchers in many fields have become increasingly aware of the observer (rater or interviewer) as an important source of measurement error. Consequently, reliability studies are conducted in experimental or survey situations to assess the level of observer variability in the measurement procedures to be used in data acquisition” (3 9). There are three levels of detail in the classification analysis: 1. Most detailed — both the concept domain and value (value 1 assignment, value 2 assignment, neutral) assignment had to be the same. 2. Concept level — classification yield the same concept domain designation, but the classifiers might disagree on the value. 25 3. Value judgment — the value assignment was the only consideration - assignment to different concepts is not tested in this set. If the non-experts agreed on the classification, the classification analysis was labeled as “SAME.” Since the non-experts were made up of four people, if a majority labeled either a concept or a value the classification was labeled as “VOTE.” There were two experts in the group therefore only “SAME” could be used. Thus there were nine subsets: l. non-expert “SAME” (concept, value, both) 2. non-expert “VOTE” (concept, value, both) 3. Expert “SAME” (concept, value, both) Each group was presented with a stratified and balanced random selection of sentences from the study. By design, there were twelve sentences that had been selected from each specific concept domain and value (value 1 assignment or value 2 assignment). There were another ninety six sentences selected from the “other” category with no presumption of value. The results, as defined for each of the nine sets of results were compared with the results of the ICM classification. The calculation of the term, It, was used to measure the similarity between two classification results. “Kappa is intended to give the reader a quantitative measure of the magnitude of agreement between observers” (40). Kappa traditionally takes into account the by- chance agreement of two observers. The formula used in this analysis did not. The formula used was: K = (Number of sentences classified the same between the two sets)/(average number classified by each method). 26 Value Assignments All four non-experts classified the same twenty eight sentences as value 2 assignment, thirty three sentences value 1 assignment, and fifty eight neutral. The numbers increased when the “VOTE” methodology was employed. At least three out of four non-experts classified forty nine sentences as value 2 assignment, forty eight were classified as value 1 assignment, and seventy six as neutral. When the ICM results were compared to the non-experts, forty five sentences were classified as value 2 assignment both by the “VOTE” and the ICM. Both the “VOTE” and ICM classified forty four sentences as value 1 assignment. Likewise, seventy five sentences were classified as neutral by both methods. (Table 1) Concept Domain Assignments The four non-experts had an average comparison in the concept agreement of 40% (Table 2) with ICM. At least three of the four non-experts and ICM agreed on an average of sixty percent. The experts and ICM matched domains with fifty percent agreement. Complete Agreement The rate of agreement significantly decreased when the exact matches were examined (Table 3 and Table 4). The non-experts agreed on average twenty three percent with ICM. The majority agreed twice as often with an average of fifty nine percent with ICM. Both experts agreed with ICM an average of thirty five percent. Two value domains received no agreement between any of the three analysis groups. 27 Out of one hundred ninety two records, all four non-experts agreed with ICM classification 61 incidences which resulted in a R540“ = 0.32. A “VOTE” technique labeled the domains and values as ICM had one hundred nine times indicating a KMCM = 0.58. Overall, the non-experts unanimously agreed amongst themselves sixty-six of one hundred ninety two times (KS = 0.34). As Shown in Table 4, the two domain experts agreed with each other at a forty nine percent frequency and with a forty six percent rate with ICM. The total agreement and vote scoring of the non-experts resulted in a higher rate of matches than the experts. 28 Discussion Relating scales to text There were instances of value 1 assignment in scales where the comments highlighted value 2 assignment comments. This could be due to a specific value 1 assignment in the patient; however, the doctOr commented on different value 2 assignment aspects. Measure of Classification It is important to recall that the domains outlined by the experts contained significant overlap. ICM dealt with the overlap by selecting specific domains for each of the overlapping ideas. These were conveyed to the non-experts through use of a small PowerPoint presentation that illustrated the overlying ideas within each domain. This presentation was not delivered in person, and therefore, no questions were answered prior to assignment. The presentation was not given to the experts. This could be why the two experts agreed less than the “VOTE” of the four non-experts. All four non-experts and the experts unanimously agreed within each group comparison; however, a four-person agreement is harder to obtain than a two-person agreement. Value Assignment It is interesting that the value assignment classification of the two experts agree at about the same rate as the non-experts agree to the “VOTE” level, which similar to the ICM results. In the value agreement, the majority of the non-experts and experts agreed with ICM above 90% (Table 1). The four non-experts agreed at roughly 80%. As mentioned earlier, the perception of whether a patient is improving or not is the most 29 critical factor. The best treatment for the patient relies within this assignment, and not necessarily whether the domain manifestation of the disease is identified “correctly.” Concept Domain Assignment The concept agreement between the non-experts and experts with ICM indicated that the non-experts agreed more often (“VOTE”) than the experts and ICM. Although the experts agreed at a fifty percent level, the non-experts and ICM agreed at a sixty percent level. This could be due to the experts’ differing personal backgrounds. Perfect Agreement Perfect agreement between the value and concept domain was less than the other analysis, which was to be expected. The non-experts still reported a majority agreement with ICM above fifty percent. Since the ntunber of possible choices for each concept was larger, this is a strong agreement. Each assignment had fifteen different possible combinations when both agreements were investigated. If this process was implemented on a large scale, it might be beneficial to employ non-experts to undergo the analysis since the objectivity may be higher. Since the domains were not explained to the experts as they were outlined to the non-experts, this may have caused issues with the overlapping ideas present. Also the expert agreement of fewer than fifty percent indicates that this methodology would help to decrease subjectivity in classification. The establishment of this technique applied to medical treatments and diagnosis would allow for innovative changes in the profession. First, this new technique offers a different means to obtain a conclusion regarding the status of a patient that uses the exact wording of the person seeking help. This could help to group many patients at the same time to 30 obtain preliminary diagnosis. Another application could be to streamline training of physicians to obtain clearer manifestation domains within a disease. Pharmaceutical companies can use the reactions of the patients as a warning system for dangerous drugs, seek other uses for a particular drug, or observe a particular subset of people who may react more favorably to a certain drug. Usually, doctors’ comments are discouraged in clinical trial settings. If doctors were encouraged to write comments complete data could be compiled to allow for more insight of the company. 31 Strengths and Weaknesses A significant strength of this study is the implementation of a computer into clinical trial assessment. This decreases the amount of subjectivity within the company’s measurement, and the process is easily repeatable with the given data. AS illustrated earlier, the experts’ classification of the domain values averaged around fifty percent. This indicates a strong difference of opinion between even the most involved and knowledgeable physicians. If the experts in the area disagree at that level, then the intervention of computers could make the process more objective. The major weakness of this paper is that the actual data cannot be issued to the public. The numbers presented represent the actual numbers, the domains are consistent between tables, but the words analyzed are not shown. Albeit a weakness, the topic presents a new way to utilize all data collected during a clinical trial. The analysis performed validates this as a new method for pharmaceutical assessment. The purpose of this study was to highlight this technology for use in a clinical trial. Another weakness of this paper is that the documentation used was from the physician. If the patients’ actual words had been used, it may have decreased subjectivity. This would only be true in disorders in which the patients would be able to verbalize their thoughts and feelings. In many other cases, the physician would have to offer the interpretation of the patient’s status, as they may not be able to communicate. Furthermore, the physicians were discouraged from writing comments altogether. If encouraged, however, a wider variety, and a more complete picture of a patient’s status could result. 32 Conclusions & Future Applications Since some disease diagnosis and assessments do not have a biological test, this process could provide a statistical process to align the diagnosis process. As indicated through the comparison of the experts, there is not always agreement. This could be significant if each of the different domains contained different diagnoses. One would be worried if two physicians consulted over their condition and had such a low agreement. Further, this process could be used for obtaining more information during a clinical trial. The additional cost of incorporating this methodology to a clinical trial would predominantly be software implementation or contracting, as was done in this project. The collection of data could be expanded to family members, as well as the patient, in order to encompass the overall status of the patient more completely. The domain experts involved in these studies often do not have the technical expertise to operate these fairly complex software packages, since they are usually focused in a particular research area (41). This process illustrates the nexus of technology and expert knowledge; and allows the software to be utilized to its full potential, while simultaneously decreasing subjectivity. Nonetheless, as the article on nurse call centers implied, a national nurse-call hotline, in conjunction with this technological process, could allow for symptoms of diseases to be monitored, and an outbreak discovered more quickly. With regard to the recent pandemic scares, the increased efficiency and ability to monitor national symptoms for an outbreak could be very beneficial indeed. 33 Tables and Figures 34 as n 9313 5.2. as u area 5.? 33 u Gait ER mtOQXOK 9. me E .20. a 099., seem. 50m 9. we I. 09m< mtoaxm 50m mm no mm N toqxo em 3 8 a 598 9. we 8 .20. So n @1353. as u 9.13 5.3. :3 n 6293 aims 22s vv mv mp 5.2 w tquméo: 90> we mv on toaxoéoc 20> and n @193 5?. Ed I GTE 5.3 m; n @533 5% was: mm mm mm .20. w toaxwéo: OEmm mm mm mm toaxfco: mEmm EOEchmm F O:_m> EoEcgmmw N O:_m> EEO: 55330 Amawa EOEEoo Co on eomueommmmflo DES, 85an mo mnemcdmaoo .H 2an 35 ”.3 N3 N2 8... ”Na ease I 39$. NE u .4?an Na. u A. 74% Na. u EN. .9. u 3+3 N... .20. w 85¢. 2. 2 S P v 9.09m. 58 099., E. .N S N m Seem. £8 2 S S m 2 N Sen. 8 5. N N on F team 8 N .N «N N .20. 4:. :3 G3 .8 :8 22s. I ARGO 58 n GENO NE u $.48 .92 u 843 NS n 8748 \NE .20. w 8 t or m N 53982 08> E 8 2 m 2 $9982 so> 8... one ”no 2... N... were. u GTNGV \Ner u 9N3: \NIN. u 8+.ch RE I 94$ \NIN n 3+8 5... .20. w Nm NF 0 N v toaxm-coz 953 No 2 w v o ewaxméoz 9:8 .o mEmEou .3 EmEo EmEo 59:0 50 BNESQSS m . o N . o F . o $3280 >880er 8 383:8 we eosmowmmmflo mo 2838800 .N 2%... 36 vm.ou.._.c._8.8 Two"; .88. Noon; 88 .90; mNd de VNd Samson Emucmfi oNd 8.0 «NA. O. Omm.o>< mod om Ed on mmd on $50 9.02 8.0 N Nwd m 3.0 F m EmEoQ EmEcm_mmm N o:_m> Fmo o ofio w 5.6 v m EmEoo .cmEcm_mmm F o:_m> ood - mmd m cod - 35.09.85 Emecmfimm N o:_m> mFd F 5.0 o mFd F $533.85 .coecmfimm F o:_m> NN.o m 8.0 m Rd N N EmEoQ EOEchmm N o:_m> de F cod - cod - N 59:00 EoEcmfimm F m:_m> ood - cod - oo.o - F EmEoo .coE:m_mmm N o:_m> Fmd v 5.0 .c. mmd n F 59:00 EOEcmfimm F o:_m> .23. wmwmum a. 5.9 Q 20. E90580 m_2 Eflofiooo a ms. was maoeoxmh 8 833:3 mo :oumommmmfio we 3859800 .m 2an 37 Foo n N32: I SEE? 2.. Bees: eases 5 n a. ”BONES u 8.52% 05 Bags 8835... 5 u a and H w \ v n Amtoaxoéo: 05 .«o .23 .3 8m £5 E BEEN—o 388:3 me 6:98: :39. \ Amtomxoéo: .3 .3 35320 888:8 me 59:35 .1. 5:06.88 “898:0: me 2:82: I 8.x ”E... u are R ...N u 33:35 32 care u .95. .> $28 a + GE? s 328 5 .98., 3 age a .N n3. :3 u 5+9 3 .N u Enema 32 saw: u 56: a 3.28 a + 52% a $28 5 \.%m a £28 a .N "a. 38 “3.0 n NERO u Nogmtoaxo 30:3 088 05 33333 833.03 a u €35: 63° N mataw n maSZHOH 05 @3333 83883 a u so? 53 u 313 3N u 32 .328 a + 325 3.28 5 \ 32 mag EN u w. 3.: 3.... 2 ma aw 389 NM... :2333 @3336 mm... owfio>< mud wm mm 550 302 to m m m :3an 333333 N 33> 5o w n m .3800 303me M 33> wuNtowBaoab 5o 2 w :3an 33:37.3 N 33> wontowoumocb end a w 53an 303333 _ 33> m _ .o 3 _ m :3an 3083me N 33> c fl o N 33:89 308333 _ 33> o o o _ =3EoD 323%me N 33> 5.6 w v _ :38an Eoficwmmma _ 33> 3&3 $5.3m .9 3&me -23 303530 53m 7:9. 20H 93 3.8358 8 3on “.5qu 0% E 33> 23 :3EoQ 8 $3033 mo comwmommmmfio «0 3833800 .v 23% 39 £23538 86:6. 98:5: 2. Noo _.- kmmN- Each 68 6.89 8 3 6.89 N N E68568 m:_m> 6.00m 28w Eek 63:- 6.89 5 EEK- 6.89 8 P EmEcgmmm m:_m> 6.00m 28m :33 $06. :3 8 9.6558 N #58563 m:_m> £3.68 «8 626258 w :68:ng m:_m> >m Eoow 28m .98. $582- 6.89 EN an F :6- 6.89 N9 3.33 :8 62.6558 N EmEanmm m:_m> 3°39 mam 626558 r EmEcm_mmm 02m) r E68363 w:_m> Soow 28m :38 ._._m_> 358800 N 368%me 623/ 25 fl Eoficmima oBm> 1809 .m> mwifim 28m 130,—. ”m 2an 40 E_m_>-Eo=ma ho .mnEzc \ mmcmco 0.8m m_ 69.65:. mgflicmzmq 86:0. 9383:... VON ”www.mzwv n AvéN- ”mafigmv mm 0.8m 2QO =30... 38.8. on $.03 8 6.9.60 AN? 60.998 m $.mN- ”60.6.93 m Boom mfiow =38. €8,di 8 38,6: 6 m 5658 as ”omm.m>mv m 3.5- ”.3993 3 98w 6.8m Eek 3°39 vm §m68 2 N 52.8 Gd ”096.93 m 3m 7696.658 v 6.8m. 2me Each §N68 mm @066: v r 58.8 meEEoo 958600 N EoEchwm m:_m> F E95936 m:_m> 8.50m 23m oczommm .> Saddam 88m @5330 mo 2an 536885 6 2an 41 mums-Egan we .6nE3c \ 69.65 9me m_ 69.65:; 6:63-6:2qu 86:6. 26983:: CNF ”6m6.6>6v mm 3. F F- ”63.623 m3 658 6_6om Ego-r €06.68 5 @038 SN 3.9.60 33 ”696.93 VNF 3.0 T ”6mm.6>6v 8N 6.06m 6_6om Each 386$ mom 3&68 8v mE6EEoo 66.3.53: 333 696.623 mm Cum- ”696.623 66 6.8m 6_6ow fine-r $.39 t; @039 h; 6 59:8 3.6 “60.6.93 9. 3.6- ”6m6.6>6v mow 6.8m 6_6ow _Eo-_. €8.69 E @0me m: N 5658 3.9 H6m6.6>6v 5 33-69298 5 6.06m 6_6ow =38- §... 8 8 .569 8 F EmEoo 6E6EEoo E6Ecm_mm< N 6:_6> 3:66:50 E6Ecm_6m< P 6:_6> mouoom a. 36250 55 .«o nova—EEOU u N. 63$- 42 2653.0 8 66:5: 33:3 :m_6m< .2620 :oufionEoog EMWHHE j 6:_m> .2355...» 3 5.8. _ a .6... 6.36326 - 6:585:6- _ cos—35666.0 66.30 2.6 .3625 *0 tan. Al? 56.6 .ll 626m 966600.: :8:- m 6x6... 366... 3:630 6.3500 :1 w6_muw 6N>_6:< a 26.3. a g _ 8.6an 62:66”. 3636365 Equo 3 toga .5.— xoo-_ _ 636_>6m fl 44 EozmoEmmflo a 9:86.29 - .iu-iL-d :8... 65...: as» 525-30 >Eo:oxm._. wmmooi m:_:__>_ :8:- buoaoxmh mo :oufiwoufi 5:5 3236.6 mmoooi wing cau- UN oSwE A 6mm: 6: . _ “x O 6 660:6“:66 memwufi 26:00:36 6 06: SP: 6.04.. :0 6566985" “muwnmug 0:66:00 6660:: : . _0 . fl «6: .6000 :0 “66 aoum 2:666“. 5:... 0.566925 06: .6: ~55 Ea...— 6E0; 06:96.62: :6 .8 «6.. tflw 05:66:85 l 53 “60:66:66 u. a - . 5.; .9620 .8: .6665." .0 66:00 .6757. :56 866.0 a a 66: o 0 “ombxm 660:8:66 .6000 66.. 93.50366 a:6>6_6.:_ :05 2.3666“. 0: 6:60:00 no :0 “tux—26.5" 5.. «6: 5:5 .3620 I .666 d Jan. 6:66:00 6660:: tmum 266.0 . 5:5 moon :8... 9:60.620 636.6: 3.36 25 8 8=&< osgfiooe 0: ”m 656E 45 .66o:6:U66 0256663.. *0 66.6888 .0 68:65:66 563 5:60:00 60 666:6:66 .96 5:866:36 06: LL LL \56IL66I . ..o_.2:66+~_6.>+:.6_. n. 626:6: 662.. 6. 5...5. . .2 63:65:66 02.... 60:6.:U6m :3. =6 506505 _ L 7 5:60:06 .65 63:65:66 06:9666 66:65:66 6E5 56:96.62: 5:6 5:5 65:656.". .1 h6>o >636 .5 :o 650-:- 5° 56: «cabxm a 66:00 a “6266 5:66:00 6566:: 606: 6:o_>6.n_ m:_o:.6506m 836 6.5 8 8.3% 68.5.85 0: 6 6.36: 46 Endnotes 10. 11. 12. 13. 14. 15. 16. 17. 18. SAS Enterprise Miner 5.1 documentation, 2004. R. Mack et a1. “Text Analytics for Life Science Using the Unstructured Information Management Architecture,” IBM Systems J oum3_al_ 43, no. 3 (2004): 490-515. N. Uramoto et al. “A Text-Mining System for Knowledge Discovery from Biomedical Documents,” IBM Systems Journal 43, no. 3 (2004): 516-533. Uramoto, 519. Uramoto, 521. Uramoto, 522. Uramoto, 522. Mack 491. Mack, 491. Mack 497. Elaine Leong, Michael Ewing, and Leyland Pitt, “Analyzing Competitors’ Online Persuasive Themes with Text Mining,” Marketinglntelligence and Planning 22 no. 2 (2004): 187-200. Leong 193. Patricia Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice,” SUGI 29 Conference Proceeding; Montreal May 9-12, 2004. Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice,” Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice,” Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice,” Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice,” Patricia Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings,” Health Management Technology, March 2004. 47 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings,” Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings,” Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings,” Cenito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings,” Stanley Loh, Jose Oliveira, and Mauricio Grameiro, “Knowledge Discovery in Text for Constructing Decision Support Systems,” Applied Intelligence 18 (2003):357- 366 Loh, 3S9. Loh, 361. Loh, 364. Loh, 364. Jane Rodman, Frost Floyd, and Walter Jakubowski, “Using Nurse Hot Line Calls for Disease Surveillance,” Emerging Infectious Diseases 4 no. 2 (1998): 329-332. Rodman, 329. Rodman, 330. Rodman, 331. Rodman, 331. Michaela Amering, Peter Stastny, and Kim Hopper, “Psychiatric Advance Directives: Qualitative Study of Informed Deliberations by Mental Health Service Users,” m British Journ_al of Psvchia_try 186 (2005): 247-252. Paul Losiewicz, Douglas Oard, and Ronald Kostoff, “Textual Data Mining to Support Science and Technology Management, “ J oumal of Intelligent Information Systems 15 (2000): 99-119. Losiewicz 105. 48 36. Scott Deerwester et al. “ Indexing by Latent Semantic Analysis,” J oum_al of American Society for Information Science 41 no. 6 (1990): 391-407. 37. Leong 198. 38. Losiewicz 113. 39. Richard Landis, and Gary Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics 33 (1977): 159-174. 40. Anthony Viera, Joanne Garrett, “Understanding Interobserver Agreement: The Kappa Statistic,” Famin Medicine 37 no. 5 (2005): 360-363. 41. Losiewicz, 115. 49 Bibliography Amering, Michaela, Peter Stastny, and Kim Hopper. “Psychiatric Advance Directives: Qualitative Study of Informed Deliberations by Mental Health Service Users.” The British Journal of Psychiatgy 186 (2005): 247-252. Cerrito, Patricia. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital Quality Rankings.” Health Management Technology March 2004. Cerrito, Patricia. “Solutions to the Investigation of Healthcare Outcomes in Relationship to Healthcare Practice.” SUG129 Conference Proceeding; Montreal May 9-12, 2004. Deerwester, Scott, et a1. “Indexing by Latent Semantic Analysis.” Journal of American Society for Information Science 41, no. 6 (1990): 391-407. Landis, Richard, Gary Koch. “The Measurement of Observer Agreement for Categorical Data.” Biometrics 33 (1977): 159-174. Leong, Elaine, Michael Ewing, and Leyland Pitt. “Analyzing Competitors’ Online Persuasive Themes with Text Mining.” Marfitinglntellijence and Planning 22, no. 2 (2004): 187-200. Loh, Stanley, Jose Oliveira, and Mauricio Grameiro. “Knowledge Discovery in Text for Constructing Decision Support Systems.” Applied Intelligence 18 (2003):357- 366. Losiewicz, Paul, Douglas Oard, and Ronald Kostofi‘. “Textual Data Mining to Support Science and Technology Management.” Journal of Intelligent Information Systems 15 (2000): 99-119. Mack, R., et a1. “Text Analytics for Life Science Using the Unstructured Information Management Architecture.” IBM Systems Joumal 43 no.3 (2004) 490-515. Rodman, Jane, Floyd Frost, and Walter Jakubowski. “Using Nurse Hot Line Calls for Disease Surveillance.” Emerging Infectious Diseases 4 no. 2 (1998): 329-332. SAS Enterprise Miner 5.1 documentation, 2004. Uramoto, N, et al. “A Text-Mining System for Knowledge Discovery from Biomedical Documents.” IBM Systems Journ_a_l 43 no. 3 (2004) 516-533. Viera, Anthony, Joanne Garrett. “Understanding Interobserver Agreement: The Kappa Statistic.” Family Medicine 37 no. 5 (2005): 360-363. 50 11111111111-1111311‘11211171311