it
PI

2007

This is to certify that the
thesis entitled

TEXT MINING INVESTIGATION OF SCALE ASSESSMENT
WITHIN CLINICAL TRIALS

presented by

Allison Renee Mentele

has been accepted towards fulﬁllment
of the requirements for the

MS. degree in Epidemiology

 

 

W’

fﬁajor Professor’s Signature
5 /2 2 1L0 6

Date

MSU is an Afﬁnnative Action/Equal Opportunity Institution

 

LIBRARY
Michigan State
University

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE out

AUG 2 9 2009

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 p:/CIRC/DateDue.indd-p.1

 

 

I III-Til; p I

.1 ‘l‘

TEXT MINING INVESTIGATION OF SCALE ASSESSMENT
WITHIN CLINICAL TRIALS

By

Allison Renee Mentele

A THESIS

Submitted to
Michigan State University
In partial fulﬁllment of the requirements
For the degree of

MASTER OF SCIENCE
Department of Epidemiology

2006

ABSTRACT

TEXT MINING INVESTIGATION OF SCALE ASSESSMENT
WITHIN CLINICAL TRIALS

By
Allison Renee Mentele

Many diagnostic assessments use scales to quantify patients’ status. Text mining
was used to obtain information from outside the assessment capability of the scales.

SAS Enterprise Miner was used to investigate the actual words written by
physicians during a course of sequential clinical trials. The concepts within the texts
were extracted and classiﬁed based on a value assignment within domains outlined by
clinicians. The text classiﬁcation was correlated to the scales at each visit to investigate
the relationship.

The text classiﬁcation corresponded strongly to a scale change, especially within
speciﬁc scales. The classiﬁcation of a random subset of documents was given to two
clinical experts and four non-experts. The experts and the majority of non-experts agreed
with the program concept and value assignment similarly. The correspondence between
groups was lowest in the experts for concept assignment. However, the experts had a
higher agreement than the non-experts in the value assignment.

Since the experts had a lower agreement in concept classiﬁcation than the
majority of four non-experts, this process can offer an objective insight into the
assessment of patients’ status. A manual investigation would be too time-consuming
given the volume of documents analyzed. By implementing this text mining process, the
main ideas available are understood quickly. Furthermore, the process outlined within

this paper allows for extraction of less prominent ideas within the given documents.

i I O P.--l...|l" I.

..’..l’vll 'u- IIIEP ‘0 -

I I‘I-Il D)

Acknowledgements

I would like to take the time to thank the following people for helping me
complete this work. First, I would like to acknowledge my advisor Dr. John B. Kaneene
who made this project possible and who guided me through this with numerous
suggestions and critiques. Thank you for always turning things back to me so quickly
and having the time to answer my questions.

Next, I would like to thank my other committee members Dr. Joseph Gardiner
and Dr. David Todem for all the time, advice and knowledge they have given me
throughout this project.

Also, I would like to acknowledge that I was part of a team working on this
project. Our team consisted of ﬁve technology experts, a senior researcher and a team
manager. Thank you team for all your help.

Finally, I would like to acknowledge the faculty and staff in the Epidemiology
department, for your patience and guidance through all of the obstacles faced when
completing this thesis.

There are many other family members and friends who supported me through this
without whom I would not have gotten this far. Thanks to all of you for your support and

encouragement.

iii

TABLE OF CONTENTS:

LIST OF TABLES ................................................................................ V
LIST OF FIGURES .............................................................................. VI
ABBREVIATIONS ............................................................................. VH
BACKGROUND .................................................................................. 1
LITATURE REVIEW ............................................................................. 2
HYPOTHESIS & OBJECTIVES ............................................................... 9
INTRODUCTION ................................................................................ 10
METHODS ........................................................................................ 13
DATA PROCESSING ........................................................................... 13
SAS TEXT MINING BAG OF WORDS METHODOLOGY .................................... 14
CONFIDENTIALITY OF DATA SOURCE ....................................................... l6
TEXT MINING PROCESS ...................................................................... 16
ITERATIVE CLUSTERING ..................................................................... 18
MEASURE OF CONSISTENCY OF CLASSIFICATION ......................................... 19
RESULTS ......................................................................................... 21
OVERALL RESULTS ........................................................................... 21
RELATING SCALES TO TEXT ................................................................. 23
MEASURE OF CLASSIFICATION RESULTS ................................................... 25
DISCUSSION .................................................................................... 29
RELATING SCALES TO TEXT ................................................................. 29
MEASURE OF CLASSIFICATION RESULTS .................................................... 29
STRENGTHS AND WEAKNESSES ........................................................ 32
CONCLUSIONS AND FUTURE RESEARCH ............................................ 33
TABLES AND FIGURES ..................................................................... 34
ENDNOTES ..................................................................................... 47
BIBLIOGRAPHY .............................................................................. 50

iv

List of Tables

Table 1- Comparisons of sentence value classiﬁcation ........................................ 35
Table 2- Comparisons of classiﬁcation of sentences to Taxonomy Concepts ............... 36

Table 3- Comparisons of classiﬁcation of sentences to Taxonomy and Value

by the Non-Expert group with the ICM and among themselves ............................. 37
Table 4- Comparisons of classiﬁcation of sentences to Domain and Value by

the Expert Group (2 members) and ICM ......................................................... 39
Table 5- Total Scale Ratings vs. Total Value assignment 1 and Value assignment 2
Comments ............................................................................................ 40
Table 6- Summation Table of Clusters from Summary v. Baseline Scale Scores .......... 41
Table 7- Compilation of Visit Concept & Scores ............................................... 42

List of Figures

Figure 1- SAS Text Mining Process .............................................................. 43
Figure 2- Text Mining Process Overview with Integration of Taxonomy ................... 44
Figure 3- IC Technique Applied to this Study ................................................... 45
Figure 4- IC Technique Applied to this Study ................................................... 46

vi

Abbreviations

ICM — Iterative Clustering Methodology

ICD-9- International Classiﬁcation of Diseases Ninth Revision Clinical Modiﬁcation
DRG — Diagnosis Related Group

SAS EM — SAS Enterprise Miner

SVD - Single Value Decomposition

vii

Background

Typically, a research project will produce, as a by-product, large amounts of
written notes and material on the subject of interest. These notes can accumulate to a
large dataset; and investigation is tedious and difﬁcult. Text mining is a process to
investigate these unstructured texts (notes) without having to manually read each one.
This process is a relatively new and not yet widely used in the biomedical sciences. The
purpose is to extract meaningful and interesting concepts from a dataset consisting of
words. The software groups the documents together based on vocabulary ﬁequencies;
and the resulting groups are known as “clusters.” A North Carolina based software
company, SAS, is a prominent developer of text mining applications. SAS has developed
Enterprise Miner”, a text mining software package that was used for this project. SAS
deﬁnes text mining as “. .. a process that employs a set of algorithms for converting
unstructured text into structured data objects and the quantitative methods used to analyze
these data objects.”(1). Text mining enables qualitative data to become quantitative with
pre-processing manipulation. The overall idea is that the researcher can pull and group
ideas from a source without any previous knowledge of the context. This is where the
process of search engines and text mining differ; people pull documents based on an idea
of what is within its content, text mining works without this knowledge. For instance,
text mining can be employed to search published articles, where the information is known
to an extent; however, the correlation between the information contained in different

articles is unknown.

Literature Review

There are many possibilities for the application of text mining. “Text-mining
methods include analyzing associations and trends between categories of entities, such as
correlations between names of researchers and research topics, genes and gene products,
drug and compound effects and disease indications, and so on.”(2) One highly pertinent
application of text mining within the biomedical sciences is searching large databases.
By using this application it is possible to investigate large sources of information quickly
for speciﬁc concepts within a general topic. The documents of a search are pulled based
on a search engine approach but are then grouped based on the sub-concepts involved in
the documents.

Of the literature examined, three articles engaged the application of searching
large databases with text mining and the resulting implications for researchers. The ﬁrst
paper applied text mining in the development of new drugs. TAKMI, an IBM developed
software was used. IBM developed it for the purpose of clustering documents in large
databases (3). The researchers involved in its development created large lists of pertinent
biological nouns. These lists were used to pull topics from the databases (4). Often in
biomedical literature, biological entities have many different literary forms. The
combination of capital letters & abbreviations causes confusion within the program as
they are identiﬁed as different nouns. By incorporating this list, these variations were
streamlined. This product demonstrated its ability in a remarkable way. Researchers
searched published papers for biological interactions to develop new treatment sites for
the treatment of diseases. This paper used a search of Medline to illustrate the

methodology. During this research, Medline contained 330,000 biomedical abstracts.

There were ﬁfty-four papers found to contain concepts dealing with the gene, AMLl.
Leukemia was the most frequent association within these papers at seventeen papers out
of the ﬁﬁy-four (5). Since AMLl has an established association with leukemia from
previous research, the methodology is shown to work. Other potential treatment sites
were then investigated with a similar methodology. A general search for leukemia
abstracts produced 1,051 papers. Within these groups signaling proteins STYKc and
Terc were commonly mentioned. Phenotypes commonly associated with leukemia
were HMG-COA lyase deﬁciency, hepatic lipase deﬁciency, and Miller-Dicker syndrome
(6). These phenotypes share SAM and HATPase_c within their expression. The TAKMI
software plots the phenotypes (vertical) and the signaling proteins (horizontal). Areas
that indicate potential new treatment sites, such as HATPase_c, are interactive; and
selecting these areas in the graph pulls the documents where these two entities are
mentioned (7). Through an investigation of these particular documents a possible new
drug interaction Site, and therefore treatment, could be developed. By employing such a
process, a researcher could save time by narrowing down the number of papers of
interest. Any search engine could pull the one-thousand papers on leukemia. By
employing this software, however, a researcher could quickly reduce the number of
papers of interest as well as focusing on a particular aspect of leukemia. This paper
created a new way to examine the current research quickly and efﬁciently for alternate
avenues of treatment capabilities.

Another use of text mining is in the monitoring of public domain data to maintain
an up-to-date surveillance of someone or some thing’s activities. This can be used to

examine new research of particular people and of competitor’s public information. Text

mining was applied in many different monitoring areas to obtain a clear idea of other
peoples’ advancements. Speciﬁcally these researchers monitored: other researchers and
institutions for patents, ﬁnding papers to investigate hypothesis on biological entities; and
integrating structured data with unstructured data to obtain a more complete picture of
positive and negative treatment effects (8). IBM developed new soﬁware BioTeks
especially intended for the biomedical community in which Medline, medical records,
and patents can be searched and analyzed through a system which identiﬁes biological
entities such as genes, proteins, drugs, et a] through a compiled biomedical dictionary (9).
The creators of this software also compiled a series of synonym lists that included
domains such as patents, bioinformatics and medical information (10). This was then
used to extract needed information from large databases.

A signiﬁcant business application of text mining is that it will allow organizations
to monitor competitors’ websites for changes and developments. Competing researchers
offer a similar circumstance, in which researchers around the world can investigate the
current research topics of many experts in a certain ﬁeld. Researchers mined ten college
websites from a potential undergraduate applicant’s perspective based on the information
about the university, vision statement, current news, and current research occurring (11).
The software package TextAnalyst was used. The resulting clusters were: facility/ school,
research/staff/ student, global, program/resource/tech/society and industry. With a
graphical display of the schools’ internal (faculty/staff/resources) vs. external
(research/student/industry) clusters it was possible to compare these universities to one
another based on a multitude of criteria (12). This would be useful to determine where,

and by whom similar projects in a particular ﬁeld are being undertaken.

The above papers incorporated the idea of pulling information from large
databases. As aforementioned, text mining can incorporate structured data with
unstructured data. If such data is available, it allows the researcher to gain more
perspectives and the research to become more reliable.

The following section of articles deals with the applications of text mining in
healthcare. Text mining was used to investigate doctors’ prescription habits and the
implications for their patients (13). A patient’s prescriptions were combined to form a
string, a sentence consisting of all prescriptions for one person. These strings were mined
to indicate a proper diagnosis based on medication intake. The software was applied to
pull strings that contained a treatment for diabetes (14). This process indicated that a
statistically signiﬁcant difference in the diagnosis of diabetes (as deﬁned by diagnosis
within the medical record) and persons receiving treatment. Not all patients who were
receiving treatment for diabetes had a diagnosis within his or her medical record (15).
Thus, the unidentiﬁed diabetes patients were at risk for more complications since their
medical record failed to contain a proper diagnosis. Contained within the same dataset
were implications in the prescription habits of antibacterial medications. Vancomycin
use is strongly advised for only ‘the treatment of serious infection with beta-lactam-
resistant organisms, or for treatment of infection in patients with life-threatening allergy
to beta-lactam antimicrobials.’ (16). The data was available to match Vancomycin
prescription to the white cell counts of the patients. Since many of the people receiving
Vancomycin prescriptions had white blood counts less than ten, the presiding physicians
were making improper choices in their drug distributions (17). This provided a clear

indication that prescription abuse occurred.

Yet another application of text mining helped to enhance hospital ranking
techniques. Creating prediction models on hospitals assumes uniform entry of secondary
International Classiﬁcation of Diseases Ninth Revision Clinical Modiﬁcation (ICD-9-
CM) codes from healthcare workers. This assumption is impossible to fulﬁll as a
multitude of different people enter this code throughout the different hospitals, and each
can have diverse training in such activities (18). Researchers looked at the ICD-9 codes
to see how hospitals assess a patient’s risk level (19). This is used for hospital rank
(prediction) that can lead to hospitals being under-ranked for under reporting risks. The
secondary ICD-9 codes for heart Diagnosis Related Group (DRG) were treated as text
and mined. These numbers provide an indication of a patient’s risk. Then different
hospitals were compared for differences in their risk factor identiﬁcation. There was a
signiﬁcant difference in the hospitals’ reporting of these risk factors (p<0.000l) (20).
The less number of risk factors reported, the lower quality rating the hospital receives.
This is because their patients will have a higher mortality because they will be improperly
described as low risk when they are high risk (21). These researchers concluded that
there is a signiﬁcant difference in reporting risk factor among hospitals (22). The low
ranking hospitals can then establish training programs for the risk factor reporting staff,
to more accurately reﬂect the hospital’s actual ranking.

Text mining also has relevance within the healthcare profession as a tool to assist
in patient diagnosis. Psychiatric diagnosis is often an uncertain process as many
disorders contain similar symptoms. A certain diagnosis in psychiatry is much harder
than in many other medical arenas, as most of the diagnostic information comes from the

patient (23). Researchers obtained data from patients’ medical records on the particular

symptoms they were experiencing, and social and behavioral issues among common
habits such as smoking, sexual interactions, and family concerns (24). A model was built
based on the four classes of psychiatric disorders from ICD-lO (organic, psychoactive
substances, schizophrenia, and affective mood disturbances) (25). Two hundred medical
records were classiﬁed by a psychiatrist into the above four categories. These medical
records were then clustered using text mining to obtain the most prominent ideas within
each group’s medical records. The resulting clusters became the training criteria for new
patients’ classiﬁcation (26). The software was able to diagnose the resulting dataset with
greater than eight percent accuracy based on post-analysis expert diagnosis (27). This
application would allow for quick preliminary diagnosis of patients.

Any area where there is recorded data in literary form is a prospective site for text
mining. For instance, the potential for disease surveillance through the monitoring of
nurse call centers is possible through applying text mining. In this paper, data was used
ﬁom the 1993 Milwaukee outbreak, which was the largest water-borne outbreak in the
United States (28). At this time, however, text mining software was unavailable, and a
manual process was employed. The symptoms of the call centers were investigated to
see if diarrhea-like symptoms increased during an outbreak of cryptosporidiosis and
whether these calls could have identiﬁed the increase in incidence earlier than other
surveillance methods (29). A four—fold increase in the standard deviation of calls with
symptoms of diarrhea was noticed April 2"", 1993 (30). The media reported the outbreak
April 6’“, and the department of public health released a statement on April 7th (31).

Thus, monitoring the call centers could have allowed the public health department to

more rapidly address the issue. This method was faster in indicating an outbreak than
other established methods, like physician/hospital reporting (32).

Another investigation using data ﬁom call centers in Milwaukee, Albuquerque,
and Boston sought whether seasonal variation existed for ﬂu-like symptoms. The call
centers reﬂected a seasonal increase of ﬂu-like symptoms for the winter months, in
particular, November and December, indicating a deﬁnite seasonal variation. AS it was
written in 1993, this paper mentioned that the United States and Canada has two hundred
forty call centers established. If data mining methods were employed in conjunction with
the nurse call centers, a national disease surveillance program could be efﬁciently created
and monitored.

These papers indicate the plethora of applications of text mining in many genres.
Many of these applications deal with investigations of large databases, as this is a lengthy
and needed process throughout any research topic. Text mining is particularly applicable
to the healthcare ﬁeld as the multitude of literature is often impossible to personally
investigate. Further, with the continued expansion of information on the intemet, it is
necessary to develop tools that help to consolidate the amount of information pertinent to
a researcher. In general, anything that contains unstructured data can be mined to

investigate its contents.

Hypothesis & Objectives

Since unstructured removes expectations of content, all drug reactions can be
captured. A scale set assesses particular manifestations of a disease. Without these
speciﬁc questions to illicit precise responses, the content of unstructured text has the
potential to be very diverse. By removing the response constraints previously
unavailable ideas can be investigated.

The speciﬁc objectives of this project were to determine if there were unexpected
outcomes within the physician’s notes of a clinical trial. Also, another objective was to
identify patients’ with language indicating a status change. Once these patients were
identiﬁed to investigate where there was a corresponding change in scales from the
previous evaluation.

This study is not limited to a speciﬁc epidemiological study design. By
implementing this methodology, a researcher can gain further insight into most

investigations.

Introduction

The disease diagnosis process consists of the integration of several aspects of
information. Besides pathogenic testing, the process includes using information obtained
from the patient; and in diseases in which a pathogen has yet to be identiﬁed, or a
physical cause is unknown, this becomes the only data available for diagnosis. These
data ‘long accepted as a productive partner in public health and evaluation research,
qualitative research methods have begun to proliferate in health services research, clinical
studies, health technology assessment and community-based intervention research. Such
methods are preferred in ﬁeld settings where the scope of work has yet to be determined,
the relevant questions have still to be precisely formulated, local understandings are in
ﬂux and institutional arrangements unsettled’ (33). Qualitative information is gathered
through interviews with the patient and quantitative measures (such as disease speciﬁc
scales) are employed for status examination. This becomes an issue, as the physician
must incorporate a conversion method to align a patient’s description into a numerical
assessment. Data are deﬁned as a numerical representation of a physical reality; whereas
text is a natural language that can convey any meaning (34). Since text is inherently
more diverse in its meaning, the potential for more diverse information is higher. The
probability of information becoming lost is increased during the conversion process from
verbal into numerical information. Furthermore, the perspective of the doctor becomes
the most critical part of the diagnosis process. His/her personal opinion of the patient
becomes integrated into the diagnosis, thus making it more of a subjective experience.

Text mining can be applied in instances where the diagnostic data is gathered and

10

processed though language placed in an electronic format to increase objectivity in
diagnosis.

A preliminary investigation of PubMed articles dealing with a well-known mental
disorder obtained two hundred eighty six articles in which ‘treatment’ was a top ten most
frequent word. The parameters of the search'criteria were: the disease name, a nine
hundred article limit, and by default the most recently published article would appear
ﬁrst. Text mining was used to cluster the nine hundred abstracts, which then allowed for
the investigation of word frequencies. Within the set of the two hundred eighty six
articles recovered, eighty four (29%) abstracts indicated a potential application of text
mining. This potential existed because the study included evaluation scales, and the
outcome was measured by the analysis of these scales. By incorporating text mining into
the study design these studies would not have to convert patients’ status to numbers.
Text mining offers a new potential for incorporation of the patient’s own words for
analysis of status.

Instead of incorporating the use of several scale sets it would be possible to
employ text mining in an investigation to obtain a more thorough assessment of the
patients’ status. By using physicians’ notes during a clinical trial, it would be possible to
statistically group the documents based on the ideas present. Once these groups are
established, they can be investigated for unexpected comments. These unexpected
comments would be ideas and concepts that would not be captured by a particular scale
or set of scales. A pharmaceutical company, for instance, could investigate several
potential secondary applications of a drug simultaneously. Also text mining of the

patients’ reactions could lead to early warning surveillance of potentially serious adverse

11

events. Scales do not capture side effects of drugs, as they are unexpected results. If an
unexpected beneﬁt resulted, the company could begin the steps to get approval for a
secondary use.

Aside from using this methodology to investigate unexpected results, it could also
be used to test the association between the scales and the concepts obtain throughout a
clinical trial. To do this, it would be possible to group concepts which indicate a change
in status of a patient, and then correlate these changes to the scales. If the notes indicate a
change in status, the scales data from that week would be investigated to see if a change
was captured by the scales data. Through interaction with experts in a clinical setting, an
established set of major concept domains could be created in connection to a disease.
The domains outlined by the experts could be expected manifestations of the disease and
potential ideas found in the clinicians notes. Text mining can be used to extract sentences
with these concept ideas and cluster them together. The clusters could then be labeled to
correspond with the domain outline. When such labeling is completed, the experts can
validate the ﬁndings by repeating the procedure based on their knowledge without use of
the software. If the results are compared to a non-expert who had training in the different
domains, complied with the labeling from text mining, a measure of accuracy can be

produced.

12

Methods

Data Processing

One physician’s note from one session often contained many different ideas and
observations at the time of the patient interaction. It is necessary that these ideas are
separated, as the ideas need to be distinct for clustering. The notes from one session
could contain many sentences. The software uses all words of a single document
together; therefore the deﬁnition of what should be considered a document becomes
important. Each sentence becomes a separate document, which was itemized by date,
and sequence number within the document. A program was used which recognizes
breaks in sentences, such as a period followed by a capital letter. Many sentences were
unrelated to clinical domains of interest and thus labeled as irrelevant due to impertinent
information. The data were broken up into visit and summary sets. Visit differed ﬁom
summary data as the physician reported current status after each visit; and summary was
reported on the last visit, and therefore reﬂected an overall status of the patient relative to
baseline. The complete dataset involved almost 3,500 patients. The summary dataset,
before sentences and ideas were broken up, contained over 4,000 records. After the
splitting, there were 9,931 records. The visit dataset had over 30,000 records before the
split and 71,587 after. The program separated sentences with differing ideas present
based on transition words such as ‘however,’ ‘but,’ and ‘although.’ When a sentence
contained these words the two ideas would be separated into two separate documents.

The physicians determined the evaluation scale set used in this analysis by
assigning a value per scale between one and ten based on observation of the physician.

The scale set used contained about two dozen speciﬁc measures. The individual scale

13

number became less important as analysis was conducted on the scale change between

visits. Thus, a change measure was created.

SAS Text Mining Bag of Words Methodology

The next step in text mining is the process of clustering sentences together. This
“bag of words” method begins with a text simpliﬁcation process in which language is
streamlined to recognize similar ideas (Figurel). In this process: nouns are converted to
singular, verbs to present tense, and time, place, and titles are classiﬁed in a standard
format. Parts of speech are tagged as: noun, verb, preposition, adjective, or adverb (35).
By analyng each document, in this case each sentence, a simple frequency count of
each word is tallied.

A stop, start, and synonym list can be applied at various times to highlight or
ignore certain ideas. A stop list is applied at the beginning of an investigation to discard
common words. These words do not assist in clustering, as they occur within each
document, and do not aid in differentiation. Examples of stop words are: “as,” “is,”
“the,” “and,” “of.” An optional start list restricts the words to be used in a clustering
process. Start lists allow the users to input the ideas or words that they are interested in
investigating. SAS clusters the documents containing these words from the start list.
When results are reviewed for completeness, adjustments to the start list and synonym list
can be made. A synonym list is used to reduce the number of terms in the documents by
eliminating redundancy of language. SAS has a default list with common words. For

example: teach, educate, instruct, train, etc. can all convey similar meaning. These lists

can be updated for speciﬁc domain investigations.

14

SAS converts each document to a vector with each word tallied for frequency. To
begin this process, the synonym list is applied to merge the columns of synonyms. The
user then applies the stop list or start list. The stop list eliminates all words on the stop
list from the document; the start list eliminates all words except the words on the start
list. The documents are compared for word frequencies within a matrix of documents
and vocabulary. A single value decomposition (SVD) is performed on the (documents
x words) matrix. The SVD process looks for combinations of words that provide the
greatest variation among the documents (this is similar to the principal component
analysis of statistical analysis to ﬁnd the most signiﬁcant factors involved in explaining
variation among observations). “Single-value decomposition allows the arrangement of
the space to reﬂect the major associative patterns in the data, and ignore the smaller, less
important inﬂuences. As a result, terms that did not actually appear in a document may
still end up close to the document, if that is consistent with the major patterns of
association in the data”(36). The result of this analysis is a set of clusters that form from
the similarity of the texts (37). The user can select the number of word combinations
(derived concepts) that he/she will continue to use in the clustering process.

There are a number of optional clustering techniques in SAS EM. All of these
options are trying to ﬁnd document vectors that are similar. The basic process involves
aligning the topics of the documents (through the synonym list and stop list) and applying
weights to the documents to illustrate how well they actually align. A Chi-square test is

preformed on each cluster to determine the similarity of the documents in the cluster.

15

The output of the clustering process assigns each document an identifying label
indicating to which cluster it belongs, the series of most frequent words or phrases within

the cluster, and chi-square measure of consistency of the documents in the cluster.

Conﬁdentiality of Data Source

Since the data were obtained through a pharmaceutical company, the degree of
sensitivity and conﬁdentiality increases. A prominent drug company that is legally
bound by HIPPA did the clinical trials; therefore, the data provided was obtained in an
ethical manner. This work was done by a third party research organization under a
conﬁdential contract to a major pharmaceutical company for the purpose of better
understanding how text mining could be used to gain additional value from the
unstructured data found in clinical trials. No mention of drug, disease, company or
improvement or worsening derivatives will be used to describe this project. The purpose

of this project is to highlight the numerous potential epidemiological applications.

Text Mining Process

Figure 2 illustrates the process used for text mining physician’s notes. The
process begins with the domain experts identifying the basic concepts important to the
domain in a taxonomy determined appropriate for the drug being studied. Each concept
has an associated value, or was assigned “neutr ” (No change in patient status). An
example of this would be: “Patient communicates well.” Although this statement is not
necessarily a change, it could be inferred that the patient may have been less
communicative in the past. In this dataset, the experts outlined ﬁve speciﬁc domains of

interest and an ‘other’ concept classiﬁcation (38).

16

There were two classiﬁcations of documents for the investigation. The ﬁrst was a
set of documents ﬁ'om each visit a patient had with a physician (visit). The second
consisted of the ﬁnal visit of a patient for the end of the study (summary). This analysis
technique was applied to each set of documents. The visits were sequentially ordered so
a time line of visits could be constructed for each patient, and later combined with scale
data.

This methodology used SAS synonym processing to combine Similar ideas under
one taxonomy element as synonyms. A preliminary understanding of the concepts within
the documents is required to relate the potential ideas of the documents to the taxonomy.
If the physician describes actions that the domain experts highlight within the domain
concepts, these can be added to the synonym list under each speciﬁc domain and value.
The start list contains value indicators of domain concepts. An example would be: “feels
better” and “interacts well” under whatever domain would contain these ideas. Text
mining would then be employed to cluster the documents based on these term
frequencies, while incorporating the speciﬁed start/stop lists to obtain clusters.

Once clusters were obtained through the text mining process, they were then
reviewed by the domain experts for consistency. Ideas that did not ﬁt into the three
outlined domains were labeled as “uncategorized by domains.” Uncategorized results
were investigated. The ideas found in the documents were expected ideas, but were not
captured by the scales or domains. These ideas were issues that a doctor would want to
document but are typical throughout any epidemiological study. An example of an idea
that could be present would be weight issues. This reaction would not usually be part of

a scale assessment but could be monitored by the physician.

l7

The results of the clustering were reported and the largest percentage of the
documents (sentences) appeared in ‘irrelevant’ clusters. Quality tests were also
performed by comparing the results ﬁom samples done by domain experts and non-

experts in the ﬁeld with the text mining classiﬁcation results.

Iterative Clustering

More speciﬁcally, the iterative clustering (IC) technique was used to pull concepts
from the collection. This is done through a constant evaluation of clusters of concepts
obtained through text mining (Figures 3 & 4). First the documents are processed using
mostly default settings to see what ideas are present at the highest frequency. The easiest
clusters to ﬁnd are “irrelevant” since these are mostly symbols (such as ellipsis, dates,
page numbers, etc) and trivial statements such as scheduling issues. The resulting
clusters can easily be labeled as such and put aside since they offer little insight into
patients’ status.

The “interesting” clusters can be scanned for words that might warrant further
investigation. In this project, domain experts highlighted ideas that they thought would
be present in the text. Domain experts are people who are leaders in the ﬁeld of study.
These people were currently working on the drug and disease of study. The domain
experts outlined three domains that they expected comments to be found. All sentences
that had not yet been assigned to a concept were processed with a stop list. The resulting
clusters were reviewed against the taxonomy. One element of the taxonomy was chosen,
and a start list generated to represent that concept. The documents were re-clustered with
the start list, and the synonym list was reviewed for completeness. The documents

selected by the start list were extracted and clustered with the stop list. (This allows all

18

the terms of the sentences to be used in the next step.) Finally, the appropriate resulting
clusters were labeled to the corresponding taxonomy element.

The cyclical nature of this technique was observed as a stop list was applied to the
leftover documents. The clustered product was examined for further interesting ideas,
which were then compiled in a start list. After the start list was applied to the documents,
the resulting documents were extracted, and the stop list applied to form clusters. The
“interesting” clusters were then labeled with the appropriate concept. This process was
repeated with the “interesting” documents until no further interesting clusters resulted.

There are residual documents that fail to align to the taxonomy outlined by the

domain experts. These residuals contain uncategorized results of potentially high value.

Measure of consistency of classiﬁcation

The ICM is a process involving clustering and review to classify parsed sentences
by doctors about patient status into one of ﬁve concepts (four domains and “Other”) and
the value assignment of each domain or “neutr .”

It is important to know the accuracy of such a classiﬁcation system so that the
cost of various methods can be compared to the knowledge obtained from the
classiﬁcation. However, the deﬁnition of accuracy is problematic. The results obtained
from ICM were compared against several groups of humans. These people took a much
longer time to obtain their results, compared to the time investment of the ICM.

The classiﬁcation of physicians’ comments resulting from the ICM was tested
against two groups of professionals: one group consisted of four non-subject domain

experts (non-experts), and the other of two domain experts in the ﬁeld.

19

A stratiﬁed subset of sentences was randomly selected from the set classiﬁed by
the ICM. After a short training period, the two groups of people (non-experts and
experts) were given the subset and asked to classify each sentence with a concept and

value assignment.

20

Results

Overall results

Overall visit contained almost 2,000 records of over six hundred individuals that
had concepts with a value 1 assignment. There were 1,000 records of over six hundred
individuals that indicated a value 2 assignment. These were analyzed per protocol since
some protocols would contain more of one value assignment. As any large-scale clinical
trial would have, the types of studies included were double blind, relapse prevention, and
Open label. Since patients could be enrolled in several types of studies over time and
have concepts in different domains there is some overlap with the categories. The double
blind study visit dataset contained 11,826 comments relating to almost 1,500 patients.
There were a total of two hundred ﬁfty seven value 1 assignment comments relating to
over one hundred patients, one hundred thirty nine value 2 assignment comments relating
to over one hundred patients and 11,427 other comments relating to almost 1,500
patients. The breakdown of value 1 assignment of all domains of documents in the
“visit” double blind dataset was as follows: one (domain 1), twenty-one (domain 2),
thirteen (domain 3), one hundred six (general), and one hundred seventeen
(uncategorized by the domains). The “visit” double blind dataset value 2 assignment
contained ten (domain 1), eleven (domain 2), twenty-two (domain 3), ﬁfty-eight (general)
and thirty eight (uncategorized by the domains) documents. The double blind study
“summary” dataset contained 1,554 documents relating to over six hundred ﬁfty patients.
There were a total of twenty three value 1 assignment comments relating to about twenty
patients, thirty one value 2 assignment comments relating to about thirty patients and

1,500 other comments relating to over six hundred ﬁfty patients. The “summary” double

21

blind value 1 assignment dataset contained one (domain 1), four (domain 2), one (domain
3) and seventeen (uncategorized by the domains) documents. The “summary” double
blind value 2 assignment dataset contained seven (domain 1), ten (domain 2), nine
(domain 3), one (general) and four (uncategorized by the domains).

The open label protocols were broken down similarly. In the “visit” dataset there
were 56,140 comments for over 2,500 patients. Of those, 1,384 comments were value 1
assignment of over ﬁve hundred patients, Six hundred ﬁfty ﬁve comments were value 2
assignments of over four hundred patients and 54,099 comments of over 2,500 patients
were considered neither value assignment. The breakdown of the value assignments
within each domain of the “visit” dataset was: seven hundred twenty one comments value
1 assignment and two hundred seventy one value 2 assignment (uncategorized by the
domains), thirty eight value 1 assignment and forty seven value 2 assignment (domain
one), one hundred ninety seven value 1 assignment and sixty six value 2 assignment
(domain two), one hundred eighty three value 1 assignment and one hundred thirty nine
value 2 assignment (domain three), and two hundred forty seven value 1 assignment and
ﬁfty eight value 2 assignment “other” comments. The summary dataset had 4,818
comments pertaining to over 1,500 patients in the open label protocol. These were
divided into one hundred nine value 1 assignment comments for almost ninety patients,
eighty seven value 2 assignment comments for almost seventy patients and 4,622 “other”
comments for over 1,500 patients. More speciﬁcally, there were ﬁfty six value 1
assignment, and nineteen value 2 assignment (uncategorized by domains) comments,
three value 1 assignment and nineteen value 2 assignment (domain one) comments,

thirteen value 1 assignment and seventeen value 2 assignment (domain two) comments,

22

eight value 1 assignment and twenty seven value 2 assignment (domain three) comments
and twenty nine value 1 assignment and ﬁve value 2 assignment (other).

The Relapse Prevention protocols were categorized the same way. Within the
“visit” dataset there were 3,126 comments for almost four hundred persons. This broke
down to one hundred twenty nine value 1 assignment comments of almost ﬁfty patients,
one hundred twenty eight value 2 assignment comments of almost eighty patients and
2,869 “other” comments of almost four hundred patients. The numbers per domain were
too small to offer any particular insights. About eighty percent of the uncategorized by
the domains were value 1 assignment, over eighty percent of domain one was value 2
assignment, seventy percent of domain two were value 1 assignment, over sixty percent
of domain three were value 2 assignment, and just under sixty percent of the general
comments were value 2 assignment. The summary dataset had 1,078 comments for about
three hundred ﬁfty patients. There were two value 1 assignment comments for two
patients, seventeen value 2 assignment comments for sixteen patients and 1,059 other
comments for three hundred ﬁfty three patients. The three domains contained only value
2 assignment comments and the “uncategorized by domain” had seventy percent value 2

assignment comments.

Relating scales to text

The scales used in the clinical trail had questions that could capture any of the
three value assessments: value 1, value 2, or neutral. Overall for the “visit” dataset
64.6% of the comments were value 1 assignment, and 59.7% were value 2 assignment
comments. The sum is over one hundred percent, as a patient can have both value 1

assignment and value 2 assignment comments. An example of this was analyzed within

23

the “summary” dataset. There were a total of one hundred seventy four patients who
were within the value 1 assignment cluster and a total of one hundred sixty seven in the
value 2 assignment cluster. Nineteen patients were found in both clusters. The change in
patient status was calculated as change from the previous visit. If the value was
decreasing then it was considered a value 1 assignment. If the change was increasing it
was considered a value 2 assignment. Each visit considered decreasing in scales was
grouped as to whether the textual value was value 1 assignment. Each patient visit that
had a value 1 assignment in both scale and concept were added and the sum of four
hundred thirty two such patient visits was -6,312. The set of value 1 assignment scored
patient visits when summed over the value 2 assignment comments (two hundred seventy
patient-visits) was -1960. This was repeated for value 2 assignment scale patient-visits.
The value 2 assignment patient-visit (of two hundred ﬁfty two instances) to value 1
assignment comments was calculated as 2008. The value 2 assignment patient-visit (of
three hundred twelve instances) to value 2 assignment comments was calculated as 3,007.
The overall summation of value 1 assignment comments to all scale changes was -4,304.
The overall summation of value 2 assignment comments to all scale changes was 1,048
(Table 5). Such analysis was performed on the “summary” dataset with the follow results
(Table 5). Each person with a speciﬁc domain concept identiﬁed within their visit was
pulled and their overall scale changes were calculated and added through the patient set.
Since there were three outlined domains and an “uncategorized by domain” category, this
was calculated eight times within both the “visit” and “summary” datasets. The domain
value (value 1 assignment or value 2 assignment) was identiﬁed and compared to the

scale change (Table 6, 7). The strongest correlation was the value 1 assignment

24

comments to a decrease in scales. The correlation to value 2 assignment scales was less
strong. The instances where the comment value assignment agreed with the scales were
expected. The disagreement became the “interesting” instances and the number of
sentences was a readable amount for the researchers to manually investigate. Without the

ICM it would have been extremely labor-intensive to manually read the text dataset.

Measure of Classiﬁcation Results

The taxonomy of the concepts was not precisely outlined by the experts. This
was evident in the domains obtained from the experts that contained overlap in ideas.
The value consideration of a doctor comment is likely to be the most signiﬁcant feature
in the classiﬁcation interpretation. Thus, the most critical test of a classiﬁcation system is
whether the value assignment (value 1 assignment, value 2 assignment, or neutral)
matches human interpretation. “Researchers in many ﬁelds have become increasingly
aware of the observer (rater or interviewer) as an important source of measurement error.
Consequently, reliability studies are conducted in experimental or survey situations to
assess the level of observer variability in the measurement procedures to be used in data

acquisition” (3 9).

There are three levels of detail in the classiﬁcation analysis:
1. Most detailed — both the concept domain and value (value 1 assignment, value 2
assignment, neutral) assignment had to be the same.
2. Concept level — classiﬁcation yield the same concept domain designation, but the

classiﬁers might disagree on the value.

25

3. Value judgment — the value assignment was the only consideration - assignment

to different concepts is not tested in this set.

If the non-experts agreed on the classiﬁcation, the classiﬁcation analysis was labeled
as “SAME.” Since the non-experts were made up of four people, if a majority labeled
either a concept or a value the classiﬁcation was labeled as “VOTE.” There were two
experts in the group therefore only “SAME” could be used.

Thus there were nine subsets:

l. non-expert “SAME” (concept, value, both)

2. non-expert “VOTE” (concept, value, both)

3. Expert “SAME” (concept, value, both)

Each group was presented with a stratiﬁed and balanced random selection of
sentences from the study. By design, there were twelve sentences that had been selected
from each speciﬁc concept domain and value (value 1 assignment or value 2 assignment).
There were another ninety six sentences selected from the “other” category with no
presumption of value. The results, as deﬁned for each of the nine sets of results were

compared with the results of the ICM classiﬁcation.

The calculation of the term, It, was used to measure the similarity between two
classiﬁcation results. “Kappa is intended to give the reader a quantitative measure of the
magnitude of agreement between observers” (40). Kappa traditionally takes into account the by-

chance agreement of two observers. The formula used in this analysis did not. The formula
used was: K = (Number of sentences classiﬁed the same between the two sets)/(average

number classiﬁed by each method).

26

Value Assignments

All four non-experts classiﬁed the same twenty eight sentences as value 2
assignment, thirty three sentences value 1 assignment, and ﬁfty eight neutral. The
numbers increased when the “VOTE” methodology was employed. At least three out of
four non-experts classiﬁed forty nine sentences as value 2 assignment, forty eight were
classiﬁed as value 1 assignment, and seventy six as neutral. When the ICM results were
compared to the non-experts, forty ﬁve sentences were classiﬁed as value 2 assignment
both by the “VOTE” and the ICM. Both the “VOTE” and ICM classiﬁed forty four
sentences as value 1 assignment. Likewise, seventy ﬁve sentences were classiﬁed as

neutral by both methods. (Table 1)

Concept Domain Assignments

The four non-experts had an average comparison in the concept agreement of
40% (Table 2) with ICM. At least three of the four non-experts and ICM agreed on an
average of sixty percent. The experts and ICM matched domains with ﬁfty percent

agreement.

Complete Agreement

The rate of agreement signiﬁcantly decreased when the exact matches were
examined (Table 3 and Table 4). The non-experts agreed on average twenty three
percent with ICM. The majority agreed twice as often with an average of ﬁfty nine
percent with ICM. Both experts agreed with ICM an average of thirty ﬁve percent. Two

value domains received no agreement between any of the three analysis groups.

27

Out of one hundred ninety two records, all four non-experts agreed with ICM
classiﬁcation 61 incidences which resulted in a R540“ = 0.32. A “VOTE” technique
labeled the domains and values as ICM had one hundred nine times indicating a KMCM =
0.58. Overall, the non-experts unanimously agreed amongst themselves sixty-six of one
hundred ninety two times (KS = 0.34). As Shown in Table 4, the two domain experts
agreed with each other at a forty nine percent frequency and with a forty six percent rate
with ICM. The total agreement and vote scoring of the non-experts resulted in a higher

rate of matches than the experts.

28

Discussion

Relating scales to text

There were instances of value 1 assignment in scales where the comments
highlighted value 2 assignment comments. This could be due to a speciﬁc value 1
assignment in the patient; however, the doctOr commented on different value 2

assignment aspects.

Measure of Classiﬁcation

It is important to recall that the domains outlined by the experts contained
signiﬁcant overlap. ICM dealt with the overlap by selecting speciﬁc domains for each of
the overlapping ideas. These were conveyed to the non-experts through use of a small
PowerPoint presentation that illustrated the overlying ideas within each domain. This
presentation was not delivered in person, and therefore, no questions were answered prior
to assignment. The presentation was not given to the experts. This could be why the two
experts agreed less than the “VOTE” of the four non-experts. All four non-experts and
the experts unanimously agreed within each group comparison; however, a four-person

agreement is harder to obtain than a two-person agreement.

Value Assignment

It is interesting that the value assignment classiﬁcation of the two experts agree at
about the same rate as the non-experts agree to the “VOTE” level, which similar to the
ICM results. In the value agreement, the majority of the non-experts and experts agreed
with ICM above 90% (Table 1). The four non-experts agreed at roughly 80%. As

mentioned earlier, the perception of whether a patient is improving or not is the most

29

critical factor. The best treatment for the patient relies within this assignment, and not

necessarily whether the domain manifestation of the disease is identiﬁed “correctly.”

Concept Domain Assignment

The concept agreement between the non-experts and experts with ICM indicated
that the non-experts agreed more often (“VOTE”) than the experts and ICM. Although
the experts agreed at a ﬁfty percent level, the non-experts and ICM agreed at a sixty

percent level. This could be due to the experts’ differing personal backgrounds.

Perfect Agreement

Perfect agreement between the value and concept domain was less than the other
analysis, which was to be expected. The non-experts still reported a majority agreement
with ICM above ﬁfty percent. Since the ntunber of possible choices for each concept was
larger, this is a strong agreement. Each assignment had ﬁfteen different possible
combinations when both agreements were investigated. If this process was implemented
on a large scale, it might be beneﬁcial to employ non-experts to undergo the analysis
since the objectivity may be higher. Since the domains were not explained to the experts
as they were outlined to the non-experts, this may have caused issues with the
overlapping ideas present. Also the expert agreement of fewer than ﬁfty percent
indicates that this methodology would help to decrease subjectivity in classiﬁcation. The
establishment of this technique applied to medical treatments and diagnosis would allow
for innovative changes in the profession. First, this new technique offers a different
means to obtain a conclusion regarding the status of a patient that uses the exact wording

of the person seeking help. This could help to group many patients at the same time to

30

obtain preliminary diagnosis. Another application could be to streamline training of
physicians to obtain clearer manifestation domains within a disease. Pharmaceutical
companies can use the reactions of the patients as a warning system for dangerous drugs,
seek other uses for a particular drug, or observe a particular subset of people who may
react more favorably to a certain drug. Usually, doctors’ comments are discouraged in
clinical trial settings. If doctors were encouraged to write comments complete data could

be compiled to allow for more insight of the company.

31

Strengths and Weaknesses

A signiﬁcant strength of this study is the implementation of a computer into
clinical trial assessment. This decreases the amount of subjectivity within the company’s
measurement, and the process is easily repeatable with the given data. AS illustrated
earlier, the experts’ classiﬁcation of the domain values averaged around ﬁfty percent.
This indicates a strong difference of opinion between even the most involved and
knowledgeable physicians. If the experts in the area disagree at that level, then the
intervention of computers could make the process more objective.

The major weakness of this paper is that the actual data cannot be issued to the
public. The numbers presented represent the actual numbers, the domains are consistent
between tables, but the words analyzed are not shown. Albeit a weakness, the topic
presents a new way to utilize all data collected during a clinical trial. The analysis
performed validates this as a new method for pharmaceutical assessment. The purpose of
this study was to highlight this technology for use in a clinical trial.

Another weakness of this paper is that the documentation used was from the
physician. If the patients’ actual words had been used, it may have decreased
subjectivity. This would only be true in disorders in which the patients would be able to
verbalize their thoughts and feelings. In many other cases, the physician would have to
offer the interpretation of the patient’s status, as they may not be able to communicate.
Furthermore, the physicians were discouraged from writing comments altogether. If
encouraged, however, a wider variety, and a more complete picture of a patient’s status

could result.

32

Conclusions & Future Applications

Since some disease diagnosis and assessments do not have a biological test, this
process could provide a statistical process to align the diagnosis process. As indicated
through the comparison of the experts, there is not always agreement. This could be
signiﬁcant if each of the different domains contained different diagnoses. One would be
worried if two physicians consulted over their condition and had such a low agreement.

Further, this process could be used for obtaining more information during a
clinical trial. The additional cost of incorporating this methodology to a clinical trial
would predominantly be software implementation or contracting, as was done in this
project. The collection of data could be expanded to family members, as well as the
patient, in order to encompass the overall status of the patient more completely. The
domain experts involved in these studies often do not have the technical expertise to
operate these fairly complex software packages, since they are usually focused in a
particular research area (41). This process illustrates the nexus of technology and expert
knowledge; and allows the software to be utilized to its full potential, while
simultaneously decreasing subjectivity.

Nonetheless, as the article on nurse call centers implied, a national nurse-call
hotline, in conjunction with this technological process, could allow for symptoms of
diseases to be monitored, and an outbreak discovered more quickly. With regard to the
recent pandemic scares, the increased efficiency and ability to monitor national

symptoms for an outbreak could be very beneﬁcial indeed.

33

Tables and Figures

34

 

as n 9313 5.2.

as u area 5.?

33 u Gait ER

mtOQXOK

 

 

 

 

 

 

 

 

 

 

 

 

 

 

9. me E .20. a 099., seem. 50m

9. we I. 09m< mtoaxm 50m

mm no mm N toqxo

em 3 8 a 598

9. we 8 .20.
So n @1353. as u 9.13 5.3. :3 n 6293 aims 22s

vv mv mp 5.2 w tquméo: 90>

we mv on toaxoéoc 20>
and n @193 5?. Ed I GTE 5.3 m; n @533 5% was:

mm mm mm .20. w toaxwéo: OEmm

mm mm mm toaxfco: mEmm
EOEchmm F O:_m> EoEcgmmw N O:_m> EEO: 55330

 

 

 

 

Amawa EOEEoo Co on

eomueommmmﬂo DES, 85an mo mnemcdmaoo .H 2an

35

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

”.3 N3 N2 8... ”Na ease
I 39$. NE u .4?an Na. u A. 74% Na. u EN. .9. u 3+3 N...
.20. w 85¢.
2. 2 S P v 9.09m. 58
099.,
E. .N S N m Seem. £8
2 S S m 2 N Sen.
8 5. N N on F team
8 N .N «N N .20.
4:. :3 G3 .8 :8 22s.
I ARGO 58 n GENO NE u $.48 .92 u 843 NS n 8748 \NE
.20. w
8 t or m N 53982 08>
E 8 2 m 2 $9982 so>
8... one ”no 2... N... were.
u GTNGV \Ner u 9N3: \NIN. u 8+.ch RE I 94$ \NIN n 3+8 5...
.20. w
Nm NF 0 N v toaxm-coz 953
No 2 w v o ewaxméoz 9:8
.o mEmEou .3 EmEo EmEo 59:0
50 BNESQSS m . o N . o F . o

 

$3280 >880er 8 383:8 we eosmowmmmﬂo mo 2838800 .N 2%...

36

 

vm.ou.._.c._8.8 Two"; .88. Noon; 88 .90;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mNd de VNd Samson Emucmﬁ
oNd 8.0 «NA. O. Omm.o><
mod om Ed on mmd on $50 9.02
8.0 N Nwd m 3.0 F m EmEoQ
EmEcm_mmm N o:_m>
Fmo o oﬁo w 5.6 v m EmEoo
.cmEcm_mmm F o:_m>
ood - mmd m cod - 35.09.85
Emecmﬁmm N o:_m>
mFd F 5.0 o mFd F $533.85
.coecmﬁmm F o:_m>
NN.o m 8.0 m Rd N N EmEoQ
EOEchmm N o:_m>
de F cod - cod - N 59:00
EoEcmﬁmm F m:_m>
ood - cod - oo.o - F EmEoo
.coE:m_mmm N o:_m>
Fmd v 5.0 .c. mmd n F 59:00
EOEcmﬁmm F o:_m>
.23. wmwmum a. 5.9 Q 20.
E90580 m_2<m 2205000 w MFO> Eﬂoﬁooo a ms.<m

 

 

 

 

 

 

 

 

$23805 mecca use 29 05 ﬁg 9.on tomxméoz

2.: c3 33> was maoeoxmh 8 833:3 mo :oumommmmﬁo we 3859800 .m 2an

37

Foo n N32: I SEE? 2.. Bees: eases 5 n a.
”BONES u 8.52% 05 Bags 8835... 5 u a

and H w \ v n Amtoaxoéo: 05 .«o .23 .3 8m £5 E
BEEN—o 388:3 me 6:98: :39. \ Amtomxoéo: .3 .3 35320 888:8 me 59:35 .1. 5:06.88 “898:0: me 2:82: I 8.x

”E... u are R ...N u 33:35 32 care u .95. .> $28 a + GE? s 328 5 .98., 3 age a .N n3.

:3 u 5+9 3 .N u Enema 32 saw: u 56: a 3.28 a + 52% a $28 5 \.%m a £28 a .N "a.

38

“3.0 n NERO u Nogmtoaxo 30:3 088 05 33333 833.03 a u €35:
63° N mataw n maSZHOH 05 @3333 83883 a u so?

53 u 313 3N u 32 .328 a + 325 3.28 5 \ 32 mag EN u w.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3.: 3.... 2
ma aw 389
NM... :2333 @3336
mm... owﬁo><
mud wm mm 550 302
to m m m :3an 333333 N 33>
5o w n m .3800 303me M 33>
wuNtowBaoab
5o 2 w :3an 33:37.3 N 33>
wontowoumocb
end a w 53an 303333 _ 33>
m _ .o 3 _ m :3an 3083me N 33>
c ﬂ o N 33:89 308333 _ 33>
o o o _ =3EoD 323%me N 33>
5.6 w v _ :38an Eoﬁcwmmma _ 33>
3&3 $5.3m
.9 3&me -23
303530 53m 7:9.

 

 

20H 93 3.8358 8 3on “.5qu

0% E 33> 23 :3EoQ 8 $3033 mo comwmommmmﬁo «0 3833800 .v 23%

39

£23538 86:6. 98:5: 2.

 

Noo _.-

kmmN-

Each

 

68 6.89 8

3 6.89 N

N E68568 m:_m>
6.00m 28w Eek

 

 

63:- 6.89 5

 

EEK- 6.89 8

 

P EmEcgmmm m:_m>
6.00m 28m :33

 

$06. :3 8 9.6558
N #58563 m:_m>

£3.68 «8 626258
w :68:ng m:_m>

>m<EEDw

 

wwwvor

modem?

.Soh

 

$386 6.89 NF,

5.88 6.86:3

N EmEcm_mmm o:_m>
Eoow 28m .98.

 

 

$582- 6.89 EN

 

an F :6- 6.89 N9

 

 

3.33 :8 62.6558
N EmEanmm m:_m>

3°39 mam 626558
r EmEcm_mmm 02m)

r E68363 w:_m>
Soow 28m :38
._._m_>

358800 N 368%me 623/ 25 ﬂ Eoﬁcmima oBm> 1809 .m> mwiﬁm 28m 130,—. ”m 2an

40

E_m_>-Eo=ma ho .mnEzc \ mmcmco 0.8m m_ 69.65:.

mgﬂicmzmq 86:0. 9383:...

 

 

 

 

 

 

 

 

 

 

 

VON ”www.mzwv n AvéN- ”maﬁgmv mm 0.8m 2QO =30...
38.8. on $.03 8 6.9.60

AN? 60.998 m $.mN- ”60.6.93 m Boom mﬁow =38.
€8,di 8 38,6: 6 m 5658

as ”omm.m>mv m 3.5- ”.3993 3 98w 6.8m Eek
3°39 vm §m68 2 N 52.8

Gd ”096.93 m 3m 7696.658 v 6.8m. 2me Each
§N68 mm @066: v r 58.8
meEEoo 958600

N EoEchwm m:_m> F E95936 m:_m>

 

 

 

8.50m 23m oczommm .> Saddam 88m @5330 mo 2an 536885 6 2an

41

mums-Egan we .6nE3c \ 69.65 9me m_ 69.65:;

6:63-6:2qu 86:6. 26983::

 

CNF ”6m6.6>6v mm

3. F F- ”63.623 m3

658 6_6om Ego-r

 

€06.68 5

@038 SN

3.9.60

 

33 ”696.93 VNF

3.0 T ”6mm.6>6v 8N

6.06m 6_6om Each

 

386$ mom

3&68 8v

mE6EEoo 66.3.53:

 

333 696.623 mm

Cum- ”696.623 66

6.8m 6_6ow ﬁne-r

 

 

 

$.39 t; @039 h; 6 59:8
3.6 “60.6.93 9. 3.6- ”6m6.6>6v mow 6.8m 6_6ow _Eo-_.
€8.69 E @0me m: N 5658

 

3.9 H6m6.6>6v 5

33-69298 5

6.06m 6_6ow =38-

 

§... 8 8

.569 8

F EmEoo

 

 

 

6E6EEoo
E6Ecm_mm< N 6:_6>

 

3:66:50
E6Ecm_6m< P 6:_6>

 

 

mouoom a. 36250 55 .«o nova—EEOU u N. 63$-

42

2653.0 8 66:5: 33:3 :m_6m<

 

 

 

 

 

 

 

 

 

 

 

 

 

.2620 :ouﬁonEoog EMWHHE
j 6:_m> .2355...» 3 5.8.
_ a
.6... 6.36326 - 6:585:6-

 

 

 

 

 

_

 

 

 

 

 

 

cos—35666.0
66.30 2.6
.3625 *0 tan.

 

 

 

Al?

56.6 .ll

 

 

 

626m

 

 

966600.: :8:- m<w

mmoooi 9:52 usu- m<m J gamma

43

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

moiom .m> 6x6...
366... 3:630 6.3500 :1 w6_muw 6N>_6:<
a 26.3.
a g
_
8.6an
62:66”. 3636365 Equo 3
toga .5.— xoo-_ _ 636_>6m

 

ﬂ

44

 

 

 

EozmoEmmﬂo
a 9:86.29 - .iu-iL-d :8...
65...: as»

525-30
>Eo:oxm._.

 

 

 

 

 

 

 

 

 

 

wmmooi m:_:__>_ :8:-

buoaoxmh mo :ouﬁwouﬁ 5:5 3236.6 mmoooi wing cau- UN oSwE

 

 

    

      

 

 

 

 

 

 

  
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  
      

    

 

 

 

 

 

A 6mm: 6: . _
“x O 6 660:6“:66
memwuﬁ 26:00:36 6 06:
SP: 6.04.. :0 6566985" “muwnmug
0:66:00 6660:: : . _0
. ﬂ
«6: .6000 :0 “66
aoum 2:666“. 5:... 0.566925 06: .6: ~55 Ea...— 6E0;
06:96.62: :6 .8 «6.. tﬂw 05:66:85 l 53 “60:66:66
u. a - . 5.; .9620 .8: .6665."
.0 66:00 .6757. :56 866.0 a a 66: o 0 “ombxm
660:8:66 .6000 66..
93.50366 a:6>6_6.:_ :05 2.3666“. 0: 6:60:00 no
:0 “tux—26.5" 5.. «6: 5:5 .3620 I .666 d Jan.
6:66:00 6660:: tmum 266.0 . 5:5 moon :8...

    

 

9:60.620 636.6:

3.36 25 8 8=&< osgﬁooe 0: ”m 656E

45

   

          

 
 

.66o:6:U66
0256663.. *0 66.6888
.0 68:65:66 563

5:60:00 60 666:6:66
.96 5:866:36 06:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LL LL
\56IL66I .
..o_.2:66+~_6.>+:.6_. n. 626:6: 662..
6. 5...5. . .2 63:65:66
02.... 60:6.:U6m :3. =6 506505

 

 

 

 

 

 

 

_ L

 

 

 

     
    

  

 

 

 

7
5:60:06 .65 63:65:66
06:9666 66:65:66 6E5 56:96.62:
5:6 5:5 65:656.". .1 h6>o >636 .5 :o 650-:-
5° 56: «cabxm a 66:00 a “6266 5:66:00 6566::

 

 

 

 

 

606: 6:o_>6.n_

m:_o:.6506m

836 6.5 8 8.3% 68.5.85 0: 6 6.36:

46

Endnotes

 

10.
11.

12.
13.

14.

15.

16.

17.

18.

SAS Enterprise Miner 5.1 documentation, 2004.

R. Mack et a1. “Text Analytics for Life Science Using the Unstructured Information
Management Architecture,” IBM Systems J oum3_al_ 43, no. 3 (2004): 490-515.

N. Uramoto et al. “A Text-Mining System for Knowledge Discovery from
Biomedical Documents,” IBM Systems Journal 43, no. 3 (2004): 516-533.

Uramoto, 519.
Uramoto, 521.
Uramoto, 522.
Uramoto, 522.
Mack 491.
Mack, 491.
Mack 497.

Elaine Leong, Michael Ewing, and Leyland Pitt, “Analyzing Competitors’ Online
Persuasive Themes with Text Mining,” Marketinglntelligence and Planning 22 no. 2
(2004): 187-200.

Leong 193.

Patricia Cerrito, “Solutions to the Investigation of Healthcare Outcomes in
Relationship to Healthcare Practice,” SUGI 29 Conference Proceeding; Montreal
May 9-12, 2004.

Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to
Healthcare Practice,”

Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to
Healthcare Practice,”

Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to
Healthcare Practice,”

Cerrito, “Solutions to the Investigation of Healthcare Outcomes in Relationship to
Healthcare Practice,”

Patricia Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of
Hospital Quality Rankings,” Health Management Technology, March 2004.

47

 

19.

20.

21.

22.

23.

24.
25.
26.
27.
28.

29.

30.

31.

32.

33.

34.

35.

Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital
Quality Rankings,”

Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital
Quality Rankings,”

Cerrito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital
Quality Rankings,”

Cenito. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of Hospital
Quality Rankings,”

Stanley Loh, Jose Oliveira, and Mauricio Grameiro, “Knowledge Discovery in Text
for Constructing Decision Support Systems,” Applied Intelligence 18 (2003):357-
366

Loh, 3S9.
Loh, 361.
Loh, 364.
Loh, 364.

Jane Rodman, Frost Floyd, and Walter Jakubowski, “Using Nurse Hot Line Calls for
Disease Surveillance,” Emerging Infectious Diseases 4 no. 2 (1998): 329-332.

Rodman, 329.
Rodman, 330.
Rodman, 331.
Rodman, 331.
Michaela Amering, Peter Stastny, and Kim Hopper, “Psychiatric Advance Directives:

Qualitative Study of Informed Deliberations by Mental Health Service Users,” m
British Journ_al of Psvchia_try 186 (2005): 247-252.

Paul Losiewicz, Douglas Oard, and Ronald Kostoff, “Textual Data Mining to Support
Science and Technology Management, “ J oumal of Intelligent Information Systems
15 (2000): 99-119.

Losiewicz 105.

48

 

36. Scott Deerwester et al. “ Indexing by Latent Semantic Analysis,” J oum_al of American
Society for Information Science 41 no. 6 (1990): 391-407.

37. Leong 198.

38. Losiewicz 113.

39. Richard Landis, and Gary Koch, “The Measurement of Observer Agreement for
Categorical Data,” Biometrics 33 (1977): 159-174.

40. Anthony Viera, Joanne Garrett, “Understanding Interobserver Agreement: The Kappa
Statistic,” Famin Medicine 37 no. 5 (2005): 360-363.

41. Losiewicz, 115.

49

Bibliography

Amering, Michaela, Peter Stastny, and Kim Hopper. “Psychiatric Advance Directives:
Qualitative Study of Informed Deliberations by Mental Health Service Users.”
The British Journal of Psychiatgy 186 (2005): 247-252.

Cerrito, Patricia. “Inside Text Mining: Text Mining Provides a Powerful Diagnosis of
Hospital Quality Rankings.” Health Management Technology March 2004.

Cerrito, Patricia. “Solutions to the Investigation of Healthcare Outcomes in Relationship
to Healthcare Practice.” SUG129 Conference Proceeding; Montreal May 9-12,
2004.

Deerwester, Scott, et a1. “Indexing by Latent Semantic Analysis.” Journal of American
Society for Information Science 41, no. 6 (1990): 391-407.

Landis, Richard, Gary Koch. “The Measurement of Observer Agreement for Categorical
Data.” Biometrics 33 (1977): 159-174.

Leong, Elaine, Michael Ewing, and Leyland Pitt. “Analyzing Competitors’ Online
Persuasive Themes with Text Mining.” Marﬁtinglntellijence and Planning
22, no. 2 (2004): 187-200.

Loh, Stanley, Jose Oliveira, and Mauricio Grameiro. “Knowledge Discovery in Text for
Constructing Decision Support Systems.” Applied Intelligence 18 (2003):357-
366.

Losiewicz, Paul, Douglas Oard, and Ronald Kostoﬁ‘. “Textual Data Mining to Support
Science and Technology Management.” Journal of Intelligent Information
Systems 15 (2000): 99-119.

Mack, R., et a1. “Text Analytics for Life Science Using the Unstructured Information
Management Architecture.” IBM Systems Joumal 43 no.3 (2004) 490-515.

Rodman, Jane, Floyd Frost, and Walter Jakubowski. “Using Nurse Hot Line Calls for
Disease Surveillance.” Emerging Infectious Diseases 4 no. 2 (1998): 329-332.

SAS Enterprise Miner 5.1 documentation, 2004.

Uramoto, N, et al. “A Text-Mining System for Knowledge Discovery from Biomedical
Documents.” IBM Systems Journ_a_l 43 no. 3 (2004) 516-533.

Viera, Anthony, Joanne Garrett. “Understanding Interobserver Agreement: The Kappa
Statistic.” Family Medicine 37 no. 5 (2005): 360-363.

50

   

11111111111-1111311‘11211171311