AN EVALUATEOH OF THE INDEXENG KETEODS
EMPLWEE [N A CGHPUTEREZEE‘: INFOEMATEON
SYSTEM USED IN TEE AREA
OF SPECIAL EDUCATE‘ON

Yhesis {or the Degree of DH. D.
MlCHfGAN STATE UNIVERSITY
Robert Don Moon, Jr.

E972

LIBRARY

Michigan Scan
University

 

This is to certify that the

thesis entitled

An Evaluation of the Indexing Methods
Employed in a Computerized Information System
Used in the Area of Special Education

presented by

Robert Don Moon, Jr.

has been accepted towards fulﬁllment
of the requirements for

Ph. D. degreein Education

 

 

 

Date February 2L, 1972 fl

0-7839

_ ,* 4—.-.ﬂ.

ABSTRACT
AN EVALUATION OF THE INDEXING METHODS

EMPLOYED IN A COMPUTERIZED INFORMATION SYSTEM
USED IN THE AREA OF SPECIAL EDUCATION

By

Robert D. Moon, Jr.

In 1966 The Council for Exceptional Children in cooperation with
the Educational Resources Information Center established the CEC-ERIC
Information Center. The Center was funded by the U. S. Office of
Education's Bureau of the Handicapped. This study evaluates the index-
ing method used at the Center, compares the method with two alternative
methods, analyzes the indexing vocabulary, describes changes in indexing
procedures, and evaluates those changes. The data base for the indexing

evaluation was 2100 abstracts contained in Volume I of Exceptional Child

 

Education Abstracts (ECEA), a computerized journal produced from the

 

Center's information files.

Three indexing methods were compared based on results of questions
written by staff members acquainted with the Center's indexing procedures
and by professional educators not familiar with the procedures. Each
group wrote 105 logical search questions to retrieve target documents.
All questions were used with each of the three indexing methods. The
computer searches were made using Basic Indexing and Retrieval System
(BIRS). Indexing Method 1 (the method normally used at the Center)

extracted terms from titles of document surrogates and used ERIC

Robert D. Moon, Jr.
descriptors assigned by indexers. Indexing Method 2 used terms extracted
from the titles and abstracts, and Indexing Method 3 used terms extracted
from the titles and abstracts and ERIC descriptors assigned by indexers.

Estimated average recall for the Center staff was .73 for Method 1,
.36 for Method 2, and .81 for Method 3. Estimated average recall for
professional educators was .54 for Method 1, .77 for Method 2, and .80
for Method 3. Average Microprecision for the Center staff was .83 for
Method 1, .76 for Method 2, and .74 for Method 3. Average Microprecision
for professional educators was .94 for Method 1, .93 for Method 2, and
.89 for Method 3. Six null hypotheses were tested at the .01 level to
determine if there were significant differences between the search
results of the CEC-ERIC staff and professional educators. These tests
when based on estimated average recall indicated that the Center staff
had significantly better results for Method 1, professional educators
had significantly better results for Method 2, and there was no signifi-
cant difference for Method 3. As measured by Average Microprecision the
professional educators had significantly better results for all three
methods. This data tends to suggest that the need for carefully con-
trolled indexing languages is minimized in the field of education when
sophisticated computer searching algorithms are available.

The vocabulary of the ERIC descriptors used to index Volume I of
ECEA was compared with the vocabulary of the titles of abstracts in
Volume I and an empirically based thesaurus developed by a retrospective
search of five years' literature in special education. These comparisons

implied that the vocabulary found in the ERIC descriptors had as much or

Robert D. Moon, Jr.
more similarity to the vocabulary based on the five-year retrospective
search as did the titles.

A subjective analysis of the indexing terms used in Volume I of
ECEA was performed by the indexing staff. This resulted in the establish-
ment of a subset of the ERIC Thesaurus to be used in indexing future
volumes of ECEA. The use of this reduced list was evaluated by proces-
sing 20 search questions on Volumes I and II of ECEA. In 19 of the
searches precision was greater for documents retrieved from Volume 11
than from Volume I. In the one case where this was reversed the preci-

sion was almost identical and greater than .9.

AN EVALUATION OF THE INDEXING METHODS
EMPLOYED IN A COMPUTERIZED INFORMATION SYSTEM

USED IN THE AREA OF SPECIAL EDUCATION

BY

Robert Don Moon, Jr.

A THESIS

Submitted to
Michigan State University
In partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

College of Education

1972

©Copyr1ght by
ROBERT DON MOON, JR.

1972

ACKNOWLEDGMENTS

This study would not have been possible without the assistance and
cooperation of many individuals. I wish to express my sincere apprecia-
tion to my doctoral committee not only for their assistance in this
study, but also for their contribution to my graduate program--to Dr.
Dale Alum for the insights he has given me into curriculum development;
to Dr. Louise Sause for the rich understanding of children and their de-
velopment which she has shared with me; to Dr. John Vinsonhaler for the
opportunity provided me to work under his direction at the Information
Systems Laboratory; and to Dr. William Walsh, chairman of my graduate
committee for his able guidance in helping me to plan a very meaningful
doctoral program.

Dr. Walsh has also been most helpful in making suggestions which
have aided in clarifying the presentation of this study. His continued
availability to answer questions and be of assistance despite his very
busy schedule has been greatly appreciated.

The Basic Indexing and Retrieval System (BIRS) used in this study
was developed under the direction of Dr. John Vinsonhaler. It was while
‘working for Dr. Vinsonhaler at the Information Systems Laboratory that
‘the study was first conceived. His continued technical assistance has
also been greatly appreciated.

The majority of activities related to this study have taken place
at the Council for Exceptional Children. It is difficult to conceive of

.an Organization that could have provided a better environment for these

11

activities. Those involved in a special way at the Council for Excep-
tional Children have been Mr. William Geer, Executive Secretary;

Dr. June Jordan, the first director of the CEC-ERIC Information Center;
Dr. Don Erickson, present director of the CBC-ERIC Information Center;
Carl Oldsen, the editor of ECEA: and his staff.

I am also indebted to Mr. John Hafterson who has contributed to
this study both while he was at the Information Systems Laboratory at
Michigan State and presently as a staff member at CEC. It has been
especially helpful to have an individual with his technical capabili-
ties with whom I could interact while I have been working on the project
at CEC.

I am especially indebted to my wife Louise for typing and editorial
assistance and to my children Bobby and Cami for their endurance and
willingness to forego some activities while the study was being

completed.

111

TABLE OF CONTENTS

CHAPTER PAGE
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 1
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Need for the Study . . . . . . . . . . . . . . . . . . . 6
Objectives of the Study . . . . . . . . . . . . . . . . . 7
Questions Examined . . . . . . . . . . . . . . . . . . . 8
Definitions and Acronyms . . . . . . . . . . . . . . . . . 9

' Scope and Overview of the Study . . . . . . . . . . . . . . 14
Overview of Procedures . . . . . . . . . . . . . . . . . 14

II. RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . 16
Systems and Systems Analysis . . . . . . . . . . . . . . . 16

What is a System? . . . . . . . . . . . . . . . . . . . . 17

What is System Analysis? . . . . . . . . . . . . . . . . 20
Information Retrieval Systems . . . . . . . . . . . . . . . 23

Indexing Methods--Content Analysis, Specification,

and Control . . . . . . . . . . . . . . . . . . . . . . . 31
Indexing Languages and Retrieval Systems . . . . . . . . 32
Methods Of Machine Indexing . . . . . . . . . . . . . . . 41
Evaluation of Indexing Methods . . . . . . . . . . . . . . S9

Descriptive Statistics for Document Retrieval . . . . . . 62

Relevance Judgment . . . . . . . . . . . . . . . . . . . 70
iv

CHAPTER

II. (cont'd)

Comparison of Indexing Schemes

An Overview of BIRS—-Basic Indexing and Retrieval

System

The Executive Program--EXEC .

Task Management Program--TASK .
Translation Program--TRANS

Information File Maintenance Program-~IFMP
Printed Indexing Program——PIP .

Printed Listing Program-—PLP

Descriptive Analysis Program--DAP .
Description File Maintenance Program-~DFMP
Description File Search Program--DFSP .

Information File Retrieval Program--IFRP

Summary .

III. THE

DEVELOPMENT OF CEC-ERIC INFORMATION CENTER

AND ITS PRESENT OPERATING STATUS

A

History and Description of Central ERIC .
Objectives of ERIC
The Growth of ERIC

The Future of ERIC

The Development of the CBC-ERIC Clearinghouse

The Early Operation of the Center .

The Establishment of Data Processing Procedures .

The Decision to Publish a Computerized Journal

PAGE

76

77

78

78

8O

8O

81

81

81

82

86

87

89

90

91

93

94

95

98

101

101

CHAPTER
III. (cont'd)
An Overview of the Operating Procedures Used by the
CEC-ERIC Information Center .
Legend and Nomenclature
Model Developed for the CEC-ERIC Information Center .
Overview of the Information Center's Major Activities
Overview of Major Input and output
Overview of Evaluation and Processing Modifications
Overview and Model of the Information Center's
Operation .
The Publication of Exceptional Child Education
Abstracts
Selective Publication .
Descriptive Statistics About the Present Operating
Status of the CEC-ERIC Information Center .
The Center's Holdings--Types of Documents and Their
Subjective Content
Acquisition and Processing Rates
Information Request Processing Statistics .
Processing Costs
Summary .
IV. PROCEDURES USED IN THE EVALUATION AND ANALYSIS OF THE
INFORMATION CENTER INDEXING METHODS .
The Evaluation of the Indexing Procedures Used in
Volume I of ECEA

Questions Examined

vi

PAGE

104

104

107

107

110

112

112

115

119

121

122

124

125

128

130

131

132

CHAPTER
IV. (cont'd)
Selection of Target Documents .
Preparation of Questions to Retrieve Target Documents
Relevance Judgments .
Measurement Techniques Employed in the Indexing
Evaluation
The Content Analysis of the Vbcabulary Used in Indexing
Volume I of ECEA
Compilation of Indexing Terms Assigned to Volume I
of ECEA .
Subjective Analysis by Indexers of Terms Used in
Volume I of ECEA
A Comparison of the Word Vocabulary Used in the
Indexing Terms of Volume I of ECEA with Words
Extracted from the Literature .
Analysis of the Vocabulary Used in Writing Questions
to Retrieve Target Documents
Analysis of Changes in Indexing Procedures Between
Volume I and Volume II of ECEA
Summary .
V. RESULTS OF THE EVALUATION AND ANALYSIS OF THE INFORMATION
CENTER INDEXING METHODS .
A Comparative Evaluation Of Three Indexing Methods
Questions Examined
Indexing Methods Compared .

Results of the Comparison of Indexing Methods .

vii

PAGE

134

134

136

136

137

139

140

140

143

143

145

146

147

148

149

149

CHAPTER
V. (cont'd)
Factors Important to the Analysis of Data Resulting
from the Comparison of Indexing Methods
Analysis of the Question Vocabulary .
AnalySis of the Indexing Vocabulary Used in Volume I
of ECEA .
Notation
Results of Vocabulary Comparisons of Three Word Lists
A Subjective Analysis of Terms Selected From the ERIC
Thesaurus to Index Volume I of ECEA .
Results of Subjective Evaluation of the ERIC
Descriptors Used in Volume I of ECEA
The Effect of Indexing Procedure Changes in Volume
II of ECEA
Results of Indexing Procedure Changes

Summary .

VI. SUMMARY, CONCLUSIONS, RECOMMENDATIONS, AND IMPLICATIONS .

Summary .
Procedures Used at the CEC-ERIC Center
A Comparison of Three Indexing Methods
Comparison of Vocabulary of Three Word Lists
Changes in Indexing Procedures

Conclusions .
Results of Testing Null Hypothesis 1
Results of Testing Null Hypothesis 2

Results of Testing Null Hypothesis 3

viii

PAGE

164

167

168

CHAPTER

VI. (cont'd)

Results of Testing Null Hypotheses 4, S, and 6

Results of Testing Null Hypothesis 7
Results of Testing Null Hypothesis 8
Results of Testing Null Hypothesis 9
Results of Testing Null Hypothesis 10 .
Interpretation of the Results of the
Comparison of Three Indexing Methods
Reflections on Methodology Used in
Comparing Indexing Methods
Interpretation of the Vocabulary Comparisons
Interpretation of the Effect of Changing

Indexing Procedures

Recommendations .

Data Related to Recommendation 1
Recommendation 1

Data Related to Recommendation 2
Recommendation 2

Observations Related to Recommendation 3
Recommendation 3

Observations Related to Recommendation 4
Recommendation 4

Observations Related to Recommendation 5

Recommendation 5

ix

PAGE

185

185

186

186

187

188

190

192

194

195

195

196

196

197

197

197

198

198

199

199

CHAPTER . PAGE
VI. (cont'd)

Implications . . . . . . . . . . . . . . . . . . . . . . . 200

The Use of Controlled Indexing Vocabularies . . . . . . . 200

An Evolving Thesaurus . . . . . . . . . . . . . . . . . . 202

Selective Publication From Information Files . . . . . . 204

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

APPENDIX A . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

LIST OF TABLES

TABLE

3.1 An Analysis of Information Requests Processed by The
CBC-ERIC Information Center During the First
Quarter, 1971

5.1 Descriptive Statistics Resulting From the Evaluation
of Three Indexing Methods

5.2 Data and Calculations Used in Testing Null Hypothesis

5.3 Data and Calculations Used in Testing Null Hypothesis

5.4 Data and Calculations Used in Testing Null Hypothesis

5.5 Data and Calculations Used in Testing Null Hypothesis

5.6 Data and Calculations Used in Testing Null Hypothesis

5.7 Data and Calculations Used in Testing Null Hypothesis

5.8 Data and Calculations Used in Testing Null Hypothesis

5.9 Data and Calculations Used in Testing Null Hypothesis

5.10 Data and Calculations Used in Testing Null Hypothesis

5.11 Results of Indexers' Subjective Analysis of Terms Used to

Index Volume I of ECEA .

5.12 Data and Calculations Used in Testing Null Hypothesis 10 .

8

9

5.13 Search Results of TWenty Questions Used on Volume I and

Volume II Of ECEA

xi

PAGE

126

150

155

156

157

159

160

161

165

166

170

173

178

179

LIST OF FIGURES

FIGURE

2.1 Input, Processing, and Output

2.2 Input, Processing, and Output with Feedback

2.3 A Portion of the ERIC Thesaurus

2.4 Illustration of BIRS Word Extraction Techniques

2.5 Examples of Permuted or Key-Word in Context (KWIC)
Indexes

2.6 The Partitioning of a Document Collection by A
Search Question

2.7 A Comparison of Various Types of Recall and Precision
Averages .

2.8 An Overview of the Basic Indexing and Retrieval System
(BIRS)

2.9 Examples of Search Questions .

3.1 Flowcharting Symbols .

3.2 Overview of Information Center Major Activities

3.3 Overview of Major Input and Output .

3.4 An Overview of the Information Center's Evaluation
and Systems Modification Components

3.5 An Overview and Model of the Information Center's
Operations . . . . . . . . . . . . . .

3.6 Sample ECEA Abstract

xii

PAGE

25

25

38

49

53

64

67

79

84

105

108

111

113

114

116

FIGURE PAGE
3.7 Samples of ECEA Author and Subject Indexes . . . . . . . . . 117
3.8 Subject Content Description of Information Center

Holdings Based on 5715 Acquisitions in Volumes I 8 II

of ECEA . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.1 A Description of Data and Descriptive Statistics Used in

Comparing Various Indexing Methods . . . . . . . . . . . . 138
5.1 Number of Target Documents Retrieved by Three Indexing

Methods . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.2 Average Microprecision for Three Indexing Methods . . . . . 152

5.3 Number of Relevant Documents Retrieved by Each Indexing

Method . . . . . . . . . . . . . . . . . . . . . . . . . . 153
1A Flowcharting Symbols . . . . . . . . . . . . . . . . . . . . 215
2A Overview of Information Center Major Activities . . . . . . 218
3A Overview of Major Input and Output . . . . . . . . . . . . . 221
4A An Overview of the Information Center's Evaluation

and Systems Modification Components . . . . . . . . . . . 222

5A An Overview and Model of the Information

 

 

Center's Operations . . . . . . . . . . . . . . . . . . . 224
6A Acquisition Control and Document Management . . . . . . . . 226
7A File Maintenance . . . . . . . . . . . . . . . . . . . . . . 230
8A File Processing fOr Exceptional Child

Education Abstracts . . . . . . . . . . . . . . . . . . . 235
9A A Computer Search - Predefined Process 9 . . . . . . . . . . 239
10A Information Request Processing . . . . . . . . . . . . . . . 241
11A Procedures fOr Processing Selective Publications . . . . . . 244

xiii

CHAPTER I: INTRODUCTION

There is a valid concern over the scientist's ability to keep
abreast of the rapid growth of knowledge in his discipline. The reali-
zation of this problem is not new. In 1936 historian H. G. Wells pre-
sented a paper entitled "World Encyclopaedia" which stated his concern
about the ineffective use and lack of coordination Of knowledge. In
this paper he suggests, "Possibly all the knowledge and all the correc-
tive ideas needed to establish a wise and stable settlement of the
world's affairs in 1919 existed in bits and fragments, . . . ." He con-
tinues in descriptive terms to describe the human species as a "man of
the highest order of brain, who through some lesions or defects or
insufficiencies of his lower centres, suffers from the wildest uncoordi-
nations; . . . ." Finally in his presentation Wells suggests a world
encyclopaedia as a means to "solve the problem of that jigsaw puzzle
and bring all the scattered and ineffective mental wealth of our world
into something like a common understanding, . . . ."1

Subsequent to the presentation of H. G. Wells one finds in the
literature with increasing frequency similar expressions of concern and
suggestions fbr solutions. Alvin Weinberg succinctly summarizes this
concern in the following statement:

—¥

1H. G. Wells, "World Encyclopaedia," World Brain (Garden City, New
'York: Doubleday, Doran G CO., Inc., 1938), pp. 3-35. Paper read at the
-R°Ya1 Institution of Great Britain Weekly Evening Meeting, Friday,
November 20, 1936.

 

2

The ideas and data that are the substance of science and
technology are embodied in the literature; only if the litera-
ture remains a unity can science itself be unified and viable.

Yet because of the tremendous growth of literature, there is

danger of science fragmenting into a mass of repetitious

findings, or worse, into conflicting specialities that are not

recognized as being mutually inconsistent. This is the essence

of the "crisis" in scientific and technical information.

When one looks at the rate at which knowledge, or at least litera-
ture, is growing, the sc0pe of the problem becomes staggering. As
reported in 1965 there were approximately seven new papers published
each year for every hundred previously published. Since 1860 the gen-
eral trend indicated an exponential increase in the literature with the
total literature doubling approximately every thirteen and a half years.
There have been only a few noticeable interruptions in this growth.
These occurred during World Wars I and II. Since World War I, with the
exception of the World War 11 period, the rate of growth appears to be
even more rapid than a doubling every thirteen and a half years.3

Individual disciplines have attempted to solve this problem with

reference works such as Psychological Abstracts, Chemical Abstracts,

 

 

and the Educational Index. Such reference works are a significant aid

 

to the scientist; however, they are far from a total solution. As the
literature has continued to increase, the volume of such journals has
also increased; and the indexing Of articles contained in the journals
has become a more difficult problem.

When the indexing is too broad, the scientist is still confronted

2Alvin Weinberg, Science, Government, and Information: The
Responsibilities of the Technical Community and the Government in the
Transfer of Information, President's Science Advisory Council,
ashington: GOvernment Printing Office, 1963), p. 7.

 

3Derek J. deSolla Price, "Network of Scientific Papers," Science,
CXLIX (July 30, 1965), 510-515.

3

with an awesome volume of articles to review. Often he would like only
specific articles which contain information about several specific
categories. When this is the case, it is necessary for him to look at
the intersection of lists of articles where each list may contain hun-
dreds of individual articles.

Two advances in technology were recognized almost immediately to
provide assistance in helping the scientist cope with the expanding
volume of literature. These advances were microfilm and the computer.
The contribution of microfilm was straightforward. It allowed a large
volume of material to be stored in a small area and made it possible to
have copies of documents for a small cost. Copyright laws are making
it difficult to apply this medium to recent publications thus prevent-
ing it from reaching its potential effectiveness.

Being able to find specific information in an ever expanding volume
Of literature is a problem which remains whether material is put on
microfilm or is in hard cOpy. It was almost immediately recognized
that the computer provided a powerful tool to assist in solving this
problem. Methods of indexing which could be used by the computer were
experimentally examined before computer technology was capable Of imple-
menting them on large collections of documents. One of the first such
experiments related to the Library of Congress and was conducted about
1953. This involved a comparison of proposed coordinate indexing with
subject heading systems then in use by the Library of Congress. The

experiment involved approximately fifteen thousand documents.4 In 1954

 

4C. D. Gull, "Seven Years of Work on the Organization of Materials

érlthe Special Library," American Documentation, VII (October, 1956),
20-329.

4

another series of experiments was initiated which continued over a
period of years and now is collectively called the Cranfield Studies.S
Following these projects an increasing number of experiments have
attempted to determine what the most effective indexing methods are for
computerized information retrieval systems.

Existing reference works such as Chemical Abstracts are using this

 

new technology to make their services more effective. There have been
increasing numbers of new information services taking advantage of this
technology, with one of the most notable being related to medical lit-
erature, MEDLARS (Medical Literature Analysis and Retrieval System.) A
number have also resulted from the aerospace research. In the area of
education ERIC (Educational Resources Information Center) has deveIOped.
Other educational Information Centers are listed in the Directory of
Educational Information Centers, published in 1969 by the U. S. Govern-
ment Printing Office. (Document #FSS.212:12042.)

One of the major problems in dealing with the retrieval of informa-
tion is the ambiguity of language. The most effective systems for
retrieving information generally deal with types of information which
have a very technical and well-defined nomenclature. A classic example
of this is the area of chemistry. The need for eliminating the ambi-
guity of vocabulary in this area resulted in a conference held in 1930

to develop an effective system for naming chemical compounds.6 This

 

5Charles P. Bourne, "Evaluation of Indexing Systems," Annual
Review of Information Science and Technology, Carlos A. Cuadra, editor
(New York: Interscience Publishers, 1966), I, 176.

 

6Commission and the Council of the International Union of Chem-
istry, "Definitive Report of the Commission on the Reform of Nomencla-
ture of Organic Chemistry," Journal of American Chemical Society, LX
(1933), 3905-25.

 

S
has helped Chemical Abstracts, related reference works, and their com-

 

puterized retrieval systems to be one of the more effective information
networks in existence.7

One has only to look at the literature concerning the research done
on thesaurus refinement to become aware of the significance of this
problem. The indexing systems used by various medically-oriented re-
trieval systems have been the focus of considerable research on this
problem.8 Despite the technical nature Of medical terminology, the lit-
erature shows there is still a considerable problem in finding the best
indexing methods.

The area of education has a nomenclature which is much less struc-
tured and more ambiguous than technical areas such as chemistry or medi-
cine. One has only to examine the ERIC Thesaurus and observe the large
number of broad terms, related terms, and narrow terms listed for a

given concept to obtain a quick appreciation of the problem.9
Problem

The increasing amount of material being published in the area of
education has made it important to find better ways to store and dis-

seminate information.

 

7F. A. Tate, "Handling Chemical Compounds in Information Systems,"
Annual Review of InfOrmation Science and Technology, Carlos A. Cuadra,
editor (New York: Interscience Publishers, 1967), II, 285-310.

8John O'Connor, "Correlation of Indexing Headings and Title Words
in Three Medical Indexing Systems," American Documentation, XV (April,
1964), 96-104; and Montgomery and D. R. Swanson, "'Machine' Like Index-
iJlg by People," American Documentation, XIII (October, 1962), 359-66.

 

9Thesaurus Of ERIC Descriptors: Workin Copy Descriptor Listing)
EHIIC Processing and RefErence Facility (Bet esda, Md.: Leasco Systems
anti Research Corporation, August, 1971), pp. 1-244.

6
The Council for Exceptional Children has an Information Center

10

(CBC-ERIC Information Center) which is a part of the ERIC system, and

it is envisioned that a major contribution of this study will be to de-

)11 was used in de-

scribe how BIRS (Basic Indexing and Retrieval System
veloping an information system used by the Center.
Specifically this study evaluates the indexing methods which are
used at the CBC-ERIC Information Center and compares them with alter-
native methods available to the Center. The results are examined for
any implications that suggest ways of improving the indexing methods

used at the Center and for implications which might be generalized to

the total field of education.

Need for the Study

 

The effectiveness of any information retrieval system directly re-
lates to the indexing methods used. If the literature in the field of
education is to be more effectively used by researchers, it is important
to locate and identify studies which contribute to a significant under-
standing and improvement of the nomenclature. This study will examine
indexing methods and describe the overall procedures used at the CEC-
ERIC Center fOr processing information about special education. The
results will be examined for implications which may contribute to fur-

ther studies relating to the total field of education.

 

10"All About ERIC," Journal of Educational Data Processing, VII
(April, 1970), 51-129; and June 8. Jordan, "CEC-ERIC-IMC: A Program
Partnership in Information Dissemination," Exceptional Children, XXXV
(December, 1968), 311-313.

11John F. Vinsonhaler, John M. Hafterson, Stuart W. Thomas, Jr.
(editors), Basic Information Retrieval System Technical Manual (East
Lansing, Michigan: Information Systems Laboratory, College of Education,
Michigan State University, 1970), Vols. I-XII.

 

 

 

7

Many information centers and systems have been develOped to help

cope with the rapid growth of knowledge. Various disciplines, including
education, are using new methods involving computers and microfilm.
While there is considerable documentation about the various ways in
which these techniques are used, the documentation for any single sys-
tem is often contained in many places, fragmented and sketchy.

The CEC-ERIC Center uses two computerized systems for information
handling: (1) BIRS and (2) a commercial system for computer typeset—
ting.12 The Information Center has interfaced these in a unique manner,
allowing for selective computerized publication.

The manner in which the CEC-ERIC Information Center is using com-
puterized information retrieval and computerized publication has not
been previously documented, nor have the indexing methods been evaluated.
By describing the procedures used at the Center and evaluating the
indexing methods, this study provides information which may contribute
to a better understanding Of educational nomenclature and the dissemi-

nation of educational information.

Objectives of the Study

 

This study has the following objectives:

1. To document the development of the information system used by
the CBC-ERIC Information Center.

2. To document the manner in which the CEC-ERIC Information Center

uses the BIRS system and other computerized programs.

. 12Exceptional Child Education Abstracts, II (November, 1970), see
inside front cover.

 

8

'To evaluate the indexing methods used by the CEC-ERIC Informa-

tion Center.

To recommend means for improving these indexing methods.

To examine the results of the study for implications about how
the CBC-ERIC Information Center might improve its overall effec-
tiveness.

To examine the results of the study for implications concerning
improvement of communication within the total field of

education.

Qpestions Examined

 

In the evaluation of the indexing methods the following questions

are examined:

1.

How effective is the indexing method used by the Information

Center fer:

a. CBC-ERIC staff who are familiar with the Information Cen-
ter's indexing system?

b. Professional educators who are not familiar with the Infor-
mation Center's indexing system?

How effective is a computerized indexing method which extracts

terms from the titles and abstracts for:

a. CBC-ERIC staff who are familiar with the Information Cen-
ter's indexing system?

b. Professional educators who are not familiar with the Infor-
mation Center's indexing system?

How effective is the indexing method used at the CEC-ERIC

Information Center when combined with machine indexing of

abstracts fer:

9
a. CEC-ERIC staff who are familiar with the Information Cen-

ter's indexing system?
b. Professional educators who are not familiar with the Infor-
mation Center's indexing system?
4. Is the vocabulary of the terms used in the indexing method
employed by the CEC-ERIC Information Center found in the lit-

erature of special education?
Definitions and Acronyms

The field of computerized information retrieval has been rapidly
developing since about 1953. As with many new technical areas there is
a certain amount of ambiguity in the nomenclature. The alphabetized
definitions in this section in most cases have been chosen to reflect a
consensus of the literature; however, it is possible to find some of the
same or very similar ideas represented in the literature by terms other
than those in the list of definitions. The several mathematical defini-
tions have been given without the uSe of mathematical symbols, but are
consistent with definitions using symbolic nomenclature.

For the reader wishing a more comprehensive introduction to the
nomenclature of information retrieval, articles or books by Bourne,

Swets, and Lancaster may provide a quick introduction,13 while a simple

 

13Charles P. Bourne, "Evaluation Of Indexing Systems," Annual Re-
view of Information Science and Technology, Carlos A. Cuadra, editor
New York: Interscience Publishers, 1966), I, 171-190; Donald W. King,
"Design and Evaluation of Information Systems," Annual Review Of Infor-
mation Science and Technology, Carlos A. Cuadra, editor (Chicago:
Encyclopaedia Britannica, 1968), 61-104; F. W. Lancaster, "MEDLARS: A
Report on the Evaluation of Its Operating Efficiency," American Documen-
tation, April, 1969, pp. 119-142; and John A. Swets, "InfOrmation-
Retrieval Systems," Science, CXLI, (July 19, 1963), 245-250.

 

 

10
text on sets or symbolic logic should give the reader a further under-

standing of the mathematical terms.

The definitions and acronyms are meant to be a reference for the
reader and in this document the terms are consistently used as defined
unless an exception is noted. It may be helpful to a reader unfamiliar
with information retrieval to read the definitions of the twelve terms

in the following list in the numeric order specified.

1. Document Surrogate 7. Hit

2. Term 8. Precision

3. Indexing Method 9. Recall

4. Information file 10. Set

5. Description file 11. Union

6. Target Document 12. Intersection

BIRS The Basic Indexing_and Retrieval System is a generalized

 

 

system of computer programs designed for storing, indexing, and retriev-

ing information.14

 

CBC The Council for Exceptional Children.
DAP The Descriptive Analysis Prggram (one of the BIRS Programs)

 

aids the user with the task of indexing or classifying infOrmational
elements. DAP reads informational elements (abstracts) and searches

. . 15
them for descriptive terms or phrases.

 

14John F. Vinsonhaler, John M. Hafterson, Stuart W. Thomas, Jr.
(editors), Basic Information Retrieval System Technical Manual (East
Lansing, Michigan: Information Systems Laboratory, College of Education,
Michigan State University, 1970), Vols. I-XII.

 

15John F. Vinsonhaler, John M. Hafterson, and Stuart W. Thomas,
(editors), Basic Information Retrieval System Technical Manual (East
Lansing, Michigan: Information Systems Laboratory, College of Education,
Michigan State University, 1970), I, 110.

 

11

DFMP The Description File Maintenance Progpam (one of the BIRS

 

 

Programs) reads descriptions and access numbers of informational ele-

ments and stores them on the Description File (DFT) to provide an index
6

 

to the contents of the Information File (IFT).1

DFSP The Description File Searching Program (one of the BIRS

 

 

Programs) is designed to read queries; to search the DFT for relevant

informational elements; and to store the access numbers Of the most

relevant elements on the Question File (QFT).17

 

Description File A file containing descriptions of the informa-

 

tion found on an information file. The object of such a file is to help
retrieve infOrmation from an information file. In this study the phrase

"description file" will always refer to a computerized description file.

Document Surrggate A substitute or abridged representation of

 

an original document.

ECEA Exceptional Child Education Abstracts, a journal of

 

 

abstracts in the field of special education published by the Council for

Exceptional Children.

ERIC The Educational Resources Information Center.

 

Estimated Average Recall The number of successful attempts to

 

retrieve target documents by computerized searches divided by the total
number of attempts. An attempt is a computer search to retrieve one

target document.

 

16mm. 17Ibid.

 

 

12
EXEC The Executive Program (one of the BIRS Programs) is de-

 

 

signed to store and retrieve system components by augmenting the super-

visory monitor.18

Hit A document retrieved by a computer search which is consid-

ered to be relevant to the computer search question.

IFMP The Information File Maintenance Program (one of the BIRS

 

 

Programs) maintains an Information File Tape (IFT) by reading and storing
informational elements (abstracts) of arbitrary length. The IFMP may

also be used to generate printed books.19

IFRP The Information File Retrieval Program (one of the BIRS

 

 

Programs) is designed to read queries and access numbers from the Ques-
tion File Tape (QFT); and to generate reports with the information ele-

ments (abstracts) read from the IFT.20

 

IMC/RMC An Instructional Materials Center/Regional Media
Center.
Indexing Method A method, procedure, or algorithm for select-

 

ing and assigning terms to describe a document or document surrogate.
The indexing method may be a manual method involving human index-
ers who assign the terms or a computerized method which extracts or
selects terms to be assigned to documents or document surrogates. The
computerized method may assign terms from a predetermined list accord-
ing to an algorithm or extract the terms according to an algorithm from

any portion of the text of the document or document surrogate.

 

181bid. 19Ibid. 201bid.

 

 

13

Information File A file containing information. This file may

 

or may not be stored on a computer storage device.

Intersection The intersection of two sets will be the set of

 

all objects common to both sets.

PIP The Printed Indexing Program (one of the BIRS Programs) pre-

 

pares a traditional subject index using informational elements read from

cards or from the IFT.21

PLP The Printed Listing Prpgram (one of the BIRS Programs)

 

provides printed books, i.e., listings of abstracts, ordered by the
contents of the abstracts. The books produced by PLP are similar to

those produced by the IFMP, except that the latter are ordered by the

 

Information File Tape (IFT) access number.22
Precision The number of hits divided by the total number of

documents retrieved in a computerized search.

Recall The number of hits divided by the number of documents in

the information file which are relevant to the search question.
Set A well-defined collection of objects.

Target Document A document which is randomly selected from an

 

information file to be used as the basis for writing a search question.

Term A word or phrase assigned to describe (used to index) a

document or document surrogate.

 

lebid- 22Ibid.

 

 

14
Union The union of two sets is a new set Of all objects belong-

ing to either or both of the original sets.
Scope and Overview of the Study

This study describes the procedures used at the CEC-ERIC Infor-
mation Center and attempts to provide sufficient detail and organi-
zation so as to allow others to use them as a model. The study
also evaluates indexing methods used at the CEC-ERIC Information Center.
The methods are evaluated in the context of a data file containing in-
formation about special education and an information retrieval system
based upon BIRS. The data base used in the initial evaluation of
indexing procedures is 2100 documents contained in Volume I of Eyggp:

tional ChildyEducation Abstracts (ECEA).

Overview of Procedures

 

The indexing methods used at CEC-ERIC Information Center involve
manually selecting terms from the ERIC Thesaurus to index each abstract.
The vocabulary of the terms assigned in Volume I of ECEA was compared
with the collective vocabulary found in the titles of the 2100 documents.
This vocabulary was also compared with the vocabulary of a thesaurus
developed by Samuel Price for use in the area of Special education.23

One hundred and five target documents (abstracts) were randomly
selected from Volume I of ECEA and computer-searchable description files
were created of these abstracts. To search these files questions were

generated by members of the CBC-ERIC staff familiar with the in-house

. 23Samuel T. Price (comp.), Thesaurus of Descriptors for an Informa-
tIon Retrieval System in the Subject Matter Area Of'Special Education,
Normal, Illinois: Special Education Instructional Materials Lﬁoratory,
Illinois State University, January, 1970).

 

15

indexing procedures and by professional educators who had no knowledge
of the ERIC Thesaurus or in-house indexing procedures. The search
results for these questions were used to compare the effectiveness of
various description files generated by different indexing methods.

A subjective analysis of the ERIC terms assigned to Volume I of
ECEA was made by the CEC-ERIC indexers in an attempt to develop and re-
fine the Thesaurus which has been used in indexing successive volumes.
A series of search questions used by the Center to create selected bib-
liographies were used in searches Of both Volume I and II of ECEA. The
individual responsible for editing the bibliographies determined the
relevance of documents retrieved. Precision of documents retrieved over
Volume I was compared with the precision of documents retrieved over
Volume II to determine if it had been possible to improve the in-house
indexing procedures.

The theoretical base for the evaluation procedures used in this
study is contained in the literature and is examined in detail in

the following chapter.

CHAPTER II: RELATED LITERATURE

This chapter considers five specific areas of related litera-
ture. They are: (1) systems and systems analysis, (2) information
retrieval systems, (3) indexing methods--content analysis, specifica-
tion, and control, (4) the evaluation Of indexing methods, and (5) an
overview of the Basic Indexing and Retrieval System (BIRS). The review
of related literature has been designed to stress the importance of
viewing an information system as a whole, including its environment,
rather than concentrating on specific components without attention to
their context.

The chapter is organized so that it moves from general to specific,
starting with broad concepts of systems and systems analysis, next
moving to information retrieval systems as a particular category Of
systems, then examining specific processes in an information retrieval
system that are most important to this study; namely, indexing methods
and indexing evaluation, and finally giving an overview of BIRS-~a
specific example of a generalized computer information retrieval
system. The overview Of BIRS is included not only because it provides
a specific example, but also because it is used by the CBC-ERIC Infor—

mation Center and in the evaluation Of the Center's indexing methods.
Systems and Systems Analysis

One has only to look at the literature in education, management,

behavioral sciences, applied sciences, and other fields to discover that

16

17

the terms "system”, "system concept", and ”system analysis” are used
repeatedly, often without definition. The frequent use Of these terms
in popularized reporting of science and technology would imply that they
are important and that most people understand their meanings. However,
an examination of literature related to system analysis reveals that
there is not a consensus concerning the meanings of these terms and as

defined by various experts the meanings embrace broad concepts.

What is a System?

 

The question "What is a system?" does not have a simple answer and
might be re5ponded to by the Question "What kind of system?” Texts on
systems analysis speak of natural, man-made, mathematical, physical
science, and engineering systems. When the term ”system" is used in
relation to mathematics it most often means a set of rules. When it is
used in connection with the physical sciences, or natural systems, it
is taken to mean a portion of the universe around which an imaginary
boundary has been drawn for the purpose of study. In engineering the
word "system” is interpreted as "meaning an organized working total, an
assemblage of objects united by a form of regular interaction or inter-
dependence."24

While mathematical systems might be considered as man-made systems,
they are distinctly different from the term system used in an engineer-

ing sense, in that a mathematical system deals only with ideas, whereas

an engineering system usually deals with man-made systems involving real

 

4Dimitris N. Chorafas, Systems and Simulation (New York: Academic
Press, 1965), p. 4.

 

18
Objects. The broadest definition of a system found in this study of

related literature defined it to be "a set Of interacting elements."25
This definition could include natural systems such as the solar system,
or man-made systems with the exception of mathematical systems where the
elements (ideas) do not interact. In a mathematical system the elements
exist only in man's mind, and have no substance or energy of their own
with which to interact.

In the science of infOrmation retrieval the concept of system or
system analysis is most closely related to the terms as they are used in
engineering. Following are seven selected quotations relating to the
term "system" which have been grouped for convenient examination:

1. A system is broadly defined as "a group of interdependent
elements acting together to accomplish a predetermined task."26

2. An integrated assembly of interacting elements designed to
carry out cooperatively a predetermined function.

3. There has been a growing realization Of the existence of an
identifiable science of systems, comprising a body of concepts,
methods, and above all, a philosophy of treating the whole
rather than bits and pieces. The new field of systems science
is as yet only loosely defined and has different meanings in
different contexts. Its ultimate domain is still seen only
dimly as compared to traditional disciplines such as physics,
mathematics, and engineering.

 

25Harry J. White and Selmo Tauber, Systems Analysis (Philadelphia:
W. B. Saunders Company, 1969), p. 4.

 

26Ibid., p. ix, (Preface by F. Gordon Smith).

27Harry J. White and Selmo Tauber, Systems Analysis (Philadelphia:
W. B. Saunders Company, 1969), p. 3, citing R. E. Gibson, "A Systems

Approach to Research Management, Part I," Research Mapggement, V (1962),
215.

 

 

28White and Tauber, pp: cit., p. l.

4.

19

A set of objects with relationships between the objects and
their attributes.29

Although system is a general term used in many senses, it does
convey a very important meaning not readily described in any
other way. The word derives from a Greek verb meaning to place
or set together, and Webster's New World Dictionary gives the
definition: "A set or arrangement of things so related or
connected as to form a unity or whole: as, a solar system,
irrigation system, supply system."

The concept of a man—made system usually includes the idea of
optimizing certain parameters such as cost, efficiency, size,
or reliability, in terms of criteria derived from externally
imposed value systems. The value systems are subjective and
are based on a variety of factors such as economic, social, or
even political. Adjustments or trade-off values between such
considerations as cost, reliability, and prestige are fre-
quently necessary.

A system can contain within its structure a number of sub-
systems each of which has all the attributes of a systgm when
considered as an integrated collection of components.3~

An examination of the above quotations reveals at least five impor-

tant concepts:

1.

The concept of a system relating to a whole as indicated by

 

statements 1, 2, 3, 4, and 5.

The concept of interacting components or elements as indicated

 

by statements 1, 2, and 4.

The concept of a system existing to accomplish a specified task

 

of function as indicated by statements 1 and 2.

 

29A. Hall and R. Fagan, "Definition of a System,” General Systems,
Vol. V of Yearbook of the Society for General Systems, 1956), p. 18.

 

 

30White and Tauber, pp: cit., p. 3.

311bid., p. 4.

32

Ralph Deutsch, System Analysis Techniques (Englewood Cliffs:

 

Prentice-Hall, Inc., 1969), p. 2.

20

4. The concept of optimizing according to predetermined_parameters

 

such as cost, efficienoy, size, reliability, etc. as indicated

 

by statement 6.

5. The concept that a system can contain subsystems each of which

 

has all the attributes of a system is indicated by statement 7.

 

To summarize, the term system is used in many ways. It is used to
refer to mathematical systems, physical or natural systems and non-
mathematical man-made systems and in each of these areas the word
"system" has a complex meaning with numerous implications. The meaning
of the word which is most directly related to information science is
when it is used as related to man-made mon-mathematical systems which
include the five above-mentioned concepts. The fourth concept, the
concept of parameters to be used in measuring the effectiveness of the
system is important because it is this component that establishes
criteria for evaluation, modification, and redesign—-functions related

to systems analysis.

What is Systems Analysis?

 

Defining the term systems analysis is not a simple task as

emphasized by a statement in a recent book entitled Systems Analysis

 

Technigues. The statement says:

 

A reasonable expectation from a book on systems analysis would
be to find an introductory section which defines the term in an
unambiguous fashion. This starting point, while admittedly
desirable, is not feasible because the wide Spectrum encompassing
the figgd of systems analysis is still in its infancy and formative
state.

An examination of references to the terms "system analysis,"

"system concept," "systems science,” and "systems engineering" indicates

 

33Ibid., p. 1.

21

that there is considerable overlapping in the meaning of these terms,

sufficient overlapping that it does not seem advisable to make a dis-

tinction in their meaning. The following are a number of selected

quotations relating to these terms which are grouped for examination:

1.

The new and promising discipline of system analysis seeks to
determine the Optimum means for accomplishing the task
described in the problem statement.3

System analysis is an attempt to define the most feasible,
suitable3 and acceptable means for accomplishing a given
purpose.

System analysis is merely a study of a system--but it should
be emphasized that one does not usually study a system as an
end in itself. Rather the explicit motivation of any system
study is to generate information so that a decision can be
made.

Systems analysis: The analytic study of system, where analytic
is taken in its most general sense.J

Systems analysis, by its very meaning, cuts across academic
departmental barriers; an interdisciplinary approach is there-
fore necessary, both in the marshaling of varied resources and
in the manifold applications.°

Essentially the system concept is that of examining the overall

 

interactions of a group of items rather than focusing attention

 

on the operation of each of the component elements in turn.39

 

The system concept can be interpreted as stripping the non-
essential details from a collection of interacting elements so
that the structure of the interrelations is laid bare for
study.40

 

34Chorafas, op, cit., p. ix. (Preface by F. Gordon Smith).

35Chorafas, op: cit., p. 2.

36Deutsch, op, cit., p. 8.

37White 8 Tauber, op: cit., p. 5.

381bid., p. 2.

39Deutsch, op, cit., p. 2.

40Ibid.

 

10.

22

Systems science--The science that is common to all large
collections of interacting functional units that are combined
to achieve purposeful behavior.

Systems engineering—-A process in which complex systems are
idealized, designed and manipulated by conscious rational
processes based upon the scientific method.

Definitions of systems science and systems engineering gen-
erally include requirements for utility or at least directed
behavior. By contrast, definitions of system tend to be more
abstract.43

The ten statements illustrate the difficulty originally expressed

in defining system analysis and the overlapping of various terms using

the word "system." These statements about system analysis, system con-

cept, system engineering and system science appear to be typical of

other statements found in the literature.

There are many concepts which are implied in the preceding state-

ments .

Specific concepts which can be identified in one or more of the

statements are:

1.

That system analysis involves the study of a system as

 

indicated by statements 3, 4, and 7.

That system analysis involves finding a best, optimum, or most

 

feasible, way to accomplish a given task as indicated in state-

 

ments 1, 2, 8, 9, and 10.

That the study or analysis is motivated by a problem or need

 

for information to make a decision as indicated by statements

 

l, 3, and 10.

 

41White and Tauber, op: cit., p. 3, citing Institute of Electrical
and Electronics Engineers, Systems Science Committee, Charter.

42White and Tauber, op: 213', p. 3.

43Ibid., p. 4.

23
4. That the study or analysis focuses on interaction between com—

ponents of a system and their relationship to the system as a
whole rather than examining components in isolation as
indicated by statements 6, 7, and 8.

5. That the stuoy or analysis is rational or formalized as indi-
cated by statements 4 and 9.

6. That the study or anaiysis usually_involves an interdiscipli-

nary approach as indicated by statement 5.

 

One reason for the variety Of perceptions about systems analysis
may be that it is interdisciplinary with some sources indicating that
those doing systems analysis need to be "generalists" to insure an
unbiased approach to the entire problem.44 The judgments involved in
determining which of many approaches should be used and what is repre-
sentative data of the total available to be examined has led others to
ask the question "Does the product of such an effort deserve to be
called a science or an art?”45 The answer to this question is not clear

for there are Obviously elements of both.
Information Retrieval Systems

Those involved with information science are primarily concerned
with man-made systems designed to accomplish a specific task. A simple

definition consistent with the concept of man-made systems defines a

 

44Deutsch, op: cit., p. l; and White and Tauber, op, cit., p. ix.

4SChorafas, op cit., p. 3.

24
system as a "group of interdependent elements acting together to accom-

plish a predetermined task.46

A definition of system analysis given by Borko incorporates many of
the concepts suggested by statements about systems and system analysis.
This definition will serve as a reference for the term system analysis
as it relates to this study.

Systems analysis is a formal procedure for examining a complex
process or organization, reducing it to its component parts, and
relating these parts to each other and the unit as a whole in
accordance with an agreed upon performance criteria. Systems
design is a ignthesizing procedure for combining resources into a
new pattern.

By substituting appropriate synonyms one finds the statement indi-
cates that systems analysis includes: (1) a formal procedure for study-
ing a system (a complex process or organization), (2) reducing the
system to its component parts for convenient study. (3) relating these
parts to each other and studying their interaction, (4) emphasis upon
the unit (system) as a whole and the relationship of the interacting
components with this whole, (5) the existence Of a performance criteria
by which the system may be judged.

Information retrieval systems in their simplest form might be
illustrated by Figures 2.1 and 2.2.48 While these diagrams are pro—

vided with references, it is probably unnecessary because of their

common use. Claire K. Schultz has constructed a similar diagram

 

46Harry J. White and Selmo Tauber, Systems Analysis (Philadelphia:
W. B. Saunders Company, 1969), p. 3, citing R. E. Gibson, "A Systems
Approach to Research Management, Part I," Research Management, V (1962)
215.

 

 

47Harold Borko, "Design of Information Systems and Services,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor (New York: John Wiley 8 Sons, 1967), II, 37.

48White and Tauber, op: cit., p. 5.

25

 

 

 

' Z

INPUT , PROCESSING : OUTPUT

 

 

 

 

FIGURE 2.1

INPUT, PROCESSING, AND OUTPUT

 

 

 

 

INPUT ,/ PROCESSING S OUTPUT

u

 

 

1i

 

 

 

 

FEEDBACK

 

 

FIGURE 2. 2

INPUT, PROCESSING, AND OUTPUT WITH FEEDBACK

26
entitled "Design of Basic Components of an Information Retrieval System"

iwhich also includes input, processing, and output. Under each of these
three major components she has three subcomponents; materials, person-
nel, and equipment. She then proceeds to pose a series of questions for
each of the categories which are designed to aid individuals in develop-

49

ing their own retrieval systems. Input, processipg, and output, with

 

or without feedback for systems modification are perhaps as universal to
data processing as any other single concept. This is presently illus-
trated by the categories found on an IBM Flowcharting template which
includes symbols for input, output, various processing procedures, and
feedback via program modification.50

Meadow states, "Information retrieval is the process of recovering
infOrmation-bearing symbols from storage places in response to requests
from prospective users of information or from libraries on the users'
behalf."51 Moreover, Artandi believes:

Document retrieval systems may be viewed as consisting of four
major elements: input to the system, the file(s) that (is/are)
searched, searching methods, and output of the system. While
each of these fOur elements is an essential part of any effective
system, they are subject to differences in emphasis in various
systems and there are differences in the theories and techniques
that relate to them.52

Vickery similarly recognizes a number of components and channels

between the user of the information and the store. He states,

 

49Claire K. Schultz, "DO-It-Yourself Retrieval System Design,"
Special Libraries, LVI (December, 1965), 721.

 

5°IBM Flowcharting Template, Form X20-8020.

51Charles T. Meadow, The Analysis of InfOrmation Systems (New York:
John Wiley 8 Sons, 1967), p. 3.

52Susan Artandi, An Introduction to Computers in Information
Science (Metuchen, N. J.: Scarecrow Press, Inc., 1968), p. 20.

 

 

27
"Retrieval is concerned with the structure and Operation of devices to

select documentary information from a store in response to questions."53
A store can be a library, abstract journal, textbook, etc. A retrieval
device can be an index in a book, library catalog, mechanical selector,
or an electronic data processing device.

These definitions could be applied tO Figure 2.1 by classifying

input as the user request for information, processing as retrieval of

 

information from the store, and output as the information which is pro-
vided to the user making the request. If to this analogy we add the
classification of feedback which is concerned with the effectiveness of
the infbrmation retrieved as well as the cost of retrieving the informa-
tion we have in a simple form the components that are involved in the
analysis, design, and redesign of information retrieval systems. As the
components of various real or ideal information systems are examined,
these elements will occur repeatedly with the modification that each of
these broad components may involve multiple interacting sub-components.
Both Meadow and Vickery view information retrieval as part of a
broader system of communication.54 Meadow states ". . . information
retrieval is part of a complex communication system existing between the
authors of information-bearing documents and their readers."55 In a
diagram that Meadow states is highly over simplified he identifies a

number of components and interactions between components of the

 

53B. C. Vickery, On Retrieval System Theory (London: Butterworths,
1965), p. 2.

 

54Ibid., p. l; and Charles T. Meadow, The Analysis of Information
Systems iNew York: John Wiley 6 Sons, 1967), p. 3.

 

55Charles T. Meadow, The Anaiysis of Information Systems (New York:
John Wiley 8 Sons, 1967), p. 3.

 

28

information retrieval process as it relates to a library. The compon-

ents are:

6.

7.

Authors and publishers generating documents.
Library management.

Indexers.

Indexing files.

Document files.

Search Assistance.

Patrons or users.

Among the interactions are:

1.

Library management acquisitioning documents from authors and
publishers.

Coordination of user needs and problems and interaction between
library management and those involved in search assistance.
Coordination, classification and indexing techniques and
interaction between library management and indexers.

Search requests from the patrons to those involved with search
assistance.

Miscellaneous interactions relating to building index files

and document files, and using these files to retrieve informa-

tion for the patron or user.56

With the proper substitutions for example, computerized files for

manual files, computerized operations for some of the operations done by

indexers and information management for library management, the basic

components could be generalized to the Operation Of various types of

information retrieval processes.

 

56Ibid., p. 11.

29
Vickery identifies ten components of information retrieval which

are very similar to the components and interactions found in the diagram
provided by Meadow. The ten components identified by Vickery are:

1. A document store.

2. A description file.

3. A mechanism fOr indexing.

4. A mechanism for storing.

5. A mechanism for filing.

6. A mechanism for formalizing queries.

7. A mechanism for selecting appropriate documents to be retrieved.

8. A mechanism for retrieving the appropriate documents.

9. Rules for bibliographic description.
10. Rules for subject description.S7

While not included in this list Of ten components, he does mention
in construction of an infOrmation system it is first necessary to select
documents for inclusion in the store. With the inclusion of this ele-
ment and substitutions of appropriate synonyms, the similarities are
striking between the components found in the diagram suggested by
Meadow, the list provided by Vickery, and a diagram suggested by
Lancaster.58 The major difference appears to be that the components

suggested by Vickery and Lancaster pay less attention to the operation

 

57B. C. Vickery, On Retrieval System Theory (London: Butterworths,
1956), p. 11.

 

581bid.; Charles T. Meadow, The Analysis of Information Systems
(New York: John Wiley 8 Sons, 1967), p. 11; and F. Wilfrid Lancaster,
Information Retrieval Systems (New York: John Wiley 8 Sons, Inc., 1968),
p. 4.

 

 

30

of an information depository than the components included in the
diagram by Meadow.
Salton provides a table which describes various types of informa—

59 In this table he

tion centers and the differences in their functions.
refers to the functions described by the previous sources as well as to

suggest functions that relate to the integration Of information includ-

ing bibliographies and analytical studies.

Kochen provides models for information services which use means of
dissemination other than a user asking questions and receiving documents
as a response. Included in Kochen's suggestions are procedures for
evaluation and synthesis of information, tutorial information service,
and standing request lists for Specific types of information to aid in
current awareness.6O

Kochen's work is primarily an examination of the searching and
dissemination process and does not consider the total information system
in its context. Failure to examine a component in the context of a
total system was typical of the majority of the literature reviewed.
Aside from the articles by Kochen which focused on the retrieval proc-
ess, works cited in this section were primarily chosen because they
relate components and their interaction to the whole rather than
focusing upon functions independently. In reviewing other literature,

it appears that the components incorporated in the works cited here

include most, if not all, of those components mentioned by articles and

 

59Gerard Salton, Automatic Information Organization and Retrieval
(New York: McGraw-Hill Book Company, 1968), p. 6.

6OManfred Kochen, "Systems Technology for Information Retrieval,"
The Growth of Knowledgo, Manfred Kochen, editor (New York: John Wiley 8
Sons, 1967), pp. 352-372.

31

books discussing Specific functions. Not directly included in an over-
all description Of most information systems was feedback or the evalua-
tion necessary for continuing system modification, redesign and improve-

ment .

Indexing Methods--Content Analysis, Specification, and Control

Vickery indicates that the key Operation in retrieval is the de-
scription of what the documents are about. ”This is the point at which
research is most urgently needed fOr it is on adequate description that

all ensuing operations in retrieval must rest."61

Similarly Fairthorne
has commented that indexing is "the basic problem as well as the cost-
liest bottleneck in information retrieval."62

The first four volumes of the Annual Review of Information Science

 

and TechnologyP3 discussed indexing methods under chapters entitled,

 

"Content Analysis, Specification, and Control." These three terms
express what some assert are the components of indexing: content
analysis, the process of determining what a document is "about";

specification, a process of assigning indexing terms to describe the

 

document; and control, the process Of establishing and regulating the

form and semantics of the descriptive labels making up the indexing

 

618. C. Vickery, On Retrieval System Theory (London: Butterworths,
1965), p. 36.

 

62R. A. Fairthorne, Towards Information Retrieval (London:
Butterworths, 1961), p. 136.

 

67’Carlos A. Cuadra, editor, Annual Review of Information Science
and Technology, Vols. I-IV, (New York: John Wiley 8 Sons, 1966-69).

 

 

32
language used for specification.64

Content analysis, whether done by indexers or by a computer is
dependent upon the indexing language which controls the procedures used
for the specification of indexing terms. For example, faceted classifi-
cation languages aid the indexer in the analysis of the documents
through grouping similar terms, and control the language by lists of
terms with rules indicating how they are to be assigned to the different
facets.65 If the content analysis is computerized, the algorithms
which are used to analyze the documents (determine what they are about
and specify descriptive terms) are in essence an indexing language. The
control of this language is established through the computerized
algorithms and in some cases the interaction of these algorithms with

authority lists stored in the computer.

Indexing Languages and Retrieval Systems

 

 

Vickery in discussing the Cranfield studies states that an indexing
language

. . is significant in determining the performance of a retrieval
system, not only as a result of actual intellectual arrangement of
indexing, but also very significant on the output side. It is very
significant in the question analysis and definition. So my point
in this connection is simply that one cannot tear a classification
or an indexing language out of the context of a total retrieval
system.6

 

64F. Baxendale, "Content Analysis, Specification and Control,”
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor, (New York: John Wiley 8 Sons, 1966), I, 71; and John R. Sharp,
"Content Analysis, Specification, and Control," Annual Review of
Information Science apoTechnology, Carlos A. Cuadra, editor (New York:
John Wiley 8 Sons, 1967), II, 87.

 

 

 

658. C. Vickery, Faceted Classification Schemes (New Brunswick,
N. J.: Rutgers University Press, 1966), p. 1-108.

 

66Ibid., p. 15.

33

This section will attempt to categorize the various types of
indexing languages and relate these to information retrieval systems
having computerized components. Meadow suggests eight categories of
languages: (1) hierarachical classification, (2) subject headings,
(3) fixed key words, (4) free key words, (5) tagged descriptors,

67 The first

(6) faceted terms, (7) phrases, and (8) natural languages.
four of these range from very structured to very unstructured languages
which have no rules for syntax while the next four languages are
grouped according to increasing sophistication of syntax.

These categories are expressed in different ways by other authors
and by no means represent the total possibilities; for example, Vickery

68 which includes a combina-

talks about faceted classification schemes
tion of What Meadow describes as a hierarchical classification which

has no syntax with a faceted structure which does have syntax—~thus a
combination of categories 1 and 6.

Some indexers might not consider the categories of "free key words"
and "natural languages" as true indexing languages because they do not
control the indexing terms which may be assigned. Compared to the six
remaining "controlled" language categories, HySIOp assigns controlled

languages to three categories: (1) classification schemes, (2) subject

heading authority lists, and (3) thesauri.69 Hylep comments that

 

67Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley 8 Sons, Inc., 1967), p. 47.

 

688. C. Vickery, Faceted Classification Schemes, (Vol. V of Systems
for the Intellectual Organization of Information, ed. Susan Artandi
(New Brunswick, N. J.: Rutgers University Press, 1966), pp. 1-108.

 

 

69Margorie R. Hylep, "Sharing Vocabulary Control," Special
Libraries, December, 1965, p. 708.

34
"there are innumerable specialized vocabularies of hybrids of these

three types that defy any attempt to force them into general catego-
ries."70 As can be seen while attempts have been made to place indexing
languages into neat compartments there is not agreement upon the cate-
gories, and new cagetories can be made up through combinations of exist-
ing ones as indicated by Vickery's faceted classification schemes.

A detailed description of the various categories of indexing lan-
guages is beyond the scope of this review; however, an attempt will be

made to describe briefly some of the categories mentioned.

Classification Schemes or Hierarchical Classification Schemes

 

Classification schemes are highly structured and Show word associations
by means of a hierarchy or family tree which leads the indexer from
general terms at the top of the tree to successively more specific terms
in succeeding lower levels. Usually there is a numerical or alpha-
numeric code which defines the unique term (branch Of the tree) which
can be translated into a unique word description. Examples of such
indexing languages are the Library of Congress system, the Dewey Decimal
Classification system, and the Universal Decimal Classification (UDC),
a modification of the Dewey Decimal system.71

These systems commonly have at least three tools for the construc-
tion of indexes. EiEEE.iS a classification schedule which provides a

visual map of the conceptual structure used to design the tree or

 

708Ibid.

 

71Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley 6 Sons, Inc., 1967), pp. 22-25; and Marjorie R. Hyslop,
"Sharing Vocabulary Control," Special Libraries, December, 1965,
pp. 708-714.

 

 

35

hierarchy. The schedule is ordered according to the hierarchy estab-
lished by the classification scheme. Second is an alphabetical index to
the terms in the classification schedule. Each term that appears in the
classification schedule appears in alphabetical order with its numerical
code identifying its position in the classification scheme. IEEIE.15 a
set of rules which usually describe the structures of the classification

scheme, its alphabetical index and how they are to be used in selecting

terms and applying notation symbols to documents.72
Supject Headings or Authoriiy Lists Subject or authority lists

 

usually consist of alphabetical lists of words and/or phrases which are
acceptable to use as indexing terms. If the list contains terms with
more than one word, the term may appear alphabetically listed under each
word in the term. In this manner, the array provides for limited word
association by bringing together all terms containing a given word.
There are many possible variations on such indexing languages,
including sets of rules which may establish various levels Of indexing
such as main entry terms plus various levels of subterms. If the list
contains only single word terms it is sometimes called a key word index.
While some specific indexing languages in this category contain various
levels of indexing terms with rules for specification, they do not con-
tain an alphanumeric code that is associated with the highly structured

hierarchical classification schemes.73

 

72B. C. Vickery, Faceted Classification Schemes (Vol. V of Systems
fer the Intellectual Organization of InfOrmation, ed. Susan Artandi
(New Brunswick, N. J.: Rutgers univerSity Press, 1966), pp. 40, 41.

 

 

73Marjorie Hyslop, "Sharing Vocabulary Control,” Special Libraries,
December, 1965, p. 708; and Charles T. Meadow, The Analysis Of Informa-
tionySystems (New York: John Wiley 8 Sons, Inc., 1967), pp. 25-33.

 

 

 

36
The authority lists and classification schemes would be called

precoordinate indexing languages, in that all of the word associations
allowable have been established (precoordinated) before they are used by
the indexer. However, the authority list allows for the addition of new
terms with greater ease than the more highly structured classification
schemes.74 The fact that an authority list does not require that the
total system is designed in advance is one of its advantages. As
perceived by Meadow subject headings represent a ". . . loosening of the
structure of a hierarchical language. Their use makes initial language
design easier since there is less to predict, and makes future changes
easier to implement because no elaborate structure need be disturbed by

such a change."75

Thesauri Indexing Languages Indexing languages which use a the-

 

saurus are a natural extension of subject heading languages. As with
the subject heading languages they have an alphabetized list of words
and/or phrases which may be used as indexing terms. However, associated
with each term is a list of narrow terms, broad terms, and related
terms. This provides a limited hierarchical structure and allows terms
which are related to one another to be displayed together.76 Hyslop
says that, "although the hierarchies are not so discreetly displayed,

they go beyond the confines of traditional classification array by

 

74Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley 8 Sons, Inc., 1967), pp. 25-30.

 

7SIbid., p. 26.

76Marjorie R. HySIOp, "Sharing Vocabulary Control," Special
Libraries, December, 1965, pp. 708-710.

37

permitting any term to appear in as many hierarchies as may be appro-
priate. It is thus the more versatile of the three types of vocabu-
laries in Showing word association,"77 Figure 2.3 is an example taken
from the ERIC Thesaurus which shows five major terms and how they are
displayed, with related terms designated by an RT, broad terms desig-
nated by a BT, narrow terms designated by an NT, and "Use For" desig-

nated by a UF.78

Free Key Word and Kenyhrase Indexiog The hierarchical classi-

 

fication, subject headings, and thesauri indexing languages are fixed
in that the number of subjects which can be described is equal to the
number of defined terms. These languages”. . . are often called 'pre-
coordinated' systems, in that whatever semantically meaningful des-
criptor combinations are allowed, have been made--the descriptors
'coordinated' to form terms by language designers."79 The major dis-
tinction that exists between free key word or free key phrase indexing
and pre-coordinated languages is the point where coordination (words or
phrases grouped to ferm indexing terms) takes place. In the free key
word or free key phrase languages this coordination can be done by the
indexer or a computer, thus allowing the opportunity to form new terms
(combinations of words) at the time that the indexing is done. In

these languages the classes of words that may be used are generally not

 

77Ibid., p. 709.

78Thesaurus of ERIC Descriptors, Bethesda, Maryland: ERIC Process-
ing and Reference Facility, Operated for U.S. Office Of Education by
Leasco Systems 6 Research Corporation, 1970), p. 82.

 

79Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley 6 Sons, Inc., 1967), pp. 29, 30.

 

38

FAMILY ENVIRONMENT 160 FAMILY INCOME 220
UF Home BT Income
Home Conditions RT Family (Sociological Unit)
Home Environment Family Resources
BT Environment Family Status
RT Family (Sociological Unit)
Family Influence FAMILY INFLUENCE 490
One Parent Family
Permissive Environment UF Home Influence
‘ NT Fatherless Family
FAMILY FACTORS RT Family (Sociological Unit)
Use Family (Sociological Unit) Family Counseling
Family Environment
FAMILY HEALTH 250 Family Status
Fatherless Family
BT Health Motherless Family
RT Family (Sociological Unit) One Parent Family
Homemaking Education Parental Aspiration

Parent Attitudes
Parent Participation
Parent Reaction
Parent Role

FIGURE 2.3

A PORTION OF THE ERIC THESAURUS80

 

80Thesaurus of ERIC Descriptors, ERIC Processing and Reference
Facility, Operated for U.S. Office of Education by Leasco System 8
Research Corporation, (Bethesda, Maryland: 1970), p. 82.

 

39

restricted except for exclusion of conjunctions, prepositions, articles

and other non-content words.81

The major advantage of such a system is the ease with which new

terms may be made up while a major disadvantage is a lack of control to

aid the searcher and indexer in using the same language.82

Indexing_Laoguages with Syntax Webster's Third New Interna-
tional dictionary, 1967 edition, defines syntax in three ways:

it connected system or order : orderly arrangement : harmonious
adjustment of parts or elements So; sentence structure : the
arrangement of word forms to Show their mutual relations in the
sentence 2} the part of grammer that treats of the expression of
predicative, qualifying, and other word relations according to
established usage in the language under study--compare MORPHOLOGY
3a: SYNTACTICS b: the area of Syntactics dealing specifically with
EBB formal prOpErties of languages or calculi--called also

logioal syntax.

An examination of these definitions might suggest that indexing
languages having syntax would be able to show word relations or the
role of specific words by the arrangement of the language (perhaps by
where a specific word appears in a string of words) or through modify-
ing symbols which are linked to specific terms.

Meadow describes languages which used tagged descriptors. ".
a descriptor has affixed to it another descriptor to describe the
first. The role of the affixed might be to classify the basic des-

criptor, denoting it as a proper name or attribute, or an activity."83

 

81F. Wilfrid Lancaster, Information Retrieval Systems (New York:
John Wiley 8 Sons, Inc., 1967), pp. 21, 30.

82Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley 8 Sons, Inc., 1968), pp. 33, 34.

33Ibid., p. 33.

40

Vickery speaks of faceted classification schemes where the role of
each facet might be identified by what he calls "facet indicators
(somewhat comparable to role indicators)."84

A second type of syntactic mechanism which can be used in indexing
languages is to have an indexing string of terms where the position of
a given term in the string indicates the role that it plays. For
example, in an inventory system, successive terms in a record might
play the fOllowing roles: (1) item name, (2) style, (3) color,

(4) quantity, (5) on hand, (6) unit price, and (7) total value. A
system of this type might be designed in such a way that any term in
the string could be used to arrange the total file in a specified
alphabetic or numeric order in this illustration each one of the terms
in the descriptive record would be called a facet.85

Another manner in which roles or facets may be defined is by
indicating the portion of a document from which a term was extracted.
This procedure is possible in the BIRS system by using field names; for
example, searches may be done which indicate that a person is looking
for the term "Brown" from the author field and that it has no meaning
if it is in the descriptor field. In this cause the searcher wants an
author by the name of Brown and finding the color "brown" in a portion

of the text, the title, or the descriptor field would not be of value.86

 

84B. C. Vickery, Faceted Classification Schemes, (Vol. V of Systems
for the Intellectual Orgonization of Information, ed. Susan Artandi.
New Brunswick, N. J.: Rutgers University Press, 1966), p. 58.

 

 

85Charles T. Meadow, The AnalySis Of Information Systems (New York:
John Wiley 8 Sons, Inc., 1967), pp, 33, 34.

86John F. Vinsonhaler and John M. Hafterson (editors), Technical
Manual for Basic Indexing_ond Retrieval System, BIRS 2.5, Appendix I
(East Lansing: Educational Publications Services, College of Education,
Michigan State University, January, 1969), pp. 2601—2641.

 

 

41
Another use of syntax also available in the BIRS system is the

extraction of phrases from the text of a document or document surrogate.
In this procedure all words are extracted except those contained in an
exclusion list thus maintaining their positional (modification) rela-
tionships with the other words in the sentence. In this way, it is
possible to search for terms with the syntax that is established by
natural language through co-occurrence of terms.87
In general indexing languages involving syntax might be classified
as languages where (l) the syntax occurs because of rules which are
part of the indexing scheme and (2) languages where the syntax is the

result of grammatical structure of the text from which the terms are

extracted.

Methods of Machine Indexing

 

Indexing languages may be used by machines, human beings, or
machine-human combinations in specifying the terms which describe what
a document is "about." The variety of procedures used in machine
indexing is illustrated in a state-of-the-art report by Stevens which

88 In a discussion Of automated

includes a 662-item bibliography.
indexing Borko describes fOur types of procedures: (1) Statistical
indexing, (2) permutation indexing, (3) citation indexing, and

association indexing.89 The discussion which follows includes these

four categories and an added category for procedures which extract and

 

37Ibid., pp. 2401-2419.

88M. E. Stevens, Automatic Indexiog: A State of the Art Report
NBS Monograph 91 (Washington: National Bureau of Standards, March 1965).

 

89Harold Borko, Automated Language Processing_(New York: John
Wiley and Sons, Inc., 1967), pp. 100-114.

 

42

assign terms by algorithms which may use neither syntactical nor

statistical analysis.

Statistical Indexing_ Luhn suggests a statistical procedure for

 

determining the significance of words as they relate to the frequency of
use. In his discussion he notes that words which occur very frequently
usually have little descriptive significance and are often conjunctions,
articles, or prepositions. It is also noted that words which occur very
infrequently may also have little significance because they are not
commonly used, are miSSpellings, or are infrequently-used synonyms for
terminology more commonly used in the literature.9

Meadow, in discussing Luhn's model hypothesizes that "words are
significant as subject descriptors, then, in proportion to the differ-

"91 The Descriptive

ence between their actual and expected frequency.
Analysis Program of the 2.0 version Of BIRS provides one of the better
examples of how this hypothesis can be applied in the design of com-
puter programs.92

Simmons, et. al. describe a system which is divided into two por-
tions, a program called Indexer which is used for indexing the full

text and a program called Protosyntax I which is used in searching. In

the example used to illustrate this system the total text of the Golden

 

90H. P. Luhn, "The Automatic Creation of Literature Abstracts,"
IBM Journal of Research and Development, II (1958), 159-165.

 

91Charles T. Meadow, The Analysis of Information Systems (New York:
John Wiley G Sons, Inc., 1967), p. 100.

 

92John F. Vinsonhaler (editor), Technical Manual for Basic Indexing
and Retrieval System, BIRS 2.0 (East Lansing: Educational Publications
Services, College of Education, Michigan State University, 1968),
pp. 1201-1221.

 

 

43

Book Eooyclopedia was indexed. The procedures used by Indexer to pro-

 

cess the text extracted all words not included in a list of approx-
imately 300 articles, prepositions, conjunctions and other non-content
words. The frequency of occurrence of each content word was determined
and Space allowed to assign sufficient VAPS numbers to identify the
location of each occurrence of the word in the text. (VAPS numbers
indicate the yolume, Article, Baragraph, and Sentence where a term
occurred.)

The search program used the word index with its VAPS numbers to
identify areas of the text that were likely to have a relationship to
specific questions stated in English. Finally, the content and syntax
of these portions of text were compared with the question to determine
what part of the text should be identified as potentially useful.93

The search program was included in the above discussion of index-
ing techniques because a major portion of the content analysis was done
by this program. Because the search program uses syntactical tech-
niques, the inclusion of this example under the heading of statistical
indexing may not be entirely satisfactory, but was done on the basis of

94

Borko's example and the fact that the program producing the index

used only statistical and word extraction techniques.

Non-Syntactical, Non-Statistical Term Specification There are

 

methods of computerized indexing that would be difficult to place in

 

93Robert F. Simmons, and Keren McConlogue, "Maximum Depth Indexing
for Computer Retrieval of English Language Data," American Documentation,
January, 1963, pp. 68-73; and Robert F. Simmons, Sheldon Klein, and
Keren McConlogue, "Indexing and Dependency Logic for Answering English
Questions," American Documentation, July, 1964, pp. 196-204.

 

 

94Harold Borko, Automated Language Processing (New York: John
Wiley and Sons, Inc., 1967), pp. 100-104.

 

44

either a statistical or syntactical category. These methods are gener-
ally based upon algorithms which use an authority list for extracting
terms and, in some cases, for substituting appropriate synonyms.

One such method described by Moon and Vinsonhaler is based upon
the assumption that the terms found in the titles of scientific articles
have high descriptive value. The first step in the procedures was to.
generate an authority list of all descriptive terms found in the titles
of the document file. A descriptive term was defined to be any term
that did not appear on an exclusion list which contained articles, con-
junctions, prepositions, and other words that were considered to be of
little content value. The second step was to index the document surro-
gates by extracting from desired portions of the text all terms which
were both in the authority list and in specified portions of text. In
the example cited, the portions of text used were the title and abstract
of articles.95

Artandi describes a system used in medical articles that extracts
all terms having characteristics which are considered to be unique to
terms having descriptive content. The characteristics defined in the
medical project were, ”. . . length Of the character strings (organic
compounds have long names); an alternating string of numbers, letters,

and dashes; the presence of Greek letters in the strings and the

 

95R. D. Moon and John F. Vinsonhaler, "The Title-Generated The—
saurus: A Practical Method for Automated Indexing," in Shultz, L. (ed.),
Proceedin s of the Sixth Annual National Colloquium on Information
Retrieval - The Information Bazaar. The Medical Documentation Service
of the College of Physicians at Philadelphia, 1969.

 

 

45

presence as part of the name such words as ethyl, methyl, prOpyl,
etc."96 She further states that the indexing algorithm is to satisfy
the following requirements:

. . . to recognize information in the text that Should be
indexed, to switch from a variety of text words to a controlled
vocabulary, to create a standardized index record, to compute and
assign weights to the index terms automatically, to create valid
links between index terms, and to provide for expandability.97
The two indexing schemes cited are typical procedures that do not

rely on the statistical frequency or syntax to control term specifica-
tion, but instead identify terms with high content value by their loca-
tion in the text (for example terms in titles, subheads, abstracts,
conclusions, summaries, etc.), their characteristics, comparing them
with an authority list, or through a combination of these.
While the procedures described by Artandi do not use statistical
or syntactical analysis of the text to extract terms, techniques related
to co-occurrence were applied to the terms extracted. An automatic
algorithm generated links based on the assumption that "co-occurrence
within a sentence is a satisfactory indication that the terms belong
together within the context of the document"98
In the evaluation of this technique it was found that the closer

the terms occurred together within a sentence, the greater the probabil-

ity that the terms actually should be linked together. The data

 

96Susan Artandi, "Computer Indexing of Medical Articles--Project
MEDICO," Journal of Documentation, September, 1969, p. 218.

97Ibid., pp. 214-223.
98Susan Artandi and Edward H. Wolf, "The Effectiveness of Auto-

matically Generated Weights and Links in Mechanical Indexing,"
American Documentation, July, 1969, pp. 198-202.

 

'1‘

11‘

46

indicated that the average number of terms between links that were
judged relevant was 3.71 words while the average number of terms between
links that were judged irrelevant was 7.08 words.

Procedures also automatically assigned weights based on the fre-
quency of occurrence to words which possessed characteristics described
in the extraction criteria. (It should be noted that no frequency or
statistical measures were used in specifying which term could be used
in indexing.) The evaluation indicated that weights assigned to terms
by manual indexers were in agreement 71% of the time with those assigned
by automatic procedures and that 72% of the links automatically assigned
because of fu11 text scanning were considered relevant.99

. The above system illustrates the difficulty of attempting to
classify automated indexing methods;.for while the procedures for term
specification did not use statistical or syntactical methods, additional
procedures which weighted terms by frequency and/or linked terms by co-

occurrence within a sentence were included in the total indexing scheme.

Word Association, Co-Occurrence, Links and Roles One weakness
in some systems which extract key words from text is that they fail to
maintain links between these words and the context from which they were
extracted. When searching is done on these systems irrelevant documents
are sometimes retrieved because the words used by the searcher do not
play the roles expected. For example, a searcher might formulate a
question that reads "train and coach" with the intention of retrieving

information about "train coaches," but instead receives information on

 

99Ibid., p. 202.

47

how to train football coaches. If the words had been extracted in a
manner that the words "train" and "coach" were mechanically linked or
co-occurred in the phrase "train coach," this difficulty could have
been avoided.

The problem described relates to precision, i.e. retrieving docu-
ments that are not relevant to the question and is sometimes solved by
mechanically linking words together, by extracting phrases or co-
occurring words, or attaching tags to words which indicate their role.

Doyle suggests a procedure for using statistics Of word co-
occurrence in the analysis of documents. He also suggests means by
which association maps can be developed for frequently co-occurring
words and presents two methods for using these maps in literature
searching. As used by Doyle, co-occurring words are words which appear
together (co-occur) in the text and association maps are maps which
graphically represent relationships between words, developed through
statistical analysis of co-occurrence. An analysis of text reveals that
some words co-occur with many different separate words whereas some
words may co-occur with only a few other words. This observation
suggests that special significance may be placed upon words co-occurring
with many other words. This is graphically displayed in Doyle's associ-
ation maps.100

Borko indicates that computer programs exist which can analyze the

101

co-occurrence Of words and automatically draw association maps. Dale

 

100Lauren B. Doyle, "Indexing and Abstracting by Association,"
American Documentation, October, 1962, pp. 378-390.

 

101Harold Borko, Automated Language Processing (New York: John
Wiley 5 Sons, Inc., 1967), pp. 112-114.

 

 

[l8

and Dale also describe a retrieval model using association of words,

or "clumping," which has been programmed for a digital computer. Based

on the results of experiments on a small document set they suggest that

the technique shows promise for larger document collections.102
Baxendale describes linguistic experiments at IBM that use syntac-

tical procedures to identify and extract from selected portions of the

document--such as the title, diagram captions, paragraph headings, and.

sentences of abstracts--words associated together as noun phrases. The

sentence, "Since the was by the ,

 

 

 

all must be ," was used to illustrate how the syntax

103

 

 

can predict where noun phrases might appear.

The Descriptive Analysis Program of BIRS has a number of Options
available to the user including the ability to extract phrases in a way
which Obtains a result that has some similarity to the result described
by Baxendale. By using an option to extract from selected portions of
documents all words from sentences except those appearing on an exclu-
sion list, it is possible to develop word strings to serve as
descriptors.

To illustrate how this Option may be used Figure 2.4 uses the
sentence "The extraction of key words and phrases from selected por-
tiOns of the text is very important to automated indexing," with an

and go; In Figure 2.4 the sentence is first shOwn with the words

 

102A. G. Dale and N. Dale, "Some Clumping Experiments for
Associative Document Retrieval," American Documentation, January,
1965, pp. 5-9.

 

103P. B. Baxendale, "Autoindexing and Indexing by Automatic
Processes," Special Libraries, December, 1965, p. 718.

 

49

underlined which would be extracted; then the sentence is shown with
these words removed to illustrate the similarity to the approach des-
cribed by Baxendale; and finally the extracted terms are shown as they
would be recorded on a BIRS description file.

The extraction of words and phrases from selected
portions of’text is very important to automated indexing.

 

 

 

The of and from
of' is very to

 

 

 

$extraction, words, phrases, selected portions, text
important, automated, indexing$

FIGURE 2.4

ILLUSTRATION OF BIRS WORD EXTRACTION TECHNIQUES

The BIRS Description File Search Program allows phrases of up to
ten words to be matched with these strings; however, the match must
occur between $‘s--in this case within a given sentence. If it is
desirable to reduce the length of the string where a match can occur,
it is possible for the Descriptive Analysis Program to insert additional
$‘s at punctuation marks such as commas, colons, and semicolons.

Procedures described in this section show how both statistical and
syntactical methods can be used to extract words in ways that will help
to maintain some of their original relationships. In the methods sug-
gested by Doyle these procedures were used to develOp association maps
involving only word pairs. In methods described by Baxendale and in
those used by the BIRS Descriptive Analysis Program it is possible to
have words associated together in groups larger than pairs. Procedures
suggested by Doyle and those described by Dale and Dale used statistical

techniques; procedures described by Baxendale used a linguistic approach

50

which considered the syntax of the language, and the procedures des-
cribed in the Descriptive Analysis Program used an exclusion list which
could be developed to consider language syntax. In these methods words
are linked by occurring together with their grammatical roles designated

by their context.

Document Association and Citation Indexiog. Another type of

 

association indexing relates to the clumping or grouping of documents
according to the similarity of content. Jones and Needham report on a
computerized program which applies the automated classification tech-
niques associated with the "series Of clumps" to document descriptions
Obtained from the ASLIB Cranfield Project. This particular program
examines co-occurrence of terms based upon their appearing in different
.document descriptions rather than the text of the document. This
results in clustering of related documents rather than a clustering of
words. The authors indicate the evaluation of the programs and prO-
cedures is still being carried Out and consequently draw no definite
conclusions on the value of these procedures.104
A similar technique is reported by Perry which is based on ”inclu-
sion relationships existing between sets of features assigned to the
document." This technique used sets of features to identify documents

that have been indexed "by all or by only some of the total features of

the sets." Perry calls the resulting index a "combined group

 

104K. Sparck Jones and R. M. Needham. "Automatic Term Classifica—
tions and Retrieval," Information Storage and Retrieval, June, 1968,
pp. 29-31.

 

51
co-ordinate index," and indicates that where used on one set Of holdings

there have been demonstrated advantages.105

The above two techniques utilized indexing terms assigned to
specific documents to generate the clumps of similar documents. Another
technique, citation indexing, employs the references cited by an article
to generate clumps of similar documents or do computer searching. In
most reviews of literature a person will use citations only in a his-
torical sense, i.e., he may find an article which is very relevant to
the.information he is seeking and examine the references cited in this
article to obtain other articles. When this procedure is followed, all
of the articles obtained from the reference will be chronologically
older than the original reference.

It is often desirable to move chronologically fOrward in the liter-
ature by examining all articles which have cited a pertinent article.
This is possible with computer programs and involves a procedure where
an article is described by the articles it cites; i.e., the descriptive
terms assigned to the article are citations to other articles. With
this type of indexing the name of a specific article may serve as a
computer search question to retrieve all articles which have cited that
article.106 Price and Schiminovich describe a study where a computer

program was used to do bibliographic coupling (citation indexing), and

they indicate that the clustering process evaluated appeared adaptable

 

105Peter Perry, "Combined Grouping fer Coordinate Indexes,"
American Documentation, April, 1968, p. 142.

106Harold Borko, Automated Lan e Processiog_(New York: John
Wiley and Sons, Inc., I967), p. 108-112; and Charles T. Meadow, The
Anal sis of Information Systems (New York; John Wiley 8 Sons, 1537:
1967 , pp. 86, 119, 120.

 

52
to future use in developing a computer generated classification

scheme.107

Permuted Indexing The term "permuted indexing" is commonly used

 

interchangeably with "fey-Word-ip-Sontext" or KWIC indexing. In 1960
Luhn suggested that an index allowing the user to see key words in their
context would be of value for disseminating new information. He also
described procedures whereby terms might be extracted from machine—
readable documents or document surrogates and a KWIC index generated
automatically by computers.108
The term "permuted index" is descriptive of the format generated by
many computer programs which do this type of indexing. Figure 2.5-A
provides an illustration Of how an article titled "The KWIC Index Con-
cept" would be permuted and Figure 2.5-B then shows how it would be
alphabetized and formatted. If the title "Indexing ConSistency and
Quality" were also included as part of the index, the merged output gen-
erated by the two phrases would appear as illustrated in Figure 2.5-C.
Also included, but not shown in Figure 2.5-C would be information
identifying the document from which the titles were extracted. Luhn
suggested an ll-character code which contained information concerning the

name of the author or senior author, the year of publication, and the

title of the document.109 If the documents in a KWIC index were part Of

 

107Nancy Price and Samuel Schiminovich, "A Clustering Experiment:
First Step waards a Computer-Generated Classification Scheme,"
Information Storage and Retrieval, August, 1968, pp. 271-280.

108H. P. Luhn, "Keyword-In-Context Index for Technical Literature,"
American Documentation, XI (1960), 288-295.

 

 

1°91bid. , p. 271.

53

1»

THE KWIC INDEX CONCEPT
THE KWIC INDEX CONCEPT
THE KWIC INDEX CONCEPT

_.B__
THE KWIC INDEX CONCEPT
THE KWIC INDEX CONCEPT
THE KWIC INDEX CONCEPT

C

THE KWIC INDEX CONCEPT
INDEXING CONSISTENCY AND QUALITY

 

THE KWIC INDEX CONCEPT
INDEXING CONSISTENCY AND QUALITY
THE KWIC INDEX CONCEPT

INDEXING CONSISTENCY AND QUALITY

.__D__
CONCEPT . . 1 THE KWIC INDEX CONCEPT
CONSISTENCY . 2 INDEXING CONSISTENCY AND QUALITY
INDEX . 1 THE KWIC INDEX CONCEPT
INDEXING 2 INDEXING CONSISTENCY AND QUALITY
KWIC . . 1 THE KWIC INDEX CONCEPT
QUALITY . 2 SINDEXING CONSISTENCY AND QUALITY

FIGURE 2.5

EXAMPLES OF PERMUTED OR KEY-WORD IN CONTEXT (KWIC) INDEXES

54

an information file as in the BIRS system, this code could be replaced
by an access number. The output illustrated in Figure 2.5-0 is similar
to the output the BIRS System would generate for a KWIC index if the
access numbers 1 and 2 had been assigned to the two titles.

When this type of index is printed in the format of Figure 2.5-D
rather than the permuted format of 2.5-C some call it a Eey Sord out Of
Sontext (KWOC) index.110 This does not seem entirely consistent with
seeing the word displayed in its context. The term KWOC is used in the
BIRS system to denote an index where the word or phrase appears with
access numbers but without any context.111

The KWIC index has the advantage of allowing the user to see the
syntactical structure of the language around the key word. It, however,
has some obvious limitations; for example, if one were to apply this
type of indexing to the total document, each line of the document might
generate fOur to ten lines of output. Borko suggests that, to cope with
this problem, titles or some other small portion of the text rich in
descriptive words, might be the only part of the document indexed. To
further reduce the amount of output he suggests excluding from the terms
to be indexed classes of words such as conjunctions, prepositions, and

articles.112

 

110Marguerite Fischer, "The KWIC Index Concept: A Retrospective
View," American Documentation, April, 1966, pp. 63, 64.

111John F. Vinsonhaler and John M. Hafterson (editors), Technical
Manual fer Basic Indexing and Retrieval System, BIRS 2. 5, Appendix I
(East Lansing. Educational Publications Services, College of Education,
Michigan State University, January 1969), pp. 2201- 2219.

 

112Harold Borko, Automated Language Processing (New York: John
Wiley and Sons, Inc., 1967), p. 104.

 

55
Trend in Machine Indexing In 1967 Sharp in the second Annual

 

Review of Science and Technology noted that:

 

. ten years later we seem to have reached a period Of
disenchantment, not with machine methods generally, but with
the idea that it is going to be easy. The particular point
we seem to have reached is the realization that statistical
techniques for textual analysis are inadequate.

Borko and.Wyllys writing in the book Automated Language Processing

 

are very candid about the problems that exist in using computers to
extract descriptive terms from documents or in doing automated abstract-
ing. Borko remarks, "Thus we see that automated classification clearly
,supplements but does not replace manual systems of classification." He
further indicates that studies of automated indexing and classification
are the results of "a real need" to improve the storage and retrieval
of information and that while progress has been made much more needs to

114 In summarizing what has been done related to automated

be done.
abstracting, wyllys comments, "If it seems that relatively little has
been accomplished in the field, it should be realized that very few

' people have concerned themselves with automated abstracting.115 While
both Borko and Wyllys present interesting possibilities and point to a

number of promising studies, neither indicate that there will be any

dramatic solutions in the near future.‘ Both present a picture which

 

113John R. Sharp. "Content Analysis, Specification, and Control,"
Annual Review of InfOrmation Science and Technology, (New York: John
Wiley 8 Sons, Inc., 1967), II, 88. 1

114Harold Borko, "Indexing and Classification," Automated Language
Processing (New York: John Wiley 8 Sons, Inc., 1967), pp. 122, 123.

 

115Ronald E. Wyllys, "Extracting and Abstracting by Computer,"
Automated Langooge Processipg, Harold Borko, editor (New York: John
Wiley and Sons, Inc., 1967), p. 160.

 

 

56
implies that solutions will come only through hard work.116

Salton takes a more positive view toward automated indexing than
some. Observing a study which compared the National Library of Medi-
cine's MEDLARS system which uses human indexers with the fully auto-
matic SMART system he states,

Fully automatic text analysis and search systems do not
appear to produce a retrieval perfOrmance that is inferior
to that Obtained by conventional systems using manual document
indexing and manual search fermulations. While the manual
indexing and search formulations can lead to exceptionally
fine results when the indexer and/or searcher are completely
aware of the relationships between the stored collection and the
user needs, the search results are also very poor when the con-
ditions are not met. The automatic process on the other hand,
with its exhaustive input data and complex analysis methods
perfOrms very poorly only rarely, and may often produce com-
pletely satisfactory retrieval action.11

It should be noted that the SMART system is a laboratory system
which utilizes "a variety of intellectual aids in the form of synonym
dictionaries, hierarchical arrangement of subject identifiers, statisti-
cal and syntactical phrase generation methods and the like, in order to
obtain content identification useful for the retrieval process,"118
whereas MEDLARS is an operating system with a data base of over 500,000
119

documents.

A significant change in the use of automated indexing may be

 

116Ibid., p. 127-179; and Harold Borko, Indexing and Classifica-
tion," AutOmated Langooge Processiog_(New York: John Wiley 6 Sons, Inc.,
1967), pp. 99-1250

 

117Gerard Salton, "A Comparison Between Manual and Automatic
Indexing Methods," American Documentation, January, 1969, p. 70.

 

1186. Salton, E. M. Keen, and M. Lesk, "Design Experiments in
Automatic Infbrmation Retrieval," The Growth of Knowledge, editor,
Manfred Kochen (New YOrk: John Wiley 8 Sons, Inc., 1967), p. 337.

 

119Gerard Salton, " A Comparison Between Manual and Automatic
Indexing Methods," American Documentation, January, 1969, p. 70.

 

57
brought about through recent developments in hardware and software which

permit more efficient use of on-line systems. Speaking to this point,
Lancaster and Gillespie state,

There appears to be a very dramatic re-awakening of interest
(somewhat quiescent for a few years, with a notable exception
of Salton's work) and the design of systems incorporating
automated indexing, automated classification, or automated
search elaboration. Undoubtedly this renaissance has been

at least partially prompted by the availability of on-line
processing capabilities.1

One of the major problems in analyzing large amounts of data has
been the cost of putting this data into the computer. Commenting on
this problem Taulbee states,

Naturally all automatic indexing procedures depend upon the
existence Of some representation of the document in machine-
readable form, but this should not be a particular difficulty
as improved page readers become available and as more and more
publications produce machine-readable c0py as a by-product of
the printing process. It is beleived that automatic indexing
will be more economical and less time consuming.12

In attempting to cope with the large amounts of data and the com-
plexities of content analysis, some large systems are utilizing indexers

and abstractors to develop document descriptions which can be used for

computer searching. Notable examples of such systems are the MEDLARS122

and the ERIC123 systems. In these systems the indexing terms and/or

 

120F. Wilfrid Lancaster and Constantine J. Gillespie, "Design and
Evaluation of Information Systems," Annual Review Of Information Science
and Technology, Carlos A. Cuadra, editorIChicagoz Encyclopedia
Britannica, Inc., 1970), V, 39.

121Orrin E. Taulbee, "Content Analysis, Specification, and Control,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor iChicago: William Benton, 1968), III, 120.

 

 

 

122F. Wilfrid Lancaster, "MEDLARS: Report on the Evaluation Of its
Operating Efficiency," American Documentation, April, 1969, pp. 119-142.

 

123Lee G. Burchinal, "The Educational Resources Information Center:
An Emergent National System," Journal of Educational Data Processipg,
April, 1970, p. 55-67.

 

58
abstracts are generated by human judgments with the help of various

indexing or abstracting guidelines.

The problem which is commonly discussed concerning systems that em-
ploy human judgment is the effect of indexer inconsistency upon Operat-
ing characteristics. Lancaster reporting on the Cranfield Project in-
dicates that sixty per cent of failures to retrieve source documents
could be traced to inconsistencies of indexing.124 Zunde and Dexter cite
various studies on indexing inconsistency where the percentage of common
terms assigned by two indexers ranges from ten to eighty per cent of the
total number of unique terms assigned by the indexers.125 Many of the
studies which examine indexer inconsistency do so by comparing results
of two indexers, rather than examining the effect of the indexing upon
precision and recall as done by the Cranfield Project.126

Cooper emphasizes the need to evaluate indexing. He is not con—
cerned about how well human indexers agree with each other or with auto-
mated indexing, but rather about the effectiveness of a particular system
in retrieving documents. He states, "The crucial question is there-
fore; what is the relationship if any, between the level of retrieval

perfOrmance achieved when the indexing is done by that method?"127

 

124F. Wilfrid Lancaster, and J. Mills, "Testing Indexes and Index
Language Devices: The ASLIB Cranfield Project," American Documentation,
January, 1964, p. 7.

125Pranas Zunde and Margaret E. Dexter, "Indexing Consistency and
Quality," American Documentation, July, 1969, p. 60.

126F. Wilfrid Lancaster and J. Mills, "Testing Indexes and Index
Language Devices: The ASLIB Cranfield Project," American Documentation,
January, 1964, p. 7.

 

 

 

127William S. Cooper, "Is Interindexer Consistency a Hobgoblin?,"
American Documentation, July, 1969, p. 268.

 

59

The references cited in this section have been chosen because they

appear to be representative of many possible citations relating to

trends in automated indexing. These as well as other references appear

to support that:

1.

Progress in automated content analysis has not come as easily
or as quickly as some had originally predicted.

There has been progress and there is continuing progress in
automated content analysis.

Developments in hardware and software which have made inter-
active on-line systems more realistic and efficient have
created a new interest in automated content analysis.

Some feel the continuing development and use of optical
scanners and computerized typesetting will reduce the cost of
putting large amounts of data into the computer for analysis.
Because the problem of inputting large amounts of data has
not been totally solved, some large systems such as ERIC

and MEDLARS are using human-machine partnerships where the
indexing and abstracting necessary fer computer searches is
done by pe0ple.

Indexing methods should primarily be evaluated on how

effectively the total system can retrieve documents.

Evaluation of Indexing Methods

In this section the factors involved in the evaluation of informa-

tion systems are identified, and those procedures specifically related

to evaluation of indexing methods are examined in greater detail. The

procedures and descriptive statistics described in this section are

FR

M
‘V

"T" 5")

ﬁ-j

a

.-ﬁ

1".“ I

6O

similarly discussed in a variety Of sources. Closely related to this
study is work done by Cleverdon as part Of the Cranfield Project,128 by
Lancaster in the evaluation of MEDLARS129 which built upon the Cranfield
Project, and by Salton on the totally automated information retrieval
system called SMART.130 In many instances it would be possible to give
multiple references for some of the statements which will be made; how-
ever, usually only a single reference will be given from sources related
'to one of these projects.

Salton identifies the following factors as being among those which
are most important to information systems evaluation:

User population Type of user, rate of requests, etc.

Collection SCOverage of collection, type of document available

at input, reliability of abstracts, etc.

Indexin Type of indexers, level and accuracy of indexers,
‘depth of indexing required, complexity Of indexing lan-

guages, etc.

 

 

*128C. W. Cleverdon, F. W. Lancaster, and J. Mills, "Uncovering
Some Facts of Life in Information Retrieval," Special Libraries,
February, 1964, pp. 86-91; and Cyril Cleverdon, JaEk Mills andTMichael
Keen, Factors Determinimgythe Performance of Indexing Systems, Vol. 1:
Desi n, Part 1: Text, and Part 2: A endices, (ASLIB Cranfield Research
PrOJect, Cranfield, Bedford, Englang, I966), pp. 1-120 and 121-377.

 

*129F. Wilfrid Lancaster, "Evaluating the Performance of a Large
Operating Retrieval System," Electronic Handling_of Information, Allen
Kent, Orrin E. Taulbee, Jack Belzer, and Gordon D. Goldstein, editors
(Washington, D.C.: Thompson Book Company, 1967), p. 199-216; and F.
Wilfrid Lancaster, Information Retrieval Systems (New York: John Wiley
6 Sons, Inc., 1968), pp. l-2I7TI '

 

*13oGerard Salton, Automatic Information Organization and Retrieval
(New York: McGraw-Hill Book Company, 1968), p. 6; Gerard Salton, "The
Evaluation of Automatic Retrieval Procedures--Selected Test Results
Using the SMART System" American Documentation, July, 1965, pp. 209-222;
and G. Salton, E. M. Keen, and M. Lesk, ”Design Experiments in Automatic
Information Retrieval," The Growth of Knowledgo) Manfred Kochen, editor
(New York‘: John Wiley and‘Sons, Inc., 1967), pp. 336-351.

 

 

*See bibliography fer additional material by each of these authors
related to their projects.

61

Anaylsis and search Type of searching, power and complexity of
search mechanISm, search effort required, accuracy of
search, etc.

Equipment and input-output Type of store, type of in-out equip—
ment, type of fOrm of output

Qperating:efficiency Cost considerations, service problems,
time lag, and response time131

 

 

In choosing the factors that are most critical, Salton contends
that the overriding consideration should be those factors that lead to
user satisfaction, and that all other criteria are secondary or relate
to this. Thus, criteria relating to the management of a system would
be considered only in "relation to their effect on the user criteria."132
The following criteria are similar to those identified by both
Lancaster133 and Cleverdon134 as important to user satisfaction:
1. The relevance Of the material contained in the information
system to the user's overall needs.
2. The effort required by the user to request and obtain
information from the system.-
3. The average time that elapses between the time that a request
is made and information is provided to the user.

4. The prOportion of the total relevant material in the system

which is retrieved in response to the user request (recall).

 

131Gerard Salton, Automatic Information Organization and Retrieval
(New York: McGraw-Hill Book Company, 1968), p. 282.

132lbid.

133F. Wilfrid Lancaster, Information Retrieval Systems (New York:
John Wiley and Sons, Inc., 1968), pp. 33, 34.

 

i34C. W. Cleverdon, Identification of Criteria for Evaluation
of Operational Information Retrieval Systems, Cranfield College of
Aeronautics, England, November, 1964.

If.
plrl

me
Ian

.9'

w

9:"

Abe.

-\.v

9
s

62

5. The proportion of the total material retrieved in reSponse to
a user's request which is relevant to that request (precision
ratio).

6. The form in which information retrieved by search requests is

presented to the users.

In the evaluation of indexing methods the factors which would be of
greatest importance would be those relating to a system's ability to
retrieve documents in response to search questions. Of the factors
discussed by Cleverdon and Lancaster, recall and precision ratios are
the two criteria that relate most directly to indexing evaluation.
Lancaster and Gillespie explain:

Most investigators, from Cleverdon on, have expressed
evaluation results in terms of twin variables of recall ratio

(the number of relevant documents retrieved over the total

number of relevant documents in the collection) and recision

ratio (the number of relevant documents retrieved over the

total number of documents retrieved), although both ratios

are sometimes given other names in the literature. For most

practical purposes, these ratios are perfectly adeguate
parameters for expressing the results Of a search. 35

 

Descriptive Statistics for Document Retrieval

 

When indexing is evaluated on the ability Of search questions to
retrieve documents, the descriptive statistics used involve four
variables. Listed below are these variables which result from the
partitioning of a document collection by a search.

a. Documents that are not retrieved and not relevant.

b. Documents that are not retrieved but relevant.

c. Documents that are retrieved and relevant.

 

135E. Wilfrid Lancaster and Constantine J. Gillespie, "Design and
Evaluation of Information Systems," Annual Review of Information Science
and Technology, Carlos A. Cuadra, editor, (Chicago: William Benton,
1970), V, 45.

 

63

d. Documents that are retrieved but not relevant.136
The partitioning illustrated by Figure 2.6 and the variables designated
by a, b, c, and d will be the basis for the discussion of the statistics

used to describe document retrieval.

 

Recall and Precision Ratios for a Single Search Qoostion As
related to Figure 2.6 the recall and precision ratios137 for a single

document are defined as:

Recall = number of documents retrieved and relevant = c
total relevant in collectiOn b + c

 

(Formula 2—1)

Precision = number of documents retrieved and relevant = c
total retrieved h c + d
(Formula 2-2)

 

Averages of Recall and Precision In evaluating indexing methods

 

it is desirable to use the results of a number of searches to calculate
an average value of precision and recall. In determining these aver-
ages one possible procedure gives equal weight tO each question in a
set of searches while a second gives equal weight to each document.

In the fellowing fermulas b, c, and d are used as illustrated in
Figure 2.6; m stands for the total number of searches and i is an index
number with different integer values from l-tO m to indicate results
for different searches. For example, if i equaled 2 then c2 would
stand fer the number of relevant documents retrieved bythe second

search. The method which gives equal weight to each search takes a

 

136Gerard Salton, Automatic InfOrmation Organization and Retrieval
(New York: MCGraw-Hill Book Company, 1968), pp. 282, 283.

7
13 Ibid., p. 283, 284.

 

 

64

15%? Documents retrieved by the search question

[MI Documents relevant to the search question

 

a = Documents that are not retrieved and not relevant

b = Documents that are not retrieved but relevant

c = Documents that are retrieved and relevant

d = Documents that are retrieved but not relevant
FIGURE 2.6

THE PARTITIONING OF A DOCUMENT COLLECTION BY A SEARCH QUESTION

65

mean of the recall ratios and precision ratios for each question with
the averages referred to as Average Macrorecall and Average Macropreci-
sion. Average Macrorecall and Average Macroprecision are defined as

follows:138

Average Macrorecall = 1 Ci
E. b1 + c1
i=1 (Formula 2-3)
m
Average Macroprecision = 1 ci
5' Ci + 31
i=1 (Formula 2-4)

The second prOcedure places equal weight upon the documents
retrieved by treating the results of multiple questions in exactly the
same manner as if they had resulted from one large search question.

The resulting averages are called Average Microrecall and Average Micro-

Precision and defined as follows:139

 

 

Average Microrecall = 1:1
m
(b1 + Cl)
i=1 (Formula 2-5)
m
Ci
Average Microprecision = i=1
m
(Ci + di)
i=1 (Formula 2-6)

 

1381bid., p. 299.

1391bid.

 

66

Figure 2.7 illustrates what happens when a search question re-
trieves a comparatively large number of documents which are unrelated
to the information request. As can be seen, questions 1, 3, 4, and 5
retrieved many less documents than question 2, which in the illustra-
tion had very poor recall and precision. An examination of the data
reveals that the effect of question 2 which retrieved proportionately
more documents than the other questions was greater on the Micro
Averages than the Macro Averages. Determination of which average is
most appropriate to evaluation depends on which is more important,
the user request (in which case Macro Averages would be used) or
distinguishing between relevant and nonrelevant documents (in which

case Micro Averages would be used).140

Estimates of Recall In calculating recall in an experimental

 

system with a small number of documents it is possible to look at every
document in the file to determine its relevance to a specific question.
With systems having thousands of documents this task is totally unrealis-
tic, and procedures must be used to estimate values of average recall.
There are at least five specific procedures mentioned in the literature
for estimating recall or the number of documents in a file which are
relevant to a specific question.

A firso technique for estimating the number of documents in a file
which are relevant to a question is to examine a random sample to deter-
mine what percent of the documents in the sample are relevant. For
example, if in a 1% sample there were 5 relevant documents to a question

it would be assumed that in the total file there would be 500 relevant

 

14°1b1d.

 

67

 

 

 

 

 

 

 

 

 

 

Recall Precision
Ci Ci
i bi Ci di bi + Ci Ci + di
1 3 7 7 7 5
2 17 3 37 15 075
3 1 9 6 9 6
4 3 17 7 85 5
Total
Searches=m=5 S 5 10 5 .67 .67
COLUMN TOTALS 29 46 62 3.27 2.345
5
1 Ci 1
= _ ..____.= — 3.27 _—__ .656
Average Macrorecall 5 2 bi + Cl 5 ( )
i=1
5
1 Ci 1
Average Macroprecision: '5' Z Ci + d1 " '5‘ (2-345) = -469
i=1
1

1 46 46

 

5
ZC'
i=

 

Average Mlcrorecall : 5 = m: 75 = 596
zbi + Ci
1 l
5
Z c,
. . . i=1 __ 46 __ 46 __
Average M1croprec151on _. 5 -— 46—7—62'T'I08'-— .426
2 C1 + d1
i=1

FIGURE 2 . 7

A COMPARISON OF VARIOUS TYPES OF RECALL AND PRECISION AVERAGES

()8

documents.]41

ln large collections this, however, still has limita—
tions. For if a 1% sample were taken of the MFULARS system which con-
tains over half a million documents it would require looking at more
than 5,000 documents to estimate recall for a single question.

A second procedure involves using the retrieval of source or target
documents as an estimate of recall. In this method source documents are
used as a basis for writing questions to retrieve information from the
file. Recall is then based on the assumption that the collection con-
tains only source documents and is determined by the percentage of
source documents retrieved when one question is written for each source
document in a random sample.142 For example, if 100 source documents
were used in writing 100 different questions and 75 Of these questions
retrieved the specific source document used as a basis for writing the
question, the estimate of average recall would be .75.

A igiio approach used by Lancaster in the evaluation of MEDLARS
was based upon questions written by users. The user is asked to list
documents that he knows are related to the specific question. The
information from the title and author is used to determine if any of the
documents listed by the user are in the information file and the per-
centage of these known relevant documents retrieved is used as an
estimated recall. For example, if a user provided a list of documents
related to a question, and if it were determined by using titles and

authors that 20 of these were in the information file and 17 were

 

1411bid.

142E. W. Lancaster and J. Mills, "Testing Indexes and Index
Language Devices: The ASLIB Cranfield Project," American Documentation
January, 1964, pp. 4, 8.

 

69
retrieved by the search question, the recall for that question would be
estimated to be 17 + 20 or .85.143
A fourth method is to use a KWIC index to examine the titles Of
the documents in the file and thus locate documents that are relevant to
a specific question. These documents are then used as a basis for esti-

144 In using this method an

mating recall for that specific question,
assumption must be made that the documents not found by using the KWIC
index will be retrieved in the same proportion as documents found by
using the KWIC index.

A.ii£th_technique that can be utilized with systems having a variety
of automated search procedures is to use an information request to formu-
late questions that use as many of the different search procedures as
possible. Results of these multiple searches for the same information
request are examined and the aggregate of the relevant documents found
is considered to contain all the relevant documents in the file.145

The method utilizing source documents was used by early studies of

the Cranfield Project,146 and subsequently criticized by Swanson because

the question might not be typical of user's requests.147 The procedure

 

143E. W. Lancaster, "Evaluating the Performance of a Large Operating
Retrieval System" Electronic Handling of Information, Allen Kent, Orrin
E. Taulbee, Jack Belzer, and Gordon D. Goldstein, editors, (Washington,
D.C.: Thompson Book Company, 1967), pp. 201-204.

144Gerard Salton, Automatic InfOrmation Organization and Retrieval
(New York: McGraw-Hill Book Company, 1968), p. 294.

 

1451b1d., p. 299.

146F. W. Lancaster and J. Mills, "Testing Indexes and Index Lang-
uage Devices: The ASLIB Cranfield Project," American Documentation,
January, 1964, p. 4-13.

 

147Swanson, Don R., "The Evidence Underlying the Cranfield Results,"
The Library Quarterly, January, 1965, p. 1-20.

70

was later defended by Cleverdon where he declared,

Naturally, I accept that there must be a somewhat un-
natural relationship between a question and a document on
which it is based, but I am not prepared to concede that
this relationship is such as to negate all the results. While
we would not use questions of such type in a research test, I
believe, for reasons argued elsewhere, that they can still be
used satisfactorily in situations where time and cost are
important considerations, as might be the case in an evalua-
tion of an operational information-retrieval system, and I
shall continue to believe this until experimental data are
produced which Show me to be incorrect. 48

Relevance Judgment

 

One of the first questions that must be answered in the evaluation
of an information system is the manner in which the relevance of docu-
ments to specific questions will be determined. Regardless of the
objectivity used in other procedures there appears to be no way to do
this without using human judgments. Because of the subjective nature of
these procedures there has been considerable work to determine the
effect of these judgments on the evaluation of information systems.

Cuadra, Katter, et. al. have made an extensive study of this area
and the result is succinctly described in the abstract:

Evidence has been developed that suggests that relevance
judgments can be and are influenced by skills and attitudes

of the particular judges used, the documents and document sets

used, the particular information requirement statements, the

instructions, and setting in which the judgments take place,

the concepts and definitions of relevance employed in these

judgments, and the type of rating scale or other medium used
to express the judgments.

 

148Cyril Cleverdon, ”The Cranfield Hypotheses," The Library
Quarterly, April, 1965, pp. 121-124.

149Carlos A. Caudra, Robert V. Katter, Emory H. Holmes, and Everett
M. Wallace, Experimental Studies of Relevance Judgments: Final Report
(Santa Monica, California: System Development Corporation, June, 1967),
I-III.

 

 

71

This study points out that various factors can influence the rele-
vance assessment of judges; however, other studies indicate that while
there is variance between judgments, rankings of particular documents
retrieved in response to a request tend to be the same for different
groups of judges.150 Salton feels that if consistent procedures are
used when comparing various retrieval methods, the comparisons of the
methods should not be affected because "bias introduced by individual
faulty relevance judgments may be expected to be in the same direction
for all methods." He does, however, acknowledge that absolute values
for either recall or precision measures must be examined with consider-
able care because of their dependence upon subjective judgments.151

A study done by Lesk and Salton compared relevance assessments by
individuals who compiled questions with relevance assessments of a
second person who had compiled another request. The results indicated
that while the overall agreement among relevance assessments was not

high, this did not affect the relative performance of various retrieval

methods. In other words, despite the inconsistency in judgment the

 

comparative rankiogs of methods fromypoor to best were not changed.152

 

 

150Orrin E. Taulbee, "Content Analysis, Specification, and Control,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor iChicago: William Benton, 1968), III, 107, citing Alan Rees, and
Douglas G. Schultz, Principal investigators, A Field Experimental
Approach to the Study of Relevance Assessments in Relation to Document
Searchin . Center for Documentation and Communication, School of Library
Sc1ence, Case Western Reserve University, Cleveland, Ohio (October 1967,
Vol. I (287 pp.): Vol. II, Appendices A-Q. Final Report to the National
Science Foundation.

 

 

 

 

151Gerard Salton, Automatic Information Organization and Retrieval
(New York: McGraw-Hill Book Company, 1968), p. 302.

152M. E. Lesk and G. Salton, "Relevance Assessments and Retrieval
System Evaluation," Information Storago and Retrieval, December, 1968,
pp. 343-359.

 

72

These studies tend to support the position that comparisons of differ-
ent systems present difficulties because of the way relevance judgments
affect numerical measures. Caudra, Katter, et. al. suggest that a
variety of factors, including document sets, may influence relevance
judgments. (See quotation on page 70.) If true this tends to make sus—

pect comparisons when done on different information files.

Comparison of Indexing_Schemes

 

Bourne in the first Annual Review provides a summary of the indexing

 

evaluation projects prior to 1965. Most of these projects compare
various indexing languages; however, says Bourne,

One point becomes clearer after viewing this literature, namely
that it is extremely difficult to make meaningful generalizations
about the performance of various indexing systems. In almost

all experimental reports, the investigator worked with an
indexing language different than that of other experiments.
Consequently, no one has ever had his test results verified,

or expanded, or made more precise by another experimenter.
Furthermore, the actual numerical values given for recall

and relevance, or other factors for a particular indexing

system would appear to have value only to that system.15

An examination of the literature for the most part still tends to
support Bourne's position; however, there appear to be some other gener-
alizations that are supported by a variety of articles. Of particular

note are: (1) the fact that over-sophistication of indexing systems

 

does not appear to be worthwhile, and (2) there exists an inverse rela-

 

 

tionship between recall and precision, such that in attempting to im-

 

prove recall there is usually a decrease in precision and vice versa.

 

 

Sharp in the second Annual Review of Information Science and

 

157’Charles P. Bourne, "Evaluation of Indexing Systems," Annual
Review of Information Science and Technology, Carlos A. Cuadra, editor
(New York: John Wiley 8 Sons, Inc., 1966), I, 180.

 

73
Technology, cites a series of authors to support his assertion that,

The most useful product of any review is a positive con—

clusion that promises to have immediate application. An

emergent feature of some of the investigations that are being

carried out is that for many retrieyal sygﬁems, attempts to

over-sophlstlcatlon are not worthwhile,"

Among the studies cited by Sharp is the Cranfield Project which
compared four indexing languages--U.D.C., alphabetical, facet, and
uniterm--where Sharp quotes Cleverdon and Keen's conclusion that "single
term indexing languages are superior to any other type,"155 After citing
a number of other studies Sharp concludes that the studies ". . . are
all pointers to the probability that in small to medium systems where
no special conditions exist, over elaboration of retrieval languages and
their associated control devices cannot be justified,"156 The fact that
simple indexing methods, such as key-word indexes, appear to do as well
or sometimes better than other indexing languages is important to machine
indexing, because of their adaptability to computer programming.

The second generalization that appears appropriate is that there is
a tradeoff between recall and precision. This generalization is support-

ed by Lancaster and Mills in their report of the Cranfield Project157

 

154John R. Sharp, "Content Analysis, Specification, and Control,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor (New York: John Wiley 8 Sons, Inc., 1967), II, 90.

 

155Ibid., p. 90 citing Cyril Cleverdon and Michael Keen, Factors
Determiniogthe Performance of Indexing Systems, Vol. 2: Test Results,
ASLIB Cran ieldEResearCh Project, Cran ield, Bedford, England, 1966,
pp. 1-299.

 

 

. 156John R. Sharp, "Content Analysis, Specification, and Control,"
Annual Review of Information Science and Techpoiogy, Carlos A. Cuadra,
editor (New York: John Wiley 8 Sons, Inc., 1967), II, 90.

 

157F. W. Lancaster and J. Mills, "Testing Indexes and Index Lang-
uage Devices: The ASLIB Cranfield Project," American Documentation,
January, 1964, p. 9.

 

..3
1
mm

5“"
tie.

OH.

nei

.. .
1' s
vV'nu

‘ u
E ,
I w.

r..
(I:

.r‘

W‘.

IEI

"A
l
Sgr

‘1'.

(A
.\~l

74

and also by Salton, Keen, and Lesk in their report on the results of the
SMART system.158 .

Swets has recognized this tradeoff in his development of a descrip-
tive statistic called ”the normal Operating characteristics curve” which
considers the variables used in the calculation of both precision and
recall.159 This tradeoff can be seen in a subjective way by considering
what happens when the total file is retrieved in response to a question.
Obviously all documents that are in the file and relevant to that ques-
tion are in the response, thus giving perfect recall. Yet the precision
is reduced to the number of relevant documents in the file divided by the
total documents in the file. This idea is clearly stated by Sharp. He
says, "It is easy enough to ensure that all or most of what is of
interest is recovered from a file by simply casting the net wide enough,
but there is little purpose in doing this if the result is a set of
retrieved documents of unmanageable size.160

In comparing various types of controlled indexing languages Hyslop
concludes that a thesaurus type of control seems best for a computerized

system.161 Artandi concludes that many issues remain unsolved but "the

advent of interactive systems has placed a new emphasis on the thesaurus

 

1586. Salton, E. M. Keen, and M. Lesk, "Design Experiments in
Automatic Information Retrieval," The Growth of Knowled e, Manfred
Kochen, editor, (New York: John Wiley 8 Sons, Inc., 1967 , pp. 344-346.

159John A. Swets, "Information-Retrieval Systems," The Growth of
Knowled e, Manfred Kochen, editor, (New York: John Wiley 8 Sons, Inc.,
19675, Pp. 174-184.

160John R. Sharp, "Content Analysis, Specification, and Control,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor (New York: John Wiley 8 Sons, Inc., 1967), II, 115.

 

 

 

161Marjorie R. Hyslop, "Sharing Vocabulary Control,” Special
Libraries, December, 1965, pp, 708-714.

75

as an aid to the user as part of a user-oriented prompting apparatus,"
and later that ". . . operation of indexing, query formulation, and
vocabulary building cannot be regarded as isolated activities, and that
good systems designshould provide for their meaningful integration."162
Thus, while Bourne's contention seems to be supported that indexing
evaluation has been done in such a manner that it is difficult to com-

163 there are some generalizations which can be made from

pare systems,
conclusions dependent upon rankings or trends rather than a comparison

of numerical values. These results tend to support the generalizations
that: i

1. There appears to be no clearly superior indexing scheme.

2. In determining the best procedures for any given system it is
important to consider the total characteristics of the system
being designed.

3. Some studies indicate that key-word systems which are applica-
ble for machine use do as well as, or slightly better than,
other more sophisticated indexing schemes.

4. Because of the apparent inverse relationship that exists be-
tween precision and recall, both statistics need to be
reported if the evaluation of indexing methods is to be

meaningful.

 

162Susan Artandi, "Document Description and Representation,"
Annual Review of Information Science and Technology, Carlos A. Cuadra,
editor, (Chicago: William Benton, 1970), V, 161.

 

163Charles P. Bourne, "Evaluation of Indexing Systems," Annual
Review of Information Science and Technology, Carlos A. Cuadra, editor
(New York: John Wiley G Sons, Inc., 1966), I, 180, 181.

an.
1
Mn;

MA

I.”

4.

$1.:

“'I
1‘1

“'1‘

‘1-

e”

 

:3

L/ll

76

An Overview of BIRS--Basic Indexing and Retrieval System

The BIRS system is examined because it provides an example of a
computerized information system and at present is the system used by
the CBC-ERIC Information Center.

The term BIRS is used to refer to a series of computer programs
written in Fortran IV which have been developed to assist in the compu-
terized operations of information retrieval systems. "The distinctive

characteristics of BIRS are generality (applicability to many types of

 

information system problems), portability_(ease of installation on many

 

types of computer configurations), and usability (ease of usage by indi-
viduals lacking computer science training)."164 The development of the
programs has been jointly supported by Michigan State University and a
series of federal grants from the U.S. Office of Education. The first
‘grant was obtained in May of 1966, with continuing support from grants
through March of 1971.165
An examination of the documentation reveals that the outward appear-
ance of the system has remained reasonably constant; however, the actual

development has been an evolutionary process resulting in a series of

refinements and the ability of the programs to run on different

 

164John F. Vinsonhaler, The Information Systems Laboratory: A
Progiess Report fer 1969 ISL Report NO. 10. (East Lansing: Michigan
State University, January 1970), P. 3.

 

 

1651bid., p. 10.

77

computers.166

Considerable effort has been made to ensure that later BIRS versions
have been built in modular form to permit the replacement of modules by
more efficient operating programs without affecting the overall appear-
ance of a system to the user.167 The modular design and flexibility of
the system makes this system ideal for those developing procedures for
new information retrieval systems, as will be demonstrated in reporting
this study.

The overall programs are divided into four major categories:

(1) systems file maintenance, (2) information storage, (3) information
indexing, (4) and information retrieval. The first, systems maintenance,

is composed of three programs: The EXECutive program (EXEC), the TASK

 

maoogement program (TASK), and the TRANSlation program (TRANS). The

 

single program which provides for information storage is the Information

 

File Maintenance Program (IFMP). The four programs which make up infor-

 

mation indexing are the Printed Indexiog Program (PIP), the Printed

 

Listing Program (PLP), the Descriptive Analysis Program (DAP), and the

 

Descriptive File Maintenance Program (DFMP). The two programs which con-

trol infermation retrieval are the Descriptive File Searching Program

 

166Ibid., p. 2; John F. Vinsonhaler (ed.), Technical Manual Basic
Indexin and Retrieval Systems, BIRS 2.0 (East Lansing: Educational
Publications Services, College of Education, Michigan State University,
January, 1968), pp. 101-111; and John F. Vinsonhaler and John M.
Hafterson (eds.), Technical Manual for Basic Indexing and Retrieval
System, BIRS 2.5, Appendix I (East Lansing: Educatidhal Publications
Services, College of Education, Michigan State University, January 1969),
pp. 2001-3118.

 

 

 

167John F. Vinsonhaler and John M. Hafterson (editors), Technical
Manual fer Basic Indexing and Retrieval System, BIRS 2.05, Appendix I
(East Lansing: Educational Publications Services, College of Education,
Michigan State University, January, 1969), pp. 2004-2014.

78

(DFSP), and the Information File Retrieval Program (IFRP).168

 

Figure 2.8 provides a diagram of the BIRS system showing the indi—
vidual programs and the relationship that exists between these programs.

This diagram again illustrates the concept of input, processiog, and

 

output while the processing is here represented by the four fundamental

Operations, systems maintenance, information storage, information

 

indexing, and information retrieval.

 

The Executive Program--EXEC The purpose of the Executive Program is

 

to allow the user to use the various independent programs by means of a
command language. The command language interacting with EXEC can call

the various programs and indicate the specific task that these programs

are to perform. Because of the Executive Program, the series of programs

with their commands looks to the user like a powerful and simple program-

ming language designed for doing information storage and retrieval.169
Task Management Program--TASK The Task Management Program is designed

 

to permit the user to combine a series of command statements to form

predefined operations. By calling for a specific task the user initiates

the sequence of commands similar to the way one would use a Subroutine

in a programming language such as Fortran.170

Translation Program--TRANS This program is designed to translate

 

the standard Fortran IV source programs into various machine-dependent

 

Fortran dialects.171
1681bid., p. 2012. 169Ibid., p. 2833.

l701bid,’ pp, 2834-2838. 1711bid., pp. 2839-3848.

*

L.U
. l>$)( >53»? >3» >82,» \\\\\ \\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\

33333? 3333333333 ,

my )3)st
3393338)
li)l)l)li)ii)))lii)ll)
iii<<ii><<>i

>))))))

1001
W.

,3)» WON»
Nwwsxsxsxxisxsswi
23333333 88K333W 333333

”H”

E:
4
LI-
H

m
MH"\NJ vvvvv M03111-

’2
\/\ ‘.
-1? sﬁé
:k, N“
\NNx/T

12v

 

:¢

é

%
x/‘fx/

W
W

)3

i

v“

W

:f

W

f

M

f

A

\/

W

v

EN

f
Jr‘y/
N

W
Mi

,A

/\,
f

Av

ova

mp)
333333

T
Mi

M%

 

 

    

 

Z
0
HA
s—st

2
Eu
Cir—t
On:
U—i—
ZLLI
HQ:

2
O
D—t
p.
.54
1
0
LL
2
p—

......

 

 

..\' INDEXING

FIGURE 2.8
AN OVERVIEW OF THE BASIC INDEXING AND RETRIEVAL SYSTEM (BIRS)

80
Information File Maintenance Program--IFMP The purpose of the Infor-

mation File Maintenance Program is to build and maintain information
files from the data which is provided by the users.

This program as well as other BIRS programs can process textual
information as it might be formatted for a book or article. The symbol
asterisk-dollar sign (*$) is used to indicate that the word or phrase
following is a BIRS command. Users may initiate separate records with
an *$ABSTRACT command and designate specific types of information within
a record by a field delimiter (a special character which the user may
define) fbllowed by a field name or blank. If there is a blank after the
delimiter, the field must later be accessed by number with the nEE delim-
iter in a record indicating the beginning of the nfh_fie1d. The primary
function of designating fields is to make it possible to index and/or

search a portion of a record rather than the total record.172

Printed Indexing Program-~PIP The Printed Indexing Program allows the

 

user to generate printed indexes of the data stored on the information
files. The program has the ability--at the option of the user-~to index
single words or phrases from any designated field. In the indexes
numbers follow specific terms to indicate the abstracts (records) on

the information file which contain that term. ’For example, if the term
"Early Childhood Education" were followed by the numbers 4, 17, 25, 107,
and 110, it would indicate that the term is in each of these abstracts.
Those using the system for information center operations commonly index

title, author, and descriptor fields.173

 

1721bid., pp. 2101-38.

173Ibid., pp. 2201-19.

81
Printed Listing Program--PLP The purpose of the Printed Listing

 

Program is to list the content of the information file in alphabetical
order, according to a criteria determined by the user. Ordinarily the
ordering is done on a single term per abstract--the first term of a
specific field-—so that each abstract will appear only once in the
ordered list; however, it is possible, by a command, to remove this
restriction so that an abstract will appear once for each different

term in a specified field.174

Descriptive Analysis Program--DAP The commands and functions of

 

this program are similar to the Printed Indexing Program in that the
results of the analysis are used by the Description File Maintenance
Program to build a description file. As in the Printed Indexing Program,

. . 7
words or phrases from any number of specific fields may be indexed.1 5

Description File Maintenance Program--DFMP The purpose of the

 

Description File Maintenance Program is to use the results of the Des-
criptive Analysis Program in building and maintaining description files
which are searched by the Description File Search Program. The program
is very similar to the Information File Maintenance Program in that new
files may be built, old files added to, or old descriptions replaced by
new descriptions.

As with the information file where abstracts are sequentially
stored, the description file has descriptions of abstracts stored to

form a one-to-one correspondence between the description on the

 

1741bid., pp. 2301-5.

1751bid., pp. 2401-19.

82

description file and the abstract on the information file. The major
purpose of the description file is to store the information needed for
computer searching in a manner that will reduce the computer time

required fbr searches.176

Description File Search Program--DFSP The following discussion of

 

the Description File Search Program is more comprehensive than the
descriptions of the other programs because of its relation to the
evalution procedures described in Chapter 4.

This program allows a user to search descriptions of documents
contained on the description file and generate a question file contain—
ing infbrmation about all documents which meet the criteria defined by
a question. Available in this program is a language which allows for
logical (Boolean) searches, relevance searches, and weighted relevance
searches.

All questions in the BIRS program begin with an *$QUESTION appear-
ing separately on the first line followed by lines containing the user's
request fbr information--a logical expression or logical expressions
separated by commas. The simplest logical expression is a word, the
next simplest a phrase, with more complicated expressions being formed
by joining and/or modifying words and phrases with the logical operators
.AND., .OR., and .NOT.. Examples of logical expressions which might be
used in a BIRS question are:

l. RETARDED

2. MENTALLY HANDICAPPBD

 

1761bid., pp. 2501-2519.

83
3. MENTALLY HANDICAPPED .AND. BLIND

4. MENTALLY RETARDED .OR. MENTALLY HANDICAPPED

5. .NOT. BLIND

6. (MENTALLY RETARDED .OR. MENTALLY HANDICAPPED) .AND. DEAF .AND.

.NOT. BLIND

If expression 3 were used as a question a document would have to
contain both the terms MENTALLY HANDICAPPED and BLIND before it would be
retrieved. If expression 4 were used a document would be retrieved if it
contained one or both of the terms MENTALLY RETARDED or MENTALLY HANDI-
CAPPED. If expression 5 were used the document would not be retrieved if
the term BLIND were in the document but would be retrieved if the term
BLIND were not in the document.

When a user is uncertain of the order in which a computer will com—
bine words or phrases to form a new expression, the user may indicate
the order with parentheses. The parentheses are interpreted by the com-
puter to mean that the words or phrases inside the parentheses are to be
combined first. Expression 6 illustrates one of the most frequent uses--
the combining of words or phrases by .OR. to indicate they have similar
meanings. Expression 6 also illustrates the fact that when logical
expressions are combined by .OR., .AND., and/or modified by .NOT. the
resulting expression is also a logical expression.

Another command available in the BIRS language is *$COMMENT which
causes all information between this command and the next line beginning
With an *$ to be printed as a comment on the BIRS report. The *3COMMENT

Conunand, and the *$QUESTION command followed by a question are used in

Figure 2.9 to illustrate some of the types of questions recognized by

DFSP -

84

*$COMMENT EXAMPLE 1
*$QUESTION
MENTALLY HANDICAPPED

*$COMMENT EXAMPLE 2
*$QUESTION
MENTALLY HANDICAPPED .AND. BLIND

*$COMMENT EXAMPLE 3

*$QUESTION

(MENTALLY HANDICAPPED .0R. MENTALLY RETARDED) .AND.
PHYSICALLY HANDICAPPED

*$COMMENT EXAMPLE 4

*$QUESTION

(DEAF .OR. HARD OF HEARING .OR. HEARING IMPAIRED) .AND.
(BLIND .OR. VISUALLY HANDICAPPED) .AND. .NOT.

(MENTALLY HANDICAPPED .OR. MENTALLY RETARDED)

*$COMMENT EXAMPLE 5
*$QUESTION
DEAF, BLIND, SPEECH HANDICAPPED, MENTALLY RETARDED

*$COMMENT EXAMPLE 6
*$QUESTION
MENTALLY RETARDED=4, DEAF, BLIND, SPEECH HANDICAPPED

*$COMMENT EXAMPLE 7
*$QUESTION

(MENTALLY HANDICAPPED .OR. MENTALLY RETARDED=4, (DEAF .OR.
HARD OF HEARING .0R. HEARING IMPAIRED), (BLIND .OR.
VISUALLY HANDICAPPED), SPEECH HANDICAPPED

FIGURE 2.9

EXAMPLES OF SEARCH QUESTIONS

85

Logical Search Questions The first four questions in Figure 2.9

 

use single logical expressions and are examples of logical or Boolean
search questions. The fourth question illustrates a more complicated
expression written to retrieve documents that are about hearing problems
and visual problems, but not about mental handicaps. The multiple words
or phrases used for each of the concepts in question 4 illustrate how a
list of similar terms may be placed inside a parenthesis and combined
with .0R.'s when it is not certain which of several terms might have

been used to describe a document.

Relevance Search Questions Another type of search question, a
relevance question, is illustrated by Example 5 of Figure 2.9. In this
question documents containing all four terms would be retrieved first,
followed by documents containing only three of the terms, then by
documents containing only two of the terms and finally followed by
documents containing just one of the four terms. By use of other
commands it is possible for a user to specify the given number of terms

a document must have before it is retrieved.

ﬂgighted Relevance Searches By weighting MENTALLY RETARDED in
Example 6 it is given more importance than the combination of the other
three terms. If a command is given to indicate that documents are not
to be retrieved unless they have terms with combined weights of five or
more; it will guarantee that only documents having the term MENTALLY
RETARDED and one or more of the other terms will be retrieved.

Example 7 illustrates a question which has used a number of logical

expressions to expand each of the four concepts used in Examples 5 and 6.

86

By writing questions where lists of words or phrases having similar
meanings are joined by .OR. the number of possible matches for each
concept has been expanded.

DFSP also has available the relational and arithmetic Operators;
.EQ. (Equal to), .NE. (Not equal to), .GE. (Greater than or equal to),
.GT. (Greater than), .LE. (Less than or equal to), .LT. (Less than),
.PL. (Plus), .MI. (Minus), .DV. (Divided by), and .TM. (Times), which
can be used for testing numerical values associated with descriptions.
For example, if it were desired to retrieve all documents about the
mentally retarded that had been published since 1969, the following
question might be used:

*$QUESTION
MENTALLY RETARDED .AND. DATE .GE. 1969

The other arithmetic Operators will not be illustrated because they do

not directly relate to this study.177

Information File Retrieval Program--IFRP The Information File

 

Retrieval Program enables the user to specify the form of output (access
numbers only, specific lines of given abstracts, or the total abstracts)
desired and have it printed on a high-speed line printer, remote
terminal, or other output device.178
The user might view the multiple commands available to the BIRS

programs as a command language with input to a single program-~the Execu-
tive Program. In this sense, the user is writing a series of commands
which with other appropriate data are used as input to be processed by

the system which generates the specified output thus illustrating again

the concept of input, processing, and output.

 

 

177Ibid., pp. 2601-30. 1781bid.. PP- 2101-25~

87

Summary

Information storage and retrieval systems represent one of many
man-made systems designed to perform specific functions. As with other
man-made systems they include interacting components, such that both
the components and their interaction need to be considered in their
design, analysis, and evaluation.

While not a sufficient condition alone for an information system
to operate successfully, good indexing procedures are necessary and by
some are considered to be the single most important component. The
ultimate criteria in the evaluation of indexing methods and their
interactions with a total information system is the system's ability
to effectively retrieve information.

The descriptive statistics most commonly used to describe the
results of searches are various types of precision and recall averages.
TWO types of averages involving precision and recall are commonly used;
the first gives equal weight to search questions while the second gives
equal weight to each document that is retrieved and/or is relevant to
any of the questions.

In large systems it has been necessary to estimate recall because
of the difficulty of examining the relevance of each document in a file
to each search question. To cope with this problem a variety of methods
have been developed for estimating recall with no single method or
procedure used for all circumstances.

The BIRS system serves as an example of a computerized information
retrieval system with its interacting components. This is the system
which is used by the CBC-ERIC Information Center and which was used in

the evaluation of their indexing methods.

88

Even when a single component is being analyzed or evaluated, it is
important that it be considered within the context of the total system.
The purpose of the following chapter is to describe the CBC-ERIC
Information Center and its development so that the evaluation of the

indexing method used at the Center may be considered in context.

CHAPTER III: THE DEVELOPMENT OF
CEC-ERIC INFORMATION CENTER
AND ITS PRESENT OPERATING STATUS

An information center is much more than a computerized system for
storing and retrieving information. Thus, when evaluating a component of
an information center's operations such as its indexing methods, it is
important to have in perspective the total objectives and procedures
of the center. The ability of the information center to meet its ob-
jectives is dependent upon a complex interaction of many variables, in-
cluding the type of information it stores, the indexing methods it
uses, the techniques available for retrieving information and the needs
of its users.

The CEC-ERIC Center has its origin in the ERIC network and has
been affected in a continuing manner by the procedures and objectives
of that system. Because of this relationship it would be difficult to
view the CBC-ERIC operations in perspective without some knowledge of
central ERIC. King states, "An intricate part of information systems
research involves a description of the entire system including its en-
vironment and component parts."179 The major objective of this chapter
is to describe the entire system including its environment and

(xnnponent parts. Specifically the chapter contains:

 

 

179Donald W. King, "Design and Evaluation of Information Systems,"
AnPllal Review of Information Science and Technology, Carlos A. Cuadra,
edltlor (Chicago: William Benton, 1968), III, 63.

89

90

1. A description of central ERIC including its origin, objectives,
and the relationship of CEC to the ERIC system.

2. A description of the historical development of the CEC-ERIC
Information Center.

3. A description of the present Operating procedures of the
CEC-ERIC Information Center.'

4. Descriptive statistics about the CBC-ERIC Center's present

operating status.

A History and Description of Central ERIC

The ERIC system had its fermal beginning in 1965 with its first
publication, a Catalog of Selected Documents on the Disadvantaged.180
In June, 1965 North American Rockwell was awarded the initial contract
for the management of the central ERIC facility including major re-
sponsibility for processing related to the publication of Research in
Education181 (RIE). The first issue of RIE, ERIC's monthly announce-
ment bulletin was published approximately four months later. In Novem-
ber, 1965 a contract was awarded to the Bell Honeywell Company to pro-

vide a service called Educational Document Reproduction Services (EDRS)

Which reproduced on microfiche and hard copy, documents which were dif-

ficult to obtain and not c0pyrighted.182

 

180Lee G. Burchinal, "The Educational Resources Information Cen-
ter: An Emergent National System," Journal of Educational Data Pro-

922%. April, 1970, pp. 59, 60.
1“mm.

 

182mm.

 

91
Since 1966 local clearinghouses have been developed to handle in-

formation in specific topical areas and to provide central ERIC with
abstracts of documents to be published in the ERIC journal, Research

in Education. The documents which are not protected from reproduction

 

by copyright laws are made available through EDRS in microfiche, hard
c0py or both.183 The first eleven clearinghouses were established

between March and June of 1966.184

Objectives of ERIC

 

The initial objectives of the ERIC program were stated as:

Making hitherto unavailable or hard to find, but significant
research and research related reports, papers, and other documents
easily available to the educational community.

Interpreting and summarizing information for many reports in
ways that educational decision-makers and practitioners could un-
derstand and use the emerging results from the national RGD ef-
fort.

Strenghtening existing channels of communication for putting
RGD results into practice.

Providing a base for developing a national education informa-
tion network that cagseffectively link knowledge producers and
users in education.

Later objectives have been stated as:

1. Guarantee ready access to the world's English-language

literature relevant to education. In information science termin-

ology this is the documentation function of the program.

2. Generate new information products by reviewing, summar—
izing, and interpreting current information on priority topics.

183Ibid.

 

184Ibid.

 

185Lee G. Burchinal, "Evalaution of ERIC, June, 1968," available
as ED 020 449 from the ERIC Document Reproduction Service. ERIC stands
for the Educational Resources Information Center, a national education
dissemination system designed and supported by the Office of Education,
Department of Health, Education and Welfare.

92

This is the information analysis function of the system. Products
include bibliographies, state-of—knowledge papers, critical re-
views, and interpretive summaries.

3. Infuse information about educational developments, re-
search findings, and outcomes of exemplary programs into educa-
tional planning and operations.

A comparison of the early objectives with later objectives reveals
a broadening of the scope of the ERIC system. In the first set of ob-
jectives the major emphasis was on making accessible previously ”un-
available or hard-to-find" documents about significant research in edu-
cation187 whereas in the latter set of objectives the scope has been
broadened to "guarantee ready access to the world's English-language

literature relative to education."188 Consistent with these broadened

objectives ERIC in July of 1969 began publishing Current Index to

 

Journals in Education (CIJE).

 

The following two ambitious principles have been presented as
relevant to the broadened documentation objectives of ERIC:

1. Educators should be able to turn to one comprehensive
source to identify current, significant educational documents on
any topic of interest.

2. Educators should be able to obtain desired reports
-quickly, again from one source, regardless of where the report
originated.189

 

186Lee G. Burchinal, "The Educational Resources Information Cen-
ter: An Emergent National System," Journal of Educational Data Pro-
cessing, April, 1970, p. 56.

 

187Lee G. Burchinal, "Evaluation of ERIC, June, 1968," available
as ED 020 499 from the ERIC Document Reproduction Service.

188Lee G. Burchinal, "The Educational Resources Information Cen-
ter: An Emergent National System," Journal of Educational Data Pro-
cessing, April, 1970, p. 56.

 

1391bid.

 

93

The name of any discipline could be substituted for "education”
in the above stated principles, and one would have presented the de-
sires of many researchers regardless of their field of endeavor. By
expanding these principles to a variety of disciplines, one would in
essence have the World Encyclopedia described by H.G. Wells.190 The
major problem in meeting these objectives is not related to technology
but to copyright laws. In attempting to apply these principles, ERIC
has met with continued and firm resistance by publishers of profes-
sional journals to include c0pyrighted materials in RIE or to produce
these materials in microfilm or hard cover for EDRS.191 Unless a so-
lution is fbund to this problem the plan for educators to secure rele-
vant information from a central source would seem impossible to imple-

ment .

The Growth of ERIC

 

Among the statistics used to measure an information network are
the number of documents it disseminates and users it serves. An ex-
amination of such statistics leaves little doubt that the influence of
ERIC is developing very rapidly. These statistics indicate that in
July 1966, ERIC's document collection, (not including articles) had
1,746 documents; in January 1967, 1,839 documents; in July 1967, 3,551

documents; in Janaury 1968, 7,227 documents; and in June 1968, 12,324

 

190H. G. Wells, World Brain (Garden City, New York: Doubleday,
Doran E Co., Inc., 1938), pp.3-34.

 

191Interview with Dr. Richard Dershimer, Executive Secretary of
the American Educational Research Association, March 16, 1971.

94

192

documents. The number of subscribers to Research in Education has

 

grown from 209 in January, 1967 to 4,558 in May of 1968.193 The number
of documents sold in microfiche has increased from 328,000 in 1967 to
more than 7 million in 1969.194

Since 1969 articles indexed for the Current Index to Journals in

 

Education have been included in the ERIC collection. In 1969 about
12,000 articles from 220 periodicals were indexed in 1970 this increased
to 18,000 articles from 500 journals.195 The above statistics as well

196

as data about the use of the ERIC clearinghouses appears to strongly

indicate that the impact of ERIC is growing.

The Future of ERIC

 

Some of the goals set by the ERIC staff for future development
are: (l) the covering of new topical fields by improving existing
clearinghouses and adding a limited number of new ones, (2) the ex—
panding of the information analysis program, (3) the supporting of

one-stop information centers with emphasis on the use of state agencies,

 

192Lee G. Burchinal, "Evaluation of ERIC, June 1968,” available
as ED 020 499 from the ERIC Document Reproduction Service, Fig. 1.

1”mm. Fig. 3.

194"Development of ERIC Through December, 1968,” Prepared under
the direction of Lee G. Burchinal, Director, Division of Information
Technology and Dissemination, Bureau of Research, U.S. Office of Edu-
cation, Department of Health, Education and Welfare, Office of Educa-
tion/Office of Information Dissemination, first printed in August 1969,
Revised Rebruary 1970, Fig. 1.

195Lee G. Burchinal, "The Educational Resources Information Cen-
ter: An Emergent National System,” Journal of Educational Data Pro-
cessing, April, 1970, pp. 60-63.

 

196Statistics on ERIC accompanied by cover letter from Delmer J.
Trester, System Coordinator, Department of HEW, Office of Education,
addressed to Carl Oldsen, CBC-ERIC, February 16, 1971.

95
(4) the expanding of acquisition efforts with emphasis on the use of

state agencies to review reports and enter them in the ERIC system, (S)
the further developing of computer searching capabilities including on-
line terminal access to the ERIC file, (6) the using of commercially
supported products to disseminate information contained in the ERIC
files, and (7) the developing of means to insure that valid research
data and information is applied to the improvement of educational pro—

grams.197
The Development of the CEC-ERIC Information Center

In the early part of 1966 The Council for Exceptional Children
submitted a proposal to the U. S. Office of Education titled "Handi—
.capped Children and Youth ERIC Clearinghouse and Research Dissemina-
tion Project." The project began July 1, 1966 with Dr. June Jordan
as the first director of the CEC-ERIC Information Center. There were
three main programs:

Program I. Operation of an ERIC Clearinghouse for Handicapped
Children and Youth.

Program 11. Expansion of the Clearinghouse activities and the
development of materials interpreting research
results.

Program III. Implementations of research findings intolgguca-

tional programs for handicapped children.

‘197Lee G. Burchinal, "The Educational Resources Information Cen-
ter: An Emergent National System,” Journal of Educational Data Pro-
cessing, April, 1970, pp.60-63

 

198June B. Jordan, "Handicapped Children and Youth ERIC Clearing-
house on Research Dissemination," a proposal submitted to the U. S.

[RHDartment of Health, Education and Welfare, Bureau of the Handicapped,
(1966) , p. 1.

96

The functions of the ERIC Clearinghouse (Program I) were to in—

clude:

(a) identifying significant research and research related

literature not readily available to the consumer, (b) ab-

stracting and indexing such literature, (c) maintaining

Clearinghouse material in a storage and retrieval system,

and (d) participating in the development of an educational

thesuarus.199

The activities of Program II and III are an expansion of the nor-
mal activities of an ERIC Clearinghouse. These expanded activities in-
cluded: (a) abstracting additional significant research relative to
the education of handicapped children, (b) coordinating the activities
of the Handicapped Children Instructional Materials Centers, (c) de-
veloping special materials to describe educational practices based
upon research, and (d) implementing the use of deve10ped materials
through worksh0ps and demonstrations for teachers, supervisors, and
teacher educators.

The specific objectives set forth in the proposal were:

Objectives

 

The general objective of this prOposed project is to provide
a central, comprehensive clearinghouse information center in the
education of handicapped children and youth. This center would
be concerned with collecting, abstracting, testing, and evaluating
literature and materials as well as developing materials, inter-
preting research, and disseminating information.

Objectives--Prggram I. ERIC Clearin house: Handicapped Children
and Youth. The Clearinghouse woul serve as one of the US Office
of Education's ERIC satellites and would follow procedures re-
quired of such a unit. Specific objectives include:

1. Identify and collect research and research related
literature not readily available for wide dissemina-
tion. '

1991bid.

 

97

2. Evaluate the above material in terms Of professional
value.
3. Identify items for storage in handicapped area Clear-

inghouse (material with limited interest) and items for
storage in central ERIC.

4. Prepare abstracts of above material and index.

5. Serve as a depository for Handicapped Clearinghouse ma-
terials.

6. Develop a retrieval system for location of Clearinghouse
materials.

7. Coordinate efforts with related professional activi—

ties--Instructional Materials Centers, CEC's regular
publication program (Exceptional Children Abstracts and
Biennial Review of Research on Exceptional Children).

 

 

8. Disseminate information on the Operation aid—ERIC ac-
quisitions to the profession.

9. Provide an information service and copies Of ERIC ma-
terials (hard copy and/or microfiche) upon request.

10. Participate in the development of the Office Of Educa-
tion's educational thesaurus (PET).

11. Provide for continuous evaluation of the effectiveness

Of the Clearinghouse Operation.

Objectives--Program II. Expansion Of Clearinghouse Activities and
Development of Materials Inteppreting Research Results. This as-
pect of the project is concerned with expanding the collection Of
literature, coordinating the efforts of Handicapped Children In-
structional Materials Centers, and interpreting research results
to the practitioner. Specific Objectives include:

 

 

1. Provide in a central source a "library" Of information
on significant literature and materials related to the
education of handicapped children.

2. Serve as the central communications center for the
various Handicapped Children Instructional Materials
Center located throughout the country.

3. Survey the research related to the education of handi-
capped children and identify results that have impli-
cations for classroom teaching. 7

4. Translate above research information into educational
practices and develop materials (literature, films, video
tapes, etc.) which would illustrate desirable education-
al practices indicated by research evidence.

5. Provide for a continuous review of research literature
to identify that which has relevance for classroom in-
struction of handicapped children.

92jectives--Progpam III. Implementation. The purpose Of Program
III is to implement research findings into the classrooms for
handicapped children. Specific objectives include:

1. - Use materials developed in Program II in workshops with
practitioners.

98

2. Cosponsor workshops with special education departments
in local and state school systems and in college teach-
er preparation programs.

The original proposal estimated that approximately 4,000 documents
would be processed during the first year and 5,000 during the succeed-
ing years. This was to be done by a staff which included a project di-
rector, an editorial secretary and two stenographer-clerks. Consider-
able importance was placed On field abstracting with the assistance of
three university personnel at one-fourth time each and six graduate
students at one-half time each. During the second year there was to be
an additional staff associate at a professional level, the equivalent
Of one-fourth time of a university person and two graduate students on
a one-half time basis. At the end Of the second year, Programs II and

III would also include three full-time equivalents on the central staff

and four full-time equivalents as part of the field staff.201

The Early Operation Of the Center

 

In March of 1967 Volume I, NO. l Of a tabloid entitled Clearing-

house On Exceptional Children announced to potential users the open-

ing of the ERIC Clearinghouse on Exceptional Children.202

Beginning
with the second issuse of this tabloid was printed as a section of the

journal Exceptional Children and sent to its approximately 40,000 sub-

203

 

scribers with reprints available for special dissemination. Since

 

20°1bid., pp. z-s.
2011bid., pp. 4-6.
202Clearipghouse on Excpptional Children, March, 1967, p. l.

203"Clearinghouse on Exceptional Children," Exceptional Children,
Summer, 1967, p. 693.

 

99

Volume II, No. 1, it has appeared under a new title "ERIC
Excerpt."204
In the early part of 1967 arrangements were made for various or—
ganizations to do field abstracting. Included were graduate students
and faculty from the University Of Minnesota, the Alexander Graham
Bell Association, and volunteers from the Association for the Gifted.
In July of 1967 the first abstracts prepared by the CEC—ERIC Informa-

tion Center appeared in the ERIC journal Research in Education.205

 

By the fall of 1967 CEC-ERIC Clearinghouse had processed over 100
abstracts into the ERIC collection, had approximately 900 additional
documents in some stage of abstracting or indexing, and was receiving
an additional 150 to 200 new items per month.206

The October issue of ”ERIC ExCerpt" reported that the major focus

Of the staff would be on the following activities:

 

1. Processing abstracts into the Central ERIC collection.

2. Building an extensive computer bank of abstracts Of litera-
ture and instructional materials.

3. Publishing special education abstracts on a regular basis
as a companion piece to Research in Education.

4. Developing bibliographies on Special topic areas in spe—
cial education.

5. Disseminating information on the use of instructional ma-

terials through a department in Exceptional Children, ”IMC
Network Report."

6. Preparing films or video tapes interpreting research in
terms Of educational practice.

 

204"ERIC ExCerpt," Exceptional Children, October, 1967, pp. 143-
148.

 

205Research Education, July, 1967.

 

206"ERIC ExCerpt," Exceptional Children, October, 1967, p. 143.

 

207Ibid., p. 144.

100

At the time these statements were made it was unknown what the
exact nature of the extensive computer bank Of abstracts would be,
how it would be used, how the abstracts not in ERIC would be published,
and how the bibliography series would be produced. In April of 1968
the following quotation appeared:

Within the next few months plans of ERIC-CEC include the
development of numerous bibliographies on various special edu-
cation topics and expansion of services to individuals by

arranging fOr computer assisted searches on highly specific
questions.

At the time this statement was made the Instructional Materials
Center at Michigan State University was using the BIRS programs to
perform many of the Operations desired by the CEC-ERIC Information
Center. The major problem involving the use of BIRS programs to
assist the CEC-ERIC Information Center was that the programs had
been developed for a Control Data Corporation 3600 Computer located
at Michigan State University. A similar computer was not available
in the Washington area nor did The Council for Exceptional Children
have trained staff necessary to maintain and run the programs.

In February of 1968 Dr. John Vinsonhaler received a grant titled
"Improving the Dissemination of Instructional Materials, USOE Grant
to develop BIRS for Information Management." One Objective Of the
grant was to continue the development Of BIRS so that it could run on
a variety of computers including IBM system 360, models 40 and above.
In the spring Of 1968 negOtiations were completed for the CEC-ERIC

Information Center to use an IBM 360, model 40 located at George

 

208"ERIC ExCerpt," Exceptional Children, April, 1968, p. 633.

 

101

Washington University, thus making it possible for the Center to use

the BIRS program.

The Establishment of Data Processing Procedures

 

One Of the major advantages Of the BIRS programs are the many
alternatives they provide for maintaining and processing information
files. This flexibility is especially uSeful when a user is not cer-
tain of the type of processing needed.

A weakness of the Information Center was the absence of an indi—
vidual with a combination of skills necessary to develop procedures
for computer processing. After making unsuccessful attempts to find
an individual locally, help was requested from the BIRS project. In
response, Dr. John Vinsonhaler, the MSU project director, provided the
Center with the half-time consulting services of one Of his staff mem-
bers.' The person provided was the author Of this thesis who at that

time was working as a computer specialist with the BIRS project.

The Decision to Publish a Computerized Journal

 

It was originally felt by some CEC-ERIC staff members that a good
way to meet the goal of providing information to the center's users in
as effective and efficient manner as possible would be for the Informa-
tion Center to disseminate computer-readable information files to
various computer centers.

In examining this and alternative methods the following factors
and their interactions were examined.

. l. The type of information disseminated by the CEC-ERIC Center.

2. The technological capabilities in the Washington, D.C. area.

102

3. The manner in which data was being processed for use on the

ERIC system.

4. The sophistication of users.

5. The locations of users and cost of mailing information.
6. The similarity of questions asked by the various users.
7. The time required to implement various alternatives.

8. The cost of the various alternatives.

During the time when alternatives were being analyzed, computer
files containing representative information processed by the CBC-ERIC
Center were established at Michigan State and George Washington Uni-
versities. These files served as a basis to examine various process-
ing procedures and acquaint the Information Center staff with some Of
the potential problems of a computerized system.

In the early fall Of 1968 the author met with Dr. June Jordan,
Director Of the CEC-ERIC Center, and Mr. William Geer, the Executive
Secretary of The Council for Exceptional Children, to consider the
alternatives. In this discussion the difficulty of maintaining
searchable computer files at various installations around the country
was considered and rejected on the basis that:

1. There were only a limited number of centers with the neces-

sary equipment, personnel, and willingness to serve users.

2. The geographic distribution of users versus the potential

locations of such centers would greatly limit the num-
ber Of individuals who would have access to these services.

3. It would be difficult to provide different centers with the

type of professional educational staff necessary to make the

procedure effective.

103

A second alternative considered was the possibility Of an on-line

system.
1.

2.

This was rejected because of:

The cost of providing communication lines.

The cost of terminal rentals.

The cost of having a computer or a portion of a computer
dedicated to the infOrmation file.

The limited number Of locations where terminals could be

maintained with the available financing.

The third alternative, considered and accepted, was to maintain

the computerized information file in such a manner that it would allow

for computer-controlled publication of a journal as well as selective

publication of any portion of the infOrmation files. This alternative

made it possible to:‘

1.

Publish a journal directly from the computer information
files using computer-controlled typesetting.

Use the computer to generate author, subject, title, or
other types of indexes to use in the journal.

Use computer searches to aid in answering difficult user
requests.

Use computer searches to help organize and select documents
about special topics which relate to commonly asked ques-
tions.

Use the computer to index and to control typesetting so that
annotated bibliographies of the selected documents could be

printed at a minimal cost.

At the close Of the meeting, Mr. Geer indicated that the Center

should move as rapidly as possible to develop procedures for publishing

104

a computerized journal containing abstracts about exceptional children.
This decision resulted in the publication of a new journal, Exception-

al Child Education Abstracts, which first appeared in April, 1969.

 

An Overview Of the Operating Procedures
Used by the CEC-ERIC Information Center
The purpose Of this section is to survey the procedures used at
the Information Center, and relate the indexing process to the other
procedures, thus describing the context in which the indexing evalu-
ation took place. While this section gives an overview, a more com-
plete discussion has been provided in Appendix A for those desiring
additional information about procedures used at CEC—ERIC Information

Center.

Legend and Nomenclature

 

The symbols used in the diagrammatic representation of CEC-ERIC's
processing are commonly used in computer program and systems flow-
charting. The description Of the symbols found in Figure 3.1 are those
given on the cover Of the IBM flowcharting template, form X20-8020.

In addition to the symbols described in Figure 3.1 the following alpha-
numeric legend is used tO identify specific symbols in various
figures: .

l. C(N) stands for Connection number N_where N may be any

number.

2. IP(N) stands for Input number N:

3. OP(N) stands for Output number N,

4. PP(N) stands for Predefined Process N.

105

 

SYMBOL

 

REPRESENTS

INPUT/OUTPUT

Any function Of an input/output device (making infor-
mation available for processing, recording processing
information, tape positioning, etc.)

 

DECISION

The decision function used to document points in the
program where a branch to alternate paths is pos-
sible based upon variable conditions.

 

 

 

PREDEFINED PROCESS

A group of Operations not detailed in the particular
set of flowcharts.

 

 

v 30

 

 

 

PROGRAM MODIFICATION

An instruction or group Of instructions which changes
the program.

 

FIGURE 3.1

FLOWCHARTING SYMBOLS

106

 

SYMBOL

REPRESENTS

 

 

DOCUMENT

Paper documents and reports of all varieties.

 

MAGNETIC TAPE

 

FLOW DIRECTION

The direction of processing or data flow.

 

 

CONNECTOR

An entry from, or an exit to, another part
of the program flowchart.

 

FIGURE 3.1 (cont'd)

107
S. SB(N) stands for Symppl N_Of a given figure. This notation

will be used when it is necessary to identify a given symbol

for discussion which is not specified in another way.

Model Developed for the CEC-ERIC Information Center

In the original proposal for the Information Center Dr. Jordan
included Objectives that provided guidelines for the Center's develop-
ment. A diagrammatic representation of a model for the Center's
Operation was contained in that proposal. Considering the lack Of in-
formation about potential technological components, the model demon—
strated remarkable insight and still bears considerable resemblance to

the present fUnctioning of the Information Center.209

Overview of the Information Center's ijor Activities

 

The overview in Figure 3.2 is an outgrowth Of the conference
between the author, Dr. Jordan, and Mr. Geer where the decision was

made to publish Exceptional Child Education Abstracts and includes

 

changes that have been made in the procedures as a result of that de-
cision. This simplified overview divides the processing into six ma-
jor activities: document acquisition, document management, file main-
tenance, file processing, information processing, and evaluation with
system modification.

The core of activities shown in Figure 3.2 is presented with

greater detail in the later diagrams, Figures 3.3, 3.4, and 3.5,

 

209June B. Jordan, "Handicapped Children and Youth ERIC Clearing-
house on Research Dissemination," a proposal submitted to the U. S.

Department of Health, Education and Welfare, Bureau Of the Handicapped,
(1966), p. S.

108

 

Activity 1 Document Acquisition

 

 

IP (1)

Activity 2

PP (1)
Activity 3

PP (2)

Information
File
Tape

{

Document Management

 

 

K/

 

File Maintenance

 

 

V

Description File Tape

Printed Index file Tape

8

Activity 4 File Processing

 

 

PP (3)

V

L

Activity 5
PP (4)

Information Processing

 

 

 

Activity 6 Evaluation and System

’_Mbdification

 

 

PP (5)

 

\/ \/

 

FIGURE 3-2

OVERVIEW OF INFORMATION CENTER MAJOR ACTIVITIES

109

and related discussions. These activities are found in most informa-
tion centers utilizing computer processing; however, the specific steps
and resulting products may vary considerably from center to center.

A brief description of each of these six major activities follows:

Activipy l - Document Acquisition This activity includes the

 

selection of documents which will be bought or acquired by other meth-
ods so that they may be examined to determine if they are appropriate

for inclusion in the Information Center holdings.

Activity 2 - Document Management , This activity includes examin-

 

ing documents to determine if they should be included in the Information
Center data bank, the abstracting of documents, the indexing of docu-

ments, and the cataloging of documents.

Activipy 3 - File Maintenance This activity includes punching

 

document surrogates, storing the document surrogates on a computerized
information file, and preparing computerized description files and

printed index files.

 

Activity 4 - File Processing. This activity includes computer
processing of files to organize the information in a fOrm that will be

more useful and easier to disseminate.

Activity 5 - InfOrmation Processipg_ This activity involves pro-

 

cessing user requests, providing users with infOrmation, publishing
new documents from the information contained on the computer files,
and providing information to be used in evaluating the system. The

activities in this section are primarily manual activities but they

110

may initiate computer file processing (activity 4) as one of several

steps in a procedure.

Activity 6 - Evaluation This activity involves examining the

 

procedures used by the Center and, if appropriate, modifying these
procedures to make the total Operation Of the Information Center more

effective.

Overview Of Major Input and Output

 

Figure 3.3 provides an overview of the input to the Information
Center and the output generated by processing this input. As illus-
trated documents are acquired (IP (1)) and processed in the document

management activities (PP (1)) to generate copy for Research in Educa-

 

Eipp_(OP (1)) and Current Index to Journals in Education (OP (2)). All
documents which will become part of the Information Center holdings,
including those which are processed for RIE and CIJE, are then passed
to file maintenance processing (PP (2)). In the file maintenance ac-
tivity the documents are put in computer-readable form and various
computer files are generated. These computer files provide input for

the file processing (PP (3)) which generates the output for Exceptional

 

Child Education Abstracts (OP (3)) and output for selected publica-
tion (OP (4)). This output is in a fOrm that allows for computer type-
setting, computer generated indexes, and printing with a minimum effort.
ECEA and the selected publications in turn become input for Information
Processing (PP (4)). These publications and other diagramed input

(IP (2), IP (3), and PP (3)) are used in providing information to

users (OP (5)), in assisting staff members to generate new documents

(OP (6)), and as input to the evaluation component (PP (5)).

111

 
 

- INPUT OR OUTPUT

 

" scat: .
'1 'PUBLICATIQNET
I1? (:2

 

 

FIGURE 3.3

OVERVIEW OF MAJOR INPUT AND OUTPUT

112

Overview of Evaluation and Processing Modifications

 

Figure 3.4 provides an overview of the continuing evaluation which
is used to monitor and, if appropriate, modify the processing Of the In—
formation Center so that it may more effectively meet its objectives.
Input to evaluation is provided from information processing (PF (4)),
user evaluation (PP (6)), the project Officer and advisory board (PP
(7)), and the Instructional Materials Center/Regional Media Center
(IMC/RMC) Network (PP (8)). The arrows going in both directions in-
dicate that there is an interaction between the evaluation component
and other components. The input from the various sources is processed
by the evaluation component to determine if there are system modifi-
cations which should be made. The decision process is illustrated by
symbols SB (1), SB (2), and in the system modification occurring to PF
(4). The numbers 1, 2, 3, and 5 appearing withing parentheses Opposite
arrows indicate that the same series of symbols; namely, SB (1), SB (2),
and SB (3) would appear at these points. This would also be connected
to the preprocessing procedures PP (1), PP (2), PF (3), and PP (5) as
is done in the later diagram Figure 3.5. If no change is made this
fact is provided as input to the evaluation component as indicated by

the connection C (l) to PP (5).

Overview and Model of the Information Center's Operation

 

Figure 3.5 provides an overview of the Information Center's pro-
cessing. In this overview six major activities can be seen in the Cen-
ter of the flowchart. The input and output operation shown in Figure
3.3 are present as well as the evaluation procedures indicated in Fig-

ure 3.4. The model as presented indicates a continual flow of input,

113

 

IP (1) :I'I°I°I'I'I:I°I ,
.:I:I:I:I:I;.::: . EVALUATION AND SYSTEM MODIFICATION

(1) «t-1

(2) “-

000000000000
0000000000

.'::3:1.SI:5T}3‘-r353' YES
PP <4) 5:5-::;<.ias:(°13=-::::-

 

 

(5) “-‘

.........
..........
0000000000
OOOOOOOOO
..........

SB (4)

     
    

marten: AND:
3;; mvrsontzg

    

"Ii'P'PIIEmi 3131

 

FIGURE 3.4

AN OVERVIEW OF THE INFORMATION CENTER'S
EVALUATION AND SYSTEMS MODIFICATION COMPONENTS

22?

  
   
   
    
    

   
 

: NEW
DOCUMENTS
0P (6)

- Chan e

- System

'1nnquesrs

 

  
       
  
  

3
Processing? V
— -INPUT 0R OUTPUT

Epdificatl°n 5 -EVALUATION AND SYSTEM MODIFICATION

DOCUMENT
MANAGEMENT

PP (l)

 

 

  
 

 

 

 

 

l
DPT SELECTEDI,
IFT PUBLICATIONS PUBLICATIONS
0P 4 IP 2
FIFT ( ) ( )
V
FILE
PROCESSING
pp (3)

INFORMATIO
PROCESSING

PP (4)

 

 

 

    
   

 

:1EVALUATIQN:

............
...........

:‘PP (5):?

 

 

      
   
   
 

“PROJECT:
.OFFICER AND-.
.. vADVISORYI-'

   

FIGURE 3.5

AN OVERVIEW AND MODEL OF THE INFORMATION CENTER'S OPERATIONS

115

processing, output, and evaluation, resulting in appropriate systems

 

 

modification. Figure 3.5 and the simplified Figures 3.2, 3.3, and 3.4
can be used as a reference to the more detailed steps involved in the

Information Center's Operations which are discussed in Appendix A.

The Publication Of Exceptional Child Education Abstracts

 

The original reason for placing abstracts on the computer was to
make it possible to use a computer to assist in searching the informa-
tion found in the abstracts. As previously indicated it was the orig-
inal intent that the computer searchable files would be made available
to a number of centers; however, an analysis of the potential effec-
tiveness of this means for disseminating information led to the deci-
sion to publish ECEA. In the system design which followed, an examina-
tion of the technological capabilities available in the Washington
area suggested that the most efficient way to have both computer
searchable files and publish the abstract journal was to use computer
controlled typesetting. Thus in this process the computer information
files became the data source for both computer searching and printing
of abstracts in ECEA.

Figure 3.6 illustrates a typical abstract set in the format gen-
erated by the computer controlled typesetting and provides a descrip-
tion Of the various fields (types of information) contained in the ab-
stract. The abstracts are stored on the information files in a manner
that makes it possible to generate printed indexes for any of the
fields identified in Figure 3.6 and to use some or all of the fields to
generate computer-searchable description fields.

Figure 3.7 provides examples of portions of subject and author

indexes which were indexed by the computer and output so that another

116

 

 

 

 

Clearinghouse “““i” ""'“P” Abstract number used in Indexes
\ aesrnacr 769 *1
_ EC 01 0769 ED 025 8644———ERIC accession.
- . . 3. B . .. num erw en or I

Authorts) ijnsley Gene Ed uck Dorothy P Microfiche and hard copy
Cooperative Agreements between Spec

Titlo : rial Education and Rehabilitation -
Services in the West. Seleeted Papers #:‘ﬁgetgocfog‘aggé 3085030....“
from I Conference on Cooperative hard copy.

Agreements (Lee Vegas. Nevada. Fe-

bruary, I968).

Western Interstate Commission For

Higher Education. Boulder. Colorado ¢———-lnstitution(s)
United Cerebral Palsy Research And

 

 

EDRS m}, be Education Foundation. Inc., New York;
Indicates document is availablg Rehabilitation Services Administration
in microﬁche and hard copy. (DH EW). Washington, D. C.
t EDRS mf.hc
VRA-546T66 : Contract or grant number

Descriptors: exceptional child educa-

tion: cooperative programs: vocational

rehabilitation; vocational education; ad-

ministration: mentally handicapped:

state agencies: cooperative education;

educational coordination: cooperative Descriptors—subject
programs: state federal aid: administra- I “ms which

tive problems; communication prob- chancgefiz. content
lcms: equalization aid: work study pro- ' ' °

grams: handicapped; cost effectiveness

Five papers discuss cooperative work-
study agreements between schools and
vocational rehabilitation services in the
western states. Areas discussed include
the advantages of cooperative agree-
ments. the forms and disadvantages Of
Athird party agreements. basic concepts of
' the programs. and an outline form to use
when applying for matching funds; the
relationship of special education. rehabi-
litation and cooperative plans. pro-
grams. and agreements: and California's
past and present work study programs
for the mentally retarded. Also reviewed
are research demonstrating the econom-
ic feasibility of vocational training for
the cducable mentally retarded in the
public schools and communication prob-
lems in work study programs. The
conference summary considers the pur-
poses. goals. cssencc of. and necessity for
cooperative agreements. (MK); Abstractor's initials

Summary

FIGURE 3.6

SAMPLE ECEA ABSTRACT

Abbott. Margaret 314.

Abel. Georgie Lee And Others 42 l.

Abraham. Willard 51.

Ackerman. Nathan W And Others 55.

Adamson. T M 266.

Adler. Alfred 835.

Adler. Edna P. Ed 530.

Adler. Lenore Loeb 730.

Adler. Manfred 747.

Adler. Sol 661.

Ahlersmeyer. Donald E 214.

Aichhorn. August 141.

Albee. George W 903.

Aldrich. Robert A 718.

Alkema. Chester Jay 892-893.

Allen. K Eileen And Others 392.

Allen. Robert M 81 l.

Alonso. Lou And Others 609.

Alterman. Arthur I 85.

Amos. William E. Ed 574.

Anant. Santokh S 446.

Anderson. Donald T And Others 506-
507.

Anderson. Jackson M 112.

Andrew. Gwen 751.

Annand. Douglass R 592.

Antinoro. Frank 407.

Apffel. James 235.

Arcieri. Libere And Others 859.

Armenti. Simma 20.

Armstrong. J D 727.

Arnold. Godfrey E 662.

Arthur. LJ H 521.

Artuso. Alfred A And Others 548.

Asp. Carl W 18.

Attwell. Arthur A 704.

Ayers. George E. Ed 100.

Babow. Irving 869.

Bakwin. Harry 490.

Bakwin. Ruth Morris 490.

Banas. Norma 289-292.

Bandura. Albert I57.

Banks. Olive 77.

Bannatyne. Alex And Others 322. 379.
862.

Barbara. Dominick A. Ed 675.

Barden. John 223.

Barker. Roger G And Others 331.

Barman. Alicerose 508.

Barnard. James W. Ed And Others 865.

Barnes. Douglas And Others 540.

Barraga. Natalie I30.

Bartel. Nettie R 910.

Baruch. Dorothy W 741.

Bates. Karla K 20.

Baughman. M Dale. Ed 681.

Baumrind. Diana 958.

Beales. Philip H 739.

Becker. Howard S 670-671.

Beckett. Peter G S 496.

Beery. Keith E 203.

Beischer. N A And Others 265.

Bender. Ruth F. 528.

Bennett. Merilyn Brottman 571.

Benson. F Arthur M. Ed 43.

Bentley. Ernest l 202.

Berger. Kenneth W 348.

Berger. Regina 939.

152

SAMPLES 0F ECEA AUTHOR AND SUBJECT

117
iAUTTﬂDRIbUNEX

Bergsma. Daniel. Ed 922.

Berko. Frances G And Others 468.

Bernard. Jessie. Ed 65.

Bernardo. Jose R 318.

Berndt. Lois A 857.

Berner. George E And Others 948.

Bersani. Carl A 123.

Bettelheim. Bruno 212.

Bilovsky. David. Ed And Others 179

Bindman. Arthur J And Others 766.

Birch. Herbert G 955.

Birch. Herbert G And Others 594.

Blackhurst. A Edward 891.

Blair. John R 853.

Blank. Marion 575.

Blatt. Burton And Others 767.

Blessing. Kenneth R. Ed 35.

Block. James D And Others 280.

Blom. Gaston E 917.

Blorn. Gaston E And Others 378.

Blue. C Milton 215.

Blum. Evelyn R 913.

Blum. Richard H And Others 950.

Bond. Guy L 996.

Bonner. J And Others 71-72.

Bonner. Ruth E 447.

Booker. Margaret 245.

Boone. Daniel R 585.

Boothroyd. Arthur 583-584.

Borowitz. Gene H And Others 907.

Bowden. M G. Ed 682.

Bowe. Frank 88.

Bowling. Wallace Lee 481.

Boyle. John 702.

Braaten. June 519.

Bradfield. Robert H. Ed 980.

Braley. William T And Others 352.

Bralley. Ralph C 357.

Braun. Samuel J 101.

Brearley. Molly. Ed 443.

Brenner. Harold J 371.

Brewer. Earl J. Jr 34.

Brewer. Jennie 676.

Breyer. Norman L. Ed 445.

Bricker. David D 390.

Bricker. William A 390.

Bright. George M 195.

Brill. Richard G 778.

Brittain. W Lambert I I4.

Broadhead. Geoffrey D 282.

Broderick. Carlfred 8. Ed 65.

Brogden. J D 556.

Brown. Doris V 946.

Brown. Sheldon S. Ed 183.

Brubaker. R S. Ed 492.

Brueckner. Leo J 996.

Bruininks. Robert H 870.

Brumbaugh. Florence N And Others
553.

Bryant. John E 205.

Buchholz. Sandra 125.

Buchwald. Edith 499.

Buktenica. Norman A 196. 466.

Bumstead. Richard 742.

Bunger. Anna M 712.

Burke. Donald A 425.

Burke. DouglasJ N 777.

Burns. Robert C 106.

FIGURE 3.7

Burris. W R. Ed 604.

Burrows. Nona L 33.

Bush. Wilma Jo 368.

Butler. Katharine G 344.
Butler. Lucius 353.

Cain. Leo F 638.

Callaway. W Ragan 972.
Calovrni. Gloria 623.

Calvert. Donald R 347.

Camp. Shirley L And Others 877.
Caplan. Gerald. Ed 997.
Carney. James R 542.

Carter. Charles H 812.

Carter. Darrell B. Ed 937.
Cass. Marion T 653.

Cassidy. Jean Trotter 695.
Cawley. John F 192.

Cawley. John F And Others 744.
Cazdcn. Courtney B 576.
Cegelka. Patricia A 470.
Cegclka. Walter J 470.
Chaiklin. Joseph B 417.
Chapman. Myfanwy E 818.
Chasey. William C 533.
Chaun. Maurice. Ed 66.
Chess. Stella. Ed 75.

Chethik. Morton 397.

Chisum. James 514.

Chomsky. Noam 369.
Christian. Floyd 306.
Christoplos. Florence I.
CIarcq.J R 780.

Clarizio. Harvey F 933.

Clark. Charlotte R 236.

Clark. Margaret M 836.
Clcland. Charles C 476.

Clyne. Max B 690.

Coggan. William G. Ed 185.
Cohen. Herbert J 758.

Cohen. Lisa 508.

Cohen. William J 949.

Cohler. BertramJ And Others 283.
Cohn. Maxine D 688.
Coleman. Jack L And Others 312.
Coles. Robert 529. 738.
Collins. James L 851.

Comly. Hunter H 894-895.
Conant. James Bryant 441-442.
Connelly. Elva A 511.
Conners. C Keith 277.
Connor. Leo E 333.

Conrad. R 351.

Contrucci. Victor J And Others 36.
Cooke. Robert. E. Ed 461.
Copel. Sidney L 809.

Cordova. Hector L 993.
Corrado. Joseph 568.
Cortazzo. Arnold D 915.
Cotten. Paul D. Ed 603.
Cowie. Valerie A 998.

Cox. Richard C 204.

Cox. T. Ed 67.

Craft. Michael 497.

Craig. William N 33.851.
Crammattc. Alan B 931.
Cratty. Bryant J 143. 626. 642. 708-710.
Crawford. Gladys H 774.
Crickmay. Marie C 660.

Exceptional ('hild Education Abstracts

INDEXES

Ability Grouping 370.

Ability Identification 613.

Abortions 538.

Abstract Reasoning 478. 579.

Abstraction Tests 478.

Abstracts 240.

Academic Ability 566. 764.

Academic Achievement 87. 92. 156.
202. 238. 253. 260. 262.-264. 324. 387.
430. 522. 549. 557. 566. 745. 764.
838. 859.

Academic Promise Tests 478.

Acceleration I56. 367. 370.

Accreditation (Institutions) 971.

Acculturation 180.

Achievement I58. 643. 914.

Achievement Gains 92.

Acoustic Phonetics 405-406.

Acoustics 727.

Activities 352. 996.

Activity Level 229.

Activity Units 636. 678. 707.

Adjective Check List 728.

Adjustment Problems 331. 436. 787.

Adjustment (To Environment) 24. 26.
35. 99. 154. 283. 318. 331. 362. 372.
689. 706. 71 8.

Adler. Alfred 835.

Administration 112. 127. 156. 271. 333.
385. 420. 486. 496. 587. 595. 619.
623. 641. 850. 916.

Administrative Change 767.

Administrative Organization 38. 77.
333. 428. 435. 587. 590. 593. 595.
639.

Administrative Policy 17. 38. 435. 554.
595. 600. 61 1. 620. 969.

Administrative Principles 271.

Administrator Attitudes 588.

Administrator Evaluation 271.

Administrator Guides 33. 435.

Administrator Intern 850.

Administrator Role 155. 333. 595.623.

Admission Criteria 38. 126.435. 600.

Adolescence 278. 831. 964.

Adolescents II. 48. 76. 79. 89. 122. I33.
I63. 195. 201. 216. 226. 236. 244.
254. 336. 375. 402. 448. 454. 482.
496. 695. 696. 815. 821. 837. 911.
960.

Adoption 538.

Adult Education 712.

Adults 23. 26. 89. 104. 246. 275. 334.
336. 359. 405-406. 411-412. 414.432.
734. 764. 784. 846. 869.

Advanced Placement 553.

Affective Behavior 426. 897. 988.

Africa 716.

After School Activities 536.

Age Differences 278. 280. 284. 366. 583.

Age Groups 607.

Agencies 22. 815. 822. 891. 971.

Agency Role 269. 546. 597.

Aggression 141. 183. 302-304. 436. 762.

Agriculture 485.

Alcoholism 79. 107. 717.

Algebra 150.

Amblyopia Ex Anopsia 983.

April 1971

118
EHJBJECHiHVEﬂEK

American Indians I48. 668. 940.

American Literature 84.

Amphetamines I89.

Amputees 102. I78. 182. 386. 572. 803.

Anatomy 572. 648. 830.

Ancillary Services 385.

Anesthesiology 401.

Animal Behavior 461.

Anne Sullivan Macy Service For Deaf
Blind Persons 23.

Annotated Bibliographies 8. 59. 68. I44.
204. 220. 240. 309. 668-669. 796. 822.
935. 962.

Annual Reports 29. 40. 338. 340. 587.

Anomalies 83. 182. 187-188. 358. 722-
725. 922.

Anxiety 104. 107. I60. 273. 473. 482.
666. 731. 741. 745. 809. 813. 881.
909.

Anxiety Scale For The Blind 482.

Aphasia 119. 180. 330. 411. 413-414.
537. 648. 656-657. 662. 665. 685. 784.
799.

Apraxia 248.

Architectural Barriers 424.

Architectural Programing 318. 848.

Architecture 318.

Area Centers For Services To Deaf
Blind Children 625.

Arithmetic. See Mathematics.

Arizona 420.

Arkansas 625.

Arkansas School For The Deaf 73.

Art 106. I83. 730. 892-893.

Art Education 114. 618. 892.

Art Materials 114. 604. 618. 893.

Art Therapy 106. 892-893.

Articulation (Speech) 32. 113. 206. 208.
366. 409-412. 416. 492. 578. 660. 662.
924.

Asian History 999.

Aspiration 989.

Assistive Devices 102.

Associa-Math Program 289-291.

Association (Psychological) 730.

Associative Learning 16. 262. 264.

Asthma 182.

Athletics 82. 570.

Attendance 745.

Attendant Training 100. 568.

Attention Span 175. 398. 469.

Attitude Tests 535. 615.

Attitudes 55. I72. 255. 269. 375. 493.
506-507. 541. 615. 696. 768. 881. 931.
999.

Audio Equipment 67. 73-74. 186. 719.
868. 1000.

Audiology 120. 621. 797. 943.

Audiometric Tests 98. 349. 515. 656.
943. 959.

Audiometry 515. 621. 943. 959. 1000.

Audiovisual Aids 32. I32. 144. I49. I86.
216. 236. 286. 410. 414. 663. 668.
773. 822.877.

Audiovisual Centers 50. 353.

Audiovisual Instruction 288. 585. 824.

Audition (Physiology) 297-298. 492.
530. 739. 797.

FIGURE 3.7 (cont'd)

Auditory Agnosia I94.

Auditory Perception 98. 180. 311. 348.
366. 368. 407. 413. 417. 455. 471.
520. 543. 701. 912. 982. 1000.

Auditory Tests 98. 297. 415. 797. 943.
959.

Auditory Training 18. 44. 180. 791.

Aural Learning 524.

Aural Stimuli 98. 229.471.

Aurally Handicapped 4. 7. 14. 18. 31.
33. 40-41. 44.57-58.73. 74. 84-88. 92.
98. 113.132. 146-147. 149-152. 171.
181. 202. 208. 246. 253. 258. 294-298.
309-310. 331. 337.347. 349-351. 373.
407. 415. 417. 423. 498. 512. 515-516.
528. 530. 551. 564. 583-585. 607. 616.
624. 656. 682. 700. 712. 719. 735.
739. 761. 763. 773-780. 787. 789.
791-792. 797.824.838.851. 868.931.
943.952. 968. 970. 974. 981. 1000.

Australia 167. 692. 722. 984.

Authoritarianism 255.

Authors 84.

Autism 45. 222. 287. 487. 793-794. 813.
982.

Autobiographies 362. 503. 580.

Autoinstructional Programs 144.

Beginning Reading 69. 329. 512. 525.
527. 732.

Behavior 8. 277. 379. 436. 495. 617.
689. 978. 980.

Behavior Change 1-2. 8-9. 20. 43. 153.
157. 207. 228. 243-244. 251-252. 354.
383. 388-392. 394-396. 445-446. 596-
597. 605. 617. 699. 715. 737. 752.
767. 771. 783. 821. 823. 827. 835.
842. 852. 857. 878. 898. 933. 949.
960. 980.

Behavior Patterns 123. I29. 243. 277-
278. 303-304. 418. 439. 455. 469. 752.
762. 776. 779. 907.

Behavior Problems 65. 261. 278. 281.
387. 392. 395. 445. 448. 462. 468.
473. 490. 514. 605. 617. 670. 699.
704. 737. 745. 771. 827. 835. 878.
930. 933. 962.

Behavior Theories 6. 157. 239. 388. 396.
446. 597. 842. 897. 917. 980.

Behavioral Objectives 617. 946.

Behavioral Sciences 157. 229. 270. 393.
395.

Bender Gestalt Test 376. 382.

Bias 577.

Bibliographies 9. 31. 53. 144. 219-220.
241. 245. 309-310. 621.668-669. 820-
822. 879. 935.

Bibliothcrapy 669.

Biochemistry 83. 308. 462. 808. 830.

Biographies 24. 111. 509-510. 612. 643.
700. 741.831.

Biological Inﬂuences 265-266. 268. 814.
827. 972.

Biological Sciences Curriculum Study
165.

Biology 165. 889. 922.

Birth Defects 182.

Blackman. Leonard S 913.

Blind I91. 269. 475-478. 655. 703.

157

119
computer could control the phototypesetting. Not illustrated in Fig-

ure 3.7, but included in Exceptional Child Education Abstracts since

 

Volume III, No. 2 is a title index.

Selective Publication

 

The rapid expansion of knowledge has made it increasingly apparent
that not only must better ways be fbund to store and retrieve informa-
tion, but that also better ways must be found to organize knowledge.
The computerized search provides a powerful tool for bringing together
documents in a file that have similar information.

While computer searches can be used to retrieve information, the
cost of retrieving information increases with the size of the file.

The fact that the more such techniques are needed (for larger files)
the more it costs provides an interesting paradox; however, not with-
out solution. Analysis of user requests at the ERIC Information Center
has indicated that there are often categories of similar requests. By
categorizing the requests, it is possible to break large files into
smaller subfiles by use of computer searches, thus reducing the cost

of additional searches in special topic areas.

The manner in which the files are prepared for the CBC-ERIC In-
formation Center not only makes it possible to create new subfiles, but
to publish these subfiles. Thus, if there are a number of requests
that could be answered by using the same document, it is possible to
directly publish these documents using computer typesetting and a very
inexpensive offset process. As of August, 1971 The Council for Excep-
tional Children had 59 separate bibliographies which have been pub-

lished in this manner. The latest operating statistics indicate that

120

approximately one half of all user requests for information are being

answered by the use of one or more of these bibliographies.210 This
procedure provides savings by:
1. Using a single search to organize information to answer many
user requests.
2. Reducing personnel time for processing requests that can be
answered by the printed bibliographies.
3. Reducing the cost of mailing a printed bibliography versus a

computer printout abstract.
(The printed bibliographies may have as many as ten abstracts on the
same amount of paper required to print a single abstract on the compu-
ter.)

As used by the CBC-ERIC Information Center the process of selected
publication has provided a powerful technique for the organization of
knowledge and a reduction of costs when compared with running individu-
al computer searches. Thus by analyzing user needs it is possible to
use the computer to serve individuals collectively with considerable
reduction in cost as compared with using the computer to serve them
individually.

This section has provided a brief overview of the Information
Center Operations which places the indexing process and evaluation in
a total context. Because of their importance to the Information Cen-
ter operation and their unique nature the most detail has been provided

about the printing of ECEA, the selected publication of anno-

tated bibliographies, and their use in answering information requests.

 

210
Statistical information based on an analysis by Carl Oldsen,
ECEA Editor, and his staff.

121

Descriptive Statistics about the Present Operating Status
of the CEC-ERIC Information Center

The previous section described the procedures presently used at
the Information Center and provided a model for continuing evaluation,
development and modification of the operating system. This section
describes the following categories of statistical information re-
lated to the present operating status of the Center:

1. The Center's holdings—-types of documents and their subject

content.

2. The rate of acquiring and processing documents.

3. Information request processing.

4. Operating costs.

The major objective of this section is to provide descriptive
statistics about rate and scope of operations under normal conditions.
Many changes in processing have occurred since the Center began op—
eration; however, the changes have become less frequent since the pub-

lication of Exceptional Child Education Abstracts began in April, 1969.

 

For this reason the information discussed is primarily concerned
with data gathered since the initial publishing of ECEA, with greater
emphasis placed on the more recent data.

The descriptive statistics presented are taken from data collect-
ed by the Information Center to monitor its operations and costs de-

scribed in accounting or budgetary records.

The Center's Holdings-~Types of Documents and Their Subject Content
It is the present policy of the Information Center that all docu-
ments acquired and processed will eventually become part of ECEA.

Documents which were acquired by the Center before the publication

122
of ECEA, and felt appropriate for ECEA were included in Volume I and

II. The documents not used in ECEA were discarded; thus the Center's
total holdings are described by the abstracts in issues of ECEA.

An analysis of Volume I and II of ECEA indicates that 36.9% of the
abstracts are of journal articles, 12.4% of research reports, 5.6% of
curriculum guides, and 45.1% of books and other non-periodic documents.
The information in Figure 3.8 describes the subject content of the
5,725 acquisitions that have abstracts in Volumes I and II of ECEA.

No document was assigned to more than one of the categories represent-
ed in Figure 3.8 even though it contained information concerning

multiple categories.211

Acquisition and Processing Rates

 

The early Operations of the Information Center were not typical
of normal processing rates because ordering and processing included
documents found in the literature prior to the current year. Because
of this the first two volumes of ECEA not only contain material from
1969—1970, but also considerable information published before these
years. Beginning with Volume III almost all the information abstract-
ed is recent material.

Carl Oldsen, ECEA editor, indicates that the Center is attempting
to examine all sources of potentially relevant documents to special
education and that processing has reached a steady rate of about 250

documents per month. Of these 250 documents approximately 50 were

 

211Statistical information based on an analysis of 5,715 acquisi-
tions in Volumes I and II of ECEA performed by Carl Oldsen, ECEA Edi-
tor, and his staff.

Administration (AD)

Disadvantaged (DS)

Deaf 6 Hard of Hearing (DH)

Emotionally Disturbed (ED)

Gifted (GC)

Learning Disabilities (LD)

Multiply Handicapped (MH)

Mentally Retarded (MR)

Physically Handicapped (PH)

Educable MentallyRetarded
(EMR)

Trainable Mentally Retarded
(TMR)

Psychology (PS)

Special Education (SE)

- Speech Impaired (SI)

Visually Handicapped (VH)

All Others (XX)

123

ssmmss 149

MN§§NN§§ 206
=§$MN§$NN§§MN§§NN§§NN§§WN§§i 663
MM§§NN§§NN§§WN§§NW§§NWN§S 612
$§§NN§§MN§§NM§ 320
SNN§§NN§§NM§§NN§§NM§§ 514

W) 9 1
ﬁlllmllm)1\®)kk&\)lkk\\\\ﬂNS”‘.1&3th it 7 7 7
ﬁ§ﬁmﬁ§$m®§$ 269

SNN§§MN§§- 217

N§§ml 114

$NR§§NNS 194

masmussmmssmmssmv 412
§NN§S§MN§§NN§§NM§ 406
ummmmmwm 3 7 2

WIIWRWIWW 399

OOOOOOOOOOOOOOOO
LOOLDOLDOLDOLDOLDOLDOLOO
HHNNMWQ‘VLDLDOCNNOO

FIGURE 3.8

SUBJECT CONTENT DESCRIPTION OF INFORMATION CENTER HOLDINGS
BASED ON 5715 ACQUISITIONS IN VOLUMES I 8 II OF ECEA

d\°

 

11.

10.

124

processed for inclusion in Research in Education (RIE) and between 50

 

and 75 for inclusion in Current Index to Journals in Education (CIJE).

 

All of the documents processed are eventually included in ECEA; thus,

in an average issue of ECEA which appears quarterly one may expect to

find about:
1. 150 document surrogates which also appear in RIE.
2. 200-225 document surrogates which are indexed only in CIJE.
3. 400-425 document surrogates which appear in neither RIE or
CIJE.

Thus, of a total of 750 abstracts appearing in each issue of ECEA
about 600 do not appear in RIE; however, about 225 of the 600 are in-

dexed in CIJE.

Information Request Processing Statistics

 

While the processing of additions to the Center's holdings appears
to have reached a stable rate, the number of user requests for informa-
tion appears to be increasing as more individuals learn about the Cen-
ter's capabilities. During the year of 1970 approximately 6,400 re-
quests for information were received and procesSed by the Center. (Of
these about 21.4% were processed during the first quarter, 30.8% during
the second quarter, 22.3% during the third quarter and 25.5% during the
fourth quarter.212 During the first quarter of 1971 2,176 information
requests were processed as compared to 1,380 during the first quarter
of 1970. If it were assumed that the first quarter represented a

fourth of the information requests that will be processed during 1971,

 

212CEC-ERIC Information Center, ”Processing Costs 6 Formulas," an
unplublished summary prepared by the Center, September, 1970, under
the direction of Carl Oldsen.

125

the projected total would be about 8,700 requests, a projected increase
of about 36% over the previous year. Even though the number of re-
quests are increasing, the Center has been able to c0pe with this with—
out increasing staff through greater use of the computer and the com—
puter-generated bibliographies.

Table 3.1 indicates the number of requests received, the type of
responses made, and the type of users making the requests during the
first quarter of 1971. There are more responses than requests because
some requests have several questions which require different reSponses.
The information related to user categories, types of reSponses, and the
way in which requests were received appears to be similar to previous

quarterly reports except for a general increase in all categories.

Processing Costs

 

The figures that are presented in this section have been taken
from operations for the calendar year 1970 which included the majority
of the processing for Volume II of ECEA. In determining the cost for
various operations all salaries, supervisory time, rental of office
space, supporting services and miscellaneous overhead items were in—
cluded. For example, to obtain the cost of acquisitioning documents,
the total number of documents processed in Volume II of ECEA was di-
vided into the total cost of salaries, purchasing the documents,
supervisory overhead, and supportive clerical functions. The total
cost for these functions was $36,150 divided by 3,615 documents--

resulting in an average cost of $10 per document.

Cost of Abstracting, Indexing, and Cataloging Calculating ab-

 

stracting, indexing, and cataloging costs as described above results

126

TABLE 3.1

AN ANALYSIS OF INFORMATION REQUESTS PROCESSED BY THE

CBC-ERIC INFORMATION CENTER DURING THE FIRST QUARTER, 1971

Total Requests Made to Clearinghouse During Report Period

 

Phone
Letter
Visits
TOTAL

Jan.

 

50
S73
14
637

Feb.

 

57
601

___7
665

Types of Responses

 

Reference - nonsubject

Reference - subject

Spot bibliographies 6
literature searches

General question on ERIC

Other (Including mailing list)

TOTAL

General Breakdown of Users

425
566

597
S
35

182

N
00

295

667'

 

Teachers

Teacher educators

Supervisors.8 Consultants

Psychologists 8 Social
Workers

Educational decision makers

Research 6 Development
Specialists

Information professionals 6
dissemination specialists

Professional organizations

Students

Federal Gov't. 8 Public
Agencies

Parents

Unidentified

TOTAL

103
59
17

11
88

13
51

48
166

11
19
51
637

86
73
9

11
54

10
45

42
254

17
14
51
665

Mar.

 

49
816
9
874

——-—-
———-

908

58
44
399

24
29
59

Total
for
Qtr.

 

156
1990
30
17

N
O\

170
210
42

28
215

.30
154
133
819

52

161

N
H
\l
O‘

o\°

for
Qtr.

 

LO
var—ax)
«RUIN

 

h—I
O
O

 

 

33.1

(X)

13.1
100

 

 

 

H
b—ILON
004:

(N
\l
\l

\INN
boon-

 

H
O
O
O

 

 

127

in the following: (1) The cost of abstracting, indexing, and catalog-
ing 3,615 documents (not including special processing for RIE or CIJE)
‘averaged $22.20 per document. (2) The cost of special processing of
600 documents for use in RIE, statistical reports prepared for ERIC,
and all supportive functions averaged $24.50 per document. (3) The
special processing of 1,000 documents which were indexed for use in
CIJE averaged $7.60 per document. (4) If the special processing re-
lated to CIJE and RIE were included in the total cost of abstracting,
indexing, and cataloging, and distributed over the total 3,615 docu-

ments, the average of these functions was $27.50 per document.

Cost of Answering Search Requests The cost of answering 6,000

 

search requests averaged $10 per request or $5 per response. The rea-
son for the lower cost per reSponse is because on the average a user's
request for information requires two responses, i.e., an average of

two questions are asked in a single user request for information. These
costs include all personnel costs, overhead figures, computer costs

for running searches and mailing costs. They do not include the cost

of documents or special materials that were sent in response to re-
quests. Obtaining an accurate value for these materials is difficult
because many of them are obtained free; however, estimates suggest the

value of materials sent averaged $3 to $5 per response.

Costs Related to Printing ECEA Of particular interest to the

 

computerized publication operation is the cost of printing ECEA. The
cost based on a per abstract average for printing 2,000 journals was

reported as:

128

ECEA, per abstract

Keypunch - Computer - Photodata* - Printing

1.40 .5.00 2.10 3.15 = $11.65
ECEA - Printing

$3.15 per abstract

$14.30 per page
PhOto-Data typesetting

$2.10 per abstract

$8.00 per page

Computer Time 213
$5.00 per abstract

Based on 3,615 abstracts and 2,000 copies of Volume II of ECEA,
the printing costs average approximately $21 per single copy of one
volume. Included in the computer costs were costs for building infor-
mation files; generating speCial author, title, and subject indexes for
ECEA; and building computer searchable description files for use in an-
swering user requests for information.

The figures presented are pessimistic in that they have attempted
to include every item that might possibly be related to the specified
cost. In attempting to compare these figures with similar operation,
it would be necessary to have information about how both sets of
figures were calculated as well as information concerning the quality

and quantity of services offered by the different information centers.
Summary

The CBC-ERIC Information Center had its origin with the Education-
al Resource Information Center (ERIC) Network and was established and

began operation early in 1967. Many of the procedures developed at

 

213 ,
Ib1d.

 

*Computer controlled typesetting.

129

the Information Center were influenced by the development of ERIC, in-
cluding the indexing processes which use the ERIC Thesaurus.

In meeting the unique needs of this specific Information Center,
procedures for selective publication utilizing computer searches and
computer-controlled phototypesetting have been developed. These pro-
cedures augment computer and hand searching which are ordinarily used
in answering requests at information centers.

The number of documents being processed has become relatively
stable and appears to be approximately 3,000 documents per year (all
documents processed have abstracts in ECEA). The number of requests for
information has been steadily increasing; however, it appears there is
not sufficient data to estimate the future rate of information request
processing.

The costs which are described are those most common to informa—
tion center processing or those unique because of the computerized
publication done by the CBC-ERIC Information Center. The process in-
volved in calculating these costs attempted to take a conservative

approach which included all related and overhead costs.

CHAPTER IV: PROCEDURES USED IN THE EVALUATION AND ANALYSIS
OF THE INFORMATION CENTER INDEXING METHODS

The first two of six objectives stated in Chapter I--to document
the development of the information system used by the CEC-ERIC Informa-
tion Center and to document the manner in which CBC-ERIC Information
Center uses the BIRS system and other computerized program5215--were
accomplished in the previous chapter. The information provided in this
documentation described the environment in which the third objective,

the evaluation of the indexing methods used by the CBC-ERIC Information

Center, took place. The objective of this chapter is to describe the

 

procedures which were used in this evaluation, with the results and in-

 

terpretation of the evaluation being reported in the following chapter.
The remaining three objectives related to recommendations for the Infor-
mation Center's operation as well as implications for similar studies
are examined in the last chapter.

The procedures described in this chapter had two distinct phases
which took place simultaneously. One phase involved the evaluation of
the indexing methods used in Volume I of ECEA as determined by measures
of Average Macroprecision, Average Microprecision and estimates of
average recall for questions written to retrieve randomly selected

target documents.

 

215Additional information concerning the Information Center's
computer processing is found in Appendix I.

130

131

A second phase of the procedures involved:

1. A comparison of the vocabulary of the terms used in indexing
Volume I of ECEA with (a) the vocabulary of the collective
titles of the document surrogates included in Volume I, and

(b) the vocabulary of the Thesaurus developed by Samuel Price216
for indexing documents in special education.

2. An analysis of the frequency with which different indexing
terms were assigned to the documents abstracted for Volume I
of ECEA.

3. An analysis of ambiguity in assignment of terms having similar
meanings.

4. The development of a subset of the ERIC Thesaurus to be used
in indexing of successive volumes of ECEA.

5. A preliminary evaluation of the effect of applying refined
procedures and a subset of the ERIC Thesaurus to Volume II.

The Evaluation of the Indexing Procedures
Used in Volume I of ECEA

Presently all requests for information which require computer

searches are processed by one of several Information Center staff members

who translates the user request into a computer-searchable question.

These staff members and those involved in indexing and abstracting have

become familiar with the terms in the ERIC Thesaurus which were used in

indexing Volume I of ECEA. The computer indexing now used in preparing

description files for searches uses terms assigned by the indexer and

 

216Samuel T. Price, Thesaurus of Descriptors for an Information
Retrieval System in the Subject Matter Area of Special Education (Nor-
mal, Illinois: Illinois State University, Special Education Instruc-
tional Materials Laboratory, 1970), pp. 1-465.

 

 

'132
terms extracted from the titles. The BIRS programs have the ability to

use other computer indexing methods where success in computer searching
may be less dependent upon a person's knowledge of the types of terms

assigned by the Center's indexers.

Questions Examined

 

The following questions were examined taking into consideration

the above conditions:

Question 1 As measured by Average Macroprecision, Average

 

Microprecision, and estimated average recall, how effective is the in-
dexing method used by the Information Center for:
a. CEC-ERIC staff who are familiar with the Information Center's
indexing system?
b. Professional educators who are unfamiliar with the Information

Center's indexing system?

Question 2 How effective is a computerized indexing method

 

using terms extracted from the title and abstracts for:
'a. CBC-ERIC staff who are familiar with the Information Center's
indexing system?
b. Professional educators who are unfamiliar with the Information

Center's indexing system?

Qpestion 3 How effective is the indexing method used at the

 

Information Center when combined with machine indexing of abstracts for:
a. CEC-ERIC staff who are familiar with the Information Center's
indexing system?
b. Professional educators who are unfamiliar with the Information

Center's indexing system?

133

These questions correspond to Questions 1, 2, and 3 on pages 8 and 9
of the introduction. In the computerized indexing methods considered,

the following three sources of indexing terms were used:

Source 1 Indexing terms selected from the ERIC Thesaurus to
describe the documents abstracted for Volume I of ECEA. These terms

were manually selected by indexers and are called descriptors.

Source 2 Words extracted by the computer from titles of the

documents whose abstracts were included in Volume I of ECEA.

Source 3 Words extracted by the computer from the abstract (sum-

mary) of the document surrogates.

When the computer is used to extract terms from the title or ab-
stract, all terms not appearing on an exclusion list containing such
words as 3, pp, She) 32d) pp, pp, by, pf, piph, and other non-descrip-
tive articles, adjectives, conjunctions or prepositions are included as
indexing terms. The three indexing methods employed in this evaluation

used the fellowing combinations of terms selected from the above sources:

 

 

Indexipg_Method 1 Terms from titles and descriptors.
Indexing Method 2 Terms from titles and abstracts.
Indexing Method 3 Terms from titles, abstracts, and descriptors.

 

Indexing Method 1 corresponds to the indexing procedures examined
in Question 1, Indexing Method 2 to those examined in Question 2, and
Indexing Method 3 to those examined in Question 3. The evaluation of
these indexing methods used measures of Average Macroprecision, Average

Microprecision, and estimates of average recall. These measures were

134

used to compare the search results of questions written to retrieve tar-
get documents by CEC-ERIC staff versus those written by professional

educators.

Selection of Target Documents

 

The data base fer the evaluation of the indexing procedures used at
the CBC-ERIC Information Center was 2100 abstracts (The information con-
tained in each abstract is illustrated in Figure 3.9, page 133) con-
itained in Volume I of ECEA. A stratified random sample of 105 documents
was selected by using random number tables to specify five documents from
each sequence of 100 documents. For example, five abstracts were ran-
domly selected from abstracts 1 through 100, five from 101 to 200, five
from 201 to 300, and so on. The documents thus selected were placed on
an information file which was indexed to create a computer searchable
description file fOr each of the three methods described in the previous

section.

’Preparation of Questions to Retrieve Target Documents

 

Two sets of 105 questions were written to retrieve the 105 target or
source documents. The first set of questions was written by ten CBC-ERIC
Information Center staff members who were familiar with the indexing pro-
cedures at the Center. The second set was written by seven professional
educators from Andrews University who were not familiar with either the
indexing procedures used at the Information Center or the ERIC Thesaurus
which was used by indexers. Each of the professional educators had a
doctor's degree in education and was familiar with the terminology used

in the field of special education.

135

The types of questions written by the two groups were logical
search questions using the operators .AND., .OR., and .NOT. to combine
words or phrases in the manner described in pages 82 through 86. Both
groups were given training and technical assistance until they were able
to demonstrate a proficiency in writing logical search questions.

The 105 document surrogates which served as the basis for writing
search questions contained only a title and a summary of a document.

For each surrogate given to an individual he was instructed to write one
question to retrieve the document described or similar documents. Those
writing questions were told that the descriptions contained in the title
and summaries represented ideal answers to requests for information that
individuals might have sent to the Information Center. Neither group
had any prior information concerning the manner in which the documents
had been indexed, nor were they assured that the document surrogates
which served as a basis for writing the question would be in the file
that would be searched. Both groups knew that multiple indexing methods
had been employed in creating the computer searchable description files
but were unaware of how those files were made.

The basic reason for having two groups write questions was to pro-
vide one set of questions generated by individuals familiar with the
Center's indexing procedures (the CBC-ERIC Information Center staff
members) and a second set by individuals who were not familiar with the
indexing procedures. The evaluation was in no way meant to examine the
skill of the individuals in writing questions. The training was de-
signed to develop the skills of two groups of individuals so that the
major difference as related to this study was their degree of knowledge

about the indexing language used at the Information Center.

136
Relevance Judgments

 

Three judges familiar with writing logical search questions and
with educational literature were used to rate the relevance of the res-
ponses to specific questions. In rating the relevance they gave a rating
of Q_if the document surrogate retrieved by a question had no relation-
ship to that question, l_if it had a moderate relationship to the ques-
tion, and 2.1f it had a very direct and obvious relationship to the
question. When rating the relevance they were provided with the two
sets of questions with a list of documents each question retrieved.

They were instructed to read a single abstract and then compare it with
all questions which had retrieved that abstract; thus, Abstract 1 was
read and rated for all questions retrieving Abstract 1 followed by Ab-
stract 2 and so on.

The sum of the ratings of the three judges had possible values ran-
ging from zero to six. An abstract was considered to be relevant to a
question if the combined score of the judges was three or greater and if
a document obtained a rating of §_it was the result of 3 one's. Thus if
a document obtained a rating of §Dby means of one judge rating it Q, a
second rating it as I, and a third rating it as 23 it was not considered

as relevant.

Measurement Techniques Emplgyed in the Indexipngvaluation

The units of measurement used in this study were based on the
retrieval results for the search question. Each question was considered
as a question written either by a person who was familiar or one who was
not familiar with the Information Center's indexing methods. Aside from
these differences all questions were considered to have been written by

individuals with equivalent skills.

137
Figure 4.1 illustrates the type of data that was collected and the

descriptive statistics used in measuring the effectiveness of the pre-
viously described indexing methods. The procedures used for calculating
Average Microprecision and Average Macroprecision are described in the
related research. The calculations of estimated average recall were ob-
tained by dividing the number of target documents retrieved by the total
number of target documents which could be retrieved. For example, if 80
of a possible 105 target documents were retrieved the estimated recall
would be 80 divided by 105.

For each indexing method a chi-square test was applied to determine
(1) if the number of target documents retrieved with one set of search
questions was significantly different from the number of target doc—
uments retrieved by the other set of questions and (2) if the ratio of
relevant to non-relevant documents retrieved with one set of questions
was significantly different from the ratio retrieved by the other set.
The null hypothesis in each case asserted that there was no significant
difference at the .01 level.

The Content Analysis of the Vocabulary Used in
Indexing Volume I of ECEA

Because of the relationship which exists between the Information
Center and the ERIC Network the descriptive terms assigned by indexers
have come from the ERIC Thesaurus. In addition to the terms from the
Thesaurus other "identifier” terms which usually contained information
such as names of institutions, names of specific tests, or geographic
location were assigned.

The ERIC Thesaurus was developed by a group of experts from many

CBC-ERIC
STAFF

138

PROFESSIONAL
EDUCATORS

DESCRIPTION OF DATA
AND STATISTICS

 

Total No. of Documents Retrieved

 

No. of Relevant Documents Retrieved

 

Average Microprecision

 

METHOD 1

Average Macroprecision

 

No. of Target Documents Retrieved

 

Estimated Average Recall

 

Total No. of Documents Retrieved

 

No. of Relevant Documents Retrieved

 

Average Microprecision

 

METHOD 2

Average Macroprecision

 

No. of Target Documents Retrieved

 

Estimated Average Recall

 

Total No. of Documents Retrieved

 

No. of Relevant Documents Retrieved

 

Average Microprecision

 

METHOD 3

Average Macroprecision

 

No. of Target Documents Retrieved

 

 

 

 

-Estimated AverageRecall

 

FIGURE 4.1

A DESCRIPTION OF DATA AND DESCRIPTIVE STATISTICS USED
COMPARING VARIOUS INDEXING METHODS

139

areas of education217 with the result that there are often terms having
closely related meanings. Because of the multiple possibilities for
indexing the same concept, two indexers have often used different terms
for indexing the same idea, thus making the retrieval of documents more
complicated.

The way in which the ERIC Thesaurus was developed also raised ques-
tions about how well the terms selected from that Thesaurus to index
Volume I of ECEA represent the vocabulary used in the field of special
education. Specifically the procedures in this section were developed

to examine the question, "Is the vocabulary of the terms used in indexing_

 

Volume I of ECEA found in the literature of special education?"

 

Compilation of IndexingiTerms Assigned to Volume I of ECEA

 

As a part of the indexing procedures used in preparing ECEA for pub-
lication, indexing terms are selected from the ERIC Thesaurus and as-
signed to the descriptors field. A list of these terms assigned by the
various indexers to abstracts contained in Volume I was compiled, inclu-
ding information concerning when the term was first used, and the number
of times the term was used. For example, "l6--MEDICAL RESEARCH--l66"
would indicate that the term MEDICAL RESEARCH was used 16 times in the
2100 abstracts found in Volume I and that it was first used in Abstract

No. 166.

 

217James L. Eller and Robert L. Panek, "Thesaurus Deve10pment for a
Decentralized Information Network," American Documentation, July, 1968,
p. 213-220.

 

140
Subjective Analysis by Indexers of Terms Used in Volume I of ECEA

 

The list of terms used in Volume I of ECEA served as the basis for
a subjective analysis by the indexers. To assist in this analysis each
term was put on a single IBM card with the information concerning the
number of times the terms was used and the abstract to which the term was
first assigned. Also a Key-Word-In-Context index was prepared to aid in
grouping terms with similar meanings.

The indexers examined each term that had been assigned to Volume I
of ECEA, comparing that term with similar terms. If the indexers could
not establish significant differences in the meaning of similar terms, a
decision was made concerning which of the terms should be kept on a list
for use in future indexing. If it was felt by a professional staff mem-
ber that one term was generally used more often than the other in the
special education vocabulary, this term was kept. If, however, there
was no preference, the term which had been used most often in the index-
ing of Volume I was kept. The result of these procedures was a subset of
terms used in indexing Volume I of ECEA which has served as an authority

list for indexing successive volumes of ECEA.218

A Comparison of the Word Vocabulapy Used in the Indexing_Terms
of Volume I of ECEA with Words Extracted from the Literature

 

 

Two lists of words extracted from the literature of special educa-
tion were used for comparisons of words found in the descriptive terms
used in indexing Volume I of ECEA. The first list of words was the
collective vocabulary found in the 2100 titles of documents abstracted

for Volume I of ECEA. The second list of words was the vocabulary of

 

218Thesaurus fer Exceptional Child Education (Arlington, Virginia:
CEC-ERIC Information Center on Exceptional Children, 1971), 12 pp.

 

141

the terms in a Thesaurus prepared for use in special education by Samuel
Price. This Thesaurus was developed by using a computer to extract terms
from the literature of Special education, thus providing the Thesaurus
an empirical base.219 The words found in the titles and in the Thesaurus
of Samuel Price were compared with the words in the indexing terms used
by the Infermation Center by means of intersections of the various lists.
Proper nouns, conjunctions, prepositions, and other non—descriptive
function words were removed from the word lists before comparisons were
made. The letters A, B, C, and D are used as follows to stand for lists

involved in the comparisons or other types of analysis:

A = All content words found in the titles of Volume I of ECEA

B = All content words found in the Thesaurus develOped for special
education by Samuel Price.

C = All content words found in the indexing terms assigned to
documents in Volume I of ECEA.

D = All content words found in the reduced list of indexing terms

developed through the analysis of the indexing terms used in
Volume I of ECEA.

In addition to these word lists, lists of word roots were generated
by a special computer program. This program was able to take a list of
words and reduce each word to a root which is used in computer searches
involving the BIRS programs. For example, ACCELERATED, ACCELERATING,
and ACCELERATION would all reduce to the root, ACCELERATE. When one of

the above lists is reduced to root form, it will be referred to with an

 

219Samuel T. Price, "The Development of a Thesaurus of Descriptors
for an Information Retrieval System in Special Education," (unpublished
doctoral dissertation, University of Pittsburgh, 1969), abstract.

142
R in front of the letter; thus the list of all roots from list A will be

referred to as RA, from B as RB, from C as RC, and D as RD. Using this
terminology with the symbol F) to stand for "intersection", L1 to stand
for "union" and n(S) to stand for the number of objects in the set, the
following ratios were examined:

1. A F) B
A L) B

2. n(RA F) RB;
n RA (1 RB

i

 

 

 

3. The number of words in A L) B which also have a root in RA F) RB
A LJ B
4 n A F) C
n(A L) C)
S n(RA F) RC)
n(RA (1 RC)
6. The number of words in A L) C which also have a root in RA F) RC
A L) C
7. ngB F) C;
n B L) C
8. n(RB F) RC
n B L)

9. The number of words in B LJ C which also have a root in RA F) RC
B LJCT

The ratio of terms found in the intersection versus the number of
terms found in the union of two lists was used as a basis for comparing
the similarity of the vocabularies of various lists. Specifically, the
ratios involving the intersections of the two lists which were extracted
from the literature was used as a basis for determining what preportion
of similar words or roots might be expected in such intersections.

This result was then compared with the result of other intersections
involving the non-empirically-based words from the indexing terms used

in Volume I of ECEA.

143

Analysis of the Vocabulary Used
in WritingpQuestions to Retrieve Target Documents

 

 

Two comparisons were made of the question vocabulary used by the
professional educators with the question vocabulary used by the CEC-ERIC
staff. The fipsp_examined the proportion of phrases (terms with more
than one word) versus single word terms. The second examined the pro-
portion of question terms contained in the ERIC descriptors used by
each group. A chi-square test was used to determine if there was a sig-
nificant difference at the .01 level between the two groups for each of
the comparisons. Because the professional educators had not seen the
ERIC terms used to index Volume I of ECEA their use of these terms would
tend to support the assertion that these terms are used in the field of
special education.

Analysis of Changes in Indexing Procedures
Between Volume I and Volume II of ECEA

As a result of the analysis done by the indexers on the vocabulary
used for indexing Volume I of ECEA some changes in indexing procedures
were implemented in Volume II. These procedures involved reducing the
'number of terms assigned to index a specific document and using a more
controlled list of terms for indexing later portions of Volume II. The
list of terms used was the subset developed in the analysis of the terms
used in Volume I. The results of this analysis was not available until
approximately the first half of Volume II had been indexed; thus this
list was used only in the last half.

In indexing Volume I there had been a general rule that an indexer
should assign any indexing term which was possibly related to the

content of a document even if the document contained only a small

144

amount of information related to the term. The results of computer
searches had indicated that many documents were retrieved which did
not contain sufficient information about the question to be useful.
Subsequently, indexers decided that in Volume II a term would not be
assigned to a document unless the document contained considerable
infOrmation related to that term.

This change was partially a result of the Information Center's use
of computer searching to assist in answering user requests. When the
information files were relatively small all documents with any infor-
mation about a topic might be sent. As the information files grew those
answering requests often read the abstracts and reduced the number sent
to only those abstracts that contained considerable information about a
requested tepic.

To compare the effect of reducing the number of terms assigned to
describe a document, identical searches were performed on Volume I and
Volume II of ECEA. These searches were a part of the normal processing
done to develop the selective bibliographies. The results of the
searches were edited by the person in charge of producing the bibliog-
raphies to assure that only documents containing considerable informa-
tion about the tOpic were retained. The person who wrote the question
and did the editing was unaware that the results would be used in an
analysis of the indexing procedures. An analysis for each search
compared the precision for documents extracted from Volume I with the
precision of documents extracted from Volume II. A sign test was used
in examining 21 searches to determine if the precision of documents
retrieved from Volume I versus Volume II was significantly different at

the .01 level.

145

Summary

The purposes of the procedures described in this section were to:

1.

Determine how effective various computerized indexing methods

available for use by the CEC-ERIC Information Center were with:

a. Staff members familiar with the indexing language used at
the Center.

b. Professional educators who are not familiar with the index-
ing language.

To examine the vocabulary used in the indexing terms assigned

to Volume I of ECEA to determine if it was similar to that which

was extracted by various means from the literature of special

education.

To determine whether changes in indexing procedures between

Volume I and II of ECEA had affected the precision of

searches made on the two volumes.

The results of using these procedures is reported in the following

chapter.

CHAPTER V: RESULTS OF THE EVALUATION AND ANALYSIS OF
THE INFORMATION CENTER INDEXING METHODS

This chapter is divided into four major sections, each of which
describes the results for a specific aspect of the total study. The
procedures for the first section of this chapter are described in the
first section of Chapter 4 while the procedures for the last three
sections are discussed in the second section of Chapter 4.

The fi£§£_section describes the results of an evaluation of the
indexing method used in Volume I of ECEA and compares its effectiveness
with two alternative methods. The measures of effectiveness of the
three indexing methods are based on the computerized retrieval of
randomly selected target documents. The computer search questions used
to retrieve the documents were written by CBC-ERIC staff members
familiar with the Center's indexing method, and professional educators
who were unfamiliarwith the Center's indexing method. The statistical
measures used in the comparison are Average MacrOprecision, Average
Microprecision, and estimated average recall.

The second section describes the analysis of the indexing vocabulary
based onia comparison of the vocabulary found in the ERIC terms used in
indexing Volume I of ECEA with:

l. The vocabulary found in the collective titles of the doc-

ument surrogates included in Volume I.

2. The vocabulary found in a thesaurus developed by Samuel

146

147

Price for indexing documents in special education.220

The 32139 section describes a subjective analysis, done by indexers,
of the ambiguity in assignment of similar ERIC descriptors to Volume I
of ECEA. This analysis resulted in the develOpment of a subset of the
ERIC Thesaurus which has been used in indexing successive volumes of
ECEA.221

The feurth section describes the preliminary results of an evalua-
tion of the effect of applying refined indexing procedures to Volume II
of ECEA. This evaluation was based upon the precision of 20 search
questions which were written to retrieve documents from both Volume I

and Volume II. The searches were part of the Center's normal processing

done to identify abstracts to be included in selected bibliographies.
A Comparative Evaluation of Three Indexing Methods

A detailed description of the procedures used to obtain the
results described in this and following sections is found in the
previous chapter under appropriate subheadings. The procedures used
in calculating Average Macroprecision, Average Microprecision, and
estimated average recall are discussed on pages 63 through 70 of the

review of related literature.

 

220Samuel Price, Thesaurus of Descriptors for an Information
Retrieval System in the Subject Matter Area of Spedial Education,
(Normal, Illinois: Illinois State University, Special Education
Instructional Materials Laboratory, 1970), pp. 1-465.

 

221Thesaurus fer Exceptional Child Education, (Arlington, Virginia:
CEC-ERIC Information Center on Exceptional Children, 1971), 12 pp.

 

148

Questions Examined

 

The results reported in this section relate to three questions
first stated on pages 8 and 9 of the introduction, and then later
restated in an expanded form on page 133 of Chapter 4. The specific

questions examined are:

Question 1 As measured by Average Macroprecision, Average

 

MicrOprecision, and estimated average recall; how effective is the
indexing method used by the Information Center for:
a. CEC-ERIC staff who are familiar with the Information
Center's indexing system?
b. Professional educators who are unfamiliar with the Informa-

tion Center's indexing system?

Question 2 How effective is a computerized indexing method

 

using terms extracted from the titles and abstracts for:

a. CEC-ERIC staff who are familiar with the Information

\

Center's indexing system?
b. Professional educators who are unfamiliar with the Informa-

p

tion Center's indexing system?

Question 3 How effective is the indexing method used at the

 

Information Center when combined with machine indexing of abstracts

for:
a. CEC-ERIC staff who are familiar with the Information
Center's indexing system?
b. Professional educators who are unfamiliar with the Informa-

tion Center's indexing system?

149
Indexing Methods Compared

The results relating to these three questions are found in
Table 5.1 with specific portions of the results presented graphically
in Figures 5.1, 5.2 and 5.3. In the Figures and Tables indexing
methods 1, 2, and 3 correspond to the indexing methods employed in
questions 1, 2, and 3. Specifically, the indexing methods may be

defined as follows:

Indexing_Method 1 This method used terms manually assigned

 

from the ERIC Thesaurus by the indexers and terms extracted by the

computer from the titles of the document surrogates.

Indexipg Method 2 This method used terms extracted by the

 

computer from the titles and abstracts of the document surrogates.

IndexingpMethod 3 This method used terms manually assigned

 

from the ERIC Thesaurus by the indexers and terms extracted by the

computer from the titles and abstracts of the document surrogates.

Results of the Comparison of Indexing Methods

 

The examination of the three previously stated questions implies
three corollary questions which consider whether or not there is_a
significant difference between the effectiveness of the three indexing
methods when the search results of questions written by the CBC—ERIC
staff members are compared with the search results for questions
written by professional educators.

Inexamining these corollary questions a chi-square test of

significance was used to test the following three null hypotheses:

150
TABLE 5.1

DESCRIPTIVE STATISTICS RESULTING FROM THE
EVALUATION OF THREE INDEXING METHODS

 

 

 

CEC-ERIC Professional Description of Data
Staff Educators and Statistics
209 143 Total No. of Documents Retrieved
METHOD 1 175 135 No. of Relevant Documents Retrieved
Terms
from .836 .945 Average Microprecision
Titles
and .960 .945 Average MacrOprecision
Descriptors
77 57 No. of Target Documents Retrieved
.732 .54 Estimated Average Recall
133 206 Total No. of Documents Retrieved
METHOD 2 102 191 No. of Relevant Documents Retrieved
Terms
from .767 .927 Average Microprecision
Titles
and .900 .973 Average Macroprecision
Abstracts
38 81 No. of Target Documents Retrieved
.362 .77 Estimated Average Recall
305 296 Total No. of Documents Retrieved
METHOD 3 227 265 No. of Relevant Documents Retrieved
Terms
from .743 .895 Average Microprecision
Titles
Descriptors .905 .974 Average Macroprecision
and
Abstracts 85 84 No. of Target Documents Retrieved

.81 .80 Estimated Average Recall

 

151

. 56 Results for CEC-ERIC Staff

'3’?!

  

        

Number Of m Results for Professional Educators
Target Documents
Retrieved
TDR = Target Documents Retrieved
100 ' EAR = Estimated Average Recall
TDR
9° 1 . TDR = 81 EAR
TDR = 77 EAR = .77
80 - -”
EAR . ...74 “RN
Aim
70 n I
TDR - 57
60 -

EAR . .54
so - . \. .-.;.-
\Ki, .. .

40 -

30 -

ti

4g;

20 -

$1

KIEE.

M

S
I
53193

I
I
I..A0.0

\\)Rm\l\\§

 

DO

0 -
INDEXING INDEXING INDEXING
METHOD METHOD METHOD
1 2 3
FIGURE 5.1

NUMBER OF TARGET DOCUMENTS
RETRIEVED BY THREE INDEXING METHODS

ex.
332:
TEii
1?:
':§ﬁ
3a;
' a

.
..
0“
'

 

Average
Microprecision
1.0 -
F .945
)1
.90 -

.70 -

7111\
h

 

I 2' 55095? ‘ ' 3 :1'2 - ("E ' '

152

(932 Results for CEC-ERIC Staff

3.0

m Results for Professional Educators
&

(fr/f

.743

    

   

ﬁﬁm
i

   

INDEXING INDEXING INDEXING
METHOD METHOD METHOD
1 2 3

FIGURE 5.2

AVERAGE MICROPRECISION FOR THREE INDEXING METHODS

     

153

  
   

:1)? Relevant Documents Retrieved
Eta-:7 3+: by CBC-ERIC Staff
Number of
Relevant -. Relevant Documents Retrieved
Documents .\ “\\by Professional Educators
Retrieved
DTarget Documents 2
250 Retrieved
225
200
175
150
125
100
75
50
25
0 .
INDEXING INDEXING INDEXING
METHOD METHOD METHOD
1 2 3
FIGURE 5.3

NUMBER OF RELEVANT DOCUMENTS RETRIEVED BY EACH INDEXING METHOD

154
Null Hypothesis 1 For indexing method 1 there is no signifi-

 

cant difference at the .01 level between the Observed and expected
number of target documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

Null Hypothesis 2 For indexing method 2 there is no signifi-

 

cant difference at the .01 level between the observed and expected
number of target documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

Null Hypothesis 3 For indexing method 3 there is no signifi-

 

cant difference at the .01 level between the observed and expected
number of target documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

In Tables 5.2, 5.3, and 5.4 the upper portion of the cells having
multiple data contains the observed values while the lower portion
contains the expected values calculated by using the marginal totals.
The value of chi-square necessary to reject the null hypothesis at
the .01 level is 6.64. The values obtained for chi—square were such
that the first two null hypotheses were rejected, while the third
failed to be rejected.

A second set of corollary questions implied by the evaluation con-
sidered whether there was a statistically significant difference in
the Average Microprecision resulting from search questions written
by CEC-ERIC staff members as compared with search questions written
by special educators. The three null hypotheses implied by this set

' of corollary questions are:

155
TABLE 5.2

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 1

Statement of Null Hypothesis 1

 

For indexing method 1 there is no

significant difference at the .01 level between the observed and expec-
ted number of target documents retrieved from questions written by CEC—
ERIC staff members versus questions written by professional educators.

OBSERVED AND EXPECTED VALUES

 

Number of Target

Number of Target

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Documents Documents Totals
Retrieved Not Retrieved
28
CBC-ERIC 77
ff 105
Sta .67 38
Professional 57 48
Educators 105
' . 67 38
Totals 134 76 210
Cell Contributions to Chi-Square
1(49 2.635
1.49 2.635 x2

8.25

Since the value of chi-square is greater than 6.64 Null Hypothesis 1
is rejected at the .01 level.

1

56

TABLE 5.3

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 2

Statement of Null Hypothesis 2
significant difference at the

For indexing method 2 there is no
.01 level between the Observed and expec-

ted number of target documents retrieved from questions written by CEC-
ERIC staff members versus questions written by professional educators.

OBSERVED AND EXPECTED VALUES

 

Number of Target

Number of Target

 

 

 

 

 

 

 

 

Since the value of chi- ~square is greater than 6. 64 Null Hypothesis 2

 

 

 

 

 

 

is rejected at the .01 level.

Documents Documents Totals
Retrieved Not Retrieved
38 67
CBC-ERIC 105
Staff 59.5 p 45.5
_ 81 24
Profe551onal ,,—r”””’7””’7"”,«””’T’T’T’J 105
Educators 59 . 5 45 . 5
Totals 119 91 210
Cell Contributions to Chi-Square
7.8 10.2
. ’7
7.8 10.2 )(‘

36.0

l

57

TABLE 5.4

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 3

Statement of Null Hypothesis 3

For indexing method 3 there is no

significant difference at the .01 level between the observed and expected
number of target documents retrieved from questions written by CEC-ERIC
staff members versus questions written by profeSsional educators.

OBSERVED AND EXPECTED VALUES

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Number of Target Number of Target
Documents Documents Totals
Retrieved Not Retrieved
CBC-ERIC 35 20
Staff 105
84.5 20.5
Professional 84 21
Educators 105
84.5 20.5
Totals 169 41 210
Cell Contributions to Chi-Square
, .003 .00125
.003 .00125 X2 = .0085.

Since the value of chi-square is less than 6.64 Null Hypothesis 3

is accepted.

158
Null Hypothesis 4 For indexing method 1 there is no signifi-

 

cant difference at the .01 level between the observed and expected
number of relevant documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

Null Hypothesis 5 For indexing method 2 there is no signifi-

 

cant difference at the .01 level between the Observed and expected
number of relevant documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

Null Hypothesis 6 For indexing method 3 there is no signifi-

 

cant difference at the .01 level between the observed and expected
number of relevant documents retrieved from questions written by CEC-

ERIC staff members versus questions written by professional educators.

The data used in testing null hypotheses 4 through 6 and the
resulting values of Chi-square may be fOund in Table 5.5, 5.6, and 5.7.
As may be noted all the values for chi-square are greater than 6.24.
Thus, the null hypotheses 4 through 6 were rejected at the .01 level
of significance.

Factors Important to the Analysis of Data Resulting
from the Comparison ofIndExingpMethods

 

 

In analyzing the results there were three factors with their
interactions which were considered. These factors were the differences
between the two groups writing questions, the conditions under which
the evaluation took place, and the vocabulary used in the questions
written by each group. While all the differences in the comparisons

would have to be attributed to the types of search questions which

159
TABLE 5.5

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 4

Statement of Null Hypothesis 4

 

For indexing method 1 there is no

significant difference at the .01 level between the observed and ex-
pected number of relevant documents retrieved from questions written by
CBC-ERIC staff members versus questions written by professional

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

educators.
OBSERVED AND EXPECTED VALUES
Number of Target Number of Target
Documents Documents Totals
Retrieved Not Retrieved
CEO—ERIC 175 34
St ff 209
a 184 25
135 8
PrOfCSSional // 143
Educators 126 17
Totals 310 42 352
Cell Contributions to Chi-Square
.44 3.24
.64 4.78 X2: 9.10

Since the value of chi-square is greater than 6.64 Null Hypothesis 4
is rejected at the .01 level.

160
TABLE 5.6

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 5

Statement of Null Hypothesis 5

 

For indexing method 2 there is no

significant difference at the .01 level between the observed and ex—
pected number of relevant documents retrieved from questions written
by CEC-ERIC staff members versus questions written by professional

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

educators.
OBSERVED AND EXPECTED VALUES
Number of Target Number of Target
Documents Documents Totals
Retrieved Not Retrieved
CEC-ERIC 102 31
Staff 133
116 17
prOfeSSional 191 y 206
Educators 177 29
Totals 293 46 339
Cell Contributions to Chi-Square
1.69 11.54
1.11 6.76 X2 = 21.10

Since the value of chi-square is greater than 6.64 Null Hypothesis 5
is rejected at the .01 level.

161
TABLE 5.7

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 6
Statement of Null Hypothesis 6 For indexing method 3 there is no
significant difference at the .01 level between the observed and expected

number of relevant documents retrieved from questions written by CEC—ERIC
staff members versus questions written by professional educators.

OBSERVED AND EXPECTED VALUES

 

Number of Target Number of Target
Documents Documents Totals
Retrieved Not Retrieved

 

Staff 250 55

 

2
Professional 65 P,,a’ji”””””d 296
Educators 242 54

 

Totals 492 109 601

 

 

 

 

 

Cell Contributions to Chi-Square

 

L 2.15 9.62

 

 

 

 

2.18 9.62- i x2 = 23.75

 

Since the value of chi-square is greater than 6.64 Null Hypothesis 6
is rejected at the .01 level. '

162

were written, the specific questions which must be answered in attempting

to interpret the results are these:

1. Are there any observable differences in the search questions
written by the two groups?

2. Can these conditions be attributed to differences within the
groups or the conditions under which evaluation took place?

Groups WritingSearch Questions The two groups were selected

 

and trained so that the major difference between groups would be that

the CBC-ERIC staff was familiar with the ERIC Thesaurus and the

indexing procedures used at the InfOrmation Center, while the pro-

fessional educators were not familiar with the Thesaurus or the

procedures.

Conditions under Which the Evaluation Took Place When the

 

evaluation took place the following conditions existed:

1.

Neither group was aware of the computer indexing method that
would be used to make the documents computer searchable.

Both groups were given instructions to use each target doc-
ument (a title and abstract) to write one question that would
retrieVe that document or similar documents.

Both the groups were aware that methods could be used which
might generate indexing terms that would not be part of the
title or abstract used as the basis fer writing the search
question.

Both groups understood that the evaluation was based upon the

retrieval of documents which were similar to the descriptions

163

(titles and abstracts). They had no assurance that a Specific

target document would or would not be in each file searched.

Analysis of the Question Vocabulary

 

The reason for examining the question vocabulary of the two groups
was to determine if there was evidence to support the position that a
major factor affecting the results was the CEC-ERIC staff's knowledge
of the ERIC Thesaurus and the procedures used to index ECEA. Two ques-

tions were considered in this analysis. The first was "What proportion

 

of the terms used in the questions written by the CEC-ERIC staff and
professional educators were phrases (more than one word)?" The second
was "What proportion of the terms used in questions written by CEC-ERIC
staff and professional educators were found in the ERIC Thesaurus?” A
term was defined to be contained in the ERIC Thesaurus if it was
identical to a term in the Thesaurus or it was contained in a term in
the Thesaurus. For example, if an individual used MENTALLY HANDICAPPED
and the term EDUCABLE MENTALLY HANDICAPPED was in the ERIC Thesaurus,
the term MENTALLY HANDICAPPED would also be considered as contained in

the Thesaurus. The two null hypotheses implied by these questions are:

Null Hypothesis 7 There is no significant difference at the

 

.01 level between the observed and expected number of phrases used in

\

questions written by CEC-ERIC staff versus professional educators.

Null Hypothesis 8 There is no significant difference at the

 

.01 level between the observed and expected number of terms contained
in the ERIC Thesaurus which were used in questions written by CEC-ERIC

staff versus professional educators.

164
As indicated in Tables 5.8 and 5.9 both hypotheses were rejected

at the .01 level. An examination of the questions and the data used
in the calculation Of the two null hypotheses indicates that:
1. Both groups used words and phrases as question terms in the
search questions which they wrote.
2. The professional educators used a significantly higher
proportion of single-word terms than did the CEC-ERIC staff.
3. The CEC-ERIC staff used a significantly higher ratio of terms
contained in the ERIC Thesaurus. About 95% of the terms used
by the CEC-ERIC staff were contained in the Thesaurus

versus only 65% of those used by professional educators.

Analysis of the Indexing Vocabulary Used in Volume I of ECEA

The indexing terms assigned to describe abstracts contained in
Volume I of ECEA were selected from the ERIC Thesaurus. Because the

222 there exists

ERIC Thesaurus embraces many topical areas of education
a question as to how well its vocabulary reflects the literature in any

of the specific areas. The objective of this section is to examine the

Question, "To what extent do the terms selected from the ERIC Thesaurus

 

to index Volume I of ECEA reflect the vocabulapy used in the literature

 

ofpgpecial education?"

 

Two sources of vocabulary were used as a basis for examining the

indexing terms assigned from the ERIC Thesaurus to describe the

 

222James L. Eller and Robert L. Panek, ”Thesaurus Development for
A Decentralized Information Network," American Documentation, July,
1968, pp. 213-220.

 

165

TABLE 5.8

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 7

Statement of Null Hypothesis 7

 

There is no significant difference
at the .01 level between the observed and expected number of phrases used
in questions written by CEC-ERIC staff versus professional educators.

OBSERVED AND EXPECTED VALUES

 

 

 

 

 

 

 

 

 

 

 

Words--Terms Phrases-~Terms
Of Word of Word Length Totals
Length 1 2 or Greater
Questions by 184 273 8
Staff 205 252
Questions by 203 203
Professional 406
Educators 132 224
Totals 387 476 '863
Cell Contributions to Chi-Square
2.15 1.75
2.42 1.97 )62

Since the value of chi-square is greater than 6.64 Null Hypothesis 7

 

 

 

 

is rejected at the .01 level.

8.29

166
TABLE 5.9

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 8

Statement of Null Hypothesis 8 There is no significant difference
at the .01 level between the observed and expected number of terms con-
tained in the ERIC Thesaurus which were used in questions written by
CEC-ERIC staff versus professional educators.

 

OBSERVED AND EXPECTED VALUES

 

 

 

 

 

 

 

 

 

 

 

Question Terms Question Terms
Found in ERIC Not Found in Totals
Thesaurus ERIC Thesaurus
22
Staff 371 86
266 140 .
Professional 406
Educators 330 76
Totals 701 162 863
11.0 53.5
12.4 62.0 X2 = 138.9

 

 

 

 

Since the value of chi-square is greater than 6.64 Null Hypothesis 8
is rejected at the .01 level.

167

abstracts in Volume I of ECEA. These sources were the vocabulary

found in the collective titles of the abstracts in Volume I of ECEA

and the terms contained in a thesaurus prepared by Samuel Price. This
Thesaurus defined the words and language of special education by con-
ducting a five-year retrospective search of the professional literature

in that field.223

Notation
Part of the following notation has been defined in previous
chapters, but is repeated here for the convenience of the reader.

Content = A word which is not found in an exclusion list con-

Word taining articles, prepositions, conjunctions, and other
words which have been subjectively determined not to have
content.

A = The set of all content words found in one or more titles
of the abstracts in Volume I of ECEA.
B = The set Of all content words found in one or more terms
of the Samuel Price Thesaurus.
C = The set of all content words found in one or more of the
ERIC descriptors used to index abstracts in Volume I of
ECEA.
ALJB = The union of sets A and B.
AF)B = The intersection Of sets A and B.
n(S) = Number of elements contained in the set S.

 

223Samuel T. Price, "The Development of a Thesaurus of Descriptors
for an Information Retrieval System in Special Education,” (unpublished
doctoral dissertation, University of Pittsburgh, 1969), abstract.

168
R5 = The set of all roots of the words which belong to the

set S.
6 = ”is an element of,” or "is a member of."
'= "such that".

H

In addition to the above notation, AB will stand for a special set

Symbols used to set apart (embrace) a set.

such that for a word to belong to this set it first must be found in
either A or B and second must have a root which is found in those roots
common to both A and B. Where w stands for any word the set AB may be

more precisely defined as follows:
AB = {wzwe (AUB) and we (RAF)RB)}

Results of Vocabulapy Comparisons of Three Word Lists

 

Because the Thesaurus developed by Samuel Price was based on a
retroactive search of five years' literature in special education,224
it was used as a criterion vocabulary. The comparisons examined the
proportion of words in two lists which had roots that were the same as
words contained in Samuel Price's Thesaurus.

The Specific problem was to find a basis for evaluating the results
of comparisons of the words in the ERIC descriptors used to index Volume
I of ECEA with words in the Samuel Price Thesaurus. To accomplish this

a second list of words extracted from the literature of special educa-

tion was used to estimate what proportion of words from two lists

 

extracted from the literature might be expected to have similar roots.

 

 

224Ibid.

 

169

A set of words available and appropriate for this comparison were the
words found in the titles in Volume I of ECEA.
In making the comparison and analyzing the results a chi-square

test was used to examine the following null hypothesis:

Null Hypothesis 9 There is no significant difference at the

 

.01 level between the Observed and expected number of words in set A
with roots in set RB versus the number of words in set C with roots in
set R8.

The analysis Of null hypothesis 9 found in Table 5.10 resulted in
its rejection thus indicating that there was a significant difference in
the number of words in set A having roots in the Samuel Price Thesaurus
versus words in set B having roots in the Samuel Price Thesaurus. An
examination of the observed and expected values used in calculating the
total for chi-square indicated that there were more words from list C
(words found in the terms selected from the ERIC Thesaurus to be used in
indexing Volume I of ECEA) having roots in the Samuel Price Thesaurus
than were expected. Other analyses and comparisons of the vocabulary

of word lists A, B, and C provided the following results:

 

 

Result 1 n(A) = 2411

Result 2 n(RA) = 1761

Result 3 n(RA) = .73
n(A)

Result 4 n(B) = 1660

Result 5 n(RB) = 1426

Result 6 n(RB) = .87
n(B)

Result 7 n(C) = 1750

 

170
TABLE 5.10

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 9
Statement of Null Hypothesis 9 There is no significant difference

at the .01 level between the observed and expected number of words in

set A with roots in set RB versus the number of words in set C with
roots in set RB.

OBSERVED AND EXPECTED VALUES

 

 

 

 

. Words With
Words With Roots Not Totals
Roots in Set RB In Set RB
1111 1300

Words w”””/””””d"’/””””’f”d 2411
1” Set A 1158 1253

888 862
Words 1750
1“ Set C 841 909
Totals 1999 2162 4161

 

 

 

 

 

Cell Contributions to Chi-Square

[ 1.90 I 1.76 1
l 2.62 l 2.43 J x2 = 8.71

Since the value of chi-square is greater than 6.64 Null Hypothesis 9
is rejected at the .01 level.

 

171

Result 8 n(RC) = 1404
n C

Result 9 n(RC) = 80
n C) '

Result 10 n(AﬂB) = 715 212

Result 11 n(RAF)RB) = 670 _

 

_ .____ - .262
n(RALJRB) 2517
Result 12 n(AB) = 1247 = 372
n(ALJB) 3365 .
Result 13 n(AOC) - 33.7.9 = .326

n(AUC) ‘ 3185

Result 14 n(RAF)RC) = 869 _

n' (RAURC) 2296 ‘ “379

Result 15 n(AC) _ 1613 _ .506

 

 

n(ALJC) 3185

Result 16 n(BFIC) = 646 = .234

n(AU' C) 2764

Result 17 n(RBF)RC) = 623

n(RBU'RC) 2207 = ‘282

 

Result 18 n(BC) = 1992.: 365
n(BLJCi 2764 '

A Subjective Analysis of Terms Selected
From the ERIC Thesaurus to Index Volume I of ECEA

Individuals doing computer and hand searches on the information
contained in Volume I of ECEA became aware of a lack of precision
resulting from the ambiguous assignment of similar indexing terms.
Some of the indexers suggested that one reason for this ambiguity was
that often the ERIC Thesaurus contained several terms which could be
used for indexing the same concept. As a result of these observations,
a decision was made to have the indexers subjectively examine the

indexing terms assigned to Volume I of ECEA. The result of this examin-

172

ation was a subset of terms used in Volume I which is now serving as a
basis for indexing successive volumes. The major objective of this
section is to examine the vocabulary of the subset of terms which
resulted from the indeximg staff's subjective evaluation of the terms
used in indexing Volume I of ECEA.

In the subjective analysis of the ERIC descriptors used in Volume

 

I of ECEA, indexers compared each term with other terms which they felt ET“
had similar meanings. When the distinction between the concepts sug-

gested by similar terms was unclear, the indexers examined abstracts to

which the terms had been assigned. If it was still not apparent why one

term should have preference in indexing a particular concept, the term i-p

retained was the one a professional staff member judged to be most
common to the literature of special education. Occasionally when no
decision could be made on this basis, it was decided to retain the term

which had been used most frequently in indexing the documents of

Volume I.

Results of Subjective Evaluation
of the ERIC Descriptors Used in Volume I of ECEA

This subjective evaluation resulted in removing 1126 descriptors
and leaving 1318 descriptors of the original list of 2444 descriptors.
The original list of descriptors had 1750 words containing 1440 roots.
The remaining list had 1061 unique words with 862 roots.

An analysis of the terms assigned by the indexers to index the
abstracts in Volume I of ECEA is found in Table 5.11. This analysis
shows the number of terms used once, twice, three times, etc. which
were (a) left after the indexers examined the terms, (b) removed when

the indexers examined the terms, and (c) the number of identifiers

173
TABLE 5.11
RESULTS OF INDEXERS' SUBJECTIVE ANALYSIS
OF TERMS USED TO INDEX VOLUME I OF ECEA

*Ratio = Number Of ERIC descriptors retained divided by the
number of ERIC descriptors assigned.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Number of Number of Number of
Times a Term ERIC ERIC
Was Assigned Number of Descriptors Descriptors
to Volume 1 Identifiers Retained Removed Total *Ratio
1 403 218 463 1084 .32
2 56 139 184 379 .43
3 24 125 137 286 .48
4 9 76 75 160 .504
5 8 69 47 124 .59
6 5 51 36 92 .53
7 3 49 28 80 .64
8 4 41 19 64 .68
9 l 34 22 57 .60
10 1 38 12 51 .76
ll 2 29 10 41 .74
12 3 25 9 37 .74
13 1 25 7 33 .78
14 21 8 29 .72
15 26 9 35 .81
16 20 7 27 .74
l7 l4 3 17 .82
18 16 '7? 18 .89
19 14 5 19 .74
20 13 2 15 .87
21 3 ll 6 20 .64
22 7 3 10 .70
23 8 2 10 .80
24 ll 2 13 .85
25 2 2 1.0
26 13 1 13 .93
27 4 l 5 .80
28 ll 3 14 .79
29 12 l 13 .92
30 6 2 8 .75
31 4 l 5 .80
32 ll 11 1.0
33 4 4 1.0
34 10 1 ll .91
35 8 2 10 .80
36 5 l 6 .83
37 3 2 5 .60
38 4 l 5 .80
39 3 3 1.0
40 3 l 4 .75

 

174
TABLE 5.11 (cont'd)

*Ratio - Number of ERIC descriptors retained divided by the
number of ERIC descriptors assigned.

Number of Number of Number of
Times a Term ERIC ERIC
Was Assigned Number of Descriptors Descriptors

to Volume 1 Identifiers Retained Removed Total *Ratio

 

41

42

43

44

45

46

47

48

49

50

51

52

53

54

SS

56

S7

S8

59

60
61-70
71-80 1
81-90
91-100

H

 

U'I

 

 

 

 

HHHI—i

 

(A

 

 

 

 

 

 

CO

 

 

 

MNNNMNhMO‘hNMMWU’I

HHH

 

r—ar—u—a
OI ooouwooooooooooouo

 

 

DIN
H

 

H
l O\
\J

 

H
O

 

H
D—It—I

 

 

 

101-110
111-120
121-130

 

131-140
141-150
151-160

 

 

161-170

 

asaaaaasamaa
(DON

7171-180

HMHNNNO‘MVVQ‘OHN

 

181-190

HHHHHHHHI—IH

 

191-200

 

201-300
301-400

 

4014700

 

HONOOHU'IHNNNO‘MVVV‘J‘ONONNOMNNHANAMUIhNMMO‘U‘D
I

NO‘
HH
010010

701-710 1

H

 

Totals 523 1318 1126 2967 .534

 

 

17S

assigned. Identifiers were special indexing terms which were not in
the ERIC Thesaurus but assigned because of special meaning to a parti-
cular document. For an example, an identifier might be the name of a
test or institution.

The results of this analysis indicated that if a term had been
used one, two, or three times, the probability of it being removed in
the subjective evaluation by the indexers was greater than the chance

of its remaining. All terms which had been used four or more times

 

had a greater probability of remaining than of being removed.

The results shown in Table 5.11 have some Similarity to Luhn's

 

suggestion that terms used very frequently and terms used very _3
infrequently have little indexing value. In this case the indexers

subjective judgment concerning the value of indexing terms supports

the part of Luhn's suggestion that terms used infrequently have little

indexing value.225

The Effect of Indexing Procedure Changes in Volume II of ECEA

Two major changes were made during the indexing of Volume II of
ECEA. In Volume I the indexers had been told to assign a descriptor to
describe an abstract even if an abstract contained very little informa-
tion related to the descriptor. This resulted in individuals retriev-
ing abstracts which had very little information related to their
question. As a result of this observation, the decision was made that

in future volumes a descriptor would not be assigned to an abstract

 

225H. P. Luhn, "The Automatic Creation of Literature Abstracts,"
IBM Journal of Research and Develgpment, II (1958), 159-165.

176

unless the document contained considerable information related to the
concept described by the indexing term.
A second change initiated part way through Volume II was the use of

226 developed as a result Of the indexers' subjective

an authority list
evaluation of the ERIC descriptors used in Volume I. The reason the
list was not used sooner was because the subjective evaluation took

place during the time that the first two issues Of Volume II were being

indexed.

Results of Indexing Procedure Changes

 

In an initial attempt to evaluate the effect of the change in

 

indexing procedures a set of 20 questions was used in searching both
Volume I and II with separate values of Average Microprecision and
Average Macroprecision being calculated for each volume.

The questions were written by one of the staff members to retrieve
documents for use in a bibliography series. For each question the staff
member determined which of the documents retrieved had considerable
information about the question and which did not. The documents re-
tained then served as a basis for developing special bibliographies
about that topical area. At the time that the evaluation was made the
staff member was unaware that the results would be used in the
evaluation.

Retrieval results for Volume I resulted in Average Microprecision
of .54 and Average Macroprecision of .60. For Volume II the Average

Microprecision was .60 and the Average Macroprecision .68. A chi-square

 

226Thesaurus for Exception Child Education, (Arlington, Virginia:
CEC Information Center on Exceptional Children, 1970), pp. 1-10.

 

177

test of statistical significance was used to examine the following null

hypothesis:

Null Hypothesis 10 There is no significant difference between

 

the Observed and expected number of relevant documents retrieved by 20
search questions from Volume I versus the number retrieved by the same
questions from Volume II. :-
The data and calculations used in examining this hypothesis are
found in Table 5.12. As may be observed, the hypothesis was rejected at
the .01 level. An examination of the data reveals that more relevant

documents than expected were retrieved from Volume 11. i

 

The data resulting from each of the 20 questions is found in Table
5.13. As revealed by the data in this table the precision was greater
for documents retrieved from Volume 11 than from Volume I for every
question except Number 7 and in this question the precision was almost
equa1-—.97 for Volume I and .966 for Volume II., In other words in 19
out of 20 questions there was a greater precision for documents re—
trieved from Volume 11 than from Volume I. The probability of this

happening by accident is less than one in fifty thousand.
Summary

An examination of the data indicated the fOllowing results:

1. CEC-ERIC staff had significantly better retrieval of target
documents than professional educators when the indexing method
used terms from titles and ERIC descriptors.

2. The professional educators had significantly better retrieval
of target documents when the indexing method used terms from

titles and abstracts.

DATA AND CALCULATIONS USED IN TESTING NULL HYPOTHESIS 10

Statement of Null Hypothesis 10

 

the same questions from Volume II.

178
TABLE 5.12

There is no significant difference
between the observed and expected number of relevant documents retrieved
by twenty search questions from Volume I versus the number retrieved by

OBSERVED AND EXPECTED VALUES

 

 

 

 

 

 

 

 

 

 

 

 

 

Relevant m Not Relevant Total
1449 1226
Volume I 2675
1546 1129
2243 1469
Volume II F””/,,,a’,,,/”TV/HI”’,,”,,,I”’ 3712
2146 1566
Totals 3692 2695 6387
_ Cell Contributions to Chi-Square
[ 6.08 I 8.34
l 6.00 L 4.38 X2 =

24.80

Since the value of chi-square is greater than 6.64 Null Hypothesis 10

is rejected at the .01 level.

179
TABLE 5.13

SEARCH RESULTS OF TWENTY QUESTIONS
USED ON VOLUME I AND VOLUME 11

 

 

OF ECEA
RESULTS FOR VOLUME I RESULTS FOR VOLUME 11
Question Documents Documents Documents Documents Preci-
No. Retrieved Relevant Precision Retrieved Relevant sion
1 10 9 .9 l8 17 .94
2 171 122 .71 301 240 .798
3 13 12 .92 13 13 1.
4 14 20 .83 29 27 .93
5 189 92 .487 300 179 .596
6 256 112 .438 287 144 .502
7 108 105 .97 115 111 .966
8 144 48 .333 264 108 .41
9 154 91 .592 155 97 .623
10 126 61 .484 145 81 .56
11 186 40 .216 219 59 .27
12 232 90 .388 388 174 .516
13 37 9 .244 55 34 .63
14 88 55 .625 115 87 .756
15 191 111 .582 278 199 .718
16 129 84 .65 169 122 .722
17 247 133 .538 360 140 .388
18 114 100 .876 219 194 .885
19 154 76 .494 191 104 .545
20 102 79 .774 141 113 .801

 

Totals 2675 1449 12.051 3712 2243 13.556

 

180

3. There was no significant difference in the retrieval of target
documents when the indexing method used terms from titles,
abstracts and ERIC descriptors.

4. Professional educators had significantly better Average Micro-
precision for all three indexing methods than did the CEC—ERIC
staff.

5. Professional educators used significantly more single word
terms in questions than did CBC-ERIC staff.

6. CEC-ERIC staff used significantly more terms found in the ERIC

Thesaurus than did the professional educators.

 

7. The subjective analysis by the CEC-ERIC indexers of the ERIC E
descriptors used in indexing Volume I of ECEA resulted in
a. The reduction of 2440 ERIC descriptors having 1750 words
and 1440 roots to 1313 ERIC descriptors having 1061 words
and 862 roots.
b. More terms which were used 1, 2, and 3 times in Volume I
were removed from the list than retained.
c. If a term had been used to index 4 or more documents in
Volume I it was more likely that it be retained than
removed.
8. The changes in indexing procedures between Volume I and Volume
II resulted in Significantly greater precision in Volume II.
The following chapter--Summary, Conclusions, Recommendations, and
Implications--ana1yzes the procedures and results of this study, states
possible conclusions implied by the results, makes specific recommenda-
tions concerning possible changes in the Information Center and examines

the implications of this study that may pertain to future studies.

CHAPTER VI: SUMMARY, CONCLUSIONS, RECOMMENDATIONS,
AND IMPLICATIONS

As indicated by the title, this chapter is divided into four major
sections. The £1153 summarizes the procedures and results of this study,
the second discusses the conclusions implied by the results of testing
ten null hypotheses, the 22139 makes specific recommendations concerning
possible changes in operations at the Information Center, and the fourth
examines the implications of this study for future studies and for

improving communication in the field of education.
Summary

The increased publishing in the area of education has made it impor-
tant to find better ways to store, organize, and disseminate this infor-
mation. In the area of special education the Council for Exceptional
Children has COOperated with the ERIC system in developing the CEC-ERIC

Information Center.

Procedures Used at the CBC-ERIC Center

 

The initiation of computer processing at the Center was made pos-
sible by the use of the Basic Indexing and Retrieval System (BIRS) and

resulted in an automated abstract journal, Exceptional Child Education

 

Abstracts. Abstracts stored on computerized information files serve as
the basis for computer-controlled typesetting for ECEA, computer-gene-

rated indexes for ECEA, computerized searching to aid in answering user

181

182
requests, and the development of special annotated bibliographies. The

bibliographies are also printed using computer controlled typsetting.

A Comparison of Three Indexinngethods

 

The effectiveness of three indexing methods was compared for two
groups of individuals. The first group was ten CEC-ERIC staff members
familiar with the Center's indexing procedures while the second group
was seven professional educators not familiar with the Center's indexing
procedures. The three indexing methods used in the comparisons were
the following:

Indexing Method 1 (the method normally used at the Center) used

ERIC descriptors assigned by indexers and computer-extracted
terms from the titles of the documents.

Indexing Method 2 used computer-extracted terms from titles and

abstracts of documents.

Indexing Method 3 combined Methods 1 and 2 using computer-extracted

terms from titles and abstracts as well as ERIC descriptors.

One hundred five target documents were randomly selected from the
file of 2100 abstracts used for Volume I of ECEA and a set of 105
questions (one question to retrieve each target document) was written by
each group. Training and technical assistance was provided so that each
group demonstrated similar technical competence in writing computer
search questions.

The results indicated that the CEC-ERIC staff retrieved Signifi-
cantly more target documents than professional educators when Indexing
Method 1 was used. This result was reversed for Method 2, while in
Method 3 there was no significant difference in the number of target doc-

uments retrieved by either group. The professional educators retrieved

183
191 relevant documents including 81 target documents by Method 2 while

the CEC—ERIC staff retrieved only 175 relevant documents including 77
target documents by Method 1 (the method used at the center). The pro-
fessional educators had significantly better Average Microprecision for
all three methods than the CEC-ERIC staff and retrieved more relevant
documents (target and non-target documents) by Method 2 and 3 than did

the CBC-ERIC staff. This data tends to suggest that the need for care-

 

fully controlled indexing_languages is minimized in the field of educa-

 

 

tion when sophisticated computer searching algorithms are available.

Comparison of Vocabulary of Three Word Lists

 

The words in a thesaurus developed by Samuel Price through a retro-
spective analysis of five years' special education literature227 were
used as a criterion in comparison with: (1) words extracted from the
titles of the 2100 abstracts in Volume I of ECEA and (2) words in the
ERIC descriptors used to index these abstracts. The results indicated
that the words in the ERIC descriptors had as much or more in common

with the vocabulary used by Samuel Price as did the words in the

collective titles.

Changes in Indexing Procedures

 

A subjective analysis by indexers of the ERIC descriptors assigned
to Volume I of ECEA resulted in a reduced list of descriptors for use

in indexing subsequent volumes of ECEA. It also resulted in a change

 

227Samuel Price, Thesaurus of Descriptors for an Information
Retrieval System in the Subject Matter Area of Special Education, (Normal
IllinOis: Illinois State university, Special Education Instructional
Materials Laboratory, 1970), pp. 1-465.

 

 

184

of indexing procedures which required an abstract to have more informa-
tion about a subject before an ERIC descriptor would be assigned. A

preliminary evaluation indicated that the change in indexing procedures
between Volume I and Volume II resulted in increased precision for com-

puterized search questions.

Conclusions

In evaluating the indexing methods used at the CEC—ERIC Information
Center and analyzing the vocabulary of the ERIC descriptors assigned to
abstracts of Volume I of ECEA ten null hypotheses were examined. In each
case a chi-square test of statistical significance with one degree of
freedom was used to determine if the null hypothesis should be accepted
or rejected at the .01 level. Under these conditions any value of chi-
square greater than 6.64 would cause the null hypothesis to be rejected.

The first eight null hypotheses were used to examine retrieval
effectiveness of three indexing methods for CBC-ERIC staff versus
professional educators. The ninth null hypothesis was used to examine
if words in the ERIC descriptors used to index Volume I of ECEA were
typical of those in the literature of special education. The tenth was
used in to examine if changes in indexing procedures between Volume I

and Volume II had affected the Average Microprecision.

Results of Testimg_Null Hypothesis 1

 

The value of chi-square resulting from the data used to test Null
Hypothesis 1 was 8.25, thus the hypothesis was rejected at the .01 level.
The rejection of the hypothesis and the examination of the data resulted

in the conclusion that the CEC-ERIC Staff retrieved Significantly more

185
target documents when Indexing Method 1 was used than did the profes-

sional educators.

Results of Testing Null Hypothesis 2

 

The value of chi-square resulting from the data used to test Null
Hypothesis 2 was 36.0, thus the hypothesis was rejected at the .01 level.
The rejection of the null hypothesis and the examination of the data
resulted in the conclusion that the professional educators retrieved
significantly more target documents when Indexing Method 2 was used than

did the CEC-ERIC Staff.

Results of Testing Null Hypothesis 3

 

The value of chi-square resulting from the data used to test Null
Hypothesis 3 was .0085, thus the hypothesis was accepted. The conclu-
sion resulting from this acceptance was that there was no significant
difference in the number of target documents retrieved by professional

educators versus CBC-ERIC staff when Indexing Method 3 was used.

Results of Testing Null Hypotheses 4, 5, and 6

 

The values of chi-square for Null Hypotheses 4, 5, and 6 respec-
tively were 9.10, 21.10 and 23.75, thus these hypotheses were all
rejected at the .01 level. The rejection of these hypotheses and the
examination of the data resulted in the conclusion that for all three
indexing methods the professional educators had significantly greater

Average Microprecision than did the CEC-ERIC staff.

Results of Testing_Null Hypothesis 7

 

The value of chi-square resulting from the data used to test Null

186
Hypothesis 7 was 8.29, thus the hypothesis was rejected at the .01 level.

The rejection of the hypothesis and the examination of the data resulted
in the conclusion that the professional educators used a significantly
greater proportion of single-word terms in their questions than did the
CBC-ERIC staff. Another way of stating this conclusion would be that
the CBC-ERIC Staff used a significantly greater proportion of phrases
(terms of two or more words) in their questions than did the professional
educators.

A possible explanation for the results and conclusions of Null
Hypothesis 7 was that this difference in the question vocabulary related
to the CBC-ERIC staff's knowledge and use of the ERIC Thesaurus where

most terms are multiple—word descriptors.

Results of Testing Null Hypothesis 8

 

The value of chi-square resulting from the data used to test Null
Hypothesis 8 was 138.9, thus the hypothesis was rejected at the .01
level. The rejection of the hypothesis and the examination of the data
resulted in the conclusion that the CEC-ERIC staff used more terms
fOund in the ERIC Thesaurus than did the professional educators. The
results and conclusions related to this null hypothesis were inter-
preted to indicate that a major difference between the two groups was

their knowledge of the ERIC Thesaurus.

Results of Testing Null Hypothesis 9

 

The value of chi-square resulting from the data used to test Null
Hypothesis 9 was 8.71, thus the hypothesis was rejected at the .01 level.

The rejection of the hypothesis and the examination of the data resulted

187
in the conclusion that the vocabulary (words) found in the ERIC descrip-

tors used to index Volume I of ECEA had significantly greater similarity
to the vocabulary of the Samuel Price Thesaurus228 than did the vocabu-

lary found in the titles of the abstracts contained in Volume I of ECEA.
This was interpreted to mean that the vocabulary of the ERIC descriptors
used in indexing Volume I of ECEA was representative of the field of

special education.

Results of Testing Null Hypothesis 10

 

The value of chi-square resulting from the data used to test Null
Hypothesis 10 was 24.80, thus the hypothesis was rejected at the .01
level. The rejection of the hypothesis and the examination of the data
resulted in the conclusion that in this comparison the Average MicrOpre-
cision fer Volume II was significantly greater than the Average Micro-
precision fOr Volume I. This was interpreted to mean that the change in
indexing procedures between Volume I and Volume II had resulted in an
increase in Average Microprecision.

In addition to the data used in testing Null Hypothesis 10 the pre-
cision of each question for Volume I and Volume II was compared. In 19
out of 20 questions the precision was higher in Volume 11 than in Volume-
I. In the single instance where this was reversed the precision was ex-
tremely high for both volumes (.97 for Volume I and .966 for Volume II).
The chance that the precision should be higher in one volume than the

other in 19 out of 20 cases is less than one in fifty thousand.

 

2231bid.

 

188

Interpretation of the Results of the
COmparison ofTThree Indexing Methods

The question vocabulary used by the two groups and the apparent
advantage of the CEC-ERIC staff in retrieving target documents when
Indexing Method 1 was used would tend to support the assertion that the
major difference between the two groups was their knowledge of the CEC
indexing procedures. Of particular interest to the interpretation of
the results is the fact that this apparent advantage for Method I
gained by a knowledge of the CEC indexing procedures appeared to be a
disadvantage when other methods were used.

With Indexing Method 2 which used terms from titles and abstracts
the professional educators did significantly better in retrieving
target documents. Not only did they do better, but surprisingly they
retrieved more target documents using Method 2 (81) than the CEC~ERIC
staff retrieved using Method 1 (77)--the method to which they were
accustomed. Still more remarkable is the fact that the professional
educators had significantly better Average Microprecision for all
three indexing methods than did the CEC-ERIC staff.

The comparisons between Methods 1 and 2 raise the question, "HHat_

is the advantage in using ERIC descriptors when those trained to use

 

them retrieved fewer relevant documents by IndexingiMethod 1 (using

terms from titles and ERIC descriptors) than didiprofessional educators

 

with computer extracted indeximgiterms from titles and abstracts?" This

 

question becomes even more germane when it is remembered that profes-
sional educators also had greater Average Microprecision fer both

methods. More specifically it would tend to suggest that the use of
controlled indexing vocabularies needs to be reconsidered in light of

the computer searching methods now available.

189

To examine this question in perspective it is necessary to consider
the results from Indexing Method 3. In this method there was not a
significant difference in the number of target documents retrieved by
either group. However, the CEC-ERIC staff retrieved 8 additional target
documents as compared to the method to which they were accustomed
(Method 1) while the professional educators retrieved only three addi-
tional target documents as compared to Method 2. This in itself might
not be a sufficient reason to consider using the combination method-—the
method which used terms from titles, abstracts, and ERIC descriptors,
but when the total number of relevant documents retrieved (both target
and non-target documents) is considered, the advantage becomes more
apparent. The CEC-ERIC staff was able to retrieve 227 relevant documents
by this method as compared to 175 by Method 2, an increase of 52 relevant
documents. Professional educators were able to retrieve 269 documents by
this method as compared to 191 by Method 2, an increase of 72 documents.

As expected from other studies the recall obtained by professional
educators showed an inverse relationship to Average Microprecision.229
In other words as the number of relevant documents retrieved increased
from 135 to 191 to 265, the Average Microprecision decreased from .945
to .927 to .895.

Before assuming that the combination method (Method 3) should be

adopted by the Information Center it is important to consider that the

 

229E. W. Lancaster and J. Mills, "Testing Indexes and Index
Language Devices: The ASLIB Cranfield Project," American Documentation,
January, 1964, p. 9; and G. Salton, E. M. Keen, and M. Lesk, "Design
Experiments in Automatic Information Retrieval," The Growth of
Knowled e, Manfred Kochen, editor, (New York: John Wiley 8 Sons, Inc.,
19675. PP. 344-346.

 

 

190

cost Of the computer indexing and the computer searching is about fifty
per cent more than with Method 1.

There remains the disturbing problem of attempting to explain why
a knowledge Of the CEC indexing procedures which use the ERIC Thesaurus
would result in less Average Microprecision for all methods and the
retrieval of fewer relevant documents in every method except the one
specifically used in the Information Center. There appear to be no
certain answers, nevertheless it is worth noting that those not having
the knowledge of the ERIC Thesaurus used significantly more single word
terms in their questions and significantly less terms found in the ERIC
Thesaurus than the CEC-ERIC staff.

It should be noted that the results might have been significantly
different if it had not been for the sophisticated searching algorithms
of the BIRS system which tend to assist individuals having knowledge of
the content area but little knowledge of computer searching techniques.
While difficult if not impossible to prove, it would appear that the
capability of these algorithms to compare a word or portion of a phrase
to a total phrase and to reduce words to their root form may have
played an important role. The implications of the term reduction algo-
rithms will be further seen in a later section where the results of the
vocabulary analysis of three word lists related to Special education

are examined.

Reflections on Methodology Used in Comparing Indexing Methods

Swanson raised a question about using target documents as a basis
for writing search questions, indicating that there may be some type of
relationship between the target document and the question that would not

exist in an ordinary information request. The Cranfield project

191

searches (about which this criticism was first made) were made to get
measures of recall for use in evaluating the effectiveness of four
different indexing methods.23O While Cleverdon conceded that there
might be unnatural relationships between the questions and the target
document, he did not concede that this relationship was sufficient to
rule out using this method in all other studies.231

In this study the Objective was not only to compare different
indexing methods, but to analyze the affect of these methods on two
different groups of individuals. The only way to effectively do this
was to provide each group of individuals with identical descriptions of
target documents. In this manner both groups perceived that they were
looking for the same information.

The implication of these questions could be that the results from

the indexing methods using titles and abstracts were positively

affected because people were presented with titles and abstracts.

Re5ponse to Questions It seems reasonable that the manner in

 

which information is presented will influence the type of questions
written and that the type of questions written will influence the
search results for different indexing methods.

To make the comparison required fer this study some method of pre-
senting information had to be chosen. The method used for giving
information to both groups was chosen because it was felt that it was a

natural way of presenting information (information about documents is

 

230Don R. Swanson, "The Evidence Underlying the Cranfield Results,"
The Librarnguarterly, January, 1965, p. 1-20.

231Cyril Cleverdon, "The Cranfield Hypotheses," The Library
rterl , A ril, 1965, pp. 121-124.
P

 

 

192

commonly communicated through titles and abstracts) and because this is

the way information is presented to users of ECEA.

Response to Implications If this procedure did make one method

 

look better, it apparently did so only for those who had not previously
used ERIC descriptors. While these questions provide for interesting
speculation, it was not the Objective of this study to consider them.

A more pragmatic implication from the above two questions would be:
If being presented with titles and abstracts did affect the results in
a positive manner and since ECEA users are normally presented with
titles and abstracts (ECEA is an abstract journal), then why not use an
indexing method which extracts terms from the titles and abstracts--the

method that is most congruent with the format of ECEA.

Interpretation of the Vocabulary_Comparisons

 

The analysis of Hypothesis 9 resulted in its rejection thus
indicating that there was a significant difference in the number of
words in set A having roots in the Samuel Price Thesaurus versus words
in set B having roots in the Samuel Price Thesaurus. An examination
of the observed and expected values used in calculating the total for
chi-square indicated that there were more words from list C (words
found in the terms selected from the ERIC Thesaurus to be used in
indexing Volume I of ECEA) having roots in the Samuel Price Thesaurus
than were expected. Because the words found in the titles of the
abstracts were directly extracted from the literature of special
education, it is assumed that words from a list which have as much or
more in common with the Samuel Price Thesaurus as these are representa-

tive of the vocabulary used in special education.

pro

193

An examination of results 1 through 18 found on pages 169 and 171

provides the basis for at least three interesting observations and

possible interpretations. Included in these are:

1.

The number of different words having the same root was less in
the controlled vocabularies than in the free vocabularies.
Specifically there were 87 percent as many roots as words in
the Samuel Price Thesaurus; 80 percent as many in the ERIC
terms used fOr indexing Volume I of ECEA; and 73 percent as
many in the titles of Volume I. A possible interpretation of
this may be that when a thesaurus is developed by a single
individual there is more consistency in the word forms used
than when a thesaurus is developed by a group, and that still
less consistency results when words come from many authors.
Results 10, 11, and 12 for the intersection and unions of A
and B; results l3, l4, and 15 for the intersections A and C;
and the results l6, l7, and 18 for the intersections of B and
C illustrate the importance of reducing words to roots to
assist in information retrieval. As is noted in each set of
three comparisons, the ratio of the number of elements in the
intersection divided by the number of elements in the unions
increase as the reduction of words to roots plays a more
important role.

The largest ratios involving the comparison of sets A to B, A
to C, and B to C resulted when the words of the titles of
Volume I were compared to the words fOund in the ERIC descrip-
tors used to index Volume I of ECEA (Sets A to C). The most

likely explanation for this would be the fact that sets A and

194

B describe the same set Of documents, whereas the words in the
Thesaurus of Samuel Price were extracted from a different set

of documents.

Interpretation of the Effect of Changing Indexing Procedures

 

The rejection of Null Hypothesis 10 and the data found in Table
5.13 leave little doubt that changes in indexing procedures between
Volume I and Volume II resulted in an increase in precision for searches
done on Volume 11.

When the preliminary examination of the effect of a change in
indexing procedures was done there was no practical way to determine or
estimate recall. However, data was examined to determine if the
expected number of relevant documents retrieved by the 20 questions was
proportional to the number of abstracts in Volume I and Volume II. In
Volume I, 1449 relevant documents were retrieved from a file of 2100
abstracts. For this proportion to be maintained in Volume II which
contained 3615 abstracts, about 2500 relevant documents would need to
be retrieved. Approximately 90% or 2143 relevant documents were actu-
ally retrieved. If the number of relevant documents compared to the
file size were used to estimate changes in recall this data would sug-
gest that whatever the recall in Volume I it would be only 90% of that
figure for Volume II. If the assumptions used in these calculations
are accepted, it could be concluded that an increase in precision did,
in this case, result in an apparent decrease in recall.

A second alternative that should be considered is that the content
of Volume I and Volume II is not similar. It should be noted that there
were more historical documents in Volume I, thus the acquisition

policies were not identical. On the other hand, both the historical

195
and recent documents Obtained for Volume I were acquired under the same

philosophy and acquisition criteria as documents acquired for Volume II.
There is no simple way to determine whether the number of relevant
documents retrieved in Volume II represents a drop in recall or a dif-
ference in file content between Volume I and Volume 11. However, based
upon results of other research and knowledge of the file content, the
author feels that the most probable interpretation is that there was a

reduction in recall.

Recommendations

Two specific types of recommendations will be made. The first
directly relates to the data and results of this study while the second

relates to observations made during the study.

Data Related to Recommendation 1

 

The results of changing the indexing procedures between Volume I
and Volume II improved the precision of search results with some appar-
ent loss of recall. When the size of information files increases this
is usually a desirable result. However, in some instances, it may be
important to sacrifice precision fOr recall, such as when a search is
being made on a topic about which the file has little information. It
is possible by using the facilities of BIRS to have improved precision
when desired and in other cases to sacrifice precision fer improved
recall. Recommendation 1 relates to procedures which could be imple-
mented to meet this objective without major changes in the indexing

processes or cost.

196

Recommendation 1
It is recommended that the CEC-ERIC Information Center give consid-
eration to assigning ERIC descriptors to three separate fields, employ-

ing three levels of indexing. In the first field or level at most two

 

ERIC descriptors would indicate what the document is primarily about.

In the second field or level ERIC descriptors would indicate topics

 

about which the document has considerable information. In the third

field or level ERIC descriptors would indicate topics about which the

 

document contained only marginal amounts of information.

Depending upon the desired retrieval results any or all of the
above levels could be searched. If high precision were important, no
more than the first two indexing levels would be used. If recall were
the most important factor all three levels would be used. It might also

be desirable to include in ECEA first, second, and third level indexes.

Data Related to Recommendation 2

 

The data indicated that additional terms extracted from abstracts
improved the recall of both the CEC-ERIC staff and the professional edu-
cators. It also indicated that professional educators using terms from
titles and abstracts were able to retrieve documents with greater Aver-
age MicrOprecision and recall than the CEC-ERIC staff did by using the
method to which they were accustomed. Because the BIRS system makes it
possible to search on any combination of fields which have been indexed
and included on the description file, the addition of terms from ab-

stracts, would not have to change any of the techniques presently used.

197

Recommendation 2

 

Because of the apparent advantages for both the CEC-ERIC staff and
professional educators, it is recommended that consideration be given
to including terms extracted from the abstracts as part of the descrip-

tion file.

Observations Related to Recommendation 3

 

Each user obtaining data from the Center is sent one or two simple
questionnaires which attempt to gather data about the user and how well
he was served. If part of the information given to the user is a bib-
liography the questionnaire attempts to gather information about how
well this bibliography met the user's need. The other questionnaire
attempts to evaluate how well other types of materials answered the
user's specific questions. About ten percent of these questionnaires
are returned with the cumulative results presenting a positive picture.
The question which is unanswered is "Would those who have not returned
the questionnaires respond in the same way as those who have returned

the questionnaires?"

Recommendation 3

 

Considering the low percentage of the questionnaires returned it is
recommended that consideration be given to using one or two possible
techniques to gather information about users. The first technique would
offer (perhaps for a limited time) those who return the questionnaires a
bonus publication with the hope that this would increase returns to 70
percent or more. The second technique would use telephone interviews

with a random sample of users to obtain more representative data.

198

Observations Related to Recommendation 4

 

Regardless of how many users are served by an information center
certain costs remain fixed. Because of this the cost per information
request can be reduced by increasing the number of users.

While some data is presently collected concerning the characteris-
tics of those using the Information Center, little if any data has been
collected concerning the proportion of potential users who know about

the Center or who use the Center.

Recommendation 4

 

To aid in making decisions that may help improve cost effectiveness,
and aid in better serving users it is recommended that consideration be
given to conducting a study, a part of which would include:

1. Defining the characteristics of individuals who are considered

to be potential users of the Information Center.

2. Estimating the number and location of individuals who have

the defined characteristics.

3. Collecting from a random sample of these individuals informa-

tion including the fOllowing:

a. Whether or not they are aware of the services provided by
the Information Center.

b.. What their information needs are in the area of special
education. .

c. Whether they have ever used the Information Center.

d. If they have used the Information Center, how effectively

it served their needs.

199
e. If they have used the Information Center, how they first

found out about the Center.
f. If they have not used the Center, where they obtain the
type of information provided by the Center.
The information collected would be used to determine if there are

appropriate modifications in procedure that might help potential users

 

 

. , . I“
to become aware of the Center, improve the Center 5 serV1ceS to users, 1
or make acquisition policies more congruent with users' information E
needs. ;
Observations Related to Recommendation 5 E

A subjective analysis was made Of the vocabulary used in the Samuel
Price Thesaurus232 which was not found in the ERIC descriptors used in
indexing Volume I of ECEA. This analysis suggested that a major cate-
gory of terms not included in the ERIC descriptors were terms of a

technical nature--especially those related to medical literature.

Recommendation 5

 

It is recommended that the Information Center give careful consid-
eration to its acquisition procedures and policies including the type of
medical literature that might be of value to those working in special
education. If the study suggested in Recommendation 4 is carried out it

might aid in the examination of the acquisition procedures.

 

2321bid.

 

200
Implications

The results of this study and the procedures developed at the CEC-
ERIC Information Center appear to have two potentially important impli-
cations related to improving communication within the field of education
and in other areas. The fire: implication relates to the use of con-
trolled indexing vocabularies when powerful computer searching algo—

rithms are available and the second relates to the use of searchable

 

information files in the publication of selected materials.

 

 

The Use of Controlled IndexingiVocabularies E
The rationale which is sometimes used to support various types of £_

controlled indexing languages is that the restricted vocabulary provides
the communication linkage between the indexer and the user. The follow-
ing results from this study related to this assumption:

1. CEC-ERIC staff retrieved 175 relevant documents (target and
non-target) and 77 target documents by Indexing Method 1. This
was the indexing method that they were accustomed to and the
method which depended most on the controlled vocabulary of the
ERIC descriptors.

2. Professional educators retrieved 191 relevant documents (target
and non-target documents) including 81 target documents by
Indexing Method 2 (the method which extracted terms from titles
and abstracts without the use of a controlled vocabulary).

3. Professional educators not familiar with the controlled vocabu-
lary (the ERIC descriptors) had significantly better precision

on all three indexing methods than the CEC-ERIC staff and

201

retrieved more relevant documents on Methods 2 and 3 than did
the staff.

In other words, the controlled vocabulary appeared to benefit the
CEC-ERIC staff only when they were using Indexing Method 1. When using
other methods their knowledge of the ERIC descriptors appeared to be of
no advantage and was perhaps a disadvantage. Those not familiar with
the ERIC descriptors were able to retrieve more relevant documents with gene
greater precision by Methods 2 and 3 than could the ERIC staff by any I
method including the method to which they were accustomed.

The analysis of these results cannot help but raise serious doubts

 

concerning the value of using a controlled vocabulary when flexible and
powerful searching algorithms are available. Specifically, it raises

the question "What is the advantage of using ERIC descriptors (a con-

 

trolled vocabulary) when those trained to use them retrieved fewer

 

relevant documents by the indexing method designed to utilize the ERIC

 

descriptors, than_professiona1 educators retrieved by an indexing method

 

which uses terms extracted from titles and abstracts?”

 

One answer is that the descriptors are used for generating printed
indexes for both ERIC and CEC-ERIC publications. These indexes would
be much more cumbersome if a controlled vocabulary was not used. A
second answer is that the ERIC descriptors when added to terms extracted
from titles and abstracts did increase the total number of relevant
documents retrieved by professional educatiors from 191 to 265.

A third answer to this question is that if the searching algorithm
available in BIRS had not been used an entirely different result might
have been obtained. These algorithms minimize the need for controlled

vocabulary by reducing words to root forms and by matching words or

202

small phrases with larger phrases. The results of this study would
tend to support the position that when such algorithms are available
the value of a controlled vocabulary is minimized.

These results not only have implications for those establishing new
information systems, but also imply the need for further Studies.
Specifically studies are needed to compare the interaction between
controlled indexing vocabularies and the type of searching algorithms
used in this study versus algorithms which make only exact matches. It
is possible to design such a study that will build on the data and

questions used in this study.

An Evolving Thesaurus

 

A problem common to computerized information retrieval systems
which use controlled thesauri is the dependency of those writing ques-
tions on the thesauri. Because of this dependency, a human interface
has often been used between the actual user and the information system.
When this is done, an additional subjective judgment is added which may
reduce the chance Of the user getting the information desired.

In this study the indexing and searching algorithms of the BIRS
system made it possible for individuals not familiar with a thesaurus to
get results comparable to or better than those obtained by individuals
familiar with a thesaurus. This would tend to support the position that
systems can be developed which eliminate the need for a human interface.
In analyzing the algorithms to determine how they facilitated this re-
sult, the most apparent reason was that it_ma§_mpp necessary for terms
to match exactly. For example, the terms MENTALLY RETARDED and MENTAL
RETARDATION would be cOnsidered a match, Similarly MENTALLY RETARDED

and EDUCABLE MENTALLY RETARDED.

 

203
An analysis of the results of questions written by those familiar

with the ERIC Thesaurus established that one failure resulted from indi-
viduals using the term MENTALLY RETARDED when the thesaurus contained
the term MENTALLY HANDICAPPED. Because handicapped and retarded do
not have similar roots, it is necessary for a human decision to be made
before the computer can equate such terms.

One reason for the use of thesauri is to Show relationships between 'r“
terms. For example, if a term is not in the thesaurus, it may suggest
an alternative term or it may identify for a given term corresponding

broad terms, narrow terms, and related terms. If such relations were

 

permanently stored in a computer and used by appropriate indexing and
searching algorithms, it might be possible to further improve search
results.

One means of giving an empirical base to these relationships would
be to analyze user requests. If a user wrote a question which contained
a term that was not indexed in any document, this would be noted by the
computer and the term stored on a separate file for later analysis. In-
dividuals familiar with the indexing terms and the information files
could then examine lists of such terms to decide if there were indexing
terms to which terms on the list could be equated. Such a system could
be refined by continuing analysis of user search results and would also
provide valuable empirical data concerning the terminology actually used
by users.

The techniques described are applicable both to interactive and
batch processing systems. An advantage of an interactive system would
be its ability to aid the user in learning to write search questions

more effectively. However, as demonstrated in this study, individuals

204

can be trained with a minimum of effort to write good computer questions

for batch processing. Additional studies examining this approach both on

systems using batch processing and interactive terminals might lead to
systems which are more user oriented, thus eliminating intermediary per-

sonnel used to translate user requests into computer search questions.

Selective Publication from Information Files

 

The problem of finding better ways to deal with the rapidly expand-
ing amount of printed information is not solely one of finding better
ways to store and retrieve information. Also important to this problem
is finding better ways to disseminate organized information. The publi-

cation of Exceptional Child Education Abstracts and the related activi-

 

ties using the computerized abstract files provides a model which may
aid in dealing with the rapid expansion of knowledge.

By having abstracted information ECEA shares with similar journals
the advantage of presenting readers minimal data to help them determine
if they should read a specific document. Because the abstracts in ECEA
are on a computer file, other alternatives for coping with the rapid ex—
pansion of information are available. Included in the procedures made
available and used by the CEC-ERIC Information Center are:

l. The use of the computer to generate indexes for inclusion in

the abstract journal or in other selected publications.

2. The use of computers to control typesetting of both the

abstracts and the computer-generated indexes.

3. The use of computer searching to assist in answering informa-

tion requests of users.

4. The use of the computer to retrieve data for, and control type-

setting Of, selected publications.

 

 

205

By using these procedures there are a number of advantages that may
not be immediately apparent. first by having the abstracts published in
a journal rather than making them available only through computer
searches, the information becomes more accessible. Second, by having
more comprehensive indexes available through computer indexing, it is

less likely that those using the journal will need Specific computer

 

searches. IpirH, by being able to organize, index, and publish selected ..
bibliographies it is possible to answer many user requests without a

specific computer search, thus again minimizing the number of Specific
searches. (At present more than half of all user requests are answered

by one or more of the Center's 59 computer-generated bibliographies.)

 

Fourth, a single keying operation provides both for generating computer
searchable files and input for computer controlled typesetting.

Not used but also available is the capability of using computer
controlled equipment to generate microfilm images of files. This could
be done at a minimal cost and would reduce the amount of space required
to store abstract journals or special bibliographies.

The variety of ways with which the Center uses a single information
file to organize and disseminate knowledge about special education pro-
vides a model which may suggest some possible answers to the problem of
the rapid growth of knowledge.

With some imagination it is possible to envision procedures whereby

Wells' "universal encyclopedia”233 might become a reality. If one

 

233H. G. Wells, "World Encyclopedia," World Brain (Garden City,
New York: Doubleday, Doran 8 Co., Inc., 1938), pp. 3-35. Paper read
at the Royal Institution of Great Britain Weekly Evening Meeting,
Friday, November 20, 1936.

 

 

206

information file can be used as flexibly as the ECEA file this can be
done with other files. If single documents can be indexed and
abstracted, information files may also be indexed and abstracted.
Thus, by searching one file it would be possible to identify other
information files that would be most likely to contain the types of

data needed.

 

BIBLIOGRAPHY

SELECTED BIBLIOGRAPHY

A. BOOKS

Artandi, Susan. An Introduction to Computers in Information Science.
Metuchen, N. J.: Scarecrow Press, 1968. 145 pp.

 

Borko, Harold. Automated Language Processing. New York: John Wiley
and Sons, Inc., 1967. 386 pp.

 

Chorafas, Dimitris N. Systems and Simulation. New York: Academic Press,
1965. 487 pp.

 

Cleverdon, C. W. Identification of Criteria for Evaluation of Opera-
tional Information Retrieval Systems. Cranfield, BedfOrd, England:
Cranfield College of Aeronautics, November, 1964.

 

 

, Jack Mills, and Micahel Keen. ASLIB Cranfield Research
Project, Factors Determining the Performance of Indexing Systems,
Vol. 1. Design, Part 1. Text, Part 2. Appendices. Cranfield,
Bedford, England: College of Aeronautics, 1966, 337 pp.

 

Cuadra, Carlos A., Robert V. Katter, Emory H. Holmes, and Everett M.
Wallace. Experimental Studies of Relevance Judgments: Final Report
3 vols. Santa Monica, California: System Development Corporation,
June, 1967.

 

Deutsch, Ralph. Hystem Anaiysis Techniques. Englewood Cliffs:
Prentice-Hall, Inc., 1969. 464 pp.

 

Directory of Educational Information Centers. U. S. Government Printing
Office, 1969. Document No. FSS.212:12042.

Directory of Federally Supported Information Centers. Clearinghouse
for Federal, Scientific, and Technical InfOrmation, April, 1968.
PB 477050.

Fairthorne, R. A. Towards Information Retrieval. London: Butterworths,
1961. 211 pp.

 

Hayes, R. M., and Joseph Becker. Handbook of Data Processing for
Libraries. New York: Riley-Hayes-Becker Publications, a sub-
sidiary of John Wiley 8 Sons, Inc., 1970.

 

Lancaster, F. Wilfrid. Information Retrieval Systems. New York: John
Wiley 8 Sons, Inc., 1968. 217 pp.

207

 

208

Meadow, Charles T. The Analysis of Information Systems. New York: John
Wiley 6 Sons, Inc., 1967. 301 pp.

 

Salton, Gerard. Automatic Information Organization and Retrieval. New
York: McGraw-Hill Book Company, 1968.

 

Vickery, B. C. Faceted Classification Schemes. Vol. V Of Systems for
The Intellectual Organization of Information. Edited by Susan
Artandi. New Brunswick, N. J.: The Rutgers University Press,
1966. 108 pp.

 

 

 

On Retrieval System Theory. London: Butterworths, 1965.
183 pp.

 

Wells, H. G. World Brain. Garden City, New York: Doubleday, Doran & Co.,

 

Inc., 1938.

White, Harry J., and Selmo Tauber. Systems Analysis. Philadelphia: W.

_J¥

Saunders Company, 1969. 492 pp.

B. ARTICLES AND PERIODICALS

"All About ERIC," Journal of Educational Data Processing, VII (April,
1970), 51-129.

 

Artandi, Susan. "Computer Indexing of Medical Articles," Journal of
Documentation, XXV (September, 1969), 185-282.

 

"Document Description and Representation," Annual Review
of Information Science and Technology, Carlos A. Cuadra, editor.
Chicago: William Benton, 1970. V, 143-168.

 

 

, and Edward H. Wolf. "The Effectiveness of Automatically Gener-
ated Weights and Links in Mechanical Indexing," American Documen-
tation, July, 1969, pp. 198-202.

Baxendale, P. B. ”'Autoindexing' and Indexing by Automatic Processes,"
Special Libraries, LVI (December, 1965), 715-719.

 

"Content Analysis, Specification and Control," Annual Review
of Information Science and Technology, Carlos A. Cuadra, editor.
New York: John Wiley 8 Sons, 1966. 1, 71-106.

 

 

Borko, Harold. "Design of Information Systems and Services,” Annual
' Review of Information Science and Technology, Carlos A. Cuadra,
editor. New York: John Wiley 8 Sons, 1967. II, 35-62.

 

Bourne, Charles P. "Evaluation of Indexing Systems," Annual Review of
Information Science and Technology, Carlos A. Cuadra, editor.
New York: Interscience Publishers, 1966. I, 171-190.

 

 

B.

209

Burchinal, Lee G. "The Educational Resources Information Center: An
Emergent National System," Journal of Educational Data Processing,
VII (April, 1970), 55-67.

 

Clearinghouse on Exceptional Children," Exceptional Children, Summer,
1967, p. 693-694.

 

Clearinghouse on Exceptional Children, March, 1967.

 

Cleverdon, Cyril W. ”The Cranfield Hypotheses,” Library Quarterly, XXXV
(April, 1965), 121-124.

 

, F. W. Lancaster, and J. Mills. “Uncovering Some Facts of Life
in Information Retrieval," Hpecial Libraries, LV (February, 1964),
86-91.

 

Commission and the Council for the International Union of Chemistry.
"Definitive Report of the Commission on the Reform of Nomenclature
of Organic Chemistry," Journal of American Chemical Society, LX
(1933), 3905-25.

 

COOper, William S. "Is Interindexer Consistency a Hobgoblin?" American
Documentation, July, 1969, pp. 268-278.

 

Cuadra, Carlos A. (ed.). Annual Review of Information Science and
Technology, 5 vols. New York: John Wiley 8 Sons, 1966-69.

 

 

Dale, A. G. and N. Dale. "Some Clumping Experiments for Associative
Document Retrieval," American Documentation, January, 1965, pp. 5-9.

 

deSolla Price, Derek J. "Network of Scientific Papers," Science, CXLIX
(July 30, 1965), 510-515.

Doyle, Lauren B. "Indexing and Abstracting by Association," American
Documentation, October, 1962, pp. 378-390.

 

Eller, James L. and Robert L. Panek. "Thesaurus Development for a
Decentralized Information Network," American Documentation, July,
1968, pp. 213-220.

 

"ERIC Excerpt,” Exceptional Children, October, 1967, pp. 143-148; April,
1968.

 

Exceptional Child Education Abstracts, 11 (November, 1970).

 

Fischer, Marguerite. "The KWIC Index Concept: A Retrospective View,"
American Documentation, XVII (April, 1966), 57-70.

 

Gibson, R. E. "A Systems Approach to Research Management," Part I,
Research Management, V (1962), 215 pp.

 

Gull, C. D. "Seven Years of Work on the Organization of Materials in
the Special Library," American Documentation, VII (October, 1956),
320-329.

 

210

Hall, A. and R. Pagan. "Definition of a System," General Systems, Vol.
I of Yearbook of the Society for General Systems. 1956.

 

 

Hyslop, Marjorie R. ”Sharing Vocabulary Control,” Special Libraries,
LVI (December, 1965), 708-714.

 

Jordan, June B. "CEC-ERIC-IMC" A Program Partnership in Information
Dissemination," Exceptional Children, XXXV (December, 1968),
311-313.

 

King, Donald W. ”Design and Evaluation of Information Systems,” Annual
Review of Information Science and Technology, Carlos A. Cuadra,
editor, Chicago: Encyclopedia Britannica, 1968. 111, 61-104.

 

Kochen, Manfred. "Systems Technology for Information Retrieval," The
Growth of Knowledge, Manfred Kochen, editor. New York: John Wiley
and Sons, 1967. pp. 352-372.

 

Lancaster, F. W. "Evaluating the Performance of a Large Operating
Retrieval System.” Electronic Handling of Information. Allen Kent,
Orrin E. Taulbee, Jack Belzer, and Gordon D. Goldstein, editors.
Washington, D. C.: Thompson Book Company, 1967. pp. 199-216.

 

”MEDLARS: Report on the Evaluation of its Operating Effi-
ciency," American Documentation, April, 1969. pp. 119-142.

 

, and Constantine J. GilleSpie. "Design and Evaluation of
Information Systems," Annual Review of Information Science and
Technology, Carlos A. Caudra, editor. Chicago: Encyclopedia
Britannica, Inc., 1967. V, 33-70.

 

 

 

, and J. Mills. "Testing Indexing and Index Language Devices:
The ASLIB Cranfield Project,” American Documentation, XV (January,
1964, 4-13.

Lesk, M. E. and G. Salton. "Relevance Assessments and Retrieval System
Evaluation," Information Storage and Retrieval, December, 1968,
pp. 343-359.

 

Luhn, H. P. "The Automatic Creation of Literature Abstracts," IBM Journal

 

of Research and Development, 11 (1958), 159-165.

 

"Keyword-In-Context Index for Technical Literature,” American
Documentation, XI (1960), 288-295.

 

Montgomery, Christine, and D. R. Swanson. "Machine Like Indexing by
People," American Documentation, XIII (October, 1962), 359-66.

 

Moon, R. D., and J. F. Vinsonhaler. "The Title-Generated Thesaurus: A
Practical Method for Automated Indexing," Proceedings of the Sixth
Annual National Colloquium on Information Retrieval - The Informa-
tion Bazaar. Philadelphia: The Medical Documentation Service of
the College of Physicians, 1969.

 

 

211

 

O'Connor, John. ”Correlation of Indexing Headings and Title Words in
Three Medical Indexing Systems,” American Documentation, XV (1964),
96—104.

Perry, Peter. "Combined Grouping for Coordinate Indexes. American
Documentation, XIX (April, 1968), 142-145.

 

Price, Nancy, and Samuel Schiminovich. "A Clustering Experiment:
First Step Towards a Computer-Generated Classification Scheme,"
Information Storage and Retrieval, IV (August, 1968), 271-280.

 

Research in Education, July, 1967.

 

Salton, Gerard. "A Comparison Between Manual and Automatic Indexing
Methods," American Documentation, January, 1969, pp. 61-71.

 

"The Evaluation of Automatic Retrieval Procedures--Selected
Test Results Using the SMART System,” American Documentation, July,
1965, pp. 209-222.

 

, E. M. Keen, and M. Lesk. "Design Experiments in Automatic
Information Retrieval,” The Growth of Knowledge, Manfred Kochen,
editor. New York: John Wiley 8 Sons, Inc., 1967. pp. 336-351.

 

Schultz, Claire K. ”DO-It-Yourself Retrieval System Design,” Special
Libraries, LVI (December, 1965), 720-723.

Sharp, John R. "Content Analysis, Specification and Control," Annual
Review of Information Science and Technology, Carlos A. Cuadra,
editor. New York: John Wiley 6 Sons, 1967. 11, 87-122.

 

Simmons, Robert F., Sheldon Klein, and Keren McConlogue. "Indexing and
Dependency Logic for Answering English Questions," American Docu-
mentation, July, 1964, pp. 196-204.

 

, and Keren L. McConlogue. "Maximum Depth Indexing for Computer
Retrieval of English Language Data," American Documentation,
January, 1963, pp. 68-73.

 

Sparck, Jones, Karen and Roger M. Needham. "Automatic Term Classifica-
tions and Retrieval," Information Storage and Retrieval, IV (June,
1968), 91-100. (Presented at the First Cranfield International Con—
ference on Mechanized Information Storage and Retrieval Systems,
College of Aeronautics, Cranfield, England, 29-31 August, 1967.

 

Swanson, Don R. "The Evidence Underlying the Cranfield Results," The
Library Quarterly, XXXV (January, 1965), 1-20.

 

Swets, John A. "Information-Retrieval Systems,” Science, CXLI (July 19,
1963), 245-250.

212

Tate, F. A. "Handling Chemical Compounds in Information Systems,”
Annual Review of Information Science and Technology, Carlos A.
Cuadra, editor. New York: Interscience Publishers, 1967. II,
285-310.

 

Taulbee, Orrin E. "Content Analysis, Specification, and Control,”
Annual Review of Information Science and Technology, Carlos A.
Cuadra, editor. Chicago: William Benton, 1968. 111, 105-136.

 

Wyllys, Ronald E. "Extracting and Abstracting by Computer," Automated
Language Processing, Harold Borko, editor. New York: John Wiley
6 Sons, Inc., 1967. pp. 127-180.

 

Zunde, Pranas, and Margaret E. Dexter. Indexing Consistency and
Quality,” American Documentation, XX (July, 1969), 259-267.

 

C. REPORTS, TECHNICAL MANUALS, AND UNPUBLISHED MATERIAL

Burchinal, Lee G. Deveiopment of ERIC Through December, 1968. Division
Of Information Technology and Dissemination, Bureau of Research,
U. S. Office of Education, Department of Health, Education and Wel-
fare, Office of Education/ Office of Information Dissemination.
First Printed in August 1969, Revised February, 1970.

 

Evaluation of ERIC, June, 1968. Report from U.S. Department
of Health, Education and Welfare, Office of Education, Bureau of
Research. Available as ED 020449. Bethesda, Maryland: ERIC
Document Reproduction Service, 1968.

 

CEC-ERIC Information Center. ”Processing Costs & Formulas," an unpub-
lished summary prepared under the direction of Carl Oldsen.
September, 1970.

Jordan, June B. "Handicapped Children and Youth ERIC Clearringhouse on
Research Dissemination," a proposal submitted to the U.S. Department
of Health, Education and Welfare, Bureau of the Handicapped, 1966,

8 pp.

Oldsen, Carl, ECEA editor. Unpublished statistical information based on
an analysis of 5,715 acquisitions in Volumes I and II of ECEA.

Unpublished statistical information on user requests.

Price, Samuel T. "The Development Of a Thesaurus of Descriptors for an
Information Retrieval System in Special Education," Unpublished
doctoral dissertation, University of Pittsburgh, 1969, abstract.

, (comp.). Thesaurus of Descriptors for an Information Retrieval
System in the Subject Matter Area of Special Education, Normal,
Illinois: Special Education Instructional Materials Laboratory,
Illinois State University, January, 1970.

 

 

213

Rees, Allan, and Douglas C. Schultz, Principal investigators. A Field
Experimental Approach to the Study of Relevance Assessments in
Relation to Document Searching. Final Report to the National
Science Foundation. Cleveland: Center for Documentation and
Communication, School of Library Science, Case Western Reserve Uni-
versity, October, 1967, I, 287 pp.; 11, Appendices A-Q.

 

 

Stevens, M. E. Automatic Indexing: A State of the Art Report. NBS
Monograph 91. Washington: National Bureau of Standards, March
1965.

 

Thesaurus of ERIC Descriptors: Working COpy Descriptor Listing, ERIC
Processing and Reference Facility. Bethesda: Maryland: Leasco
Systems and Research Corporation, August, 1971. 224 pp.

 

Thesaurus for Exceptional Child Education. Arlington, Virginia: CEC-
ERIC Information Center on Exceptional Children, 1971. 12 pp.

 

Trester, Delmer J, System Coordinator, Department of HEW, Office of
Education. Statistics on ERIC accompanied by cover letter to Carl
Oldsen, CEC-ERIC, February 16, 1971.

Vinsonhaler, John F. The Information Systems Laboratory: A Progress
Report for 1969. ISL Report No. 10. East Lansing: Michigan State
university, January, 1970.

 

 

(ed.). Technical Manual Basic Indexing and Retrieval System
BIRS 2.0. East Lansing: Educational Publications Services, Col-
lege of Education, Michigan State University, January, 1968.

 

, John M. Hafterson, and Stuart W. Thomas, Jr. (editors). Basic
Indexing Retrieval System Technical Manual. 10 vols. East Lansing,
Michigan: Information Systems Laboratory, College of Education,
Michigan State University, 1970.

 

, and John M. Hafterson (editors). Technical Manual for Basic
Indexing and Retrieval System, BIRS 2.5, Appendix 1. East Lansing:
Educational Publications Services, College of Education, Michigan
State University, January, 1969.

 

 

Weinberg, Alvin. Science, Government, and Information: The ResPonsi-
bilities of the Technical Community and the Government in The
Transfer of Information. President's Science Advisory Council.
Washington: Government Printing Office, 1963.

 

 

 

APPENDIX A

APPENDIX A

A Description of the Operating Procedures
Used by the CEC-ERIC Information Center
The procedures described in this Appendix are an expansion upon
the overview of the Operating procedures found in Chapter 3. There
have been some duplications of the material in Chapter 3 so that the
Appendix may be read in an entirety without referring to other sections

of the text.

Segend and Nomenclature

 

The symbols used in the following diagrammatic representations of
CEC-ERIC's processing are those commonly used on computer program and
systems flowcharting. Occasional liberties are taken by using a single
symbol to imply more than is commonly done in computer programming;
however, in these cases the Operations represented by the symbol will
be described verbally. The descriptions of the symbols found in Figure
1A are those given on the cover of an IBM flowcharting template, Form
X 20-8020.

In addition to the symbols in Figure 1A the following alphabetic
legend will be used to identify specific symbols in various Figures:

1. C (n) stands for Sonnection number H_where N may be any

number and the connection may be between flowcharts or

within the same flowchart.

214

215

 

SYMBOL

REPRESENTS

 

 

INPUT/OUTPUT

Any function of an input/output device (making infor-
mation available for processing, recording processing
information, tape positioning, etc.)

 

 

 

PROCESSING

A group of program instructions which perform a
processing function of the program.

 

DECISION

The decision function used to document points in the
program where a branch to alternate paths is possible
based upon variable conditions.

 

PREDEFINED PROCESS

A group of operations not detailed in the particular
set of flowcharts.

 

 

 

 

PROGRAM MODIFICATION

An instruction or group of instructions which changes
the program.

 

 

CCU.

 

 

CLERICAL OPERATION

A manual offline operation not requiring mechanical
aid.

 

FIGURE 1A

FLOWCHARTING SYMBOLS

216

 

 

 

 

 

 

 

 

 

 

 

SYMBOL REPRESENTS
DOCUMENT
Paper documents and reports of all varieties.
MAGNETIC TAPE
7 PUNCHED CARD
. All varieties of punched cards including stubs.
KEYING OPERATION
An operation utilizing a key-driven device.
‘ ‘ FLOW DIRECTION
' The direction of processing or data flow.
.

 

CONNECTOR

An entry from, or an exit to, another part
of the program flowchart.

 

 

OFFPAGE CONNECTOR

A connector used instead of the connector
symbol to designate entry to or exit from a page.

 

FIGURE 1A (cont'd)

217
2. D (N) stands for Hecision number E:

3. IP (N) stands fer ipput number H.

4. OP (N) stands for Qutput number H,

5. PP (N) stands for gredefined Erocess E:

6. S (N) stands for Step (H).

7. SB (N) stands for Symbol H_of a given Figure. This will be
used when it is necessary to identify a given symbol fer
discussion which is not identified in another way.

Sometimes the number N may be a decimal number such as 1.1 or 1.12.

This is used to tie various closely related operations together. For
example, 81 would stand for Step 1, 81.1 would stand for a small step

in the major step 1. Additional points past the decimal will be used to
indicate further refinement of steps. For example, Steps 87.11, 87.12,
and 57.13 would all have functions in common to those represented by

the 87.1. This legend allows the reader to identify similar steps

appearing in various portions of the flowchart.

Overview of the Information Center's Major Activities

 

Figure 2A provides a simplified overview of the Information Center
processing by dividing the processing into six major activities; docu-
ment acquisition, document management, file maintenance, file processing,
information processing, and evaluation with system modification. The
core of activities found in Figure 2A are presented with greater detail
in the later diagrams Figures 3A, 4A, and 5A. The activities described
are found in most information centers utilizing computer processing;
however, the Specific steps and products resulting in these broadly

defined activities will vary considerably from center to center.

21

(I)

 

Activity 1 Document Acquisition
1P (1)

 

 

 

 

 

 

 

Activity 2 n Document Management
Activity 3 a File Maintenance

Information Hescription file Iape

— Sile

Iape
Erinted ipdex file [ape

Activity 4 D File Processing
Activity 5 ” Information Processing
Activity 6 Evaluation and System
'_—_‘—'_-—‘ MOdifiCation

FIGURE 2A

OVERVIEW OF INFORMATION CENTER MAJOR ACTIVITIES

 

219

Briefly these activities can be described in the following manner:

Activity 1 - Document Acquisition This activity includes the

 

selection of documents which will be bought or acquired by other methods
so that they may be examined to determine if they are appropriate for

inclusion in Information Center holdings.

Activity 2 - Document Management This activity includes exami-

 

ning documents to determine if they should be included in the Informa-
tion Center data bank, the abstracting of documents, the indexing of

documents, and the cataloging of documents.

Activity 3 - File Maintenance This activity includes key-

 

punching document surrogates, storing the document surrogates on a com-
puterized information file and preparing computerized description files

and printed index files.

Activity 4 - File Processimg_ This activity includes computer

 

processing of files to organize the infOrmation in a form that will be

more useful and easier to disseminate.

Activity 5 - Information Processipg This activity involves

 

processing user requests, providing users with information, publishing
new documents from information contained on the computer files, and
providing information to be used in evaluation Of the system. The
activities in this section are primarily manual activities but they

may initiate computer file processing (Activity 4) as one of the several

steps in the procedure.

Activity 6 - Evaluation This activity involves examining the

 

procedures used by the Center and, if appropriate, modifying these pro-

220
cedures to make the total operation of the Information Center more

effective.

Overview of Major Input and Output

 

Figure 3A provides an overview of the major input to the Informa-
tion Center and the output generated as a result of processing the
input. Documents are acquired (IP (1)) and processed in the document

management activities (PP (1)) to generate copy for Research in

 

Education (OP (1)) and Current Index to Journals in Education (OP (2)).

 

All documents which will eventually become part of the Information
Center holdings, including those which are processed for RIE and CIJE,
are then put in the form used on the Center's information files and
passed to file maintenance processing (PP (2)). In the file maintenance
activity the documents are put in computer readable form and various
computer files are generated. These computer files provide input for
the file processing (PP (3)) and output for selected publication

(OP (4)). This output is in a form that allows for computer type-
setting, and computer-generated indexes, and printing with a minimum
effort.

ECEA and the selected publications in turn become input Infor-
mation Processing (PP (4)). These with input from CEC publications,
user requests, and additional file processing, are utilized in pro-
viding information to users (OP (5)) and in assisting staff members

in generating new documents (OP (6)).

Overview of Evaluation and Processing Modifications

 

Figure 4A provides an overview of the continuing evaluation which

is used to monitor and if appropriate modify the processing of the In-

221

- INPUT 0R ourpur

 

 

DFT . SELECTED , i .. CEC ..
IFT PBBFIFATIQRS: ' PUBLICATIONS"
_ SEEM = “ If?- (2

 

PIFT

 

PF (3)

 

INFORMATION
TO USERS
PP 4
OP (5). ( )

 

may! I.
as , (6}? ' '

 

PP (5)

 

   

FIGURE 3A

OVERVIEW OF MAJOR INPUT AND OUTPUT

222

 

IP (1) §:§:§:§:§:§:§:§:

................ . EVALUATION AND SYSTEM MODIFICATION

(11+—

YES

   

 

 

(51“F-'

    
   
   
   

SB (4)

   

-: PROJECTS:-
,oprrcenz m.

.ADVISORYSS
380:; 'gpp -(?'

   

FIGURE 4A

AN OVERVIEW OF THE INFORMATION CENTER'S
EVALUATION AND SYSTEMS MODIFICATION COMPONENTS

223

formation Center. Input to the evaluation component is provided from
information processing (PF (4)), user evaluation (PP (6)),the project
officer and advisory board (PP (7)), and the IMC/RMC Network (Instruc-
tional Materials Center/Regional Media Center Network) (PP (8)). The
arrows going in both directions indicate that there is an interaction
between evaluation and other components. The input from the various
sources is processed by the evaluation component to determine if there
are system modifications which should be made. The flow of decisions
is illustrated by symbols SB (1), SB (2) and in the modification
occuring to (PP (4)). The numbers 1, 2, 3, and 5 appearing within
parentheses Opposite arrows indicate that the same series of symbols;
namely, SB (1), SB (2), and SB (3) would appear at these points and

be connected to the predefined processes PP (1), PP (2), PP (3), and
PP (5) as done in later diagram Figure SA. If no change is made this
serves as feedback to the evaluation procedures as indicated by the

connection C (l) to PP (5).

Overview and Model of the Information Center's Operation

 

Figure 5A provides an overview of the Information Center's
processing. In this overview six major activities can be seen in the
center of the flowchart. The input and output operation shown in
Figure 3A is present as well as the evaluation procedures indicated
in Figure 4A. The model as presented indicates a continual flow of
input, output, evaluation, and appropriate systems modification to
improve the operation procedures. Figure 5A and the more simplified

Figures 2A, 3A, and 4A can be used as a reference as the more detailed

224

 
  
    
 
  

E H 7 - Change
Processing? . j
- ,_ -INPUT OR OUTPUT
S H - _ystem ” _
E9dif1°3t1°n 3; -EVALUATION AND SYSTEM MODIFICATION

 

  
   
   
  

DOCUMENT
MANAGEMENT

PP (1)

  
   
   

FILE
MAINTENANCE

PP (2)

 

 

  
    

  

 

   

 

 

   
 

 

‘ ,
”FT SELECTED .
IFT PUBLICATIONS
or 4
PIFT ( )
)
ﬁikEQUEsrs FILE
* PROCESSING
pp (3)

INFORMATIO

PROCESSING
PP (4)

 

 

 

 

 

  
 

 

    

'k'bp (sﬁﬁ’

 

 

.........
..........

  

  
     

.OFFICER ANDn

,Zkﬁusskiﬁﬁu
ZEYAFHATiBBS 3&ADVIS0RY2=

421;» ' s

 

FIGURE 5A

AN OVERVIEW AND MDEL OF THE INFORMATION CENTER'S OPERATIONS

225
steps involved in the Information Center's operations are discussed

in the following sections.

Acquisition Control and Document Management

 

Figure 6A provides a detailed description of the Steps involved in
acquiring documents and preparing document surrogates to be placed on
the Center's information file. The four sources of input are identified
in Figure 6A as 1P (1.1) through 1P (1.4) and are an expansion of 1P (1)
found in the overview of Figure 5A. Step 2 through Step 5 relate to the
predefined process PF (1) of Figure SA with the output OP (1) and OP (2)

being identical to that on Figure 5A.

Step 1 - Acquisition of Documents Documents which become part

 

of the Information Center holdings are obtained from four major sources,
identified in Figure 6A as 1P (1.1) through 1P (1.4). These sources are:
l. Journals containing articles which will be processed for use
in CIJE. These journals are divided into two categories:
a. Journals where all articles are automatically processed
for use in CIJE.
b. Journals containing articles which are examined to
determine if they are relevant for processing in CIJE.
2. Journals which contain articles that are not considered fer
CIJE, but are considered fer ECEA.
3. Documents ordered from various publishers as a result of pub-
lishers' announcements and literature reviews.
4. Documents which are donated or suggested by various sources to

the Information Center. Two major sources of contribution are:

226

 

 

 

 

    
      
 

NON-CIJE DOCUMENT DONATED
JOURNALS SOURCE ORDERING DOCUMENTS
JOURNALS
IP(1.1) IP(1.2) IP(1.3) IP(1.4)

   

   
 

  

ORDER
CONTROL

 
    

S (2)

 

 

  
   

 

 

 

INDEXING N0 our
OP (0) 5 (4-1) OP (0)
‘1 YES
0(3)
ABSTRACTIN USE IN YES
5 (5.1) RIE
?

 

,* NO

CATALOGING CATALOGING CATALOGING
s (3.1) S (3.2) s (3.3)

-INDEXING INDEXING &
ABSTRACTIN ABSTRACTING

(4.2))5.2) S(4.3)(5.3)

1

COPY
TO
RIE

OP (2)

 

  
  
   
  

 

 

 

To Figure 7A

 

FIGURE 6A

ACQUISITION CONTROL AND DOCUMENT MANAGEMENT

W

227

a. Documents relating to research projects contributed by
the U. S. Department of Health, Education and Welfare.
b. Documents relating to instructional procedures and media

recommended by the IMC/RMC Network.

Step 2 - Order Control The second step is to process orders

 

in a way that will prevent a document from appearing more than once

on an information file. To prevent this, it is necessary to determine
that documents ordered are not already part of the Information Center's
holdings and that documents coming to the Center through orders and
donations do not contain duplicates. This is done by filling out a form
on all documents which are obtained from either commercial sources or
which are donated to the Center. Included on the form is information
about the author, publisher, title, and number of pages. This infor-
mation is keypunched and placed on a computerized file which is sorted
to generate separate listings by title order, author-publisher order,
and by author-title-publisher order. These listings and similar sorted

lists of documents already processed are used to prevent duplication.

Step 3 - Cataloging The third step for documents not used in

 

CIJE is cataloging. This includes placing on a processing form the
title, author, source where the document may be Obtained, pulication
date, number of pages, and assigning an EC number. The EC number is a
six-digit number where the first two digits refer to a volume number of
Exceptional Child Education Abstracts and the last four digits to an
abstract number. For example, if the document had the number EC 03
1234, this would indicate that the document surrogate is abstract 1234

in Volume III of Exceptional Child Education Abstracts.

 

228
Step 4 - Indexipg_ In the fourth step indexers assign terms
1A

 

from the ERIC Thesaurus to describe the document. All documents
which are processed for use in CIJE are sent to Central ERIC after these
indexing terms have been assigned, as is indicated by S (4.1) and OP (1)
of Figure 6A. Since the beginning of Volume II, the Information Center
has used a subset of the ERIC Thesaurus to prevent the unnecessary proli-

feration of terms with similar meanings.2A

Step 5 - Abstracting In Step 5 an abstract is written for each

 

document or, where permission has been granted from specific journals,
the author abstract is used. In all documents except journal articles
processed for CIJE the indexing and abstracting are done simultaneously.
The documents processed for inclusion in CIJE also differ in processing
order in that the cataloging is not done until the document has been
indexed and abstracted. The reason for this is that the copy sent to
CIJE contains only indexing and bibliographic information and does not
contain a summary (abstract). Regardless of the original acquisition
source, all document surrogates which become a part of the Information
Center files or are published in ECEA are processed according to the
same criteria. At the point in processing indicated by C (2) all docu-

ment surrogates contain the information which will be keypunched for

 

1AThesaurus of ERIC Descriptors, ERIC Processing and Reference Faci-
lity, Operated for U. S. Office of Education by Leasco Systems 6 Research
Corporation, 4833 Rugby Avenue, Bethesda, Maryland, 1970, p. 82.

 

2AThesaurus for Exceptional Child Education, (Arlington,
Virginia: CounCil fbr Exceptional Children, InfOrmation Center on
Exceptional Children, 1970), p. 1-10.

 

 

229

inclusion on the Center's information files. Copies of document surro-

gates which are to become part of Research in Education are sent to

 

Central ERIC as indicated by OP (2).

File Maintenance

 

Figure 7A provides a detailed description of the steps involved in
preparing computer readable copy and placing this copy on the Center's
information files. The steps described correspond to predefined process
PP (2) which generates the IFT (ihformation file Iape), DFT, (Hescrip-

tion file Iape) and PIFT (Erinted ihdex file Iape) of the overview

presented in Figure 5A.

Step 6 - Preparation of Computer Readable Copy» Step 6 is a

 

sequence of small tasks involving repeated keypunching, computer pro-
cessing, and proofreading. Step 6 involves keypunching the document
surrogates which include the information resulting from the cataloging,
indexing, and abstracting steps. The documents are initially punched
in as free a fOrmat as possible with distinct types of information
(fields) designated with an equals Sign followed by a letter. Each
document surrogate is separated by an *$ABSTRACT card followed by the
last four digits of the EC number assigned in the cataloging step.

In Step 6.21 the keypunched cards are read into Preprocessor
Program I which adds a sequence number to each line. For example if
there are a total of 3,500 cards the sequencing would be from 1 to
3,500. The program then provides a listing, printing one abstract per
page with the abstract number and the sequence information, and punches
a new deck of cards including the sequencing information on the right

side of each card.

a From Figure

 

KEYPUNCHING

S (6.11)

I

PREPROCESSOR
PROGRAM I

S (6.21)

 

i

ROOFREAD-
INC
5 (6.31)

‘7

 

 

30

 

 

 

 

 

 

EDIT PROGRAM
5 (6.24)

 

 

 

 

 

 

KEYPUNCHING KEYPUNCHING
s (6.12) s (6.13)
PREPROCESSOR PREPROCESSOR

PROGRAM II PROGRAM II
s (6.22) s (6.23)
PROOFREAD- PROOFREAD-

ING ING
s (6.32) s (6.33)

 

 

 

 

 

 

 

 

DAP SYSTEM
DFMP IFT UTILITY
SIFMP
s (8) g—rvy
PIP
S (9)

 

 

@ e— To Figure 8A —>

FIGURE

7A

FI LE MAINTENANCE

231
The listing is proofread (Step 6.31) and corrections are sent for

keypunching (Step 6.12) where corrections are punched and inserted into
the new deck that was punched by Preprocessing Program I. The sequencing
information on the right side of the card is used to locate and insert
the corrections.

In Step 6.22 the corrected deck is used as input into Preprocessing
Program 11, which:

1. Converts the two letter field code into full field names;

2. Inserts control codes to be used in computer typesetting;

3. Sequences the document surrogates placing the abstract number
followed by the card within that abstract in the right portion
of the line;

4. Checks each EC number to see that it correSponds with the
abstract number;

5. Prints the abstract number on a new page

6. Prints an error message for each field code or EC number in-
consistent with what is expected;

7. Prints the correct abstract number or (corrected abstract
number) on the right of each abstract record;

8. Punches a new deck containing the complete field name, code
for computer typesetting, and sequencing information; and

9. Provides a listing to be used in proofreading.

In Step 6.32 the new listing is proofread and corrections sent to
keypunchers. In Step 6.13 the corrections are keypunched and inserted
by a data processor who then uses the corrected deck as input to Prepro-
cessing Program 11 (Step 6.23) which generates a file tape (Temporary

File Tape 1) and a listing indicating the corrections made in Step 6.13.

232

In Step 6.33 the listing is proofread and corrections if any keypunched
for use in the editing program, S (6.24), which can replace entire ab-
stracts or change lines within the abstracts as needed to generate a
corrected file tape (Temporary File Tape 2).

In the first two keypunching/preprocessing/ proofreading operations
abstracts are handled in batches of 50. In the third sequence of key-
punching/preprocessing/proofreading (S (6.13), S (6.23, and S (6.33)),
ten of the previous batches are grouped together to create a batch of
500 abstracts. The number of the first abstract in the batch is indi-
cated and used as a parameter by the preprocessing program to check
the following sequential abstracts to determine if they have the correct
abstract and EC number. If there is disagreement with what is expected
this is printed out as part of an error message which is examined in

the proofreading Step 6.33.

Step 7 - The Creation of an Information File Tape In Step 7 a

 

systems utility program adds the 500 abstracts to the information file
tape containing the previous abstracts for the current volume of ECEA.
This utility program is designated as SIFMP, standing for Surrogate
Information File Maintenance Program. As was previously mentioned, BIRS
(Basic Information and Retrieval System) is designed in modular format
so that portions of programs or entire programs can be replaced by more
economic Special purpose programs. Step 7 is an example where a systems
utility program was used to replace an operation which might have been
done at more cost by the BIRS Information File Maintenance Program (IFMP).
The Information File Tape (IFT) created in Step 7 is in a line

image format containing all the information that will appear in the

233
published abstracts in ECEA and the information necessary to control

the computer typesetting in a later step. This information file tape
with information file tapes from previous volumes is the data base for
a variety of Information Center activities. Because of the importance
of these tapes and the work required in their preparation, multiple
backups of the tapes are kept in separate locations.

When the BIRS system was first developed, random access disk stor-
age did not have the wide use and lower cost it does today. Many of
the operations which relate to the Information File could now be more
efficiently done using disk storage. Work is presently being done to
develop an Information File Maintenance module which will take advan-
tage of disk storage. Because of BIRS modular structure this can be
done in a way that will not change the user's procedures, thus, again

illustrating the advantage of a modular design.

Step 8 - The Creation of a Description File Tape In Step 8

 

the BIRS program Hescriptive analysis Erogram, DAP is used to extract
descriptive terms from designated fields of the information file. In
the processing at the CEC-ERIC Information Center the terms are ex-
tracted from the title field, the author field, the descriptors field,
the date of publication field and 3 categories field. The information
selected by DAP is put on a temporary file which is processed by the
BIRS Descriptive File Maintenance Program (DFMP) to generate a descrip-
tion file tape or add new information to an existing Description File
Tape, (DFT).

There is a one-to-one correspondence between the information on

the DFT and IFT; i.e., for each abstract on the IFT there is a descrip-

234
tion on the DFT such that description 1 on the DFT corresponds to

abstract 1 on the IFT. The purpose of the DFT is to have a subset of
the information contained on the IFT which will be useful in finding
documents in computerized search. By using a tape with less informa-
tion (the DFT) for searching, the speed of computer searching is in-
creased. The DFT serves as the basis for all computerized searches
done by the BIRS programs and as with the IFT, multiple backups of this

tape are kept in separate locations.

Step 9 - Preparation of Printed Indexes In Step 9 the BIRS

 

Printed Indexing Program, (PIP) is used to select information from the
information file tape and order this information to create a printed
indexing file tape (PIFT). As with the Descriptive Analysis Program,
information from selected fields can be extracted and used to create
from the information file any number of different indexes.3A The
Information Center primarily uses the program to provide indexes of the
author field, the title field, and the descriptors field, all of which

are published as part of Exceptional Child Education Abstracts.

 

File Processing for ECEA

 

Once information files have been established there are many ways
that these files can be processed to organize the information and gen-

erate new products. Figure 8A illustrates the steps taken in file

 

3AJohn F. Vinsonhaler, John M. Hafterson, Stuart W. Thomas, Jr.
(editors), Basic Information Retrieval System Technical Manual (East
Lansing, MiChigan: Information Systems Laboratory, College of Education
Michigan State University, 1970), V, 901-956.

 

 

INDEX
REFORMATTING
PROGRAM
S (10)

 

235

I it I <— From Figure 7A ——>- Y“,

 

 

 

 

 

COMPUTER
CONTROLLED

TYPESETTING

S (12.1)

ABSTRACT
BEFORMATTING
PROGRAM
S (11)

 

 
  

 

 

 

  
  
     
 

PHOTO COPY
TO PRINTER

s (12.4)

  
 

COMPUTER
CONTROLLED

TYPESETTING

S (12.3)

 

 

 

     
 

FINAL
PROOFREAD-

ING
s (12.2)

 

 

 

.EDITOR'S
COPY TO
PRINTER

S(12.5)

  

FILE PROCESSING FOR EXCEPTIONAL CHILD EDUCATION ABSTRACTS

 

    
   
   

PRINTING 6
BINDING

s (13)

 

 

 

I

ECEA

OP (3)

FIGURE 8A

PHOTO TYPE

  
 

 

236
processing to generate the journal Exceptional Child Education Ab-

stracts. In the overview of the Information Center's major activities
this would occur as part of the predefined process PP (3) labeled "file
processing." With slight modification the steps illustrated in Figure
8A could be used to generate a variety of products some of which will

be discussed in a later section.

Step 10 and Step 11 - Preparation of Input for Computer Typesetting
In Step 10 an index reformatting program reads the printed indexed file
tape (PIFT), adds coded information to be used in computer typesetting,
and punches out a deck which will be utilized in computer typesetting.
In Step 11 an abstract reformatting program reads the IFT, selects all
or portions of specified abstracts from the tape and provides a deck of
punched cards in the fOrm that will be used in computer typesetting.
The program used in this step does not change the abstract in any manner
except to delete portions which are not to be printed; however, it would
be possible to have a program restructure the abstract to fit a new for-

mat if needed.

Step 12 - Preparation of Copy fer Printers The input of punched

 

cards from steps 10 and 11 are used by a computer program run on an IBM
1130 to punch a paper tape which is input for a phototypesetter. The
paper tape contains information concerning the various type faces that
are to be used, the width of the line, and how the line is to be set.
The camera-ready offset copy generated in Step 12 is given a final
proofreading in Step 12.2, corrections needed are reset by the photo-
typesetter in Step 12.3, and the corrected copy is prepared in page

format and sent to the printer in Step 12.4.

237

In Step 12.5 copy which is not computer generated (ads, instruc-
tions about how to use the journal, etc.) is prepared by the editorial

staff and sent to the printers to be merged with the computer-generated

copy.

Step 13 - Printing and Binding In Step 13 the camera-ready pages

 

provided by the computer-controlled phototypesetting and the additional
copy provided by the editor are used to prepare the offset plates. ECEA
is then printed, bound and sent to the Council for Exceptional Children

for distribution.

Processing Information Requests

 

The major objective of any information center is to disseminate
information to its users in a form that will be most effective. An
examination of the overview Of the Center's activities found in Figure
3A shows how the various inputs and outputs of the Center revolve
around this activity designated PP (4). Even the evaluation activity
described in Figure 4A obtains information from this step and exists
solely to determine how the total procedures may be modified to more
effectively carry out the information processing and dissemination
procedures.

The Information Center processes two major categories or requests,
(1) requests made by CEC-ERIC staff members involved in generating new
documents or organizing information so that it may be more effectively
used by those outside the Center; and (2) requests made by a variety of
individuals outside the Center. Included in the various types of users
that are not on the CEC-ERIC staff are: '(1) educational administrators

and decision makers, (2) federal and public agencies, (3) parents,

238
(4) psychologists, (5) public Officials, (6) research and development

specialists, (7) social workers, (8) special education supervisors and
consultants, (9) staff members of professional organizations, (10)
students, (11) teacher educators, and (12) teachers.

Often the requests from users outside the Center can be answered
by documents which have been prepared by the CEC-ERIC staff or reprints
of CEC publications; however, when needed the Center has a set of
powerful computer search programs to aid in answering difficult ques-
tions. Figure 9A illustrates a predefined operation used at the Center
for computer searches which will later be referred to as Predefined
Process PP (9). In this Figure an information request IP (3) is first
translated into a computer searchable question by one of the CEC-ERIC
staff members processing user requests, SB (5). Next the question is
read by the Hescription Eile Search Erogram (DFSP), which uses informa-
tion stored on the Hescription file Iape (DFT) to determine which docu-
ments are most closely related to the question. The results of the
search with instructions concerning format of output desired (a list of
access numbers or a computer printout of the total document surrogate)
is placed on a question file tape or disk which is read by the informa-
tion Sile Hetrieval Erogram (IFRP). If the full text of the document
surrogate is requested, IFRP obtains this information from the ihforma-
tion file Iape (IFT). If not, the access numbers and original questions
are output to a line printer or a temporary storage device OP (6). The
total search operation excluding the information request IP (3) and the
output generated OP (6) will hereafter be referred to as PP (9). De-

tailed information concerning the various searching alternatives avail-

239

 

INFORMATION

REQUEST

   

IP (3)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

REPARATION
OF SEARCH DFSP
QUESTION
SB (5) SB (6)
IFRP
QFT
SB (7)
OP (6)
FIGURE 9A

A COMPUTER SEARCH - PREDEFINED PROCESS 9

240
able as a part of DFSP can be obtained from the BIRS documentatiOn.

4A

The overall procedures used in answering an information request
are outlined in Figure 10A. When a request is received IP (3) it is
examined to determine if it is relevant to the information stored at
the Center (D6) and if not a letter is sent to the person requesting
information stating why the Center cannot process the request. If the
request concerns the information contained at the Center, the decision
(D7) is made whether it can be best processed by a computer search or
by a hand search.

If a computer search is indicated it follows the procedures indi-
cated in PP (9). (See Figure 9A for detail.) The output from the com-
puter search OP (6) is edited 5 (15.1) to determine if it is meaningful
and depending on the nature of the request, items that are inappropriate
may be removed. If after editing it was determined that the search was
successful a report of the search results is sent to the user, and
statistical information about (1) the type of question, (2) the type
of individual requesting information, and (3) the type of information
sent is transmitted for processing 8 (16). This information is used to
assist in providing quarterly reports to ERIC and in determining how
users may be served better.

If Decision 7 indicated that a hand search was best, it is deter-
mined if there are Obvious documents available that can be sent, if a

hand search of ECEA indexes is most appropriate, or if there is a selec-

ted bibliography with an index reference that could be sent (Step 14.2).

 

4AJohn F. Vinsonhaler, John M. Hafterson, Stuart W. Thomas, Jr.
(editors), Basic Information Retrieval System Technical Manual (East
Lansing, Michigan: Information Systems Laboratory, College of Education
Michigan State University, 1970), I-XII.

 

241

 

 

 

 

 

       
   
   
    

 

 

 

 

 

 

 
  
   
 

  

 

 

 

 

 

 

 

 

      
    
   

  

USER REPORT TO
REQUEST USER
IP (3) 0P (5)
OUTPUT OF s (14.1) s (14.1)
COMPUTER COMPUTER DO APPRO-
SEARCH SEARCH PRIATE HAN
OP (6) PP (9) SEARCH
5 (16)
S 15.1
EDIT ) STATISTICAL Seéii.2) HXNDPSEARCH
DATA-USE FOR
OUTPUT
EVALUATION OUTPUT
? YES YES ?
SEARCH SEARCH
SUCCESSF @ SUCCESSFU
0(3) (8)
NO, NO
5 (14.2) REPORT TO 3 (14.1)
s (15.2) USER s (15.1)
HAND COMPUTER
SEARCH OF (5) SEARCH

 

 

*CS also connects to evaluation procedures PP (S) of Figures 5A and
6A. The connector is not shown at the figures.

FIGURE 10A

INFORMATION REQUEST PROCESSING

242
The information gathered from the hand search is edited to determine

which information is most appropriate 8 (15.2) and a decision is made
D (8) as to whether or not the search was successful. If it is deter-
mined that the search was successful, the report and statistical data
are processed in the same manner as in the computer search.

If it is indicated at D (8) in either a hand search or a computer
search that there is not sufficient relevant data to warrant sending a
report to the user, an alternate search method is attempted--a hand
search if a computer search was first done, or a computer search if a
hand search was first attempted. The results of the alternative search
methods are edited, the statistical information processed and a report
is sent to the user. If the second search was no more successful than
the first, the report to the user may not contain documents but merely
state that the Center was unable to find information relevant to the
user's question.

The statistical data collected and processed as part of Step 16 is
used by staff members to help determine what types of new documents
would be most valuable to users and, if appropriate, these are developed

by Center staff or commissioned to experts outside the Center.

Selective Publication

 

The manner in which the files are prepared for the CEC-ERIC Infor-
mation Center not only makes it possible to create new subfiles, but to
publish these subfiles. Thus if there are a number of requests that
could be answered by using the same documents it is possible to directly
publish these documents using computer typesetting and a very inexpensive
offset process. As of August 1971 the Council fOr Exceptional Children

has 59 separate bibliographies which have been published in this manner.

243
Step 17 - Selection of Topical Subfiles Figure 11A illustrates

 

the procedure used for creating the selected bibliographies. In Step
(17.1) statistical information which has been gathered from user
information requests and processed in Step (16) of Figure 10A is ana-
lyzed to determine what bibliographies will be of greatest use. Step
17.2 involves a computerized search which generates a list of access
numbers that are used to locate abstracts in ECEA so they may be exam-
ined (Step 17.3) to determine their relevance to the selected topic.
All documents that are relevant to the type of information desired
are selected for the Special ihformation file Iepe (SIFT) generated in
A (17.4) and the Special Hrinted index file Iape (SPIFT) generated in
S (17.5). As the Center's holdings increase the search question is
rerun on the new holdings and the edited results added to the SIFT and
the SPIFT. These two tapes are used as the basis for continued updating

of the printed bibliography in this topical area.

Step 18 - Selection of Abstracts to be Printed While it is

 

possible to publish all of the relevant document surrogates on a special
topic, as the file grows larger, even the subfiles contain more docu-
ments than can be published and distributed at a reasonable cost. For
this reason it has been determined that bibliographies which are given
away as answers to requests for information will contain no more than
100 abstracts. The criteria by which 100 or less abstracts are selected
from sometimes a thousand or more relevant abstracts are:

1. Availability

2. ReCency

3. Information value

244

a From Figure 10A

3 (17. 2)
COMPUTER

SEARCH
PP (9)

 

 

 

 
  

    

 
 

s (17.1) s (17.3)

  
   

  
 

  

IP (3) EDIT

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S (19.1) S (18) S (91.3)
PIP EDIT IFMP

S (19.2)

REFORMAT

 

 

 

 

  
 

  

COMPUTER
TYPESETTING
PRINTING
5 BINDING

 
 

S (20) PP (10)

 
   
  
 

 

   

SELECTIVE
PUBLICA-
TION

 
   
 

FIGURE 11A

PROCEDURES FOR PROCESSING SELECTIVE PUBLICATIONS

 

245
4. Author's reputation

5. Classical contentSA

Step 19 - Preparation Of Input for Computer Typesetting In Step

 

(19.1) the abstracts selected in S (18) are indexed by EC numbers; in
Step S (19.2) cards of the index are punched for use in phototypesetting,
and in Step 8 (19.3) cards of the selected abstracts are punched for use

in phototypesetting.

Step 20 - Computer-Controlled Typesetting Step 20 consists of

 

computer-controlled phototypesetting, the preparation of offset plates,
printing, and binding. This step is indicated as PP (10) and is almost
identical to the procedures used for the printing of ECEA.

The bibliographies and their indexes provide a powerful tool that
is used in answering about 55% of all information requests. By having
a selected topic with an index to that topic it is often possible to
answer search requests by using a single item in the index of a special
bibliography. The bibliography can be sent with the particular index
term or terms circled and a covering letter indicating that the user
should look at the abstracts specified by the circled indexing terms.

If the 100 abstracts in the printed bibliography contain too few
to answer the specific search request, the special printed index gene-
rated in S (17.5) can be used by the person answering user requests to
find additional items from the total subfile. Often when using the
index of a selected bibliography an individual can obtain in less time

the same results as a computer search.

 

5ACEC Information Center, Educational Resources Information Center,
Newsletter, June 23, 1971, p. 2.