MULTI -TRA|T MULTI - METHOD STUDY 'OF ADULT
TEMPERAMENT

l
P Thesis for the Degree of M. A
' MICHIGAN STATE UNIVERSITY
HOWARD MARK ERMAN
1977

 

vuu

 

.....

(:5 (0,; 51;) /

ABSTRACT

MULTI-TRAIT MULTI-METHOD STUDY OF ADULT TEMPERAMENT

By

Howard Mark Erman

Though child temperament has been repeatedly studied over
the last l5 years, there have been few studies of adult temperament.
Studies demonstrating temperament influences on adult behavior are
necessary for any general theory of temperament. However, before
such evidence can be developed, two interrelated problems must first
be met: (l) there must be agreement on what traits can be called
temperaments; and (2) there is a need for an adult temperament
scale firmly grounded in a theory of temperament. An adult temperament
scale is an essential research tool. This study addresses these two
interrelated problems.

Temperament as defined by Allport (l96l) and elucidated by
Buss and Plomin (l975) leads to five theoretical guidelines for an
adult temperament scale: (1) temperament categories should be com-
patible with child development research; (2) the categories should
be based on empirical evidence of long-term stability in adults;

(3) the categories should reflect style rather than content; (4) the

categories should include broad personality dispositions rather than

Howard Mark Erman

specific acts; and (5) the categories of temperament should be inde-
pendent of one another.

Early attempts at studying temperament, particularly Sheldon
(l942), were riddled with methodological errors which later invali-
dated many of the conclusions. The more recent studies, particularly
Thomas et al. (l963, l968), have been successful because they estab-
lished careful methodological controls. Many of the precautions in
the successful studies are part of normal test construction precautions,
such as running careful reliability checks. However, some precautions
were unique to the problem of measuring temperament. By contrasting
the failed studies of temperament with the successful studies, two
empirical guidelines for an adult temperament scale can be derived:
(6) the unit of temperament analysis, namely a measure of the presence
or absence of stable behavioral tendencies, should differ from the units
of the questionnaire data; and (7) the best units for questionnaire
data are neutral and concrete descriptions of common, everyday beha-
viors. In the following two examples, the second meets these empiri-
cal guidelines while the first does not:

Example 1. I tend not to be impulsive.

Example 2. I do not run out of toothpaste at home
because I keep a spare tube on hand.

Among currently available adult temperament measures, the only
test that meets the five theoretical criteria--the EASI-III Temperament
Survey developed by Buss and Plomin (l975)--fails to meet the empirical
guidelines. Its title is an acronym for its four categories of teme

perament: Emotionality, Activity, Sociability, and Impulsivity.

Howard Mark Erman

A new temperament scale, the Temperament Scale-Erman or TS-E,
was specifically developed to meet all seven temperament scale guide-
lines. Its four categories of temperament are the same as the EASI-
III but the definitions of these categories are modified to reflect
the work of Bronson (l969, 1971). Existing scales and the descrip-
tive work of Buss and Plomin (1975) provided some items; however, most
items were created in small group meetings where individuals were given
a definition of a temperament category, asked to name people they knew
who might be at one of the extremes of the temperament continuum, and
then asked to cite the specific behaviors that led them to categorize
the people they had named. Sexually biased or socially desirable
itesm were eliminated. The resulting questionnaire used 20 items
per scale, all forced-choice, true-false items. Special instructions

were developed to control for cases of the impossible forced choice.

Subjects and Procedure

A multi-trait multi-method study of the four temperament
traits was conducted; however, the entire multi-trait multi-method
matrix was not generated because it would have placed excessive time
demands on subjects. Multi-trait refers here to the four temperament
traits studied.

The first method was a reliability study of the Temperament
Scale-Erman. This study was primarily designed to see if the four
scales were independent; independence would demonstrate discriminant
validation. In addition the internal reliability of each scale was

measured. The reliability study of the 80-item TS-E used 7l male

Howard Mark Erman

and lllfemale students enrolled in an undergraduate Abnormal Psychol-
ogy class at Michigan State University.

The second method was a peer-nomination study. This was
used to test criterion validity of the EASI-III and of an expanded
(25 items per scale) TS-E. A group of advanced undergraduate students,
labeled "seekers,“ nominated two "subjects" whom they considered as
high and low, respectively, on a given dimension of temperament. If
seeker perceptions were matched by subject test scores, this would be
a form of convergent validity. Different seekers were used for each
temperament scale: 25 seekers for Activity; 7 for Emotionality;
ll for Sociability; and l2 for Impulsivity. Each scale was analysed
by a matched t-test. In addition, a two-way within subject ANOVA was
conducted; it was based on two levels of subjects (high, low) and two

tests (TS-E, EASI-III) for each seeker.

Results or Finding§_

 

The reliability study showed low correlations between indi-
vidual TS-E scales, ranging from -.13 to .08. This is a form of
discriminant validation: these low correlations indicate that the
four scales of the TS-E measure independent personality characteris-
tics. Internal reliability, as measured by coefficient alpha, ranged
from .44 to .60 for the TS-E scales. These reliabilities are high
considering the relatively short length of the TS-E scales (20 items),
the true-false (0,l) nature of scale items, the use of behavioral
rather than attitudinal items, and above all the fact that this was
the first time the TS-E was used.

Howard Mark Erman

For the criterion validity study, the following findings were
significant (p < .05).

l. Criterion validity, in the form of differentiating high
nominated subjects from low subjects, was established for
all four scales of the Temperament Scale-Erman.

2. Criterion validity was also established for the following
EASI-III scales: Impulsivity, Sociability, Activity.

3. When a TS-E scale is used in conjunction with an EASI-III
scale, all four temperament scales were able to differen-
tiate high subjects from lows.

4. The TS-E Impulsivity scale was better than the EASI-III
at discriminating high nominated subjects from lows.

5. Scores were higher on the TS-E Emotionality scale than on

the EASI-III Emotionality scale.

Implications and Conclusions

 

This study demonstrated the existence of four independent
temperament traits. It also established criterion validation for two
paper and pencil tests which meet the theoretical guidelines for tem-
perament. The two tests thus provide a way of solving the two inter-
related problems of (l) defining and (2) measuring adult temperament.

The TS-E, which meets the empirical guidelines, is already
better than the EASI-III in differentiating high nominated subjects
from low subjects. The psychometric properties of the TS-E can be

improved by lengthening each scale and weeding out poor items. Ways

Howard Mark Erman

to improve the TS-E are outlined and important future validity tests
of the TS-E are presented. If an improved TS-E passes these validity
tests, then a vast range of adult temperament issues becomes open to

simple and efficient study.

MULTI-TRAIT MULTI-METHOD STUDY OF ADULT TEMPERAMENT

By

Howard Mark Erman

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

MASTER OF ARTS

Department of Psychology

1977

Copyright by

HOWARD MARK ERMAN
T977

I would like to dedicate this thesis to my wife, Mary Corcoran.
Her understanding and support, but above all her love, were central to

this much as they are central to so many other things in my life.

ii

 

[ Il‘ll‘rl‘ll‘l‘l‘ {[ul‘l Ill!“ {‘1‘ lll‘lllnll‘lll (1|. [ [I A [II I [I

ACKNOWLEDGMENTS

The completion of this project depended on help from a large
number of people. Professor Gary Stollak, the chairman of my commit-
tee, first suggested to me that there was a need for a new adult
temperament scale. He remained throughout the project a source of
encouragement and sound advice. My original interest in temperament
can be traced back to one of the best classes I have had in graduate
school, a developmental seminar run by Professor Ellen Strommen. I
remembered again why I thought of Ellen as a superb teacher when I
read the useful comments she wrote about this project.

Professor Lawrence Messe suggested the method for analysing
the validation study. He helped me interpret the data, steered me
back to the central issues of my research when I wandered off into
the kinds of by-ways where novices are apt to stray, and he provided
useful comments on the final draft. But above all I want to thank
Larry for his faith in me and in the value of this project; at
times when I despaired of ever succeeding or completing anything,
he provided a personal interest which carried me through. This
concern was invaluable.

Professor Charles Hanley, though the last member to join my

masters committee, was certainly first in terms of contributions to

 

Illl-‘l‘lll‘l‘llll‘l‘ll‘ll‘ll‘ll ltlt‘nIIlllll‘llll‘ll‘Ilul‘I' AA I

this research. When I started this project, I knew very little about
test construction; Professor Hanley was an expert, but more importantly
for me, a patient expert. He convinced me to use true-false (0,l)
items; he alerted me to dangers such as social desirability factors
in the wording of items; he helped me to interpret the reliability
study results; and, most importantly, he suggested the validation
method used in the study. Professor Hanley never let me lose sight
of the major issues of the study or of the theoretical meaning of

its findings. He also suggested the title of the thesis. All the
members of my committee were accessible for my questions, but as

this list of contributions suggests, it was to Professor Hanley that
I turned most frequently. I found him stimulating, supportive, and,
though he might be among the last to acknowledge this, warm and
friendly.

My wife Mary handled all the computer work in this thesis.
She also handled me, patiently listening when I provided long and
tedious explanation of what I was doing, and providing me with both
technical and personal assistance.

I am particularly thankful to four undergraduates who pro-
vided invaluable time and assistance at each step of the project.
Gayle Baugh, Gil Simon, and Fred Simon joined at the very beginning
of the project and worked all winter quarter. They made critical
contributions in deciding what the four categories would be, they
helped pull useful items from old scales, and they helped run the
small groups devoted to writing new items. We then met as a group
to review every item to see if the item was apprOpriate for a given

iv

Al'l‘ II II I: ll Illllilll‘l-M‘I‘ Ill: ‘lll lf‘ {

. l III‘ I A All.

temperament, and to eliminate any item which seemed sex-biased or

biased by a social desirability factor. They also helped run the

reliability study. In the spring quarter, Fred Simon was replaced

by Geri Sims. The validation study took all quarter to run and

Geri, Gil, and Gayle gave countless hours to insuring its success.

In terms of their intelligence and their dedication to research,

these four undergraduates could match most graduate students.
Finally, I want to thank the large number of personal friends

who attended small group meetings devoted to writing new items for

the scales. These included Linda Cohen, David and Peggy Hayes,

Andrew and Paula McNitt, Professor Brian and Sally Silver, Meredith

Taylor, and once again, my wife Mary. Since writing items is rarely

an exciting prospect, I assume they helped out of personal friendship;

I am grateful.

TABLE OF CONTENTS

LIST OF TABLES ........................

Chapter
I. STATEMENT OF THE PROBLEM AND REVIEW OF THE LITERATURE .

Theoretical Guidelines for Adult Temperament

Questionnaire ...................
Category Determination in Traditional Measures

of Adult Temperament ...............
Sixteen Personality Factor (16 PF) of Cattell

and Eber (l962) .................
California Psychological Inventory (CPI)

(Gough, 1964) ..................
The Guilford-Zimmerman Temperament Survey

(Guilford & Zimmerman, 1949) ..........

Thorndike Dimensions of Temperament (Thorndike,

1963) ......................

Thurstone Temperament Schedule (Thurstone, 1953)
Johnson Temperament Analysis (Johnson, 1944)

Empirical Guidelines for an Adult Temperament
Questionnaire ...................

EASI-III (1975): A Temperament Scale That Meets the
Theoretical Guidelines ..............

II. DEVELOPMENT OF THE TEMPERAMENT SCALE-ERMAN (TS-E) . . .

Choosing the Categories ...............
Writing the Items ..................
Format of the Questionnaire .............

III. A MULTI-TRAIT MULTI—METHOD STUDY OF TEMPERAMENT . . .

The Reliability Study ................
Method ......................
Results ......................

The Validity Study .................
Method ......................
Results ......................

Summary of Results .................

vi

Chapter Page

IV. DISCUSSION ....................... 57
TS-E or EASI-III? ................... 58
Further Development of the TS-E ............ 61
Behavior Validation of the TS-E ............ 63
APPENDICES
A. TEMPERAMENT SCALE-ERMAN (TS-E) ............. 66
B. TEMPERAMENT SCALE-ERMAN (TS-E): BY TEMPERAMENT
CATEGORY ....................... 73
LIST OF REFERENCES ....................... 80

vii

LIST OF TABLES

Table Page

1. Definitions of the Four Categories of Temperament in
the Temperament Scale-Erman (TS-E) ........... 36

2. Reliability Study: Inter-Scale Correlations for Male,
Female, and Pooled Subjects on TS-E .......... 45

3. Reliability Study: Means, Standard Deviations, and
Internal Reliability for TS-E ............. 46

4. Means and Standard Deviations of Reliability and
Validity Studies .................... 52

5. Results of Matched T-Tests for Individual Scales,
Validity Study ..................... 53

6. Results of Two-Way Analysis of Variance, Validity Study . 55

viii

er I CHAPTER I
. [/‘I
STATEMENT OF THE PROBLEM AND REVIEW OF THE LITERATURE

While the last 15 years has seen an increasing study of infant
temperament (Birns, Barten, & Bridger, 1969; Carey, 1970; Freedman &
Keller, 1963; Graham, Rutter, & George, 1973; Thomas, Chess, Birch,
Hertzig, & Korn, 1963; Thomas, Chess, 8 Birch, 1968; and Wilson &
Lewis, 1974), there have been few studies of adult temperament. The
study of adult temperament could provide critical evidence for any
general theory of temperament and its influence on behavior. Should
evidence for consistency of specifiable characteristics of tempera-
ment emerge, a broad range of issues would then merit further inves-
tigation including the effects of temperament on marital interaction
and satisfaction, on job choice, on child caregiving behavior, and
on both treatment and therapist of choice in clinical setting.

In part the lack of adult temperament studies is due to the
lack of agreement on what traits can be labeled "temperaments" and
also due to the lack of an adequate instrument to measure these
temperaments, particularly in questionnaire form. These two problems--
defining the categories and creating a questionnaire measure--go hand
in hand. Of course evidence of adult temperament should not be lim-
ited to measurement just through paper and pencil tests; adult tem-

perament should also be measurable via naturalistic observation and

1

in laboratory measurement. However, the efficiency of a self-
administered paper and pencil test would provide an essential
research tool for the study of adult temperament. Guidelines for
constructing such an adult temperament scale can be drawn from a
theoretical consideration of temperament and a review of past
attempts at the empirical study of temperament.

Theoretical Guidelines for Adult
Temperament Questionnaire

 

While several scales with the word "temperament" in the title
now exist, these and related scales, with one exception, are concep-
tually flawed, reflecting an ambiguity in the common Operational

1 To understand the problem in available

definitions of temperament.

tests, consider Allport's (1961) definition of temperament, often

cited as the definition most accepted by psychologists (Buss, 1973).

Temperament refers to the characteristic phenomena of an

individual's nature, including his susceptibility to emotional
stimulation, his customary strength and speed of response, the
quality Of his prevailing mood, and all peculiarities of fluc-
tuation and intensity of mood, these being phenomena regarded
as dependent on constitutional makeup and therefore largely
hereditary in origin (p. 34).

Buss and Plomin (1975) point out three basic components in
this definition: (1) temperament involves style rather than content;
(2) temperament includes only broad personality dispositions rather
than extremely specific or narrow acts; and (3) temperament has an

hereditary component.

 

1Of course this flaw might only invalidate these tests for
studies of temperament; they could be reasonably valid for studying
other facets of adult personality.

Most scales of adult temperament reflect only the first two
components. Such scales focus on temperament as the h9w_of beha-
vior, the style or manner in which a person acts, as contrasted with
the what_or why of behavior, ability, or motives, respectively.
These scales also try to tap dispositions or innate patterns as
broadly defined rather than in specific acts. However, adult tem-
perament scales have not been constructed with an hereditary com-
ponent in mind. There is one exception--Buss and Plomin's EASI-III
scale (1975); its title is an acronym for the four categories of
temperament it measures: Emotionality, Activity, Sociability, and
Impulsivity.

Buss and Plomin (1975) suggest that heritability is a criti-
cal component in order to distinguish temperament from other features
of personality which derive from experiences; they suggest that any
personality disposition be labeled temperament only if it contains
an hereditary component. While beginning with inheritance as the
critical criterion for labeling a personality disposition as being
a temperament, they then suggest that heritability implies four addi-
tional criteria, namely: stability during childhood, retention into
maturity, adaptive value, and presence in our animal forebearers.

Based on this reasoning, any scale of adult temperament
should include as categories only categories that meet the following
guidelines:

1. The temperament categories should be compatible with

child development research.

2. The categories should be based on empirical evidence of

long-term stability in adults.

3. The categories should reflect style rather than content.

4. The categories should include broad personality dis-

positions rather than specific acts.

There is one additional theoretical guideline that is inde-
pendent of the above line of reasoning: if there are different cate-
gories representing different aspects of temperament, each category
should be relatively uncorrelated with other categories. Not all
categories need be totally uncorrelated all of the time; after all,
there may be certain personality types or certain age periods in
which two categories of temperament become moderately correlated
with one another. However, if temperament is to be a meaningful
concept, categories neither too large nor too small are needed. The
heretability guidelines, #1 and #2 above, insure that categories are
not too large. But unless categories are independent of one another,
they can be infinitely subdivided into increasingly more minute
components. When Sheldon (1942) first grappled with temperament, he
assumed that potential temperaments could be found in any English-
1anguage adjective that was generally applied to people; hundreds of
potential temperaments had to be considered. If there were really
hundreds of temperaments, then the concept of temperament becomes
so unwieldy as to be useless. Accepting categories only when they
are uncorrelated with one another insures a meaningful level of

analysis. Hence a fifth guideline:

5. The categories of temperament should be independent of
one another.
Category Determination in Traditional
Measures of Adult Temperament

While traditional scales of adult temperament meet the last
three guidelines, they rarely meet the first two. Instead, as will
be seen below in a review of some of these scales, the choices of
temperament categories are based on factor analysis, criterion analy-
sis, or some combination of the two.

Sixteen Personality Factor (16 PF)
of Cattell and Eber (1962)'

 

 

The "16 PF" scale was constructed using factor analysis.
Cattell (1951) has argued that the factors he derived are the
"primary source traits" or building blocks of personalities; Cattell
also suggested that the "natural history" of these source traits
should be investigated, including their life course and stability.
However, many of the categories are clearly incompatible with child
development temperament studies, so that even if long-term stability
were demonstrated, the factors might still be primarily or entirely
learned. Factor H, shyness, is said to be "largely hereditarily
determined." However, while the high end of Factor H is said to be
"shy, withdrawing, cautious"--all conceivably hereditary--this person
is also said to "usually have inferiority feelings,“ which are
unlikely to be inherited. Other factors appear to have an entirely
learned component. Consider the following three:

Factor G--Expedient (evades rules, feels few obligations) vs.
Conscientious (persevering, staid, rule-bound); Factor N--

Forthright (natural, artless, sentimental) vs. Shrewd (calcu-

lating, worldly, penetrating); Factor Q]--Conservative (respect-

ing established ideas, tolerant of traditional difficulties)

vs. Experimenting (critical, liberal, analytical, free-thinking)

(Cattell & Eber, 1962).
These categories could only evolve by analysing the behavior of highly
socialized individuals; they would be meaningless in the study of
infants. The technical title for Factor 6 is weaker superego strength
vs. stronger superego strength; by any commonly accepted definition,
superego, even if relatively stable, is based on experience rather
than heredity.

California Psychological Inventory
(CPI) (Gough, 1964)

 

 

Although the CPI was designed to define and then measure both
social and personal descriptive personality constructs of a wide
relevance, unlike the 16 PF scale, no claim was made that the 18
scales are the basis for the measurement of the total personality.
The scales were constructed by an empirical technique, which followed
the following three steps: (1) a criterion dimension is defined;

(2) items which appear to bear on the defined criterion are then
written; (3) the items are then validated on a relevant criterion
group independently chosen, usually by a peer nomination method.

However useful the CPI may be for measuring adult personality,
it is simply not useful as an adult temperament scale. First, its
scales are not independent; Socialization and Dominance correlate
+.65; Socialization and Self-Control about +.50; the Responsibility
scale usually correlates with the Capacity for Status scale about

+.35. Second, it shows only modest stability over even 1 year.

Test-retest correlation over a year's time with high school students
ranged from .44 to .77 for females and from .38 to .74 for males.
Eleven of the scales for females and 14 of the scales for males had
test-retest correlations below .70. The most serious defect is that
the scales have no relevance to the study of child temperament
research: the purpose of the Tolerance scale is "to identify per-
sons with permissive, accepting, and non-judgmental social beliefs
and attitudes," while the purpose of the Sense of Well Being scale
is "to identify persons who minimize their worries and complaints,
and who are relatively free from self-doubt and disillusionment."
The scales consist in part or in their entirety of learned beha-
vioral styles.

The Guilford-Zimmerman Temperament
Survey (Guilford & Zimmerman, 1949)

 

Using a strictly empirical approach, namely a factor analysis
of items in a personality inventory, Guilford and Zimmerman obtained
14 factors of temperament; each factor was then divided into smaller,
homogeneous units by a "rational approach" in which the items for a
given factor were further regrouped using a combination of content
inspection and statistical correlation.

The same criticisms which apply to Cattell's research apply
here. Since the factors are based neither on a theory nor on any
external longitudinal study, there is neither evidence for enduring
stability nor a theory to suppose they should be enduring. Since
the factors are derived by an empirical study of adult responses to

adult test items--test items designed to measure features of adult

I I I A . I I! III II]

personality but having no developmental theoretical framework--there
is no reason for the categories to have any relevance to the study
of child temperament. Child temperament categories should at least
have the theoretical possibility of preceding learning even if in
adults they later show some additional learned component. The issue
is one of degree: with too large a learned component the char-
acteristic could no longer be labeled a "temperament" even if it
shows life-long stability. An individual's native language, such as
German rather than English, has life-long stability for most people;
they retain their accent even when they have left their native land
and rarely speak their native language. However, this primary lan-
guage, being entirely learned, could not be considered a temperament.

Many of the categories used by Guilford and Zinmermann are
wholly or primarily based on learned experiences. The Objectivity
factor includes (1) egocentrism, (2) ideas of reference, (3) unwar-
ranted sympathy, (4) hypersensitivity. The Masculinity vs. Feminity
factor includes (1) fearfulness, (2) inhibition of emotional experi-
ence, (3) masculine vocational interest, (4) masculine avocational
interest, (5) disgustfulness, and (6) sympathy. All of these, with
the possible exception of fearfulness, would be entirely learned;
they are thus unlikely to be temperaments.

Even when a Guil'ford-Zinmermann factor might appear to
possibly be inheritable, it includes a subfactor which would be
socialized. For example, in the General Activity factor, liking
for action is a subfactor. Of course having adults show such a

learned subfactor (liking for action) is compatible with the primary

factor (General Activity) being a temperament, if the subfactor

does not comprise too large a component of the primary factor.

This learned subfactor would not be a problem if studies revealed
that in adults a learned subfactor later became attached to the
heritable factor: in the above case, adults who had inherited a
high active temperament might usually learn to accept their tempera-
ment and later report they "like action."

The learned subfactor is only a problem when the primary
factor is defined as including a learned subfactor component, as in
the above case: then there is no way of ever sorting out under what
circumstances the learned behavior is or is not associated with the
temperament. Intervening and antecedent variables cannot be
explored. Consider the following analogy: imagine an I.Q. test
where liking for reading was a subfactor of the test. Now it is
possible that most people with high native intelligence will also
show some degree of liking to read, while individuals with low
native intelligence will not show such a preference. The proposi-
tion that intelligence has some partial genetic component could
remain true even if liking to read were both entirely learned and
frequently associated with high I.Q. After all, school and families
tend to reward success in reading and punish failure, so individuals
with high I.Q. scores might learn to "like to read." However, high
I.Q. could be an antecedent variable for liking to read and liking
to read need not always be associated with high I.Q.: there are many
students in the public schools who do not perform up to their poten-

tial ability exactly because they do not like to read despite a high

IO

I.Q. If liking to read were defined as a component of 1.0. tests,
then the relation between the variables would be lost, since low
scores on liking to read would lower overall 1.0. scores. Similarly,
when liking for action is defined as part of General Activity, the
possibility of someone showing high General Activity but not liking
it is precluded from study.

Twin studies are the traditional means of measuring herita-
bility and when the scales listed above have been used in twin studies
investigating the heritability of personality features, the results
have been uneven (Vandenberg, 1967; Buss & Plomin, 1975). These
inconsistent findings are probably caused by the failure to explicitly
consider heritability when the scales of the above tests were defined,
so that even if a category could theoretically be inherited, the
heritable component in the scale is diluted by items in the scale
which clearly tap learned attitudes or behaviors.

Thorndike Dimensions of
Temperament (Thorndike, 1963)

While the Thorndike Dimensions of Temperament test has many
innovative features in format and structure, discussed later, the
hypothetical traits that it attempts to measure are based on the
factor analytic work of Guilford and Zimmerman, and the objections
to their work noted above also apply to Thorndike.

Thurstone Temperament Schedule
(Thurstone, 1953)
The seven categories of the Thurstone Temperament Schedule

were based on a reanalysis of the schedule of Guilford. At the

11

same time, interest and personality questionnaires were surveyed and,
after items relating to abnormal behavior and psychiatric categories
were eliminated, a file of several thousand itEms was accumulated.
By eliminating duplications and items that did not match the cate-
gories derived from the factor analysis, 320 items remained in a
questionnaire which was filled out by 198 adults. The 20 most
disciminating items for each scale were retained, for a total of
140 items.

The same problems noted for the Cattell 16 PF scale and the
Guilford-Zimmerman Temperament Survey also apply here.

Johnson Temperament Analysis
(Johnson, 1944)

 

 

The Johnson Temperament Analysis measures nine different
traits, which are defined as "a constellation of behavior patterns
and behavior tendencies sufficiently coherent to be measured and
effectively used." The nine were chosen because they seemed useful
to the areas where the test might be used, including vocational
counseling, marriage counseling, diagnosis, and criminology.

The traits are not independent: Depressive correlates with
Nervous at +.74, Critical with Subjective at +.72. A more serious
problem is that the definition of traits does not include long-term
stability as a criterion for choice; Nervousness, for example, "may
be a temporary condition brought on by the onset of much worry,
fatigue," etc. Finally, the traits are not relevant to child

development research.

12

Empirical Guidelines for an Adult
TemperamentgQuestionnaire

 

 

There have been two large-scale observational studies spe-
cifically aimed at the study of temperament. The first, a study of
adult temperament by Sheldon (1942), had serious methodological flaws
and has been generally discredited. The second, a study of child
temperament by Thomas et al. (1963, 1968), provided the method-
ology and the results which spurred many of the recent investiga-
tions of infant temperament. Contrasting the two studies to see
how the second avoided the problems of the first can provide addi-
tional guidelines for considering an adult temperament scale.

0n the assumption that any adjective generally applied to
peOple was a potential temperament, Sheldon began by sifting through
long lists of such adjectives. This labor led to 650 alleged tem-
perament traits, which, by assiduous sifting and classification,
were then reduced to 50. Each of these 50 alleged traits was con-
verted to a seven-point scale. Sheldon then began a systematic
study of some 33 male graduate students and academicians. Each
subject was extensively studied for 1 year, including 20 analytic
interviews plus observation of the subjects in their daily routines
and their social interactions. All ratings in this massive initial
study were done by Sheldon. Finally intercorrelations were run for

the entire series of traits--a total of 1,225 correlations.2 A

 

2This was done before computers were available. Sheldon's com-
ment on this immense labor is poignant: "The tedious element of the
job did not lie in hunting and finding traits, however, nor in hunting
subjects to be rated. . . . The pain lay in the statistical analysis
of large masses of data that accumulated” (p. 16).

13

search for patterns in these intercorrelations led to the discovery
of three basic factors; all traits within a factor had a positive
correlation of at least +.60 with other traits within the cluster
and negative correlation of at least -.30 with all other traits in
the other two clusters. 0f the original 50 traits, 20 met this
criterion: six traits in Group I, seven in Group II, and nine in
Group III; these groups were later renamed Viscerontonia, Somatotonia,
and Cerebrotonia, respectively. These 22 traits became the core of
the Scale of Temperament; however, 4 additional years were spent
revising and expanding this scale 7 or 8 times eventually to arrive
at 20 traits in each cluster which still met the initial correla-
tion criterion.

Sheldon's large-scale study of temperament repeated the same
methodology used to develop the scale: subjects had to be studied
for over a year in as many situations as possible and in addition
interviewed for no less than 20 separate l-hour sessions, preceded
by l- or 2-hour-long sessions devoted to gaining rapport with the
subject. In this study, Sheldon claimed to discover a relationship
between body type and temperament.

The Thomas et al. New York Longitudinal Study attempted to
trace the emergence of temperament patterns from birth in order to
uncover the relation, if any, between infant temperament and child
behavior disorders. Over a 12-year period, 141 children from 85
families were studied; the families were fairly homogeneous on most
socioeconomic status variables. Most temperament data came from

parent interviews beginning shortly after the infant's birth; later

14

additional data were collected from parent interviews, from direct
observations of the infant in school at least once a year, and from
observations made when the child was brought in for standard psy-
chological testing at ages three and six. By inductive content
analysis of parent interviews from the first 20 children, a total of
nine temperament categories were obtained. To rate infants on

these categories, a three-point scale was designed for each category.

While the Sheldon study was riddled with methodological
flaws, the New York Longitudinal study included four procedures
edesigned to insure objectivity, validity, and reliability of results.
Reviewing these four procedures, and constrasting them with Sheldon's
errors, provide additional guidelines for constructing an adult
temperament scale.

First, to avoid "halo" effects, Thomas et al. made sure that
different observers handled the phases of data collection for any
particular infant. It is the charge of distortions due to halo
effects that is at the heart of various critiques and studies which
have largely discredited Sheldon. The relation between body type
and temperament that Sheldon claimed to discover was actually an
artifact of having the same rater, usually Sheldon, rate subjects
on both body type and temperament. Even if the raters did not
deliberately try to match temperaments to body types, they may have
been influenced by their knowledge of the body type categories;
indeed Sheldon specifically says that a well-trained temperament
observer "should have the advantage of the constitutional analysis"

(p. 56). When other studies attempted to duplicate Sheldon's work

15

while having different people rate a subject on the two scales, the
correlations largely disappeared (Cortes & Gotti, 1965; Walker, 1962).
The second methodological safeguard of Thomas et al. was
periodic checks during the course of the study on inter- and intra-
observer and interviewer reliabilities. The only reliability study
which Sheldon reports being conducted during the course of the study
was a re-rating of 83 cases 1 year after the initial rating; though
the test-retest correlation was +.96, all the ratings were done by
Sheldon, suggesting confounding effects due to both the "halo"
effect and memory effects. Another reliability study of the instru-
ment was conducted by having a class of graduate students rate one
another on the three components of temperament; reliabilities ranged
from .17 to .94. Sheldon notes that the four best raters had an
average mean correlation of +.90 with Sheldon and +.86 with one
another. Of course these were not the raters used in his study;
had the four worst raters in his graduate class been the raters in
his study, the average mean reliability would have been about .27.
The third safeguard used by Thomas et al. was a careful
attempt to record both a child's first response to a stimulus and
all subsequent exposures until a stable pattern was clear. In other
words, developmental patterns were carefully scrutinized. Sheldon
could not exactly repeat this safeguard, for adults, after all, have
usually already had some exposure to most ordinary daily stimuli and
their patterns of response are already set. However, Sheldon did
not even consider either heritability or developmental possibilities

from infancy when he picked his initial 50 traits for study. Of

16

course his final theory, relating body type to temperament, is
highly developmental. But he claimed to build the theory from his
observations, and not to make the observations to test his theory.
The difference is critical. His three temperaments, later labeled
Viscerotonia, Somatotonia, and Cerebrotonia, were each initially
comprised of a total grab-bag of unrelated traits. Some traits in
a cluster were physiologically based; others socially based. The
cluster later called Viscerotonia began with six traits: relaxed
posture, love of physical comfort, greed for affection, deep sleep,
and need of people when troubled. If we ignore the later morpho-
logical theory and consider only these traits, there is no reason
to assume the social behaviors stem from the physiological rather
than the reverse. Perhaps a learned response causes all of these;
for example, these traits might be a variant of what psychoanalytic
theory calls the "oral" personality. 0r consider the cluster later
called Cerebretonia, which initially stemmed from correlation among
the following traits: restrained movements, fast reactions, socio-
phobia, inhibited social address, vocal restraint, poor sleep habits,
youthful intent and manner, and need for solitude when in trouble.
Perhaps individuals with youthful intent and manner are more

likely to be physically immature and hence self-conscious: all the
other traits may reflect being nervous and self-conscious. The
point is that when a grab-bag of unconnected traits is correlated,
there is no way of knowing a chronological sequence of connection;
temperament--implying hereditary or constitutional origins--is only

one possible explanation among many possibilities. Finally, the

17

fact that the 3 temperament clusters each eventually grew to include
20 traits instead of the original 5 or 8 provides no additional
evidence for an underlying temperament; these additional traits

are an artifact of the methodology, being largely but variants on
the original traits in the cluster. Thus, to the pleasure of diges-
tion is added a second trait called love of eating, and a third
variant called socialization of eating; to greed for affection there
is a second variant called orientation to people and a third called
need for people when troubled.

The first three safeguards in the Thomas et al. methodology
have some relevance to adult self-report scales. Self-report scales
of adult temperament should also be based on developmental categories,
as was argued earlier. Scales naturally should show respectable
reliability too. And while self-report scales do not have the prob-
lem of "halo" effects, they should be designed to minimize response
sets, such as social desirability (more on this below).

However, Thomas et al. developed a fourth safeguard and this
safeguard provides a particularly critical guideline in considering
adult temperament. Objectivity was sought by having interviews focus
on details of ordinary daily living, such as eating, sleep, play,
etc., with behaviors described in purely factual rather than judg-
mental terms whenever possible. The need for objective behavior
descriptions rather than judgments or attitudes when attempting to
judge temperament was already apparent to the earliest temperament
researchers. Kretchmer (1925), the first scientist to attempt a

temperament study, noted,

18

If we ask a peasant woman "Was your brother nervous and
peace-loving, energetic, etc.?" we shall often get a vague and
uncertain answer. If, on the other hand, we ask: "What did
he do when he was a child, if he had to go alone in the dark
hayloft?" or "How did he behave himself when there was a row
up at the pub on a Sunday evening?" then perhaps this same
woman will give us concise and unequivocal information, which
. . . bears the stamp of trustworthiness. . . . I have laid
particular stress on this point, that as much as possible
[questions] should be asked in this concrete manner and that
direct questions on characterology should only be used to fill
out the picture, to fill in the time, and to serve as control
questions scattered about among the concrete accounts (pp. 111-

2 .

Despite his weaknesses in so many other areas of methodology,
Sheldon also occasionally strikes this same note. In explaining the
trait called "Love of Privacy" he notes: ". . . Ignore any super-
ficial statement of verbal attitude. Most pe0ple §Qy_they like to
be alone. Study the individual's habits and his history" (p. 73).
This point is essential, as Thomas et al. (1968) also
noted:

The parent and teacher interviews focused on the details
of daily living during feeding, play, sleep, etc. Behavior
was described in factual descriptive terms with a concern not
only for what the child did but hgw_he did it. Statements as
to the presumed meaning of the child's behavior were considered
unsatisfactory for primary data. When such interpretative
statements were made by a parent or teacher, the interviewer
pressed for an actual description. Thus, to a parental report
that "the baby hated his cereal," or that "he loved his bath,"
the question was always posed, "What did he do specifically
that made you think he loved or hated it?" Similarly, if a
teacher commented that "this child always gets angry if he
doesn't get his way,“ she was asked to give several examples
with detailed descriptions of the manner in which the anger
was expressed. If a staff observer reported that a child "was
afraid to ask the teacher for help," she was instructed to spell
out in detail the incidents she had observed and describe the
behavior she had interpreted as "fear" (p. 15).

There are two key points to this methodological safeguard.

First the unit of analysis, namely the rating on an abstract concept,

19

which in this case is a temperament category, is different from the
unit of the primary data, which is a discrete objective behavior.
Second, the behaviors chosen are neutral descriptions of common,
everyday behaviors. Both points are also useful for an adult tem-
perament questionnaire.

The first advantage of separating the unit of analysis from
the unit of original data is that Thomas, Chess, and BirCh were able
to avoid alternative meanings offered by similar evaluative words
describing behavior. While one mother might say "hated his bath"
to mean the child sulked and showed no pleasure, a second might say
"hated his bath" to mean the child actively struggled to get out of
the water. Though both mothers initially described their child in
similar ways, it would be meaningless to say the two children had
the same styles of response to new stimuli. The same problem exists
in querying adults: three adults who answer yes to "I often feel I
am bursting with energy," an item from the EASI scale, might mean
three very different things. For one it might mean very physical
activities such as jogging a few miles, while for the second it might
mean intellectual activity, such as working through a difficult mathe-
matical or crossword puzzle, and for a third a great deal of small-
muscle physical activity, such as carving a miniature in soap, where
little physical energy is expended.

The second advantage in separating the unit of analysis from
the unit of the data is that the analysis should be relatively free
of attitude bias. Attitude bias could be either a very idiosyn-

cratic attitude or a general "response set" bias. An example of an

20

idiosyncratic bias would beaamother who herself hated giving someone
else a bath. Without deliberately distorting her response, she
might focus on trivial variations in the child's behavior as proof
that the child also "hated his bath." However, when the behaviors
of the child are themselves described, there may be no evidence that
this child "hated his bath" more than other children whose mothers
described them as "liking" their bath.

The same distortion process might stem from a more general
"response set," such as social desirability. If it is commonly
thought that a child who is responsive when held does so because
he has been well cared for, and if mothers like to think that they
do a good job caring for their child--both attitudes which might
reasonably exist--then mothers who say "my child is responsive when
being held" may be influenced by a social desirability response set.
Even though they are not deliberately lying these mothers might focus
on small behaviors which give credence to their views, behaviors
which are ignored as too trivial by mothers lacking this social
desirability response set for child responsiveness. The distortions
due to response sets or idiosyncratic value systems can be weeded
out by having the interviewer make sure the descriptions are broken
down into behaviors and then using the behaviors alone as the primary
data from which to extract judgments about responsiveness, reactions
to stimuli or whatever abstract temperament category is being con-
sidered.

In a temperament questionnaire, there is no interviewer

available to extract specific behaviors from either evaluative

21

statements or generalities. Instead the items themselves have to be
in the form of behaviors. The more general or abstract the behavior
description, the less likely a person is to know how to answer, and
hence the more likely he is to be influenced by attitudes or response
sets. It may be difficult to respond to a statement describing a
whole category of behaviors, such as this item from the EASI scale:
"I like to be busy all of the time." However, when the behavior is
very specific, people can probably decide whether in fact they do

the behavior, as in this item: "If I were going from the first

 

floor to the third floor in an office building, I would rather ride

 

an elevator than take the stairs."

The difficulty in responding to a general description of a
category of behaviors is further compounded when people are asked
to compare themselves to others. One person's judgment about where he
stands in relation to others might be quite different from another
person's judgments although both display the same behaviors. Again,
it should become easier to respond if the behaviors are specific
enough. While it might be difficult for someone to make a judgment
about this EASI item "I have fewer fears than most people my age,"
most people would know whether or not "I tend to be frightened
during loud thunderstorms."

Of course since we do not actually see the respondents per-
forming the behaviors which comprise the items of a questionnaire,
there remains the possibility that we are only tapping the respon-
dent's attitude toward the behaviors rather than an objective inven-

tory of his/her performance. Once again response sets might distort

22

the results. However, this can at least be partially compensated
for by choosing as behaviors neutral descriptions of common, every-
day behaviors. A yes-no response to everyday behavior should be
less likely to draw on a response set than a yes-no response to a
general description of a category of behaviors. If the items them-
selves are then carefully chosen so as not to be laden with emotional
or desirability meanings, then we at least reduce the probability
that we are tapping an attitude toward behaviors more than the beha-
viors themselves.

All of the above arguments about controlling for the effects
of attitude in response choices deal with cases where the respondent
is not deliberately altering his response. In other words, the
respondent is not deliberately lying. When the respondent is delib-
erately lying, then interviews provide no advantage over question-
naires. Both are at a disadvantage and the only way to discover if
a temperament is present is by actually observing an individual in
a host of different social and private situations--the arduous
methodology developed by Sheldon.

However, if the lying stems not from a general hostility
to the test-taking situation but rather from a dislike of the tem-
permanetal characteristics being measured, then once again there may
be an advantage to a temperament test containing only behavior items.
The advantage, which needs empirical proof, would be that subjects
taking a test consisting only of specific behavioral items may be
less likely to know exactly what the test is attempting to measure

than would subjects taking a test such as the EASI, which asks general

23

behavioral questions. This argument is a variation on the social
desirability response set argument but now applied to pe0ple who
deliberately change their answers according to what they would like
to be rather than to people whose responses to items where they lack
definite answers are unconsciously affected by their attitudes.
Consider Sociability as a temperament: on the EASI Temperament
Survey the five items which measure this temperament are:

I make friends very quickly.

I am very sociable.

I tend to be shy.

I usually prefer to do things alone.
I have many friends.

U'l-DCAQN-i
o o o o 0

When people read these items, they probably know that this is an
attempt to measure sociability (though once again this assertion
needs empirical verification). Imagine someone who is in fact not at
all sociable but hates himself for being a recluse; such a person
might deliberately distort his answers to the EASI scale. Now con-
sider the following items which are behaviorally more specific:

1. PeOple approach me to get acquainted before I approach
IhSEUld rather see a ball game alone than stay home

alone and watch the game on TV.

When I am happy I sometimes smile or say hello to pe0ple
I hardly know.

I enjoy telling my friends about an interesting experience.

«DOOM

Someone might answer these questions without immediately knowing that
they are designed to tap sociability. If their purpose is less mani-
fest, they are probably more likely to be answered truthfully by our
recluse who hates himself for being asocial. Empirical support of

this argument would involve two findings: (1) subjects have an

24

easier time guessing the purpose of a test when the items describe
classes of behavior rather than specific behaviors; and (2) knowing
the purpose of a test alters one's responses.

To make this argument clearer, consider the following
analogy. There are many people with high mathematical aptitude who
hate mathematics; this can be attested to by any psychology professor
teaching an introductory graduate statistics course. If we asked
such people, "Do you like mathematics?" they would probably answer
"No," and this would be quite true, since it reflects their attitude
toward mathematics. If we asked them, "Are you mathematical?" or
“Are you mathematically inclined?" they might again answer "No";
in this second case, their answer would be a false answer. This
could be simply demonstrated: if we gave them a mathematical apti-
tude test, a test consisting of mathematical problems, they might
perform well above the mean, which is to say they would perform mathe-
matical behaviors quite well. The analogy is to people who claim
sociability when asked abstractly but would attest to no sociable
behaviors when given a list of specifics.

There remains one additional and central argument for creating
an adult temperament questionnaire consisting of specific everyday
behaviors. If adult temperament is a meaningful concept, it should
actually manifest itself in objective behaviors or ordinary daily
life; if it is not so manifested, then either the concept lacks the
broad personality disposition which we noted earlier was part of
Allport's definition, or else temperament accounts for but a trivial

percentage of the variance in behaviors. It would be like people

25

with high mathematical aptitude being no better at arithmetic or
graph reading than people with low aptitudes; in this case, "high
mathematical aptitude" becomes a meaningless concept. Similarly,
everyone could probably rate themselves somewhere along a five-point
Likert-type scale ranging from "very fearful" to "very calm."
However, if in an everyday fear-producing situation, such as an
extremely violent storm, 10 pe0ple who rate themselves "very fearful"
show no more signs of being afraid than 10 pe0p1e who rate them-
selves "being calm," then being fearful is either a useless concept
or not a temperament. Since the actual display of behaviors such as
fearfulness is the key to the validity of the concept, a question-
naire to measure temperament is best off starting with the objective,
specific behaviors rather than the abstract categories.

Such a position does not imply that all people who display
a particular temperament, such as fearfulness, will all consistently
show the same fear responses when in the same situations. If there
are 10 behaviors which we expect fearful people to show, we might
expect any one person to show only four such behaviors, for example,
while a second person might show an entirely different set of four
fearful behaviors. With 20 fearful people, each person might show a
completely different set of four fearful behaviors. This would
lower the correlation between the 10 items. However, despite a low
correlation among any two items, the items remain useful items if
20 "calm" people show few or none of the 10 “fearful" behaviors.

Based on the methodological issues raised by the empirical

studies of temperament reviewed above, the following guidelines

26

for an adult temperament questionnaire can be added to those cited
previously:

6. The unit of temperament analysis, namely a measure of the
presence or absence of stable behavior tendencies, should
differ from the units of the questionnaire data.

7. The best units for questionnaire data are neutral and
concrete descriptions of common, everyday behaviors.

EASI-III (1975): A Temperament Scale That Meets
the Theoretical Guidelines

 

The EASI-III (l975),developed by Buss and Plomin, uses
categories which meet the theoretical guidelines outlined above but
which fail to meet the empirical guidelines. The EASI-III attempts
to measure four characteristics of temperament: activity, socia-
bility, emotionality, and impulsivity. These four categories were
chosen because on theoretical grounds they meet the heritability cri-
teria and because some empirical evidence for their heritability
exists (Buss, 1973). On purely theoretical grounds these four basic
categories were each further subdivided as follows: activity into
tempo and vigor; emotionality into general emotionality, fear, and
anger; impulsivity into inhibitory control, decision time, sensation
seeking, and persistence; and sociability into general sociability
and affection.

The EASI-III has only been used once, this being a study in
which husbands and wives rated themselves and their spouses. A test-
retest reliability over 2 to 3 months was obtained for the self-

report of 32 women; the reliability ranged from .57 to .95 with an

27

average reliability of .79. A factor analysis of the four basic
categories demonstrated their complete orthogonality. Such complete
orthogonality is unusual in a personality questionnaire which was not
originally constructed around orthogonal factors, as was the 16 PF,
for example. The orthogonality of the four basic scales stems from
a high average correlation among the items of each scale; in part
this high average correlation of items in a given scale is due to
the use of a five-point Likert-type scale for rating each item, a
method which effectively raises the average correlation among items
(Nunnally, p. 534). This high average correlation may also reflect
Ithe wording of the items; rather than being specific behaviors, the
items are general and abstract descriptions of behavior tendencies,
often being almost the same statement slightly reworded. Consider
the following sets of items:

I. General Sociability

1. My spouse makes friends very quickly.
2. My spouse is very sociable.

3. My spouse has many friends.

4. My spouse tends to be a loner.

II. General Emotionality
1. My spouse gets excited easily.

2. My spouse is somewhat emotional.
3. My spouse frequently gets upset.
With statements such as these, it is easy for subjects to uncover the
common core of meanings in the items, and hence be consistent.
The factor analysis of these subdivisions did not consis-
tently verify their theoretical independence. The tempo and vigor

components of the activity scale consistently loaded on to one factor.

Emotionality split into fear and anger components, with the third

28

theoretical category, general emotionality, splitting between these
two. Three of the four impulsivity components were not well-defined
factors, persistence being the exception. Although the two socia-
bility factors were independent, Buss and Plomin (1975) concluded
that the affection subdivision really measured negd_for affection
and hence did not belong in a temperament survey.

A validity measure of this instrument can be obtained by
comparing a subject's self-report score with the rating of this same
subject by the spouse. For example, if a husband rates his wife,
we can consider the husband to be nominating a subject at different
levels of the temperaments being measured; if the self-reports of
the wife match the husband's perceptions, then the self-report
instrument is a valid measure of temperament style as perceived by
others. Here the results were not as good: the rater agreement
between a spouse's self~report and the rating by the other spouse
ranged from .36 to .75, with an average of .51.

Plomin (1974) ran a further analysis comparing a subject's
self-report with that same subject's rating of his or her spouse;
for example, the wife's self-report was correlated with the wife's
rating of the husband. These comparisons ranged from .45 to -.24,
with an average correlation of almost zero, suggesting that a
subject's rating of a spouse was not merely a projection of the
subject's own personality.

However, it is still unclear why there is not a higher level
of agreement between spouse and self report. Plomin suggests that

this lack of agreement is due to two factors, the first being that

29

each spouse lacked an absolute standard for their ratings, and the
second being method variance when a subject's self-report is cor-
related with another person's rating of that same subject.

There are two other possible contributing factors. First,
the EASI-III may only be a crude instrument when measuring tempera-
ment differences in the middle ranges of temperament; if this is
true, then a new validity study has to be run to demonstrate that
the EASI-III can be least differentiate between extreme temperament
groups. The second explanation is more central: the items of
EASI-III are very general and abstract descriptions of behavior
rather than descriptions of specific behaviors, and this can cause
all of the difficulties reviewed above in the discussion of Empirical
Guidelines for a temperament scale. If a social desirability
response set is more likely to be activated by general and abstract
descriptions of behavior than by neutral descriptions of specific
behaviors, and if such a response set is also more likely to be
activated during a self-description than during a description of
someone else, then the distortions in the EASI-III remain a function
of the instrument design even when a subject's rating of a second
person is not merely a projection of the first subject's own per-
sonality. Furthermore, the error factor which Plomin attributes to
the raters' lack of an absolute standard to measure temperament may
be exaggerated by the use of general rather than specific behavior

items.

30

In conclusion, although the Buss and Plomin EASI-III appears
to hold some promise as an instrument for measuring adult tempera-

ment, it still needs to demonstrate its validity.

CHAPTER II

DEVELOPMENT OF THE TEMPERAMENT SCALE-ERMAN (TS-E)

The Temperament Scale-Erman (TS-E) was developed to meet the

seven guidelines outlined above. The next section includes a dis-

cussion of how this scale was constructed.

Choosing the Categories

 

The first task in developing the TS-E was choosing cate-
gories which met the theoretical guidelines. To meet the herita-
bility guidelines, longitudinal studies of adults were reviewed.
Bronson (1969, 1971), using the Berkley longitudinal data, found two
relatively stable dimensions of behavior. The first, Expressive-
0utgoing versus Reserved Withdrawn, and referred to as "emotional
expressiveness," is defined as (a) a continuum from ebullience to
depression and (b) differences in the extent to which interactions
with other people serve as a focus of involvement. The second dimen-
sion, Placid-Controlled vs. Reactive-Explosive and referred to as
"reactivity control," is defined as (a) differences in the readiness
to act or in prevailing tension and (b) tendency to contentiousness
vs. phlegmatic behavior.

The first dimension had a mean persistence over ages 5 to 16
of .73 for boys and .65 for girls. The second dimension had a mean
persistence over these ages of .55 for boys and .48 for girls.

31

32

These two dimensions accounted for over half of the variance in all
rated behaviors over this time span.

Since these two factors are orthogonal, they meet guideline
number five. These two factors are described as “Central orienta-
tions: characteristic sets of attitudes or response tendencies
which affect to a large degree the individual's interactions with
his environment." As such, these two dimensions meet guidelines
number three and four. The stability noted above meets guideline
number one. Since the research begins at age five, it is only
tenuously related to child development research. However, Bronson
notes that her categories correspond to categories used in the
investigations of Escalona and Heider (1959) and Thomas, Chess,
Birch, Hergzig, and Korn (1963).

A large component of Bronson's emotional expressiveness
dimension is sociability. While Bronson notes that Scarr (1969)
showed sociability has a large hereditary component, Bronson herself
nevertheless believes that learning is more parsimonious than genetic
mediation as an explanation for the stability of these two dimensions.
Bronson specifically proposes that either: (1) these orientations
receive constant reinforcement from the environment or (2) once
adapted by an individual these orientations just tend to persist
unless forced to alter by a major disruption of their adaptive value.

However, nothing in Bronson's work precludes a genetic com-
ponent, and since the temperament scale currently designed is only
meant to aid further research into whether such a genetic component

exists, these categories still remain useful.

33

The second major source for temperament categories was
Buss et al. (1973, 1975). Following an extended review of published
temperament studies, they derived four categories which meet the
theoretical guidelines for temperament. These four categories are
Emotionality--the level of arousal, which corresponds roughly to
intensity of reaction; Activity--the sheer amount of response output;
Sociability--the tendency to approach others; and Impulsivity--
quickness of response (Buss, 1973). Their twin study indicates these
four dimensions are independent and have an heritable component.
Buss's Emotionality and Sociability dimensions appear to correspond
to Bronson's reactivity control and emotional expressiveness dimen-
sions, respectively.

The work of Thomas et al. (1963, 1968) is the largest longi—
tudinal study specifically aimed at exploring temperament. They
used nine categories of temperament, which appear to meet the first
four theoretical guidelines. However, their published data only
cover the first 5 years of life, so conclusive evidence of stability
remains unavailable. Their nine categories also fail to meet guide-
line five, namely that the categories of temperament should be
independent of one another. The nine categories of temperament have
intercorrelations which range from -.49 to +.48; the Intensity cate-
gory has significant and positive correlations on 31 of the 40 cor-
relations computed. A factor analysis of the categories revealed
three major factors; no information is available on the variance
accounted for by these three factors. Factor A was primarily

comprised of the mood, intensity, approach/withdrawal, and adaptability

34

categories. Factor A is particularly important because children
high in Factor A appear to be at high risk for later behavior dis-
order. Factor B was primarily comprised of threshold, rhythmicity,
intensity, and adaptability. In Factor C, the largest components
were activity and intensity.

Although Factor C may correspond to Buss's Activity dimen-
sion, Thomas et al.'s first two factors do not appear to correspond
to the categories found by Bronson and by Buss et al. This may be
a coding artifact due to Thomas et al.'s failure to include a spe-
cifically Sociable component. The approach-withdrawal category is
defined as reaction to new stimulus, be it animate or inanimate.
Examples include, on the one hand, loving new toys and disliking
the first taste of orange juice, while on the other, crying when
strangers approach and enjoying a visit to the doctor. Mood is
defined as the amount of joyful and friendly behavior as opposed to
crying and unfriendly behavior. Again animate and inanimate examples
are mixed: smiling at strangers or hitting girls on the playground,
and fussing before going to sleep or crying when food the baby does
not like is presented. If a sociability component were added,
Factors A and 8 might correspond more closely to Bronson's reactivity
control and emotional expressiveness dimensions.

Scholom (1975) provides additional evidence that the nine
Thomas et al. categories can be reduced in number. He studied tem-
perament in parents and their children by using scales based on
Thomas et al.'s nine categories along with the Thorndike Dimensions

of Temperament as a second scale for adults. A factor analysis

35

uncovered three factors present in both adults and children; these
he labeled Mood, Energy, and Consistency. Energy and Consistency
appear consistent with Buss et al.'s Activity and Impulsivity Dimen-
sions. Mood appears to be an amalgam of what Buss et a1. call
Sociability and Emotionality.

For the sake of research consistency, the final categories
used for the Temperament Scale-Erman (TS-E) were four in number and
given the same names as the Buss et al. categories. They are defined
as Buss et al. define them, but also expanded in definition to
include concepts from Bronson and Thomas et al. Table 1 gives the
definitions of the four scales in the TS-E.

Buss et al. also postulated theoretical reasons for sub-
headings in each of the four categories. For reasons noted earlier
(see pp. 5-11), only the fear and anger components in the Emotionality
scale were retained. Actually the Bronson reactivity control dimen-
sion is closest to just the Buss et al. anger Component; fear may
eventually be best considered a separate temperament. However, in

the current TS-E, fear was subsumed in the Emotionality scale.

Writing the Items

 

Once the four categories of temperament had been determined,
items were written for each of the scales. Some items were drawn
from existent scales. With the assistance of four undergraduate
psychology majors, every item in the Thorndike Dimensions of Tempera-
ment scale, the Thurstone Temperament Survey, the Johnson Temperament
Analysis, and the EASI-III was reviewed to see if it appeared to be

related to one of our four categories of temperament.

36

Table 1: Definitions of the Four Categories of Temperament in the
Temperament Scale-Erman (TS-E)

 

Definition of the Impulsivity Scale

Impulsivity: Planful and persistent vs. impulsive and dis-
tractible. Impulsivity measures the quickness of our response. On
the planful end, it involves planning ahead what you do and then
completing it on time. On the far impulsive end, it involves doing
things on the spur of the moment, easily dropping one thing to move
on to other activities, never completing tasks.

Definition of the Emotionality Scale

Emotionality: Placidity vs. explosiveness. Emotionality
refers to differences in our readiness to act, in our prevailing
level of tension. We range from being placid or phlegmatic on the
one hand to being explosive or contentious on the other hand. These
differences may also be differences in our level of arousal, which
correspond roughly to the intensity of our reaction. On the explo-
sive end of the scale are the people who, in a given situation, are
most likely to express anger or fear.

Definition of the Sociability Scale

Sociability: Ebullient vs. reserved or depressed. This is
a measure of our tendency to approach others. It measures differ-
ences in the extent to which interaction with other people serves as
the focus of our involvement. This continuum ranges from ebullience
on the one end and goes toward being reserved on the other, ending
in depression or social withdrawal. These descriptive words are
used within the social context: someone who is friendly when meet-
ing others but depressed when alone would ngt_be depressed on this
scale.

Activity: Active vs. lethargic

 

The activity scale measures the sheer amount of response out-
put. It refers to the level, tempo, and frequency of our motor and
muscular activities. Some people are full of energy, on the go,
quick to get things done, ready at a moment's notice. Others are
slow, easily tired, less productive than others, and like to move
at a leisurely pace.

 

37

At the same time, original items were written at more than a
dozen small group meetings exclusively devoted to this endeavor.
Membership in these small groups changed constantly from meeting to
meeting, and included graduate and undergraduate students, both
psychology and non-psychology majors, as well as professors1 and
spouses of graduate students and professors. Participants in a
given session were read one of the definitions of temperament and
then asked to spend a few minutes thinking of people they knew who
fit at one of the far ends of the trait being defined. Participants
were then asked to think of specific behaviors which made them
place these people at the ends of the continuum. These sessions
proved the most productive source of items.

A final source of items was Buss and Plomin's A Temperament

 

Theory of Personality_(l975). After defining a given temperament,
Buss and Plomin provide a description of a hypothetical man who
falls at an extreme end of the temperament continuum. These des-
criptions often consisted of concrete behaviors which could be
transformed into TS-E items.

These three sources led to a pool of some several hundred
items. Any item which was not behaviorally specific was discarded;
most items drawn from existent scales failed this criterion and
were therefore discarded. In addition, any items which were spe-
cific to the activities of only one sex as well as items that appeared

to contain a social desirability element were discarded. All forced

 

1Professor Charles Hanley independently supplied some very
useful items.

38

items were carefully reviewed to make sure each of the two choices
was equally desirable or undesirable. Decisions on sex role beha-
vior, social desirability, behavioral specificity, and relevance to
the appropriate scale were made by my four undergraduate assistants
and myself; any item not receiving unanimous consent was discarded.
From the pool of items still remaining, the 20 items that
intuitively best matched a given scale were chosen. These 20 items
were then rewritten so that 10 items had to be answered true and
10 items answered false to achieve the maximum score in a given
direction. Due to a coding error, the Emotionality scale was not
quite equally divided; a maximum high score needed 11 true and 9
false answers. Finally, tables of random numbers were consulted to

arrange the order in which the items appeared in the questionnaire.

Format of the Questionnaire

 

The basic decision in choosing the format for a personality
questionnaire is whether response choice will be limited to just
two possibilities, usually true-false or yes-no. The most common
additional choice is a neutral response. Adding a neutral response
supposedly reduces examinee resentment that arises when responding
in only one direction or the other to a given item seems to present
an impossible choice. Among the personality tests reviewed earlier,
the §£I_is the only test that has forced-choice items; the Guilford-
Zimmerman, the l§_ﬂf, the Thurstone Temperament Schedule, and the
Johnson Temperament Analysis all provide a middle response choice,
while the EASI-III attaches a Likert-type scale to each item that

permits five response choices.

39

Thorndike also assumed that extended response freedom was
needed to maintain examinee acceptance of the test, but he developed

an innovative alternative in the Dimensions of Temperament scale.

 

His scale consists of 15 sets of items, with each set containing
10 items. The items of a given set contain one item which relates
to each of the 10 dimensions of temperament which the test measures;
hence the 10 items per set. In a given set of items, examinees
pick the four items which are most like them, and the four items
which are most unlike them; two items are omitted. Though an ingen-
ious format, the possibility remains that one or two scales may get
significantly fewer responses from a given subject. Subject scores
on these two scales would then be an inaccurate measure of temperament.
Although the problem of examinee resentment to forced-choice
items is a sound reason for including at least a third, neutral
response possibility, responses were limited to true or false. The
decision was based on the following considerations: first, neutral
responses are unscored, and since the original questionnaire is
shorter than most traditional personality scales, if large numbers
of items were unscored, too few items might be answered to assign a
meaningful temperament score to a subject; second, scales were still
in the developmental stages. If a number of items were consistently
passed over even though the instructions included no such option,
those items could always be drapped. Furthermore, a neutral response
category could be added at a later time. The fourth reason was that

the items were carefully constructed to be neutral descriptions of

4O

behavior and to be free of social desirability. Care was given to
balance the choices in a given item. Finally, a Likert-type scale
to each item was rejected because such an approach would have made
the scale unnecessarily cumbersome, in effect adding a second scale
on top of the first. Since the TS—E was closer to a personality
inventory than to an attitude scale, high inter-item correlations
on a given scale were not an issue of concern.

There nevertheless remained the problem of the impossible
choice--items describing behaviors in which the examinee never
indulged. To take a rather blatant example, consider TS-E item #12:

If I am following a recipe, I sometimes have to interrupt
my cooking because I discover I am out of an ingredient.

 

There are surely many men and women who have never followed a recipe.
To circumvent this problem, two steps were taken. First, the instruc-
tions were worded so that items were answered "true" if they were

true or mostly true and likewise "false" if false or mostly false.

 

In this the instructions were modeled after the instructions for
the MMPI. These instructions provide examinees with greater lati-
tude. The second step was breaking up many items into two clauses,
a dependent clause beginning "If..." followed by the independent
clause. The dependent clause is underlined and subjects were
instructed to assumg_that this part of the item was actually true
whether or not it actually was true. Having assumed the underlined
clause true, they were then free to answer the independent clause
true or false. If they never actually performed the underlined

behavior, they were asked to imagine that they were performing that

41

behavior and then to pick among the choices of the independent
clause, perhaps by thinking of a behavior analogous to the under-
lined clause. In the example given above, subjects who had followed
recipes would answer based on actual experiences. Subjects who had
no such experiences would answer the statement by drawing on an
analogous experience--perhaps following a manual to repair a car

or instructions to construct a toy. The actual instructions for the
TS-E were worded as follows:

Please answer the following questions true or false. Answer
every question; answer true if the statement is true or
mostly true for you. Answer false if false or mostly false
for you.

Some of the statements will ask you to pretend that you are
in a situation. These statements begin:

"If . . . ."
An example is the following:

"If I were to buy a car, I would buy a big car rather
than a small cari11

For these statements, try to pretend you are doing what is
described in the underlined part of the sentence, then answer
the rest of sentence true or false, based on how you typically
act or would expect to act. In the above example, answer the
statement even if you have never bought a car and do not plan
to buy a car; pretend you are buying a car and think how you
would act in such a situation.

 

These instructions remove the obstacle of impossible forced
choices while at the same time preserving a zero-one response for any
item, excluding those that a respondent chooses not to answer at all.

Such instructions have a potential drawback. Since respon—
dents were asked to imagine how they would act for items in which
they have never performed the action described in the dependent
"Ij;;;ﬂ clause, some capacity for imaginative self-projection is

needed for accurate answers. The relation of responses to some

42

capacity for imaginative thought would be a problem if this imagi-
native dimension accounted for a substantial portion of the response
variance. The scales would then be unable to pass any validation

study based on the nomination method.

CHAPTER III

A MULTI-TRAIT MULTI-METHOD STUDY OF TEMPERAMENT

A multi-trait multi-method study of temperament was conducted
using the Temperament Scale-Erman; however, the entire multi-trait
multi-method matrix was not generated because it would have placed
excessive time demands on subjects. Multi-trait refers to the four
types of temperament that the TS-E is supposed to measure. Two
methods comprise the multi-method criterion. The first method is
the paper and pencil TS-E questionnaire. If the four scales showed
little or no correlation with one another, this would indicate that
the four scales are measuring different features of personality;
this would be a form of discriminant validity. This study also pro-
vides a measure of reliability of the TS-E; hence it is called the
Reliability Study. The second method was a peer-nomination method
using high and low nominated subjects for each scale; if the TS—E
and the EASI-III are able to match the nominators in distinguishing
high nominees from the lows, this would provide convergent validity
for the existence of the four temperaments. This study simultan-
eously tested the criterion validity of the EASI-III and the TS-E;

hence this second study is called the Validity Study.

43

44

The Reliability Study

 

M99.

The reliability of the 80-item TS-E was examined by having
71 men and 111 women complete the test. All subjects were students
in an undergraduate Abnormal Psychology class at Michigan State
University. The test was completed during class time. Students
were free to leave the class early and not take the test, but were
urged to stay in the interest of research. All students were assured
anonymity and no identifying information was included on the com-
pleted tests; however, as questionnaires were turned in, they were
sorted according to sex of subject. At the time they took the test,
students were not told what the test was attempting to measure but
3 weeks later, when results were compiled, the test was explained
to them. Students were told they would not be given their individual
test scores since the TS-E was still in the experimental stage and

individual scores would be meaningless.

Results

The four scales were analyzed for inter-scale correlations.
Men and women first were analyzed separately and then the data were
pooled. The results appear in Table 2. All correlations were very
low, indicating that each scale was measuring an independent per-
sonality characteristic. In only one case did a correlation exceed
.15: for male subjects, Activity correlated with Emotionality at

-.34.

45

Table 2: Reliability Study: Inter-Scale Correlations for Male,
Female, and Pooled Subjects on TS-E

 

Scale Activity Impulsivity Sociability Emotionality

 

1. Men: N = 71

Activity - -.O6 -.13 -.34
Impulsivity - -.02 .05
Sociability — -.04

Emotionality -

II. Women: N = 111

Activity - .07 .06 .OO
Impulsivity - .06 .09
Sociability - .14
Emotionality -

III. Men & Women: N = 182

Activity - .02 -.02 -.13
Impulsivity - .02 .07
Sociability - .08
Emotionality -

 

Means, standard deviations, and internal reliability as
measured by coefficient alpha were also obtained for each scale;
Table 3 shows the results for men, women, and pooled subjects. In
all cases the internal reliability was between .42 and .60. These
modest internal reliabilities reflect the low correlations between
items of a given scale; such low correlations would be expected,

given the true-false (0,l) nature of the scale.

46

Table 3: Reliability Study: Means, Standard Deviations, and Internal
Reliability for TS-E

 

 

Standardized
Scale Mean 5.0. Alpha Item Alpha
1. Males: N = 71
Activity 10.90 2.70 .43 .44
Impulsivity 9.74 3.10 .59 .60
Sociability 10.81 3.02 .63 .59
Emotionality 7.09 2.87 .57 .51
II. Females: N = 111
Activity 10.82 2.75 .47 .46
Impulsivity 9.44 2.98 .53 .53
Sociability 11.68 2.73 .56 .55
Emotionality 7.68 2.75 .44 .45

III. Males & Females: N = 182

Activity 10.85 2.72 .44 .43
Impulsivity 9.56 3.02 .55 .55
Sociability 11.35 2.87 .60 .57
Emotionality 7.46 2.80 .48 .48

 

Maximum score = 20.

The Validity Study

 

Heme

The four scales of the TS-E were each expanded by five
items, creating a TS-E of 100 items. The expanded four scales were
then validated by a peer nomination method. The Buss EASI-III
scales were validated at the same time. For clarity, students who
did the nominating are referred to as the "seekers"; the pe0ple

who were nominated and filled out the scales are referred to as the

47

"subjects"; the complete EASI-III and TS-E are each "tests"; and
each test is composed of four "scales."

Seekers were given one of the four scale definitions that
appear in Table l and asked to think of two pe0ple whom they know,
one of whom might fit on the very high end of the scale, and the
other on the very low end of the scale. A given seeker thus only
aided in the validation of one of the four scales. For example, a
seeker would be given the definition of Sociability that appears in
Table l and would try to think of someone "high“ on Sociability and
someone "low" on this dimension. If scores on the scale discrimi-
nated between subjects in the direction perceived by the seekers,
then the scale would be validated.

To insure that subjects were being nominated to meet the
scale definitions and not just to match the relevant behaviors des-
cribed by items on the Temperament Scale—Erman, seekers were not
initially shown the TS-E. The seekers were reapproached a few days
after they received the scale definition. If the seeker could think
of two appropriate subjects, the seeker was then given two copies of
the entire TS-E and two c0pies of that Buss EASI-III scale that
matched the scale definition used for the nomination. The seeker
gave these instruments to the subject to complete. Thus though a
subject nominated for high sociability would only aid in the criterion
validation of the Sociability scale of the TS-E, he/she would fill
out the entire TS-E; however, he/she would only fill out the Socia-
bility scale of the EASI-III. The entire EASI-III was not used so

that time demands did not over-burden subjects.

48

Some seekers could only think of one appropriate subject,
either "high" or "low" on a scale. In such cases, seekers were
asked to give the questionnaire to the one nominated subject, as
well as to any other person they randomly chose, categorized as
"other.“

Seekers were asked to identify the initials and the sex of
the subject they had nominated; the subject's name was not recorded,
so confidentiality of results could be assured. All questionnaires
were coded so the scale being validated by a subject and the sub-
ject's nominated position on that scale, "high," "low," or "other,“
could be identified. Seekers were told always to give questionnaires
to "low" subjects. To make sure that the subject nominated had
filled out the apprOpriate questionnaire, as seekers returned ques-
tionnaires they were always asked the initials of the subject who
had completed the questionnaire.

The seekers who aided in the validation of the Sociability,
Emotionality, and Impulsivity scales were all advanced undergraduate
psychology majors. They received no course credit for nominating
subjects. Seekers were explicitly asked not to participate rather
than give the questionnaire to subjects whom the seekers did not
think appropriate to the ends of the defined scale categories.

The numbers¢rfseekers originally used for these three scales were:
12 for Sociability, 13 for Impulsivity, and 9 for Emotionality.
One seeker for the Emotionality scale and one for the Impulsivity

scale only returned data from a single subject and therefore had to

49

be dropped from the analysis. In addition, one subject from the
Sociability and Emotionality pools failed to complete the approp-
riate EASI-III scale, and had to be dropped from the analysis com-
paring the two scales. Thus in the analysis, the pairs of subjects
used were as follows: 11 for Sociability, 12 for Impulsivity, and
7 for Emotionality. Finally, six seekers found "other" subjects
instead of a "high" or a "low" subject; these "other" subjects
replaced two "low" and one "high" Sociable subject and two "low"
and one “high" Impulsive subject.

Since the definition of activity is easier to grasp, stu-
dents in an introductory psychology course were used as seekers for
the Activity scale. Students received extra credit for participat-
ing in the study as seekers. These students were told they would
be allowed to participate in the study for full credit even if they
could not think of people who could be placed on either the high
or the low ends of the activity scales; in such a case, they could
give the questionnaire to any two people, defined as "others."
However, the need to clearly identify subjects as either "high"
active, "low“ active, or “other" was carefully stressed. Twenty-
eight seekers were originally used to aid in the validation of the
Activity scale; three seekers for the Activity scale returned data
from only a single subject and therefore had to be dropped from the
analysis.

The EASI-III was scored by assigning a score of zero if the
Likert-type scale for an item was marked at the extreme "low"

direction of the scale and four if at the extreme "high" end.

50

Intermediate positions, moving from low to high, were scored one,
two, and three, respectively. The TS-E was scored by assigning a
score of one to any item answered in the "high" direction and
zero to any in the "low" direction. To make scores from the scales
on each test comparable, all scores were converted to percentage
scores based on the maximum possible raw score for a given scale.
Two subjects did not complete every item on the TS-E; for these
subjects, the maximum score consisted of the total completed items
for a given scale.

The analysis was planned to answer a number of questions.
The first question was:

1. Can a scale from a single test differentiate the high
nominated subjects from the low nominated subjects?

To answer this question, separate matched t-tests were conducted
for each scale of each test. The matched t-test for a given scale
used only the pair of subjects nominated by the seeker for that
scale. For the TS-E scales, the only subject scores considered
were those scores on the scale for which the subject had been nomi-
nated. Thus for a pair of subjects nominated as high and low on the
activity dimension, their TS-E scale scores for Emotionality,
Impulsivity, and Sociability were ignored.

The next set of questions was the following:

2. Do the two tests together distinguish high nominated
subjects from low subjects?

3. Is one test better than the other in distinguishing high
nominated subjects from low subjects?

4. 00 subjects tend to answer more items in a positive
direction on one test than on the other?

51

Question 3 is a critical question for deciding the relative
merits of an EASI-III scale versus a TS-E scale. To answer ques-
tions 2-4, a within subject two-way analysis of variance was con-
ducted; it was based on two levels of subjects (high, low) and two
tests (EASI—III, TS-E) for each seeker. The main effect of subject
or Subject Differentiation provided the answer to question 2.

The main effect of tests or Test Strength provided the answer to
question 4. Finally, the interaction of subject and tests or Test
Differences provided the answer to question 3. Again, only matched

pairs were used.

Results

The means and standard deviations of each scale in the
validity study are presented in Table 4. For the purposes of com-
parison, Table 4 also includes the results of the reliability study
converted to percentage maximum scores. In every scale for each
test, the mean of high subjects is above the mean for low subjects.
For the TS-E, the reliability data provide some tentative norms of a
scale in a random population: Table 4 reveals that the high mean
was always above the reliability study mean, while the low mean was
always below the reliability study mean. However, for individual
subject pairs in the validity study, some high-low reversals occurred

for every scale except the TS-E Sociability Scale.2 To statistically

 

2When only high-low pairs, rather than high-other or low-
other pairs are considered, then the TS-E Impulsivity scale also
shows no crossovers. Furthermore, when high-low pairs alone are
considered, the TS-E Impulsivity and Sociability scales had a domain
of high subject scores that never intersected the domain of low sub-
ject scores. That is, the high subject who scored lowest still scored
above all low subjects. This was not true for the other two TS-E
scales or for any of the EASI-III scales.

52

Table 4: Means and Standard Deviations of Reliability and Validity Studiesa

 

 

 

 

Validity Study Reliability Study
TS-E Scale: EASI-III: ( )
. Mean Mean TS-E N=l82 :
Scale SUbJECtS N (Standard (Standard Mean
Deviation) Deviation)
Impulsivity Highb Imp. 12 ‘60.58 56.25
(20.60) (18.64)
Low Imp. 13 36.00 41.77
c (15.75) (14.82) 47°78
Remainder 93 41.85 50.90
'TTB (16.94) (17.87)
Sociability High Soc. 12 62.00 60.00
( 8.94) (10.55)
Low Soc. 12 31.33 38.33 56 75
(17.88) (16.14) '
Remainder _24_ 52.83 50.77
118 (17.61) (18.04)
Activity High Act. 27 62.81 64.11
(12.64) (15.41)
Low Act. 26 46.77 39.73 54 25
(13.85) (11.67) '
Remainder _65_ 58.89 49.05
118 (15.01) (17.45)
Emotionality High Emot. 9 49.78 57.44
‘ (15.76) (17.62)
Low Emot. 76 29.14 39.00 37 28
(11.25) (12.65) °
Remainder 192_ 39.08 50.61
118 (15.31) (17.92)

 

aTo make the reliability and validity studies comparable, all means
and standard deviations are based on percentage maximum scores. Thus in the
validity study, raw scores were first divided by the 25 items per scale; for
the reliability study, the results in Table 3: III have been here divided by
20. Scores are printed above as percentage scores.

b"High" and "low" include the six subjects defined as "others," who
replaced two low and one high Sociability subjects and two low and one high
Impulsivity subjects.

c"Remainder" consists of all subjects who participated in the val-
idity study excluding those nominated for the scale listed in column 1.
Therefore "remainder" is never a random p0pulation.

53

determine how well each scale worked, the statistical analysis out-
lined earlier is reported in Table 4.

Matched t-test for individual scales. Table 5 presents

 

the results of the separate matched t—tests for each scale of each
test in the validity study. This answers Question 1. In the Tem-
perament Scale-Erman, all four scales were able to differentiate
significantly (p < .05) the nominated high subjects from the nominated
low subjects. Three scales of the EASI-III test were able to differ-
entiate high subjects from low subjects. These were the Impulsivity,

Activity, and Sociability scales.

Table 5: Results of Matched T-Tests for Individual Scales, Validity

 

 

Study
(were)
Em°t1°"a11ty Ii;§-111 I 212231 3:21 21:93 :82?
Impu‘51Vity EASI-III 15 :Igigg 1:13 :3217 I837
Ra... 1:13 3:1: :23: :33;
Age... 11 3&3: 9:33 :23: 113:

 

ANOVA results. Table 6 presents the results of the two-way

 

analysis of variance for each scale in the validity study. For
every scale, the two tests together were able to distinguish signifi-

cantly (p < .05) high nominated subjects from low subjects (see

54

Table 6, column l, Subject Differentiation); so Question 4 can be
answered affirmatively for each scale. Questions 2 and 3 for each
scale are discussed together above (see Table 6, columns 2 and 3,
respectively).

Results for the Impulsivity scale showed no differences
between the TS-E and the EASI-III on the level of scores obtained by
nominated impulsive/planful subjects (Question 2). However, the
Impulsivity scale of the TS-E is significantly better (p < .05) than
its counterpart on the EASI-III in differentiating high impulsive
subjects from low subjects (Question 3).

0n the Sociability scale, a strong trend (p < .06) indicates
that nominated sociable subjects receive a higher score on the TS-E
than on the EASI-III. Comparing the two tests' abilities to differ-
entiate high sociable subjects from low sociable subjects reveals
that neither test is significantly better. There is a very weak
trend (p < .l8) indicating the superiority of the TS-E over the
EASI-III.

0n the Activity scales, nominated active subjects tend to
score higher on the Temperament Scale-Erman than on the EASI-III;
but this trend in Test Strength differences was very weak (p < .12).
More importantly, a strong trend (p < .07) indicates the EASI-III
is superior to the TS-E in differentiating high active subjects
from low.

Finally, on the Emotionality scale, nominated Emotional sub-
jects scored an average of l4 points higher on the EASI-III than on

the TS-E (p < .05). However, the two tests were identical in their

55

Table 6: Results of Two-Way Analysis of Variance, Validity Study

 

 

 

 

Test Strengthsa Subject Differentiationb Test Differencesc
Grand Mean F= Grand Mean F- Grand Mean F-
(Standard Standard " Standard '

Error) (p < ) Error) (p < ) Error) (p < )

 

I. Impulsivity Scale (N=l2 pairs)

 

-2.47 F=.47 -29.46 F=8.56 -7.42 F=6.00
( 3.62) (p<.51) (l0.07) (p< .01) (3.03) (p< .03)

II. Sociability Scale (N=ll pairs)

-5.53 F=4.42 -38.05 F=19.52 -7.84 F=2.11
( 2.63) (p< .06) ( 8.61) (p < .001) (5.40) (p< .18)

III. Activity Scale (N=25 pairs)

 

4.89 F=2.63 -24.58 F=35.14 5.12 F=3.65
( 3.02) (p< .12) ( 4.15) (p < .0001) (2.68) (p< .07)

IV. Emotionality Scale (N=7 pairs)

 

-14.04 F=9.56 -23.58 F=6.15 .10 F= .0002
( 4.54) (p< .08) ( 9.49) (p< .05) (8.12) (p< .97)

V. Anger Subscale (N=7 pairs)

4.75 F= .98 -28.99 F=9.03 5.35 F= .34
( 4.80) (p < .36) ( 9.65) (p < .02) (9.27) (p < .58)

 

aMain effect of tests, answering question 4: Does one test
show higher scores? + Grand Mean = higher TS-E score; - Grand Mean
higher EASI-III score.

bMain effect of subjects, answering question 2: Can the two
tests together distinguish high nominated subjects from low subjects?

CInteraction, answering question 3: Is one test better than
the other? + Grand Mean = EASI-III scale is superior; - Grand Mean
TS-E scale is superior.

56

ability to differentiate high Emotional subjects from low Emotional

subjects.

In a separate analysis, the Fear subscale was dropped

from the TS-E Emotionality scale in order to compare a pure Anger

TS-E with the EASI-III Emotionality scale. There were no significant

differences in regard to question 2 or question 3.

II.

Summary of Results

 

All of the results might be summarized as follows.

Reliability Study

1.

The low correlation between individual TS-E scales,
ranging from -.l3 to .08, indicates that the four scales
of the TS-E measure independent characteristics.

Internal reliability, as measured by coefficient alpha,
ranged from .44 to .60 for the TS-E scales. These modest
relibilities reflect the relatively short length of the

TS-E scales and the true-false (0,l) nature of scale items.

Validity study: the following findings were significant (p < .05).

1.
2.

All four TS-E scales demonstrated criterion validity.

The following scales of the EASI-III test demonstrated
criterion validity: Impulsivity, Activity, and Sociability.
All four temperament scales showed criterion validity when an
EASI-III scale was used in conjunction with a TS-E scale.
The TS-E Impulsivity scale was better than the EASI-III
Impulsivity scale in differentiating high nominated sub-
jects from low subjects.

TS-E Emotionality scale scores were higher than EASI-III

Emotionality scale scores.

CHAPTER IV

DISCUSSION

This research has demonstrated the existence of the four per-
sonality traits which comprise the TS-E. It has also demonstrated
criterion validity for two paper and pencil tests which meet the
theoretical guidelines for temperament and which measure the four
personality traits. Having two such measures should provide essen-
tial tools for future research aimed at determining how large, if
any, is the genetic component of temperament, how strongly temperament
affects general behavior, and how stable temperament remains over a
lifetime.

The Temperament Scale-Erman was specifically designed to meet
empirical as well as theoretical guidelines. These guidelines were
derived by contrasting failed studies of temperament with successful
studies (see pp. l2-26 of this thesis). The guidelines were the
following:

l. The unit of temperament analysis, namely a measure of the
presence or absence of stable behavior tendencies, should
differ from the units of the questionnaire data.

2. The best units for questionnaire data are neutral and

concrete descriptions of common, everyday behaviors.

57

58

Since the EASI-III also met criterion validity without meeting these
empirical guidelines, an essential question remains: is a test which
meets these empirical guidelines a better test of temperament than
one which does not? In particular, is the TS-E better than the
EASI-III?

Discussion of this question will be followed by two sections
that discuss ways to improve the TS-E and important future tests of

the TS-E.

TS-E or EASI-III?

 

Is the TS-E better than the EASI-III? For at least one of
the four categories of temperament the answer is clear: the TS-E
Impulsivity scale is significantly better than the EASI-III Impul-
sivity scale. In addition, while the TS-E Emotionality scale was
able to differentiate significantly high subjects from low subjects,
the EASI-III Emotionality scale was not.

Three other features of this study suggest that meeting the
empirical guidelines, as the TS-E did, was quite useful. The first
was the comments of students who took the reliability study. After
the reliability study was completed, the author spoke with about two
dozen students who had completed the questionnaire. While most stu-
dents reported that the questionnaire was interesting to complete,
none of them was able to identify accurately what the questionnaire
was measuring. The most frequent guesses were that it was assessing
obsessiveness, a topic recently covered in the class lectures, or sex

differences. This combination of interest but uncertainty matched

59

one of the aims in constructing behavioral items: controlling social
desirability. Most people probably would rather appear sociable
than unsociable and active rather than inactive, for example. Atti-
tude type measures, such as the EASI-III, may be affected by such
factors. By using behavioral items, any such social desirability
factors can be avoided.

The value of the behavioral items was also suggested by the
peculiar drop-out pattern from the Emotionality validity scale.
Before potential seekers looked at either of the two tests, they were
first asked to think of their two subjects, a high and a low for a
given scale. Every scale originally had a minimum of l5 seekers.
The Activity and Sociability scales lost a few seekers who never
returned the questionnaires, usually because the seeker forgot about
the study or else because the seeker's subjects were unavailable.
The Impulsivity scale suffered the largest drop-out rate by Seekers;
partly this was because it was the last scale to be validated in the
study and thus began to run into end of the term time conflicts for
the students. In addition, the Impulsivity scale suffered from a
self-selection factor: seekers reported that their high impulsive
subjects never completed the questionnaire, probably because they
were high impulsives.

However, the Emotionality scale was a peculiar case. Here
many Seekers who had already thought of their two subjects dropped
out of the study just after they first looked over the two question-
naires. These seekers were concerned that when their high Emotional

subjects saw the EASI-III scale, their subjects would know they had

60

been chosen because they were explosive, and these seekers were
afraid their subjects would explode at them. Though these incidents
are only anecdotal, they suggest that subjects who fill out the EASI-III
can immediately know what the test purports to measure. Such knowl-
edge might lead to any of the distortions discussed on pp. 12-26,
distortions which should not occur with the behavioral item TS-E.

However, the value of the TS-E format is best suggested by
its having done so well the first time it was used. The EASI-III has
already been pretested; as its name implies, it is a revision of an
earlier form. In addition, it currently leaves little room for
further improvement. Poor items already had been discarded and the
current scales already have large internal reliabilities, large
part-whole correlations, and most importantly, already form four inde-
pendent factors. The EASI-III is close to the ceiling of its improv-
ability. By comparison, while the TS-E is only at the floor of its
improvability, the TS-E is already clearly superior to one EASI-III
scale and at least as good as two others. If poor items are discarded
and if the weaker scales are lengthened, then the TS-E should markedly
improve.

Finally, some indirect evidence, referred to in footnote 2,
p. 5l, suggests that the presence of matched pairs using "other"
subjects may underestimate the value of the TS-E Sociability scale.
When these "other" pairs are eliminated, then the TS-E Sociability
scale and the TS-E Impulsivity scales are the only two scales among

all the TS-E and EASI-III scales where there are no high-low reversals

61

in a matched pair and where the domain of all low subject scores
never intersects the domain of high subject scores.

The contention that it is item types--behavioral versus
attitudinal--which differentiate the TS-E from the EASI-III would
still need empirical proof. In particular, a simple Q-sort experi-
ment needs to be run. Subjects would be given a deck of cards; on
each card would appear either an EASI-III item for a given scale or a
TS-E item for that scale. Subjects would then be asked to sort the
cards according to whether the item described a specific behavior or
a general description of behavior. If most TS-E item cards and few
EASI-III item cards appeared in the specific behavior pile, this would
clearly demonstrate that subjects in fact perceive differences between

the two types of items.

Further Development of the TS-E

A test-retest study of the TS-E needs to be run. Such a
study would probably reveal levels of reliability for the TS-E scales
considerably above the internal reliabilities reported earlier,
demonstrating that the first reliability study found the minimal or
floor levels of reliability for each scale. At the conclusion of
such a study, subjects could also be asked to guess what they thought
the test was attempting to measure. If earlier predictions about the
benefits of behavior-specific items are correct, then subjects should
have a more difficult time guessing the exact purpose of the TS-E

than of a non-behavior-specific test.

62

TS-E scales could also be improved. If the TS-E had two
scales as good as the EASI-III scales and two scales significantly
better than comparable EASI-III scales, then the TS-E might be the
single instrument of choice for measuring its four categories of
temperament. By concentrating on the Activity and Sociability
scales, this level of achievement could likely be reached. Without
too many changes the TS-E Activity scale could be strengthened.
until it is as good as the EASI-III Activity scale, while the TS-E .
Sociability scale could be developed until it is significantly better
than its EASI-III counterpart.

The first step in improving the TS-E involves "weeding out"
poor items, particularly from the Activity and Sociability scales.
Items should only be discarded if they fail to differentiate high
subjects from lows over a number of repeated studies. Instead of
immediately running new studies, however, the responses to the first
reliability and validity studies could be reanalysed using split-
half methods.

Before any final decision on dropping items, a new reliability
and validity study using a new set of subjects would then have to be
run. In this new study, the "discarded " Activity and Sociability
items would not actually be discarded. However, a new set of items '
equal in number to the "discarded" items would be added to the two
scales. In addition, 5 to 10 new items would be added to all four
scales. If any "discarded“ item again failed, then these failed
items could actually be discarded for good. New items which appeared

unable to differentiate high subjects from low subjects could not

63

be discarded until this entire procedure for eliminating items was

repeated.

Behavior Validation of the TS-E

 

When the TS-E has been improved, other kinds of studies need
to be undertaken. These would determine whether scores on the TS-E
can be used to predict behavior. Several areas need to be explored:
would high or low scores on a scale predict specific behaviors in an
experimental situation? Would groups known to be high or low on a
given dimension consistently score in an expected direction?

For the first type of study, the TS-E could be given to a
large sample. Then high and low scoring subjects for a given tem-
perament could be called back to perform in a predetermined experi-
mental situation. For example, in an acquaintance process study if
subjects were asked to sit in a waiting room along with an experimenter
accomplice, then those subjects who had scored high on the Sociability
scale might be expected to initiate conversations with the accomplices
while subjects who had: scored low on Sociability would be expected
to sit queitly. Similarly, in a situation inducing frustration,
subjects scoring high on Emotionality would be expected to get more
upset than subjects who had scored low on Emotionality. Various
appropriate experiments could be conducted with §s scoring at the
extremes of each scale.

In the second type of study, the TS-E would be given to known
populations. For example, Sociability would be tested by giving the

TS-E to a group of public relations employees and a group of forest

64

fire watchers. The first group needs to be relatively more sociable
to succeed at their job, while the second group needs to be rela-
tively less social, so the first group ought to score higher than the
second on the TS-E Sociability scale. In some ways this study is
similar to the reliability study reported in this paper, but there is
one crucial exception: in the present study, high and low categori-
zation was based on the relative judgment of an individual seeker,
while in the future study, comparing public relations employees

with forest fire watchers, there are probably absolute levels of
sociability above or below which one is no longer suitable for the
chosen professions. Thus if the TS-E were accurate, no forest fire
watcher should receive a higher Sociability score than any public
relations employee. In the present study, when a "low" nominated
subject scored above some other "high" nominated subject, there is

no way of knowing whether the test is inaccurate or whether the

two different seekers who did the nominating had very different

ideas about "high" or "low."

Consider the following extreme example for the Activity
seekers: 'ifone seeker were a seminary student and the other were a
football player, while each might in fact choose the highest and
lowest active subjects from their circles of friends, it is possible
that the seminary student's "high" active is in fact less active than
the football player's "low" active. By using known p0pulations, such
as the public relations employees for sociability, it is possible to

control for relative differences in seeker nominations.

65

When the TS-E can demonstrate behavior prediction, then
clearly the vast range of temperament issues is open to simple and

efficient study.

APPENDICES

APPENDIX A

TEMPERAMENT SCALE-ERMAN (TS-E)

Please answer the following questions true or false. Answer every
question; answer true if the statement is true or mostly true for you.
Answer false if false or mostly false for you.

Some of the statements will ask you to pretend that you are in a
situation. These statements begin:

"If . . . ."
An example is the following:

"If I were to buy a car, I would buy a big car rather than a
small car.Tr

For these statements, try to pretend you are doing what is described

in the underlined part of the sentence, then answer the rest of sentence
true or false, based on how you typically act or would expect to act.

In the above example, answer the statement even if you have never

bought a car and do not plan to buy a car; pretend you are buying a

car and think how you would act in such a situation.

66

—l
0

«boom

10.

ll.

12.

l3.
14.

15.

l6.

l7.

18.

67

I never buy clothes on the spur of the moment.
I will sometimes take out two hours to talk to someone.

If I hear a tornado warning, I rarely bother to take cover.

 

If someone cuts into a line I'm waitinggon, I would be more likely
to say nothing than to complain to them.

When I was in high school, I preferred to go out with friends and
stir up excitement rather than to go to a movie with them.

If I hear strange noises downstairs, I am more likely to call a
friend than to ignore the noises and go to sleep.

Sometimes I hit or kick vending machines that take my money and
give me no product.

There are times of the day I need time to just sit and do nothing.

From the time I finished high school, I've know what career I
wanted.

If I am with a group of friends and an old friend we have not seen
for years joins us, I would Be less likely than my friends to

 

give him/her a big hug.
I am able to work long hours without feeling tired.

If I am following a recipe, I sometimes have to interrupt my
cooking because I discover I am out of an ingredient.

I sometimes feel I have to get away from people for a while.

If I have dinner with friends, I find that I eat my meal more

 

STEle than my friends eat.

If I am wakened from an afternoon nap by people repeatedly honking,
their car horn outside my window, I onTd be more likely to first
yell at the people in the car than to first close my window.

If I started a garden, I would plant the seeds in precisely the
time of year that is best for each type of seed.

If I walk with people my own height, I find I usually walk quicker
than they do.

I tend to be free from stage fright in speaking or performing in
public.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.
32.

33.
34.

35.
36.

68

When I arise in the morning, I usually do my regular morning tasks
in the same order. (Tasks would include washing, brushing teeth,
eating breakfast, dressing, etc.)

If I had a little extra money, I would be more likely to buy a big
meal than to save it.

If Iyplayed a violin, I would rather play alone than play in a
quartet.

If I have a good idea, I like to mull it over before sharing it
with others.

If I had volunteered to help a church or_political campaign and had a
choice between two equally boring jobs, I would—prefer to sit alone
and stuff envelopes rather than sit alone and do telephone canvassing.
I find I often hurry to get places even when there is plenty of time.
I tend to be annoyed by out of the ordinary noises.

If a TV program becomes extremely scary, I turn the channel.

If a group of friends gather in my room or apartment and begin to
singya song that irritates me, I am more likely to let them finish
the song than to insist they stop.

I am able to rest when there are unexpected noises and movements
about me.

If I play with a group of young children I like, I prefer to play a
qulét card game rather than a running game such as tag.

If I am driving after a hard day is over, I will try to pass more
cars on the highway than I usually do.

I rarely mind if people drop in on me without calling ahead first.

If I must lose weight, I would prefer some kind of exercise rather
than diet.

I would rather do a crossword puzzle than play scrabble.

I have an easy time starting a conversation with strangers at a
party.

I like to take a nap during the day if it is possible.
If I am selecting what to wear on a cold winter day, I decide

based on what enhances my appearance rather than what protects
my health.

37.

38.
39.

40.
41.
42.
43.
44.

45.

46.

47.

48.
49.

50.

51.
52.

53.

54.

55.

69

If I hurry to catch a bus and just miss it, I'd be more likely to

 

shrug it off than to do a 1'Elow burn.
I cry very easily during sad movies.

If I have a letter to mail, I will sometimes carry it for a few

 

days before I maiTit.

I like to be off and running as soon as I wake up in the morning.
I am frightened during loud thunderstorms.

Six hours of sleep is enough for me most nights.

If I have to speak to a group, I tend to pace a great deal.

 

Considering my income and needs, I tend to go into debt more than
I should.

If I had to pick a job, I would pick a job where I sit and count

 

the number of cars passing through an intersection over a job
where I clean every door handle in an office building.

If I have to leave the house for the day, I listen to weather

 

reports so I will know what to wear.

If I go on a trip, I prefer having my itinerary planned ahead of
time.

 

I often run up stairs taking two at a time.

If I have to file federal income tax, I file as early as January
or February.

If I go to a movie and someone near by bothers me with their talking,

 

I am morelikely to ignore him than to ask that person to stop.
I enjoy spending an evening alone by myself.

If I were a good swimmer and were by a beach or pool, I would

 

still prefer lying in the sun over going for a long swim.

If I play golf, I would rather play as a twosome than play a

 

solitary game trying to beat par.

If,yafter completing a large grocery shopping, a check-out clerk
yells at me because I've left my check book at home, I would Be

 

more likely to quietly apologize than to yéll back.

When I do not have enough sleep, I become irritable.

56.
57.
58.
59.
60.
61.

62.
63.

64.
65.
66.

67.

68.
69.
70.

71.
72.
73.

74.

70
If I were camping and saw a snake, I would be more likely to stay
and see what kind it is than to run away.

I prefer interesting work where I sit at a desk over work in which
there is vigorous activity.

I don't run out of toothpaste at home because I keep a spare tube.
If I drive alone in a car with a radio in it and I cannot find my

favorite radio station, I am more likely to change stations
repeatedly than to settle on one station.

 

I would rather go to a quiet party than watch an excellent tele-
vision show alone.

If I were goingfrom the first floor to the third floor in an office
building, I would rather ride an elevator than take the stairs.

I prefer to ride rather than walk when the distance is moderate.

During my vacations I would prefer just resting to sports or
sightseeing.

If I see a big dogybarking, I will often cross the street to avoid it.
Having people around sometimes gets on my nerves.

If I have an alarm clock, I often forget to set the alarm clock
on days when I should set it.

When I am happy I sometimes smile or say hello to people I
hardly know.

I enjoy telling my friends about an interesting experience.
I often make up lists of tasks and errands to do.

If I have a library book checked out, I almost always return the
book to the library before the day on which a fine is charged.

I tend to fidget at a long lecture or sermon even when I am
interested in the subject.

When I have a problem I would rather talk to a friend than mull
it over by myself.

If I won a thousand dollarsj$1,000) in a state lottery, I would
spend most of it on a spree rather than save most of it.

If I change a light bulb, I always unplug the lamp even if it is
already turned off.

75.
76.

77.

78.

79.

80.

81.

82.

83.

84.

85.

86.

87.

89.

90.

71

PeOple approach me to get acquainted before I approach them.

If I went to a football game, I would be more likely to sit
quietly than to scream and yell aloud.

 

I usually punch elevator buttons several times in a row rather
than just once.

If Iggo alone to a restaurant, I would prefer waiting fifteen
minutes for an empty table over being seated immediately at a
table with a stranger.

 

I would rather go to see a ball game alone than stay home alone
and watch the game on TV.

If there is a good movie on TV, I would enjoy watching it with
friends more than watching it alone.

 

I sometimes forget to bring along enough money or my checks when I
go shopping.

I would rather listen to records alone than go to a concert alone.

If a traveling encyclopedia salesman came to my house, I would be

 

more likely to listen to his complete sales pitch than to quickly
tell him to stop and go away.

If I saw a pair of policemen come out of their car with guns drawn,
I would be more likely to immediately duck for cover than I would
be to first wait and see about what was going on.

 

During my leisure time, I quickly get bored just sunbathing or
lying in a shaded, grassy area.

I usually say the first thing that pops into my head without first
considering the consequences.

I often do a lot of physical exercise.
If I have been sick and confined to bed, I would be more likely to

start my usual activities as quickly as possible rather than take
an extra day to relax.

 

If I am on an express check-out lane for people purchasing fewer
than ten items and someone with a full grocery cart is in front
of me, I would be more likely to tell that person off than to say
nothing.

If I were riding on a train, I would rather have a stranger sit
next to me than sit alone for the whole trip.

 

91.

92.

93.
94.

95.

96.

97.

98.

99.

100.

72

If I had plants to care for, I would remember to water the plants

 

every time they needed to be watered.

If I joined a group of five strangers gathered to talk about

 

personal problems, I would be more likely to be the very last to

 

talk than the very first.
I would rather relax and go fishing than go on a vigorous hike.

If’I am about to leave on a trip, I will carefully plan exactly
what clothing I will take along.

In a heated discussion, I am often the first person to raise my
voice.

If I am about to buy something expensive, I will first check out
the price at many different stores rather than buy at the most
convenient store.

If a neighbor's party is too loud and is keeping me awake, I am
more likely to complain to them (or to the police) than to just
ignore the noise.

If I were lost in a strange city, I would rather ask directions
from strangers than consult a simple map.

If I stopped to join a crowd which had gathered to hear a street
band perform wonderful music, I would'be more likely to share my
enthusiasm with strangers than I would be to keep my enthusiasm
to myself.

I prefer parties where people sit and talk over parties filled
with activities.

10.

11.

12.
13.

QCDNOS

. (11)
. (l7)

. (24)

. (32)

. (40)

(42)
(43)
(48)

. (71)

(77)

(85)

(87)
(88)

APPENDIX B

TEMPERAMENT SCALE-ERMAN (TS-E) BY TEMPERAMENT CATEGORY

ACTIVITY
(True = High Activity)

I am able to work long hours without feeling tired.

If I walk with people my own height, I fing I usually walk

 

quicker than they do.

I find I often hurry to get places even when there is plenty
of time.

If I must lose weigh_, I would prefer some kind of exercise

 

rather than diet.

I like to be off and running as soon as I wake up in the
morning.

Six hours of sleep is enough for me most nights.

If I have to speak to a group, I tend to pace a great deal.

 

I often run up stairs taking two at a time.

I tend to fidget at a long lecture or sermon even when I am
interested in the subject.

I usually punch elevator buttons several times in a row
rather than just once.

During my leisure time, I quickly get bored just sunbathing
or lying in a shaded, grassy area.

I often do a lot of physical exercise.

If I have been sick and confined to bed, I would be more

 

likely to start my usual activities as quickly as possible
rather than take an extra day to relax.

73

14.

15.

16.

17.
18.

19.

20.

21.

22.

23.

24.

25.

(8)

(14)

(29)

(35)
(45)

(52)

(57)

(61)

(62)

(63)

(93)

(100)

. (10)

. (13)

74

(False = High Activity)

There are times of the day I need time to just sit and do
nothing.

If I have dinner with friends, I find that I eat my meal
more slowly than my friends eat.

 

If I play with a group ofpyoung children I like, I prefer to
play a quiet card game rather than a running game such as tag.

 

I like to take a nap during the day if it is possible.

If I had to pick a job, I would pick a job where I sit and
count the number of cars passing through an intersection
over a job where I clean every door handle in an office
building.

If I were a good swimmer and were by a beach or pool, I

 

would still prefer lying in the sun over golng for a long
swim.

I prefer interesting work where I sit at a desk over work in
which there is vigorous activity.

If I were oing from the first floor to the third floor in an
office bui1ding, I would rathEr ride an elevator than take

 

the stairs.

I prefer to ride rather than walk when the distance is
moderate.

During my vacations I would prefer just resting to sports
or sightseeing.

I would rather relax and go fishing than go on a vigorous
hike.

I prefer parties where people sit and talk over parties
filled with activities.

SOCIABILITY
(False = High Sociability)

If I am with a roup of friends and an old friend we have not

 

seen for years joins us, I would'be less likely than my
friends to give him/her a big hug.

 

 

I sometimes feel I have to get away from people for a while.

10.

11.

12.

13.
14.

15.

16.

17.

18.

\OGDNOl

. (21)

(22)

. (23)

(33)
(51)
(65)
(75)
(78)

(82)

(92)

(2)
(18)

(34)

(53)

(60)

(67)

75

If I played a violin, I would rather play alone than play

 

in a quartet.

If I have a good idea, I like to mull it over before sharing

 

it with others.

If I had volunteered to help a church or political campaign
and had—a choice Between two egually Boring jBBs, I would‘
prefer to sit alone and stuff envelhpes rather than sit
alone and do telephone canvassing.

 

 

I would rather do a crossword puzzle than play scrabble.

I enjoy spending an evening alone by myself.

Having people around sometimes gets on my nerves.

People approach me to get acquainted before I approach them.

If I go alone to a restaurant, I would prefer waiting fifteen

 

minutes for an empty table over being seated immediately at a
table with a stranger.

I would rather listen to records alone than go to a concert
alone.

If I joined a group_of five strangers gathered to talk about
personal problems, I would be more likely to be the very last

 

to talk than the very first.

(True = High Sociability)
I will sometimes take out two hours to talk to someone.

I tend to be free from stage fright in speaking or performing
in public.

I have an easy time starting a conversation with strangers
at a party.

If I playpgolf, I would rather play as a twosome than play a
solitary game trying to beat par.

I would rather go to a quiet party than watch an excellent
television show alone.

When I am happy I sometimes smile or say hello to people I
hardly know.

19.
20.

21.

22.

23.

24.

25.

(58)
(72)

(79)

(80)

(90)

(98)

(99)

. (l)
. (9)

. (16)

(19)

. (46)

. (47)

. (49)

. (59)

76

I enjoy telling my friends about an interesting experience.

When I have a problem I would rather talk to a friend than

 

mull'it over—by myself.

I would rather go to see a ball game alone than stay home
alone and watch the game on TV.

If there is a good movie on TV, I would enjoy watching it

 

with friends more than watching it alone.

If I were riding on a train, I would rather have a stranger
sit next to me than sit alone for the whole trip.

If I were lost in a strange city, I would rather ask direc-

 

tions from strangers than consUTt a simple map.

If I stapped to join a crowd which had gathered to hear a
street band perform w0ndérfUl music, I would be more likely

 

to share my enthusiasm with strangers than I would be to
keep my enthusiasm to myself.

IMPULSIVITY
(False = High Impulsivity)

I never buy clothes on the spur of the moment.

From the time I finished high school, I've known what
career I wanted.

If I started a garden, I would plant the seeds in precisely
the time of year that is best for each type of seed.

When I arise in the mornin , I usually do my regular morning
tasks in the same order. (Tasks would include washing,
brushing teeth, eating breakfast, dressing, etc.)

If I have to leave the house for the day, I listen to
weather reports so I will know what to wear.

If I go on a trip, I prefer having my itinerary planned
ahead of time.

If I have to file federal income tax, I file as early as
January or February.

I don't run out of toothpaste at home because I keep a
spare tube.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

. (69)

(70)

(91)

(94)

(96)

(5)

(12)

(20)

(31)

(36)

(39)

(44)

(59)

(66)

(73)

77

I often make up lists of tasks and errands to do.

If I have a library book checked out, I almost always return
the book to the library before the day on which a fine is
charged.

If I had plants to care for, I would remember to water the

 

plants every time they needed to be watered.

If I am about to leave on a trip, I will carefully plan

 

exactly what clothingl will take along.

If I am about to buy something expensive, I will first check

 

out the price at many different stores rather than buy at
the most convenient store.
(True = High Impulsivity)

When I was in high school, I preferred to go out with friends
and stir up excitement rather than to go to a movie with them.

If I am following a recipe, I sometimes have to interrupt

 

my cooking because I discover I am out of an ingredient.

If I had a little extra money, I would be more likely to buy

 

a big meal than to save it.

I rarely mind if people drop in on me without calling ahead
first.

If I am selecting what to wear on a cold winter day, I
decide based on What enhances my appearance rather than what
protects my health.

If I have a letter to mail, I will sometimes carry it for a

 

few days before I mail it.

Considering my income and needs, I tend to go into debt
more than I should.

If I drive alone in a car with a radio in it and I cannot
find my favorite radio station, I am more likely to change
stations repeatedly than to settle on one station.

If I have an alarm clock, I often forget to set the alarm
clbck on days when I should set it.

 

If I won a thousand dollars ($1,000) in a state lottery, I
would spend most of it on a spree rather than save most of it.

24.

25.

11.

12.

13.

(81)

(86)

. (6)

. (7)

(15)

. (25)
. (26)
. (30)

. (38)
. (41)
. (55)
10.

(64)

(74)

(84)

(89)

78

I sometimes forget to bring along enough money or my checks
when I go shopping.

I usually say the first thing that pops into my head without
first considering the consequences.

EMOTIONALITY
(True = High Emotionality)

If I hear strange noises downstairs, I am more likely to call

 

a friend than to ingore the noises and go to sleep.

Sometimes I hit or kick vending machines that take my money
and give me no product.

If I am wakened from an afternoon nap pyypeople repeatedly
honking their car horn outside my window, I would be more

 

likely to first yell at the peOple in the car than to first
close my window.

I tend to be annoyed by out of the ordinary noises.

If a TV program becomes extremely scary, I turn the channel.

 

If I am drivipgyafter a hard day is over, I will try to pass

 

more cars on the highway than I usually do.

I cry very easily during sad movies.

I am frightened during loud thunderstorms.

When I do not have enough sleep, I become irritable.

If I see a bigydogpbarking, I will often cross the street

 

to avoid'it.

If I changeya light bulb, I always unplug the lamp even if it

 

is already turned off.

If I saw a pair of policemen come out of their car with guns
drawn, I would be more likely to immediately duck for cover
than I would be to first wait and see about what was going on.

If I am on an express check-out lane forypeople purchasing
fewer than ten items and someone with a full grocery cart is
in front of me, I would be more likely to tell that person
off than to say nothing.

 

14.

15.

16.
17.

18.

19.

20.

21.

22.

23.

24.

25.

(95)

(97)

(3)
(4)

(27)

(28)

(37)

(50)

(54)

(56)

(76)

(83)

79

In a heated discussion, I am often the first person to raise
my voice.

If a neighbor's party is too loud and is keeping me awake,
I am morelikely to complain to them (or to the police)
than to just ignore the noise.

 

(False = High Emotionality)

If I hear a tornado warning, I rarely bother to take cover.

 

If someone cuts into a line I'm waiting on, I would be more
likely to say nothing than to complain to them.

If a group of friends gether in my room or apartment and
begin to sing a song that irritates me, I am more likely to
let them fihish the song than to insist they stop.

 

 

I am able to rest when there are unexpected noises and move-
ments about me.

If I hurry to catch a bus and just miss it, I'd be more likely
to shrug it off than to do a “slow burn."

 

If I go to amovie and someone near by bothers me with their
talking, I am more likely to ignore him than to ask that person
to stop.

If, after completing a large grocery shopping, a check-out
clerk yells at me because I've left my checkibook atthome, I
would be more likely to quietly apolbgize than to yell back.

If I were camping and saw a snake, I would be more likely to
stay and see what kind it is than to run away.

If I went to a football game, I would be more likely to sit
quietly than to scream and yell aloud.

If a travelingyencyclopedia salesman came to my house, I
would be more likely to listen to his complete sales pitch
than to quickly tell him to stop and go away.

LIST OF REFERENCES

Allport, G. W. Pattern and growth inppersonality, New York: Holt,
Rinehart andiWihston, 1961.

 

Birns, B., Barten, 5., & Bridger, W. H. Individual differences in
temperamental characteristics of infants. Transactions of the
New York Academy of Science, 1969, 3l(8), 107l4l082.

 

Bronson, W. C. Stable patterns of behavior: The significance of
enduring orientations for personality development. In Hill,
J. P. (ed.), Minnesota Symposia on Child Psycholo , Vol. 2.
Minneapolis: The University of Minnesota Press, 969.

 

Bronson, W. C. Adult derivations of emotional expressiveness and
reactivity-control. Developmental continuities from childhood
to adulthood. In Jones, M. C., Mayleys, N., MacFarlane,

J. W., & Honzik, M. D. (Eds.). The course of human develop-
ment. Waltham: Xerox College Publications, 197l.

Buss, A. H., & Plomin, R. A temperament theopypof personality
development. New York: John Wiley EiSons, l975.

 

Buss, A. H., Plomin, R., & Willerman, L. The inheritance of tempera-
ment. Journal of Personality, 1973, 41(4), 513-524.

 

Carey, W. B. A simplified method for measuring infant temperament.
Journal of Pediatrics, 1970, 21, 188-194.

 

Cattell, R. B. Personality: A systematic and factual study. New
York: McGraw-Hill, 195].

Cattell, R. B., & Eber, H. W. Sixteen personality factor questionnaire
"the l6 PF" magual for forms A and B. Champaign, Illinois:
Institute for Personality and Ability Testing, l962.

 

Cortes, J. B., & Gott, F. M. Physique and self-description of
temperament. Journal of Consulting Psychology, 1965, 22,
417-431.

 

Escalona, S., & Heider, G. Prediction and outcome. New York: Basic
Books, 1959.

 

80

81

Freedman, D., & Keller, B. Inheritance of behavior in infants.
Science, 1963, 149, 196-198.

Gough, H. G. Manual for the California Psychological Inventory (CPI).
Palo Alto, California: Consulting Psychologists Press, Inc.,
1964.

 

Graham, P., Rutter, M., & George, S. Temperamental characteristics
as predictors of behavior disorders in children. American
Journal of Orthopsychiatry, 1973, 43(3), 328-335.

 

Guilford, J. P. Personality. New York: McGraw-Hill, 1959.

 

Guilford, J. P., & Zimmerman, W. S. The Guilford-Zimmerman Temperament

 

Survey manual of instructions and interpretations. Bevehly
Hills, California: Sheridan Supply Company, 1949.

 

Johnson, R. H. Manual of the Johnson Temperament Analysis. Los
Angeles: Test Bureau, 1944:

 

Kagan, J., & Moss, H. A. Birth to maturity: A study in psychological
development. New York: JOhniWiley, 1962.

 

Kretchmer, E. Physique and character: An investigation of the nature
of constitution and of the theory of temperament, translated
by W. J. H. Sprott. New York: Harcourt, Brace and Company,
Inc., 1925.

 

Nunnally, J. C. Psychometric theory, New York: McGraw-Hill, 1967.

 

Plomin, R. A temperament theory of personality development: Parent-
child interactions. UnpublishedPh.D. dissertation, The
UniVersity of Texas at Austin, 1974.

 

Scarr, 5. Social introversion-extroversion as a heritable response.
Child Development, 1969, 49, 823-832.

 

Scholom, A. H. The relationship of infant and parent temperament to
the prediction of child adjustment. Unpublished Ph.D. dis-
sertation, Michigan State university, 1975.

 

 

Sheldon, W. H. The varieties of temperament: A psycholpgy of con-
stitutional differences. New York: Harper & Brothers, 1945.

 

Thomas, A., Chess, S., Birch, B. 0., Hertzig, M. E., & Korn, S.
Behavior individuality in early childhood. New York: New
York’UniverSity Press, 1963.

Thomas, A., Chess, S., & Birch, H. G. Temperament and behavior dis-
orders in children. New York: New York University Press,
1968.

 

82

Thorndike, N. L. Thorndike Dimensions of Temperament manual. New
York: Psychological Corporation, 1966.

 

Thurstone, L. L. Examiner manual for the Thurstone Temperament
Schedule. Chicago: Science Research Associates, Inc., 1953.

 

Vandenberg, S. G. Hereditary factors in normal personality traits.
In J. Wortis (ed.), Recent advances in biologicalppsychiatry,
vol. IX. New York: Plenum Press, 1967.

 

Vandenberg, S. G. The hereditary abilities study: Hereditary com-
ponents in a psychological test battery. American Journal of
Human Genetics, 1962, 14, 220-237.

 

 

Walker, R. N. Body build and behavior in young children; 1. Body
build and nursery school teachers' ratings. Mono-ra-hs of
the Society for Research in Child Develppment, I552, 22. .

Wilson, C. D., & Lewis, M. Temperament: A developmental study in
stability and change during the first four years of life.

Research bulletin 74-3. Princeton, New Jersey: Educational
Testing Service, 1974.

 

ICHIGRN STQTE UNIV. LIBRARIES

31293103135913