A: éﬂ-i\\\\ ‘-

‘ - 1":in

‘4”ng

3r: "1" '

Jug 3153113208 01

OCT 134?? 5003

 

UVIKWL ruino;
25¢ per dAy per item
RETURNING LIBRARY MATERIALS:

Place in book return to remove
charge from circulation records

 

 

 

DETECTION OF TRENDS
IN
WATER QUALITY PARAMETERS

By

Robert Hale Montgomery

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

MASTER OF SCIENCE

Department of Resource Development

1981

ABSTRACT
DETECTION OF TRENDS

IN
WATER QUALITY PARAMETERS

By
Robert Hale Montgomery

With the advent of standards and criteria for water quality param-
eters, there has been an increasing concern about the changes of these
parameters over time. Thus, sound statistical methods dealing with the
detection of trends in water quality parameters are needed.

The method presented provides: 1) formulation of a problem
(hypothesis), 2) selection of water quality parameter(s) and data,

3) data analysis techniques, and 4) statistical tests for detection of
trends. A review of water quality parameters and certain topics in sta-
tistics is also provided. The techniques are explained in a non-
statistical manner to allow usage for those not well versed in statis-

tical theory.

ACKNOWLEDGMENTS

I would like to deeply express my thanks to the members of my com-
mittee, Dr. Kenneth Reckhow, Dr. Stanley Zarnoch, and Dr. Eckhart Dersch,
for their assistance, advice, and criticism. Dr. Reckhow, my major pro-
fessor, served as the cornerstone of my graduate career. His knowledge
and professional attributes set the standards that I will attempt to
uphold. Dr. Zarnoch's knowledge and friendship played a vital role in
allowing me to grasp the numerous statistical concepts. Dr. Dersch pro-
vided the impetus to show that research must be applied to management
needs in order to be valuable.

Special thanks go to Janine Jurack for the long and tedious job of
typing this document and to Paul Schneider for the preparation of fig-
ures. Also, I would like to thank my graduate student cohorts V. David
Lee, Micheal Beaulac, Jonathon Simpson, and Ralph Ancil for their inter-
actions we had at Michigan State University.

Finally, to my present and future family, I dedicate this thesis
to, for without them none of this would have been possible.

This research was supported by a grant from the National Oceanic

and Atmospheric Administration, grant number 03-78-301-109.

TABLE OF CONTENTS

Page

LIST OF FIGURES ......................... iv

CHAPTER

I. INTRODUCTION ...................... 1
Study Objectives ................... 1
Water Quality Management ............... 2
Trend Theory ..................... 10

II. WATER QUALITY ..................... 14
Water Quality Parameters ............... 14

Data Selection .................... 24

III. STATISTICS ...................... 27
Introduction ..................... 27
Descriptive Statistics ................ 28
Measures of Central Tendency .......... 28

Measures of Dispersion ............. 31

Measures of Relationship ............ 36
Probability Distributions .............. 47
Normal or Gaussian Distributions ........ 51

Hypothesis Testing .................. 52

IV. TREND DETECTION METHODS ................ 64
Hypothesis Formation ................. 64

Data Preparation ................... 67

Data Analysis .................... 68
Graphical Techniques .............. 69
Distribution Tests ............... 75
Homogeneous Variance .............. 77
Examination of Outliers ............. 79

Time Dependency of Data ............. 8O
Transformations ................. 82
Statistical Tests for Trend Detection ........ 84
Normal and Independent Data ........... 85

Symmetric and Independent Data ......... 88

Dependent Data ................. 90

Time Series Modeling ................. 94
Application ..................... 96
Linear Trend Example .............. 96

Step Trend Example ............... 103

ii

V:

CONCLUSIONS ......................

LIST OF REFERENCES . . . . ...................

APPENDICES ...........................

A.
B.

WATER QUALITY PARAMETER DATA SETS ...........
STATISTICAL TABLES ...................

112
116
121
121
124

FIGURE
1.

10.
11.
12.
13.

14.
15.

16.
17.
18.

LIST OF FIGURES

Water quality management structure as suggested

by Ward (1973) ....................

Water quality data uses (Sherwani and Moreau, 1975)

Important water quality parameters and related uses

(modified from Sherwani and Moreau (1975)) ......

Major communities of biological organisms used in

biological monitoring (Weber, 1980) .........

Effect of distribution shape on mean (Y), median (M),
and mode (M) ....................

Normally distributed Y's given X (independent
data assumption) ...................

Frequency function for a continuous random
variable (Y) . . . .7 ................

Frequency function for a discrete random variable (Y)

The normal density function, with mean (p) and

standard deviation (0) ................

An outcome distribution

Errors in decisions based on tests of hypotheses . . . .

A typical power curve for n and for a larger n(¢) . .

Relation between probability of errors, Type I error

and Type II error ..................

Flow diagram for Trend Detection Methods ........

Time series plot of mean annual ammonia concentrations

for Lake Ontario ...................
Cusum plot of step, linear, and no trend ........

Normal, stewed, kurtotic distributions .........

Time series plot of nitrate concentration (data

set A.2) .......................

iv

Page

16

18

32

39

49
50

53
56
59
61

62
65

7o
72
74

98

19.
20.

21.

22.

Cusum plot of nitrate concentrations (data set A.2) . . 99

Time series plot of total phosphorus concentration

(data set A.3) .................... 105
Cusum plot of total phosphorus concentrations

(data set A.3) .................... 106
Plot of total phosphorus residuals (data set A.3) . . . 107

Chapter I
INTRODUCTION

Study Objectives

 

The earth is endowed with numerous natural resources available for
man's use. However, long-term maintenance of the environment is a pre-
requisite to the continuing usage of natural resources. Thus, society
is responsible for the protection and propagation of the environment for
future generations. As society becomes more technologically oriented,
the use of qualitative knowledge in decision planning can be balanced
and supported by quantitative information. The use of quantitative tech-
niques in water quality planning, for example, provides the necessary
"hard" evidence that planners and managers have lacked in recent years.
An especially weak area of water quality management is quantitative meth-
ods dealing with trend detection in water quality parameters. Trend de-
tection techniques are vital in the development of planning, policy and
management of water resource systems.

The objective of this study is to provide sound statistical methods
dealing with the detection of trends in water quality parameters. The
trend detection method is based on strong statistical techniques, but
attempts to explain the techniques in a non-statistical manner. The in-
tention of the study is to make the trend detection method available for
use by persons not well versed in statistical theory.

The remaining sections of Chapter 1 describe water quality manage—
ment, the application and usefulness of trend detection techniques to
water quality management, and provide a general review of trend theory.

1

Chapter 2 discusses water quality parameters. The emphasis is on data
availability and on selecting the appropriate parameter for a given
management decision or concern. Chapter 3 provides a general review of
the field of statistics that apply to the application and use of trend
detection techniques, particularly those dealing with time series data
and analysis. The trend detection method is explained in Chapter 4
and all phases necessary for its use are discussed (i.e., hypothesis
testing, data preparation, data analysis, statistical tests, and time
series modeling). Two applications of the trend detection method are
at the end of Chapter 4. The conclusions and recommendations on the
trend detection method are discussed in Chapter 5. There are also two
Appendices: A) containing water quality parameter data sets, and

B) containing statistical tables needed for the statistical methods.

Water Qualitnganagement

 

As the demand for water increased over the years, there has been a
profound expansion of the goals among water quality management agencies.
First, it was protection of the public health. Later, as both the
amount and variety of waste discharges grew, concern for the protection
of other beneficial uses was expressed. Recently, esthetic or social
goals of water quality have been added to the growing list of objec-
tives for water quality management. The result is an emphasis on re-
source oriented management instead of management oriented to use. Thus,
the concern has shifted from pollutants to the water resource itself.

In this context, resource protection becomes the object of public policy,

and the various beneficial uses appear as subsystems which must be

managed in a coordinated and integrated fashion to achieve resource
policy objectives (McGauhey, 1968).

Water quality management programs, or agencies, have two broad
objectives: prevention and abatement. The prevention objective is
related to maintaining the existing "good" water quality, while abate-
ment refers to reducing or moderating existing pollution conditions.
These two objectives can be subdivided into seven basic activities,
which are (Ward, 1973):

1. Planning

2. Research

3. Aid Programs

4. Technical Assistance

5. Regulation

6. Legal Enforcement

7. Data Collection, Processing, and Dissemination

Planning, research and aid programs are generally classified as pre-
vention, while technical assistance, regulation and legal enforcement
are classified as abatement. Data collection, processing and dissemin-
ation provide support to the first six activities. Figure 1 illus-
trates this type of water quality management structure.

The planning activity in water quality management has received
increased emphasis in recent years, at both the state and federal
level. The state level emphasis has been on program planning, while
at the federal level, emphasis has primarily been on project planning.
Project and program planning are similar in that a plan for future
action is being developed, but the plans differ in the amount of detail

required. Program planning deals with broad definitions in program and

.Amxmpv new: as umummmmam mm wgzpuacum “cosmmmcms zu_—a=c swam: . _ acamwu

 

 

 

:o_pawuwcm m:_mmmooca co_uumppou
cowau< copumEcotcH cowumsgowcH

 

 

_. .-_

 

 

 

 

pocucou
Emgmocm

 

 

 

 

 

 

 

 

mucmam_mm<, m:o_um_=mwa mocmumwmm< - , msmcmogm ,. cugmmmom - mcwccm_¢
_amas _au_=;oap u_<

 

 

 

 

 

~11, _,-T-ﬂ _

 

 

 

 

acmEmuon< coppcm>mc¢

=o_a=__oa . copaappoa

 

 

 

 

 

 

work plans. Consequently, general overall trends in water quality best
serve the program planning function at the state level. Project plan-
ning entails specific planning related to detailed evaluation of future
water quality effects as a result of specific causes (e.g., waste
treatment plant location, number of pollution sources). Thus, the change
in a specific parameter over time is the concern in project planning.
The detection of trends in water quality parameters is extremely
important for both program and project planning.

The general objective of research is to conduct studies and inves-
tigations on issues relating to water quality. Included in the research
objective is the implementation of research findings. All too often,
research results are not put to an effective use. It is in the research
phase that the detection of trends in water quality parameters is
carried out.

The activity denoted under aid programs is: 1) accepting and
supervising of loans and grants, 2) processing applications and admin-
istration of loans or grants, and 3) certification of the need for
money for various projects. The information used to verify a need for
a certain project can often deal with how a parameter(s) has or will
change when the project is implemented. Thus, past trends can pro-
vide the impetus to invoke a project and future changes can be used to
evaluate the effectiveness of the project.

The objective of technical assistance is to advise, consult, and
cooperate on technical matters concerning water quality. Technical
assistance may be given to other agencies (federal, state, local) or
to private enterprises. In order for technical assistance to be given,

valid information about water quality parameters is needed. A major

form of information disseminated concerns the changes in water quality
parameters over time and methods of determining (quantifying) the
change.

The objective of water quality regulations is the maintenance of
water quality. This objective has traditionally been the most impor-
tant to public agencies given the money and manpower devoted to its
accomplishment. Regulation entails the development of water quality
standards and the routine surveillance of water quality to insure com-
pliance with the established standards. Regulation also involves devel-
oping pollution abatement requirements and establishing procedures to

assure that future actions do not cause violations of standards. Im-

/
/
/

portant uses of trend detection for regulation are: 1) to evaluate the 9’
progress of abatement programs toward meeting standards, and 2) to iden-
tify emerging water quality problem areas so that prevention can be
affected, before abatement programs are required.

The object of legal enforcement is to enforce water quality stan-
dards when standards cannot be maintained by persuasion. The need
for quantitative factual information on water quality parameters for
legal evidence is paramount. Until recently, environmental lawyers
had only qualitative information to support their position. Subjec-
tive evaluations, however, may be easily contradicted by the opposing
side's experts. With the use of statistical techniques to provide
quantitative information on water quality trends the courts should
have a clear path to follow (i.e., what the data dictate).

Successful accomplishment of the six objectives outlined depends
upon good data collection, processing and dissemination. The first

step toward good water quality data is to select a representative

sample. Following proper sample preparation, handling, and laboratory
procedures, with quality control, the results of the analysis are
processed. Data processing involves screening, verification, inter-
pretation, indexing, storage and retrieval of the data. Data dis-
semination is accomplished through the generation of reports. The

idea of proper sampling design cannot be overstressed. For a detailed
review of sampling design, consult Cochran (1977), Beckers et al. (1973),
Sanders et al. (1976), and Ward (1973).

All phases of planning, development and operation of a water
quality management system require data to quantify existing states of
the system, to forecast future changes and trends, and to predict the <
response of the system to intervention. To evaluate alternative ways
of combating perceived conditions of water quality deterioration re-
quires objective measurements that: 1) describe the observed condi-
tions, and 2) can be related to appropriate measures of water resource
management. A number of parameters have been observed which might
provide objective criteria to study trends in water systems. An impor-
tant concern with water quality parameter selection for trend detec-
tion analysis is whether the parameter provides the best measure of the
perceived condition or qualities of concern. The kind of data required
and their spatial and temporal resolution depend upon the nature of the
problem and type of intervention under consideration. A partial list
of needs and uses of water quality data is shown in Figure 2 (Sherwani
and Moreau, 1975).

Relatively few studies have been conducted on trends in the quality
of the nation's waters. The emphasis of past water quality trend analy-

sis has been primarily with river water quality (Holman, 1971; EPA, 1974;

Public Interest:

Planning:

Regulation and Control:

Public Health

Aesthetics

Nuisance

Ecological balance
Conservation

Natural state preservation

Recreation

Water and related land use planning
Economic planning

Urban planning

Identification of sources
Fate of pollutants

Description of present state of
quality

Prediction of water quality
Evaluation of trends

Available control strategies
and tactics

Measurement of progress in pollution
abatement

Episodic effects
Non-degradation policy
Research

Legislation

Public hearings

User-oriented reports

Figure 2. Water quality data uses (Sherwani and Moreau, l975)

Lettenmaier, 1977), and only a few with lake trend analysis (Rockwell
et al., 1979; Chapra, 1980; Dobson, 1980). The reason for the lack of
water quality trend studies is due to the number of disabilities that
interfere with a truly adequate statistical analysis of time series
data. The major disabilities that affect the use of proper statistical
procedures are (Holman, 1971):
1. Short records of water quality data.
2. Techniques of observation and analysis have
changed over the years.

3. Changing location or frequency of observations.

4. Correlations that relate specific variables to
limnologic and hydrologic behavior are rarely
available.

5. Natural background often hides water quality
trends.

6. Explanation of trends requires a knowledge of
the economy and land use in the area.

Most measures of natiOnal growth suggest that demands on water
resources are increasing. It is these increasing demands for water
resources that usually result in deterioration of water quality. Thus,
even without proof, one might assume that water quality conditions are
getting worse. The problem is to precisely determine the relationshipsﬁEIF—)
between the pressures posed by society and the responses of water quality
parameters. The main interest for the management of a water resource is
not its present state, but how the state of the system has changed and
will change in the future. Thus, the use of trend detection techniques

to quantify the change in water quality parameters over time is critical.

10

By providing information on water quality trends, one can determine
whether a given water resource is improving, deteriorating or station-
ary under the current conditions. Therefore, management and policy
decisions can be developed, with a degree of certainty, based on the
changing system. For example, if a lake with excellent water quality
was exhibiting a declining trend, emphasis could be placed on an effort
to control the situation. Conversely, a lake exhibiting poor water
quality, but with an improving trend is evidence of good recent manage-
ment. If planning is to be based on sound information and the effec-
tiveness of management alternatives are to be judged, then observational
tools (e.g., trend detection methods) must be designed and implemented

in water quality management.

Trend Theory

 

Lake water quality parameters are subject to continual change
over time. The inputs to the lake, outputs from the lake, and the
lake itself are variable. Thus, lake water quality data arise from a
nonstationary process. To provide a representation of the changing
conditions over a period of time, a time sequence of measurements on
water quality parameters is needed. The data exist in a time series,
which preserves the order of occurrence. The measurements may be taken
at approximately regular intervals or values may be obtained as averages
over fixed periods. The measurements should be taken over a sufficient-
ly long period of time using similar methods. Usually most existing
water quality data are: 1) not measured over sufficient periods of

“M

time, 2) measured at different times and time intervals, and 3) variable,

"Ii/-
‘0 w- “’r

11

due to changes in sampling methods and laboratory techniques. Thus,
inherent problems exist in trend detection of water quality parameters.

In order for a trend to appear, the data from a time series must ,
display a non-random pattern. In a non-random series the observations
cannot be explained by purely random variation at a given level of
significance. Non-randomness can arise from: 1) presence of a trend,
2) cyclicality, and 3) serial correlation. A series exhibits a trend if
the values of its members show a tendency to increase or decrease. A
cyclic, or periodic, series occurs when the values rise or fall in a
regular fashion. Serial correlation occurs when a value is dependent
upon previous values.

A time series can be considered to be made up of two parts:
1) a systematic or deterministic component changing over time in a
regular and predictable way, and 2) a random component superimposed
on the regular part (Sherwani and Moreau, 1975). The changes in the
deterministic part are those attributable to fundamental changes in the
nature of the process itself. The random component represents short-
term fluctuations due to transitory factors. Often, a trend may be
obscured by superimposed random variation. Also, it is difficult to
detect small trends in a short series.

A general time series equation is

x =

t “t + et (1)

time series of data

2
=-
(D
1
(D
x
r.-
II

the trend

random variation (noise term)

0
("P
II

12

Two of the most common trends are step and linear trends
(Lettenmaier, 1977). A step trend is an instantaneous jump in the

mean level at some point in time, i.e.,
xt = “1 + [“2 ' ”1]T + 9t (2)

where: u1 = true mean of first part of record

“2 = true mean of second part of record

[L1T

function with value zero for t §_T
and L for t > T , where t is start of

step trend

A linear trend is simply a uniform increase in the mean level:

xt = u + t/TAu + et (3)
where: t = time length of each time unit
T = total time length of time series

For large sample sizes, or when the variance of e is known, the

t
power of the classical two-tailed t-test (Breiman, 1973) against step
and linear trends, where the noise terms et are statistically inde-

pendent and normally distributed with mean zero and variance 0: , is

1 ' B = F(NT - W1_a/2) (4)

where: F cumulative distribution function of a standard
normal probability distribution
W1_a/2 = standard normal quartile at probability

level 1 - a/Z

13

T n
_ r
NT - 20 (for a step trend) (5)

 

N = Tr n(n + 1) (n - 1) 3 Tr n

T n [120e 120

Tr = trend magnitude (7)

(for a linear trend) (6)

 

 

lul - “2' for step trend

Au for linear trend

Trend equations can be expanded to include seasonal movement or modi-
fied to a multiplicative system (log or polynomial) instead of the
additive system, as in the step and linear trend (consult Lettenmaier
(1977) and Sherwani and Moreau (1975) for more information).

A step trend is a permanent increase or decrease in the value of
the water quality parameter. This may, for example, result from a
permanent change in land use or the construction of a waste treatment
plant. A linear trend is a steady upward (or downward) movement in a
water quality parameter. This may, for example, result from a change
in agricultural or urban runoff (i.e., nutrient loads).

The classical hypothesis testing framework may be used to deter-
mine the existence or nonexistence of trends in water quality param-
eters. The null hypothesis, H0, is that no trend exists, while the
alternative hypothesis, H1, is that a trend does exist. The choice
between H0 and H1 is made on the basis of a test statistic, computed
from the data. The test statistic is compared to a probability out-
come distribution, and H0 is either accepted or rejected at a given

level of confidence.

CHAPTER II
WATER QUALITY

The quality of water is affected by many factors, some of which
are 1) nutrients, 2) organic material, 3) toxic chemicals, 4) dissolved
and suspended solids, 5) dissolved oxygen, and 6) pH. The number of
factors required to specify water quality is limitless, changing both
the physical, chemical and biological environment and the socio-economic
activities within the water system. Any consideration of water quality
should also include water quantity, because quality and quantity of
water are interrelated. A knowledge of ambient water quality is re-
quired to: I) ascertain reliable and accurate information on the
current status of water quality required for planning of water re- /
sources, 2) provide reliable data to assess long-term trends and over-
all changes in water quality, 3) determine the degree to which water
quality is improved as a result of pollution abatement measures, 4) in-
dicate problem areas requiring corrective actions, and 5) determine the
extent of compliance and non-compliance with water quality standards

(Sherwani and Moreau, 1975).

Water Quality Parameters

Water quality management seeks to insure levels of pollutants in
surface waters which will not interfere with desired uses. Thus, water
quality depends directly on the use of the water. There are numerous
physical, chemical, and biological parameters which are significant in

14

15

determining water quality. Figure 3 lists some parameters and the role
they play for various water uses.

Water quality parameters may be general or specific purpose param-
eters. The general purpose parameters provide basic information about
water quality. Some general purpose parameters are: 1) dissolved oxy-
gen, 2) pH, 3) temperature, and 4) conductivity. Special purpose param-
eters relate to definite processes, activities, or pollution sources
affecting water quality. Examples of specific purpose parameters are:
1) BOD, biological oxygen demand, 2) fecal coliforms, and 3) toxic
chemicals.

There are numerous parameters that affect water quality and
discussion of all of them is not possible here. Therefore, only the
major parameters are discussed below. For more information on these
parameters and others, consult: 1) EPA, (1976), 2) Wetzel, (1975),

3) Hutchinson (1957), 4) McNeely (1979).

Alkalinity_refers to the quantity and types of compounds, mainly

 

inorganic carbon, which collectively shift the pH to the alkaline side
of neutrality. The forms of inorganic carbon in fresh waters are:

1) free carbon dioxide (C02), 2) carbonic acid (H2C03), 3) calcium
bicarbonate (HC03'), and 4) carbonate (CO3=). These four species of
inorganic carbon form an equilibrium between each other based primarily
on pH. As the pH increases, the dominant species moves from free carbon
dioxide to carbonate ions. Alkalinity is a measure of the buffering
capacity of water. Since pH has a direct effect on organisms and an
indirect effect on the toxicity of certain pollutants in water, the
buffering capacity of water is very important to water quality. Long

term trends in alkalinity are extremely important in those areas

16

Use

 

 

Parameter~

1. Temperature
2. Turbidity
3. pH
4. Dissolved Oxygen
5. BOD
6. Suspended Solids
7. Total Dissolved Solids
8. Coliform
9. Nutrients
10. Organics
11. Heavy Metals
12. Radioactivity
13. Oil
14. Color
15. Conductivity
16. Chlorophyll

Figure 3.

Aquatic life, Industrial use, assimila—
tive capacity, recreation

Drinking water, recreation, industrial
use

Industrial use, aquatic life, recreation

Aquatic life, aesthetics, industrial
use, assimilative capacity

Food and beverage industries, recrea-
tion, assimilative capacity

Aesthetics, photosynthesis, reservoir
capacity depletion, hydroelectric power
generation, navigation

Irrigation, water supply, industrial use
Direct-contact water-based recreation,
water supply, food and beverage indus-
tries, irrigation

Eutrophication, aesthetic degradation,
secondary effects on aquatic life

Water supply, industrial use, aquatic
life

Water supply, aquatic life
Water supply
Recreation, industrial use

Aesthetics, water supply, recreation,
industrial use

General parameter of water quality

Biological activity

Important water quality parameters and related uses

(modified from Sherwani and Moreau (1975))

17

receiving acid rain. Acid rain tends to lower the natural pH of waters
with low buffering capacity. Thus, monitoring of alkalinity changes
may serve as an indicator of deteriorating water quality due to acid
rain. Another important parameter for trend analysis in water quality
is the hypolimnetic inorganic carbon accumulation. The accumulation of
inorganic carbon in the hypolimnion of a lake can be used to estimate
indirectly the organic production in a lake's epilimnion and metalimnion
(for more information consult Wetzel, 1975).

The use of biological organisms fbr evaluating water quality is
an old concept that only recently has received attention. The response
of sensitive biological indicator organisms to an environmental stress
provides an early indicator of changing water quality. Biological
organisms are: 1) uniquely sensitive to multiple environmental stresses
operating consecutively or simultaneously, and 2) integrate the effects
of environmental stresses over time. Thus, the ability to selectively
accumulate, biomagnify, and show the synergistic effects to exposure
from environmental stresses gives bioindicators useful properties for
determining trends in water quality. Figure 4 lists the major communi-
ties of biological organisms and the parameters used to describe them
for water quality evaluation. One problem with the use of biological
organisms is the difficulty in obtaining quantitative numbers and
levels on the organisms. The use of biological organisms for detection
of trends in water quality has tremendous possibilities, especially
when dealing with water systems affected by numerous stresses. For
more information on biological organisms and monitoring consult 1) Norf,
(1980), and 2) Cairns, et al. (1977).

Organic carbon is of two forms, particulate or dissolved, and

18

Community Parameter

 

Plankton Counts and identification
Chlorophyll a
Biomass as ashfree weight

Periphyton Counts and identification
Chlorophyll a
Biomass as ashfree weight

Macrophyton Areal coverage
Identification
Biomass as ashfree weight

Macroinvertebrate Counts and identification
Biomass as ashfree weight
Flesh tainting
Toxic substances in tissue

Fish Toxic substances in tissue
Counts and identification
Biomass as wet weight
Condition factor
Flesh tainting
Age and growth

Figure 4. Major communities of biological organisms used in
biological monitoring (Weber, 1980)

19

results from outside of the lake (allochthonous) or from within the lake
(autochthonous). The dissolved to particulate organic carbon ratio
approximates 6:1 to 10:1 in most natural water bodies (Netzel and Rich,
1973). The biochemical trnasformations of particulate and dissolved or-
ganic matter by microbial metabolism are fundamental to the dynamics of
nutrient cycling and energy flux within aquatic ecosystems. Thus,
trends in organic carbon may provide valuable information on possible
changes in water quality.

Iron is an essential trace element required by both plants and
animals. In some marl lakes where iron is precipitated by the highly
alkaline conditions, iron may be the limiting factor for algal growth.
Iron is also a vital oxygen transport element in the blood of all
vertebrate and some invertebrate animals. Iron exists in solution in
water as either the ferrous (Fe++) or ferric (Fe+++) state. Amounts
of iron in solution in natural water, and rate of oxidation of Fe++ to
Fe+++, as occurs in oxygenated waters, are dependent primarily on:

1) pH, 2) Eh (redox potential), and 3) temperature. As the hypolimnion
of a lake goes anaerobic, ferric iron is reduced to ferrous iron. This
reduction in the state of iron causes a release of iron-bound phosphorus
from the sediments, because ferrous iron is quite soluble. Thus, the
changing concentration of both forms of iron over time are very important
in lakes, consult: 1) Hutchinson (1957), and 2) Stumm and Morgan, (1970).

Microbiological organisms (bacteria, viruses, etc.) have been used

 

to determine the safety of water for drinking, swimming and shellfish
harvesting, for it is well known that water may serve as a medium for
the transfer of disease (EPA, 1976). The exact relationship between

numbers of specific disease-causing organisms in water and the potential

20

for transmission is unknown. However, the numbers and biomass of
bacteria tend to increase with increasing concentrations of inorganic
and organic compounds in lakes (i.e., change from oligotrophic to
eutrophic lake). The seasonal distribution of bacterial populations
is highly variable between lakes and within a lake between years. In
some cases, bacterial populations are correlated to numbers of phyto-
planktonic algae (Netzel, 1975). Bacteria of the coliform group are
usually considered the primary indicators of fecal contamination from
warmblooded animals.

Nitrogen occurs in fresh water in numerous forms: 1) ammonium
(NH4+), 2) nitrite (NOZ'), 3) nitrate (N03'), 4) dissolved molecular
(N2), and 5) a large number of particulate and dissolved organic com-
pounds (Wetzel, 1975). Ammonia is a pungent, gaseous, alkaline com-
pound of nitrogen and hydrogen and is highly soluble in water. Ammonia
is generated primarily from decomposition of organic matter by hetero-
trophic bacteria and as an excretory product of animals. It is present
primarily as NH4+ and as undissociated NH4OH (NH3 is sometimes used),
the latter being highly toxic (Trussell, 1972). The proportions of
NH3 to NH4OH are dependent on 1) pH, 2) temperature, and 3) ionic
strength (salinity). Ammonia usually increases as lake productivity
increases or when the hypolimnion is anaerobic. Nitrite (NOZ') levels
in natural lake waters are usually very low, while nitrate (N03') is
much more common. Nitrogen may often be the limiting factor in algal
productivity, especially when large amounts of phosphorus have been
added to the lake by man's activities. Vollenweider (1968) found a
direct correlation, with some exceptions, between productivity of algae

and average concentrations of nitrogen. Since nitrogen is a dominant

21

factor in lake systems, the analysis of trends in nitrogen concentra-
tions are extremely useful in the examination of water quality.

951922 is a fundamental parameter in lakes and is a major parameter
for evaluating water quality. Dissolved oxygen has a direct effect on
the maintenance of aquatic life. Insufficient dissolved oxygen in the
water causes: 1) decrease in numbers and kinds of aquatic life, 2) de-
composition of organic materials, and 3) release of nutrients from the
sediments and formation of anaerobic gases (e.g., hydrogen sulfide and
methane). Numerous criteria for dissolved oxygen have been established
for most forms of aquatic life (EPA, 1976). While mean oxygen concentra-
tions are mainly used for water quality, oxygen deficits may also pro-
vide valuable information. The oxygen deficit, in lakes, is the differ-
ence in the amount of oxygen present at the beginning and at the end of
stratification below a given depth. The oxygen deficit reflects the
amount of organic matter synthesized by measuring the rate of oxygen
utilization, thus provides an indirect estimate of lake productivity.
The use of oxygen as a parameter for water quality trend analysis is
extremely effective.

pﬂ_is a measure of the hydrogen ion activity in water and results
from the dissociation of water to H+ and OH' ions. The pH usually is
defined as the logarithm of the reciprocal of the concentration of free

1
).
H+

 

hydrogen ions (i.e., pH = log In truth, pH measures the
activity of the hydrogen ion and not the concentration. The pH of
natural waters is governed mainly by the carbonate system (discussed in
inorganic carbon). pH affects the dissociation of weak acids and bases
of many toxic compounds. The solubility of metal compounds in bottom

sediments or suspended material also is affected by pH. As with

22

alkalinity and inorganic carbon, pH may provide valuable information on
water quality in areas receiving acid rain.

Phosphorus is one of the major nutrients required for algal and

 

macrophyte nutrition and is often the limiting factor for productivity
in lakes. When phosphorus is limiting, increased supplies of phosphorus
have led to a condition of accelerated eutrophication or aging of waters.
The majority of phosphorus in lake water is bound organically in organic
phosphates and cellular constituents (Wetzel, 1975). The only signifi—
cant form of inorganic phosphorus in natural waters is orthophosphate
(PO4=, H2PO4'). Phosphorus is often stored, consolidated in lake sedi-
ments (i.e., phosphorus sink), with some being released under anaerobic
conditions. The amount of phosphorus, and other nutrients, retained by
a lake is a function of: 1) the phosphorus loading to the lake, 2) the
volume of the euphotic zone (zone that receives light), 3) the extent of
biological activities, 4) the lake detention time, and 5) the level of
discharge (outflow) from the lake (EPA, 1976). Phosphorus is probably
the most studied of all lake parameters, and only a small fraction is
presented here. For more information on phosphorus in natural waters
consult: 1) Hutchinson, (1957), 2) Wetzel, (1975), and 3) Hynes, (1970).
Physical parameters describe the physical characteristics of lakes.
Some physical parameters are: 1) temperature, 2) inflow and outflow,
3) Take volume, 4) lake and shoreline area, 5) mean depth, 6) hydraulic
detention time, and 7) sediment characteristics. Changes in the physical
state of a lake may have definite effects on lake water quality. For
example, reduction in lake volume may decrease the hypolimnion of the
lake. Thus, an oxygen depletion may occur due to the small volume of

water (and oxygen) in the hypolimnion. The usefulness of physical

23

parameters is increased when dealing with reservoirs, where physical
characteristics are easily controlled. For more information on physical
parameters consult: 1) Hutchinson, (1957), and 2) Wetzel, (1975).

The salinity of inland waters is made up primarily of the major
anions, bicarbonate (HCO3'), carbonate (CO3=), sulfate (504:), and
clorides (CL’) and major cations, calcium (Ca++), magnesium (Mg++),
sodium (Na+) and potassium (K+). The proportions of major ions in
natural waters tend towards Ca > Mg 3_K and C03 > $04 > C1, with Na and
CL having larger concentrations in soft waters (Wetzel, 1975). Mag-
nesium, sodium, potassium, and chloride are relatively conservative ions
and undergo minor spatial and temporal changes within a lake. Calcium,
inorganic carbon, and sulfate are non-conservative (dynamic) ions and
their concentrations are strongly influenced by biotic activities. Soft
and hard waters refer to conditions of low and high salinity levels, re-
spectively in natural waters. The ratios of monovalentzdivalent cations
and the proportions of cations influence the metabolism of many organ-
isms. Therefore, salinity can indirectly affect seasonal population
succession and productivity of certain algae and macrophytes.

gﬁlj§a_(5i02) is a major component in algal production, especially
for diatom algae which utilize silica for cell wall formation. Seasonal
population dynamics of diatoms can greatly influence silica concentra-
tions in natural waters (Lund, 1949; 1950). In a lake dominated by
diatoms, sedimenting diatom frustules can accumulate within the sedi-
ments and be lost permanently to the system. Thus, silica concentrations
will exhibit decreasing trends and affect diatom populations. The con-
centration of silica is very important for water quality trends in lakes

where diatom algae are present (e.g., Lake Michigan).

24

Other water quality parameters, and references for information on
each of the parameters, are: 1) toxic compounds (EPA, 1976), 2) micro-
nutrients, (boron, calcium, cobalt, molybdenum, zinc, etc.) (Wetzel,
1975), 3) dissolved and suspended solids (Hutchinson, 1957), and
4) organic matter (Hutchinson, 1957). The water quality parameters
discussed above are far from all inclusive, but are usually the major
parameters that are used for evaluating water quality and for which

sufficient data has been collected for to allow good trend analysis.

Data Selection

 

The process of selecting data for use in water quality trend
analysis consists of four phases:

1. Determine desired output
Determine availability of data

Choose data to fit needs

DOOM

Place data in desired form
The first three are discussed in this section, while the last, data
preparation, is discussed in Chapter 4.

The first step in data selection is to determine the desired output,
or the information that is needed. When dealing with water quality
trends, for example, the desired output may be the changes in the overall
quality of water or changes in a specific parameter. Thus, the desired
output may determine the exact parameter to use. For example, if the
management concern is whether a lake is maintaining its ability to
sustain trout populations, trends in temperature and dissolved oxygen,

the two critical factors for trout, should be analyzed. Often, the

25

desired output may be achieved by analyzing one of a number of param-
eters. For example, when trends in overall water quality are the
desired outputs, examination of phosphorus, nitrogen, oxygen or others
may be used. Also, when overall water quality is desired, the analysis
in trends in trophic status indicators may provide more information than
does any single parameter. Some trophic state indicators are:

1) Carlson (1977), 2) Walker (1979), and 3) Reckhow (1980) who provides
a review of trophic state indicators.

Once the desired output has been defined, the data that are avail-
able must be determined. The major concern with data availability is
which parameters have been sampled. Water quality data has been collec-
ted by countless individuals for numerous reasons. Ideally, one would
like to use their own data. However, when dealing with water quality
trend analysis, especially annual trends, one will often need to use
data from other sources. There are numerous water quality data banks,
for example: 1) STORET (EPA), 2) WATSTORE (uses), 3) NAWDEX (USGS), and
4) several state data systems (see Edwards (1980) for a review of water
quality data systems). An important issue when dealing with different
or multiple data sources is whether the data are mutually compatible.
Similar sampling designs, sampling devices, laboratory techniques, and
methods are a prerequisite to combine data. A major concern in time
series analysis is the need for continuously spaced data over time,
without missing values. Water quality data usually are unevenly spaced
in time, with missing values. Thus, the combination of data from
different sources may be necessary. Usually, the more data one has,
the more information one can obtain. However, extreme care must be

taken when combining data sets from various sources.

26

Based on the desired output and the availability of data, data
should be chosen for trend detection analysis. Ideally, one has a
definite parameter in mind based on the desired output and excellent
data on these parameters collected over time. In reality, the choice
of a parameter(s) and the choice of data on this parameter to use, are
highly subjective. It may be necessary to use the parameter desired,
with low sample size, or use another parameter (related to that desired,
limnologically or by correlation) with a larger sample size. The choice
may be remedied by analysis of a few parameters in order to use all of
the results in the management and planning of water resources. In
closing, the selection of an appropriate parameter(s) and associated
data is extremely important in the application of trend detection
analysis, for the information produced from the analysis can only be

applied on the basis of the data used.

CHAPTER III
STATISTICS

Introduction

 

In order to apply trend detection techniques, a general knowledge
of statistics is necessary. The key statistical topics relevant to the
application of time series analysis are descriptive statistics, proba-
bility distributions, and hypothesis testing. The information in this
chapter provides the basic theory of these statistical concepts in an
informal manner. The first section presents the statistical aspects
of descriptive statistics. The application (when and where to use
descriptive statistics) is discussed in the beginning of Chapter 4.
The second section on probability distributions is a theoretical pres—
entation which is needed to understand hypothesis testing. Finally,
hypotheses testing is discussed and provides the basis of statistical
trend detectidn techniques.

This chapter is not meant to serve as a substitute for a statis-
tical textbook, but is intended to present only the necessary areas of
statistics needed to properly apply and analyze the trend detection
techniques presented in Chapter 4. The following books are suggested
for a more complete treatment on these subjects: 1) classical statis-
tics (Sokal and Rolf, 1969; Bhattacharyya and Johnson, 1977; Neter and
Wasserman, 1974), 2) nonparametric statistics (Conover, 1971; Siegel,

1956; Hollander and Wolfe, 1973), and 3) data analysis and regression

27

28

(Mosteller and Tukey, 1977; Reckhow and Chapra, 1980; Tukey, 1977;
Reckhow, 1980; Chatterjee and Price, 1977).

DeScriptive Statistics

In order to apply statistical methods, a concise description of the
data is necessary. This can be achieved by performing arithmetic opera-
tions on the data to obtain values for one or more descriptive measures
or statistics.

There are three catagories of statistics used to describe random
variables in a particular population: 1) measures of central tendency
(location of an ordinary value), 2) measures of dispersion (relative
distance of extreme values from a central value), and 3) measures of
relationship between variables (degree of similarity or dissimilarity

in magnitude).

Measures of Central Tendency

In most sets of data there is a tendency for the observed values
to group themselves about some central value. This central value is
characteristic of the data and may be used to describe the central ten-
dency of the data's distribution. The statistics that describe this
phenomenon are measures of location or central tendency. Common mea-
sures of location include the arithmetic mean, median, and mode.

The average, or arithmetic mean, is the most frequently used of all
statistical measures. The population arithmetic mean (or simply mean)

of a particular random variable Y in a population is usually denoted

29

by the Greek letter O or “X' The estimate of “X for samples of size n

is:
. _ n
“X = X = .E XT/n (8)
1-l
where: BX = X = estimate of population mean (sample mean)
X. = ith observation

1
n = sample size (i.e., number of X's).

The mean is calculated by summing all the individual observations
(X1, X2, X3, ..., Xn) of a sample and dividing this sum by the number of
items (n) in the sample. If, for example, three total phosphorus con-
centration measurements of 1, 2, and 6 mg/l were taken from a lake,

their sample mean concentration is:

-_1+2+6_g_
X-——3———-3-3mg/l

The median is the value of‘a random variable that ranks midway
between the largest and smallest values. It can also be defined by the
value of the variable (in an ordered array) that has an equal number of
items on either side of it, (i.e., divides the frequency distribution in
half).

If n is an odd integer, the sample median is the (n + 1)/2th number
in the ordered array. For example, consider this array of oxygen concen-
trations 1, 1, 2, 3, 7, 8, 11, 12, 14, 19, 20 mg/l. It has 11 observa-
tions and the (11 + 1)/2, or the sixth number, is equal to 8. Hence,
the median oxygen concentration of this set of values is 8 mg/l. If n
is even, the median is calculated as the midpoint between the (n/2)th

and the [(n/2) + 1]th variate. Thus, from a sample of 6 oxygen

30

concentrations, 5, 8, 10, 11, 12, 13 mg/l, the median would be the mid-

th value. For this example,

point between 6/2 = 3rd and [(6/2) + 1] = 4
the median is 10.5 mg/l. The median is especially useful when the
random variates exhibit skewed (asymmetric) distributions (Mosteller
and Tukey, 1977), for example, when dealing with lake phosphorus concen-
tration, in a eutrophic lake.

The mode refers to the value occupied by the greatest number of
individuals in a frequency distribution. When applied to a frequency
distribution it is the value of the variable where the probability
density function peaks. Given a set of nitrate concentrations of 1,

2, 3, 3, 3, 4, 4, 5 mg/l, the mode would be equal to 3 mg/l.

The midrange is the average of the largest and smallest value.

The midrange provides a quick and easy measure of location but is
subject to extreme variation from sample to sample unless the samples
are quite large.

The geometric mean is the antilog of the arithmetic mean of the
logarithms of a set of values, and it is always as small or smaller
than the arithmetic mean of the same set of values. The geometric mean
is computed as:

n
G.M.x = antilog %- 2 log X (9)
i 1

The harmonic mean is the reciprocal of the arithmetic mean of the
reciprocals of a set of values and is always as small or samller than
the geometric mean of the same set of values. The harmonic mean is

computed as:

31
(10)

X
MSH

.1. 1
n X

i=l

In making a decision as to which measure of location should be used
with a given set of data, a primary consideration is the intended use
once of the measure once selected. In addition, the advantages and
disadvantages inherent in each of the measures of location should be
known. If the distribution of the data is symmetrical and unimodel, the
mean, median, and mode are identical, but as the distribution becomes
skewed, differences among these measures will occur. This is illus-
trated in Figure 5.

In addition to the intuitive appeal for the use of the mean, the
mean also has smaller variability from sample to sample, is easier to
work with mathematically, and has more desirable properties in connec-
tion probability distributions than other measures. However, the mean
is sensitive to extreme values, particularly when n is small.

In general, the mean is recommended when the distribution is
normal or uniform. Robust statistics (e.g., median) are preferred when
the data distribution is skewed or irregularly shaped or when insuffi-
cient information is available, i.e., n is small (Mosteller and Tukey,
1977; Tukey, 1977). If the distribution is questionable as to whether
the distribution is normal or not, it is desirable to use a couple

different measures of location to provide more information.
Measures of Dispersion

In most sets of statistical data the numerical values will not be

identical, but will be scattered or dispersed to some degree. The

32

.Asv macs can .sz newcma .Amv came so wacgm cowuanweummu co Hummem

.m acumen

 

 

 

 

 

 

 

 

 

 

 

   

   

 

E a 2.2g.
_ . .
_. . u
.. . .
. . . _
. . . _
.. .
._ _ HO _
. u _ no u
. . II.
_ . _ MW _
"u _ A. _
. . . no — _
.
u .3 - _
m _ _
n b ._
nu
. a _
_ nUu .
.An

 

Iouanban aMiDIGU

 

33

statistics used to measure this characteristic of the data are measures of
variation or dispersion. Several sets of data could have the same, or
nearly the same, mean, median, or mode, but vary considerably in the

level of dispersion around a central value. Thus, a more complete de-
scription of the data results when we evaluate one of the measures of
variation in addition to one or more of the measures of location.

The range is the difference between the largest and smallest values
in the data. Using data set A.1 (found in Appendix A), the range is 7.1.
Although the range is easy to calculate and is commonly used as a rough-
and-ready measure of variability, it is generally not a satisfactory
measure of variation for three reasons. First, the calculation involves
only two of the observations, regardless of sample size. Therefore, it
utilizes only a fraction of the available information concerning varia-
tion in the data. Secondly, since the range tends to become larger as
sample size increases, it is improper to compare ranges from two sets of
data with different sample sizes. Finally, the range is very unstable
except in small sample sizes. With repeated samples taken from the same
source, the ranges will exhibit more variation from sample to sample
than will other measures of variation. However, the use of the range
differs from the other measures in that it provides a relatively good
measure of variation for small numbers of observations.

Among measures of variation, the standard deviation and its square
the variance, are almost universally accepted as the most useful dis-
persion statistics. The standard deviation for a particular random
variable X in a population usually is denoted by the lower case Greek
letter sigma (o), and the variance by 02 or oi . The estimate of 02

from a particular sample size n is:

n 2 n 2 ( > ( )
‘2 _ 2 _ X X. - Z n - 1 11
0 ‘ S ‘ i=1 ' i=l Xi "

Using data set A.1, an estimate of the variance and standard devia-
tion of the data are as follows:

s = [(20.82 + ... + 15.92) - (20.8 + 20.9 + ... 15.9)2/11] / 1o

. [4727.37 - (225.9)2/111 / 10
= (4727.37 - 4680.32) / 10

s2 = 4.70
s = V4.70
s = 2.17

Strictly speaking, only one parameter 0 or 02 is needed to describe
the dispersion in X. However, the squared form is much easier to work
with mathematically (variances are ascribed to independent causal agents
are additive) while the unsquared form has the advantage of being ex-
pressed in the same units as X, the variable measured. Thus, both are
usually calculated and used.

When the data are presented in an ordered array, the interquartile
range is the difference between the value at the 75 percent level and
the value at the 25 percent level. The interquartile range provides a
description of the dispersion in the central half of the distribution.
Since the interquartile range, like the median, is based on order statis-
tics, it is robust in situations with extreme data (i.e., outliers) and
skewed distributions. These percent levels can be altered to accommo-
date more or less of the variable as desired. The interquartile range
for data set A.1 is 3.55.

The mean of median absolute deviation is computed by:

 

n
2 IX - XI
- '=1 1 (12)
A.D. - n
where: A.D. = mean or median absolute deviation
X = mean or median

The value of X is either the mean or median depending whether the
mean or median absolute deviation is desired. The choice between these
two is equivalent to the choice between the mean and median. The mean
and median absolute deviation for data set A.1 is 1.42 and 1.38,
respectively.

The coefficient of variation is a measure of relative, rather than

absolute variation, since it is a unitless quantity. It is calculated

by:

XI [(0

c.v. = (13)

t: >|Q>

and can be expressed as a ratio or a percentage. It's primary advantage
is that it is independent of the unit of measurement and can therefore
be used to compare the relative variations of two or more sets of data,
regardless of the units involved. The coefficient of variation for data
set A.1 is .105. 0

Standard deviations of various statistics are generally known as
standard errors. The standard error of a statistic, for example the
mean, is the standard deviation of a distribution of means for samples
of a given sample size n. The standard error is used as a measure of
the reliability of an estimate. The following are the estimates of
standard error for the mean and median, respectively (for standard

errors of other estimators, consult Sokal and Rohlf (1969)).

36

5, =3:- (14)
smed = (1.2533) 5X (15)
where: s = standard error of mean
5 = standard error of median

med

The estimates for 5X and smed for data set A.1 are .65 and .82, respec-
tively. It should be noted that 5X is valid for any population with

finite variance and smed for large samples from normal populations.

Measures of Relationships

Measures of relationship commonly used are correlation and re-
gression. There has been much confusion on the subject matter of cor-
relation and regression for several reasons. First, the mathematical
relations between the two methods of analysis are similar. Second,
earlier statistical texts did not make a sufficiently clear distinction
between the two approaches. Finally, while the approach chosen by an
investigator may be correct in terms of his intentions, the data avail-
able for analysis may be such as to make one or the other technique
inappropriate.

Regression is intended to describe the dependence of a variable
Y on an independent variable X. Regression equations lend support to
hypotheses regarding the possible causation of changes in Y by changes
in X; for purposes of prediction, of Y in terms of X; and for purposes
of explaining some of the variation of Y by X, by using the latent
variable as a statistical control.

Correlation, by contrast, is concerned largely with whether two

37

variables are interdependent of covary (i.e., vary together). One
variable is not expressed as a function of the other, hence no dis-
tinction between dependent and independent variables is made. The in-
tent is to estimate the degree to which these variables vary together.
Regression is the proper measure of relationship when a random
variable Y is dependent on, or caused by, one or more controllable or
fixed variates Xi, which are said to be independent. Ordinary methods
of linear regression require a proposed model to be linear in the para-
meters but not necessarily in the relation of Y to X. The simple linear

statistical model is:

Y = 80 + 81 X + e (16)
where: Y = dependent variable
80 = origin (extrapolated value of Y when x is fixed at zero)
81 = slope (average change in Y per unit change in x)
X = independent or fixed variable
e = random error

The random error is composed of two basic parts: 1) failure of the
linear form properly to describe the relation between Y and X (i.e., non-
linear bias) and 2) random contributions of latent variables to Y.
Regression is based on four assumptions (Sokal and Rohlf, 1969).
1. The independent variable X is measured without error
(i.e., the X values are known). This means that only Y the
dependent variable, is a random variable, and X does not vary
at random.
2. The expected value for the variable Y for any given X is

described by the linear function ”Y = so + 81 X.

38

Another way of stating this assumption is that the
parametric means ”Y of the values of Y are a function
of X and lie on a straight line described by this equation.

3. For any given value of X the Y's are independently and
identically distributed. The distributions must be normally
distributed when confidence intervals or hypothesis testing
is needed. Figure 6 illustrates this assumption. By taking
repeated measurements each year, a frequency distribution of
nutrient concentrations (Y) to the independent variates
(X = time (years)) is generated. Due to the inherent varia-
bility in lakes and in time, it is obvious that a frequency
distribution of values of Y (nutrient concentration) around
the expected value will result. This assumption states that
these sample values must be independently and identically
distributed.

4. The variance of Y given X, is equal for all X's. This means
that variances of the samples along the regression line are
homoscedastic. Thus, the variance around the regression line
is constant and independent of the magnitude of X and Y.

Commonly, least squares estimation is used to derive estimators for

the regression parameters Bo and 31. The general idea is to minimize
for samples of size n the squared random errors of Equation 16 with
respect to 80 and 81. This is so the estimators obtained specify an
estimator of Y, given X, which has the smallest variance of any linear
equation having unbiased estimators of the same parameters. An estimate

for 81 is:

39

.Aco_aas:mma mama u:mu=mamc=_v x cm>wa m.» cou=a_cum_w >P_mscoz

2::

 

 

 

.....

nnnnnn

oooooooooooooooooooooo
oooooooooooooooooooooo

 

 

. a 953“.

C:

x Y (2 xx; 'U] x
- . . n
1(" ') i=1 ' i=1 ' __= pr (17)

 

B _=
1 1 n 2 n 2 ssX
1=l 1=l
where: spXy = sum of cross products of X and Y
ssX = sum of squares of X

80 can then be estimated by:

A

so = bo = Y - b1 X (18)
Given a linear model, an observation Y may be partitioned into

three parts: 1) the mean, 2) the deviation of the regression line

from the mean (regression effect), and 3) the deviation of the obser-

vation from the regression line (error). Therefore, the sum of squares

of Y may be partioned into a sum of squares caused by regression and a

residual (error) sum of squares, which measures the failure of the

observed values to fall exactly on the regression line. The equation

takes the form:

ssY = ssR + 55 (19)

e

where: ssY sum of squares of Y

ssR regression sum of squares

ss residual (error) sum of squares

e

When sse is divided by n - 2 degrees of freedom, the error sum of
squares is an unbiased estimate of the variation of Y, given X, if the

linear model is correct, i.e.,

(Silx) = sse/(n - 2) (20)

where: (sjlx) - variance of Y given X

41

Usually, the error sum of squares is obtained by subtracting the
regression sum of squares from the total sum of squares for Y. The

regression sum of squares is computed as:

SSR = b1 pry (21)

Also note that:

2
ssR/ssY = (stY)2/ssx ssY = r XY (22)

where° r2
' XY = coefficient of determination (represents percentage

of the total variation explained by the model)

This (rZXY) shows that the proportion of the total sum of squares of Y
attributable to linear regression on X is the square of the correlation
between X and Y.

In order to develop interval estimates or tests of hypotheses for
the parameters and predictions, the validity of the assumptions of nor-
mality, homogeneous variance, and linearity must be checked. The proce-
dures for satisfying this requirement are in the next chapter. Adjust-
ments may be made to the data for: l) non-normality by transformations
(discussed in the next chapter), 2) heterogeneous variance by transforma-
tions or weighted regressions, and 3) nonlinearity by transformations
and adding higher degrees of polynomials to the model (consult an ad-
vanced regression analysis text for more information concerning these
problems).

Given satisfaction in the assumptions, the (l-a) 100 percent

confidence interval estimates may be computed from:

42

 

2
b0 i ta/Z, n _ 2 (sYIX) ‘l(1/n) + X /ssX (23)
b1 i tel/2’ n _ 2 (sylx)/,/—ssx (24)
where: ta/Z, n _ 2 = value from t distribution
a = alpha level, i.e., desired level of significance

The expression to the right of to”2 _ 2 is the standard error of esti-

mate for b0 and b1.
Tests of hypotheses about parameters (e.g., H: 81 = 81) are com-

puted from:

t = (b1 - Bi) / [(syIX)/\Issx] (25)
where: Bi = the slope you are testing for

The appropriate critical levels for a two-tailed are ita/z, n _ 2 and
for a one-tailed ta, n _ 2. The theory of hypotheses testing will be
discussed in a later section and present the information about those
tests to those unfamiliar with the subject.

The following is an example of developing a linear regression model
and estimation of parameters for data set A.2. In this example all
assumptions are assumed valid, which in reality, is probably not the
case.

When using time as the independent variable it is convenient to
recode the time units into whole numbers from 1 to n. For this example,
time (years) are 1968-1979, which are recoded from 1 to 12. The first

step in developing a regression model is to calculate the slope (b1)

of the equation (Equation 17):

43

I—J
"M: II
—l
x
4N
U
A
d
"M:
cud o
x
do
VV
‘\'v I
3
b4
0"
W
X

[(1-215. + 2-237. + ... + 12-335) -

 

 

 

01 = (1 + 2 + ... + 12) - ( 215. + 237. + ... + 335.)/12]
[(12 + 22 + ... + 122) - (1 + 2 + ... + 12)2/121
= [22877 - (78-274)] = 1505 = prv
[650 - 5071 143 ssX
b = 10.52

I
The intercept (b0) can then be solved (Equation 18):

b0 = Y - b1 X

bo

275. - (10.52 - 6.5)

205.62
Therefore, the regression model is Y = 205.62 + 10.52X + e.

The quantity ssY is calculated the same as $5 , with y's substi-

x
tuted for X's (Equation 17 (denominator)).

ssY = (215.2 + 237.2 + ... + 3352) - [(215. + 237. + ... + 335)2/12)
= 917452 - (32882/12)
= 917542 - 900912

ssY = 16540.

The regression sum of squares (SSR) is (Equation 21):
ssR = b1 Sny

SSR = 10.52 - 1505

= 15832.6

44

Then, the residual (error) sum of squares is calculated (Equation 19):

$5 16540 - 15832.6

707.4

The variance of Y given X is (Equation 20):

(sYZIX) = sse/(n - 2)
(syzlx) 707.4/10
70.74

The coefficient of determination (riy), which represents the percen-

tage of the total variation explained by the model, is calculated (Equa-

tion 22):
2 _ 55

r xv ' EEB'
Y

r2

XY = 15832.6
16540.

= .957

Finally, standard errors for estimates of b0 and b1 are calculated

(Equation 23 and 24):

 

Sbo (SYIX) (1(1/0) + Xz/ssX

 

8.41 (1/12) + 6.52/143

1.32

sb1 (sYIX) ('ssx
8.41/ ‘V143

.703

45

If the joint distribution of X and Y is a bivariate normal distri-
bution, one of its parameters is the product moment correlation coef-
ficient p (rho), or simply, the correlation coefficient.

The correlation coefficient 0 is a measure of the linear covariation
of the variables, that is, it measures the degree of linear association
between them. It may vary from -1 to +1, inclusive, and is a dimension-
less quantity. As p increases in absolute value, so does the linear
association. A positive correlation means as one variable increases,
the other increases. A negative correlation means as one variable in-
creases, the other decreases. Since p measures only linear_relationship,
the variable may be perfectly correlated in a curvilinear relationship,
and 9 could be equal to zero.

The common estimate for p is:

. s
Mug-2%; (26)
The estimator of p is nearly unbiased for large sample sizes but
slightly underestimates in small samples, especially in small magnitudes
of correlation (Kendall and Stuart, 1967). For small samples

(4 < n < 15) r may be adjusted (Olkin and Pratt, 1958) to provide an

estimate that is nearly unbiased,

r* = r[1 + (1 - r2) /2 (n - 4)] (27)

where: r* = adjusted correlation coefficient

To test if p = O a t-value is computed:

t = r/ 1 - r2)/(n - 2) (28)

 

and compared with critical levels of ita/z n _ 2 from the t distribution

for a two-tailed test. Weir (1960) has shown a quick, approximate,

46

two-sided test (HO: p = 0, given n > 4 and a = .05) is achieved simply by

noting if r > 2/n. Those interested on testing p = po or construction

of confidence intervals should consult Bhattacharyya and Johnson (1977).
Another method of estimating the relationship in random variables,

where a linear relationship is not assumed, nor that the variables are

normally distributed, is Spearman's rank correlation coefficient rs.

" 2
6 z (91:)
r = 1 — i=1 (29)

S
'13-"

where: di = difference between the ranks of the ith pair of n pairs

It is computed from:

of observations.

For samples of n > 100, Spearman's coefficient may be tested with good
approximation by using the procedures for product moment correlation.
Otherwise, see Zar (1974) for tables of the distribution (4 §_n 5_1OO).
The following is an example of computing the product moment correla-
tion (r and r*) and Spearman's rank correlation coefficient (rs) using
the same data (A.2) as in the regression analysis example. The estimate

for p (i.e. r) is calculated using Equation 26:

1505
1J143 - 16540

1
ll

 

 

prY
ssx ssY
= .978
The estimate is adjusted for small sample size in the following

manner (Equation 27):

47

r [1 + (1 - r2)/2 (n - 4)]
= .978

1
34'
II

The estimate is adjusted for small sample size in the following
manner (Equation 27):
r [1 + (1 - r2)/2 (n - 4)]
.978 [1 + (1 .9782)/2 (12 - 4)]
= .980

r*

The Spearman's rank correlation coefficient (rs) is computed using

(Equation 29):

Ranks for the observations are found in data set 2 in Appendix A.

_6-H1-02+Q-mZ+Q-3F+(emz+u.+gzqmﬁ

 

= 1
123 - 12
6 - 4
=1-.___.___
123 - 12
= .986

Probability Distributions

Given any continuous random variable Y, there is a corresponding
mathematical expression or function f(Y) known as the frequency func-
tion of Y. For the theoretical or population distribution of Y the
frequency function is the analog of the frequency distribution (histo-
gram). Thus, f(Y) is a mathematical model that provides a basis for
calculating theoretical frequencies or probabilities for any or all out-

come classes of the variable. These functions are: 1) defined for all

48

values of the variable, 2) non-negative for all values of Y, and 3) such
that the total area under the corresponding frequency curve and above
the Y-axis is equal to one.

Let Y be a continuous random variable with a frequency function
f(Y), shown graphically in Figure 7. The function is adjusted so that
the total area under the curve is one and corresponds to probability
one for the range of all possible values for Y. In addition, f(Y) has
the property that, given any two numbers a and b, the probability that
a randomly selected element of the population will have a Y value be-
tween 8 and b, inclusive, is equal to the area under the curve between
the lines Y = a and Y = b as indicated by the shaded portion in Figure 7.
Therefore, for any continuous distribution, the area under the curve
between a and b is the probability P(a 5.Y §_b).

For every discrete random variable there is also a frequency func-
tion or probability distribution function which has essentially the same
properties as continuous frequency functions. However, since the varia-
ble is discrete, the graph of the function does not result in a contin-
uous curve, but in a bar diagram (Figure 8). The probability of each
value that Y can take is represented by the height of the appropriate
bar.

Among the more important properties of a theoretical distribution
is a set of quantities known as the moments of the distribution. The
moments characterize the distribution. In applied statistics, the first
two moments are of most importance. The first moment about the origin is
the mean u of the theoretical distribution and is defined as the average
value or expected value of the variable. The variance 02 of a random

variable Y is defined as the second moment about the mean, the average

   

 

   

  

 
   
 
     

a

n. u .o -
Ole-culcn-eonoiln
...-uncaneaoooaco-
noooooonlnuo no...
nu. .- e u on

     

c u a

to... .-

Iconic-lo.
-

c.

v-

...-...uo-
a -

  

.0690
0L1!-

-
up...
vo-

 
 

 

    
  
 
 
 
 
 
  

    
 

 
 
 
   

.
o
[AAA—AA

 

 

f(Y)

 

Frequency function for a continuous random variable (Y).

Figure 7.

.A>v m_amwcm> Eocene mudgumwv a Low :o_uo:=m xucwacmca . w mcaowm

 

50

 

         

      

       

        
   

  

    

                                  
   
    

 

 

 

   

  

 

   

 

I I I I 1 llllllllll I I I I I I I 11III IIIIIIIIvIII‘III
OI IIIIC 0.. ...-I IO. O 0...
III I I III I II I IIIIII
III I I I ..... I II IIIII
III I ........ III I II IIII
II I I III I I I I III
II I I I I I I I II II I
I I I I I I I I I I I I
o I I I D I I O 0 I r I I
C C I. CI. .0. D O 00...... I v C O l O
InI II III III IIIIIIII I II "II I IHII
I I I I I O I I I I I III I I II
I I O I O I I I I I O r I I I I

I I I I
III I IIII III II IIIII I wIIII III I IIII
I I I I I I I I I I I I I I I I
I I I I I I I I I I I I v I I I
0 O I. IIII .00 0.00.000 IIOI. I I I I I 0
III I II IIII III IIIIIIII IIIII .- "I In."
I I I I I I I I I I I I I I I I II I I I

I I I I I I I I I I I I
. ..

I I I .
IIIIIIIIIIIIIII II IIIIIIIIII I IHIU "I IIIII I I I I I I I I I I I I I I I

IIII OI III.

I I I I I

O O D O I

I I I

I I O

I I '

   

O

"n ..3

II IIIII

“u "an"
I .

"u" ”.3 . .

I III .IIII III

IIII IIII. I

. I O I V

I O 0....

III I I I I

IHI IHIUI " I I

III II... v 0'.-

..u Eu" . ”.u.

I I I I

u.” n. H.”
II I

 

 

 

OI.
DDIDIDBII

 

 

I
1::
“I

         

          

 

 

......... 2W . E.

 

 

51

value of (Y - u)2. The third and fourth moments, skewness and kurtosis,
respectively, refer to distribution shape and will be discussed in the
next chapter when dealing with distribution selection.

Probability or density distributions describe the relative frequency
of occurrence of value taken on by random variables. The distributions
used in practice are usually at best close approximations to the true
distribution.- The more widely used distributions are the binomial,
multinomial, and poisson distributions for discrete variables and the
normal or gaussian distribution for continuous variables.

While density distributions describe actual frequency (in popula-
tions) or probable occurrence (in samples) of different values of natural
phenomena, sampling distributions describe the relative frequencies of
different values of functions of random variables in samples of specified
size. Three important sampling distributions associated with samples

from normal distributions are the chi-square, student's t, and variance

ratio (F) distributions.
Normal or Gaussian Distribution

The normal distribution provides a good approximation to many empir-
ical frequency distributions found in biological sciences. Its strength
results from the central limit theorem of Laplace (Mood and Graybill,
1963). The speed with which the distribution of means, drawn at random
from distinctly non-normal distributions, converges to the normal distri-
bution as sample size increases, provides the keystone to the support of
most statistical procedures (Gill, 1978).

For a continuous random variable Y, the normal density function is:

52

1 e-(1/2 oz) (Y _ u)2

V2170

The distribution of any normal variable depends only on the mean (u) and

 

f(Y) = (30)

variance (02).

The normal density function is a symmetric, bell-shaped curve
(Figure 9). The mean is a location parameter and the standard devia-
tion, a scale parameter, is the distance from the mean to the inflection
in the curve. An infinite number of normal curves exist, differing in

location, scale or both.
Hypothesis Testing

The most frequent application of statistics in biological research
is to test some scientific hypothesis. To decide whether or not any dis-
agreement between the observed and the theoretical values is sufficient
to warrant rejection of the theory, data are collected in a suitable
manner and a statistical test on the hypothesis is used. A statistical
hypothesis is an expression, in some manner, of the theory of concern.
Statistical methods are important in biology because results of experi-
ments are usually not clearcut and therefore need statistical tests to
support decisions between alternative hypotheses. A statistical test
examines a set of sample data and, on the basis of an expected distri-
bution of the data, leads to a decision on whether to accept the hypoth-
esis underlying the expected distribution, or whether to reject that hy-
pothesis and accept an alternative one. The nature of the tests varies
with the data and the hypothesis, but the same general philosophy of

hypothesis testing is common to all tests.

53

.on co_um_>mt ucmucwam vac 51V cams saw: .cowpuczm xummcmu peace: on» . m mg=m_d

.omta~ bm+i

b+1

bli

bwui bmgi

 

$..~

 

.x. 9m.

 

oiém

 

.x. _.vm

 

«no.2

 

$_.N

54

The first step is to state the null hypothesis (H0). The null hypo-
thesis is a hypothesis of no differences (such as HO: pl = “2)' It is
formulated for the express purpose of being rejected. If it is rejected,
the alternative hypothesis (H1) is accepted. Usually, hypotheses tested
by random samples cannot be absolutely disproved because the range of
theoretical distributions of sampling variables extends to infinity. If
one attempts to propose and test a direct hypothesis, such as H0: “1 f “2
against the alternative H1: u1 = “2’ statistical assessment of probabili-
‘ ty of the evidence is not possible because the ability to obtain numeri-
cal values from sampling distributions, given H0, depends on knowing the
value of pl - ”2’ the very value being questioned in the hypothesis.
0n the other hand, for an indirect hypothesis, such as H0: u1 = ”2’ the
value of ”1 - “2’ given H0, is zero, and numerical values can be obtain-
ed from the sampling distribution.

Rejection of the indirect hypothesis proposed usually is a strong
decision because the experimenter chooses a small degree of doubt (prob-
ability of being wrong). Therefore, when sufficiently conclusive results
occur in a sample, one may be confident (but not certain) that the re-
sults represent the true status of the population. Acceptance of the
indirect hypothesis proposed normally is a weak decision, because in
that case the experimenter usually cannot completely control the proba-
bility of being wrong. Acceptance of a hypothesis, such as HO: pl = p2,
should not be interpreted firmly, such as "no mean difference exists,"
but in a more qualified way. For example, "the experimental difference
in means was not sufficient to provide high confidence that the true
means differ." It is important to be cautious against over confidence

in the results of any one isolated case by reminding us that "the one

55

chance in a million will undoubtedly occur sometime." However, one
should not discard a significant result just because it leads to awkward
conclusions or goes against one's personal bias.

Given that a null hypothesis is formed and data collected, the
question arises “what is the probability of having obtained the observed
outcome or an even more extreme outcome if there is actually no differ-
ence in populations?"1 To answer this question, the relative frequency
distribution of all possible outcomes, given the two populations are
equivalent, must be examined. The specific observed outcome may then be
compared with this distribution. Figure 10 shows a typical outcome dis-
tribution. The proportion of the probability in the tails is designated

as g_(alpha), the level 9f significance. The points on the outcome

 

scale, such that no more than 100 (a/2)% of the outcomes are beyond them

in each tail are referred to as the critical values. The total area be-

 

yond these values is called the critical region or rejection region, and

 

 

is shaded in Figure 10. This region contains the most rare 100 a% of
the outcomes, given H0 is true. The area inside the critical values is

called the acceptance region (unshaded area). Given this distribution,

 

the probability that an outcome will fall in the critical region, if H0
is true, is less than or equal to a. In other words, if H0 is true and

a study is repeated numerous times, in about 100 a% of cases the observed
outcome would fall into this region. The value chosen for a provides an
arbitrary means of making a decision as to whether or not an observed

outcome is rare or not rare under H0.

 

1Here we consider the null hypothesis to be concerned with possible
differences between two populations. This is the case with trend detec-
tion.

56

2.830

.cowuznwcumwu msoouzo c<

 

 

 

 

 

 

=o_ao. _

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5:33. .A
a .855 “
_¢»a_

.855

+

a--.

_ =°_oo_

 

=°_ao. ao:c.qooo<

VJ =c__oo_c.
_ 8 .855
.

a=_e> I.
.855

 

M!|!q0q01d

.o. mc=m_d

57

In testing a null hypothesis, two possibilities arise. One is when
the observed outcome falls in the critical region, then H0 is rejected.
The probability of such a result, or more extreme result, is considered
to be too small (§_a) to be attributed solely to chance. Hence, H1
(i.e., mean difference) would be statistically significant at the a or
100 a%) level. The second possibility is when the observed outcome falls
in the acceptance region, then H0 is accepted. The probability of such
a result, or even more extreme, is greater than a. This, by the arbi-
trary definition established, is not a rare outcome if H0 is true.
Therefore, one would state that no statistically significant difference
between the means had been demonstrated at the 100 a% level.

It is important to remember that a null hypothesis is not proved or
disproved by a statiStical significance test, but rather, precise proba-
bility statements may be made regarding the compatability of a set of
observations with H0. The true cause of observed differences in effects
is almost always an inextricable combination of real differences and
random processes. Although it would be useful to have the entire distri-
bution of outcomes under H0 available for every situation, the calcula-
tions are prohibitive. Therefore, tables have been prepared for most
tests with critical levels for a few selected a levels. While the most
commonly chosen values for a are 0.05 and 0.01, these values are arbi-
traty, and mainly for the sake of having concise tables, they have be-
come standard. The choice of a will be discussed later in the section.

The outcome distribution discussion above is based on two-tailed

tests (1.8-. H1: ”1 f “2)' However, in some cases, a set of restricted
alternative hypotheses may be desired, those being H1: pl > “2 or

H ° In these situations the entire 100 a% of the probability

1' ”1

<

112-

58

is assigned to the appropriate single tail. The corresponding critical
value is the basis for these one-tailed tests of H0. When applying these
one-sided tests, the critical value used is that for the entire a, as
compared to two-tailed, where the a is divided in half (i.e., a/Z) and
distributed to each tail.

When testing hypotheses in the manner described, two kinds of de-
cision errors are possible. One, a true hypothesis may be rejected or,
two, a false hypothesis may be accepted. These are referred to as Type_l_
and 1222.11.2222E52 respectively. These errors and the conditions under
which they arise are shown in Figure 11. These errors occur with certain
probabilities, since the decision to accept or reject H0 is based upon
the outcome of a random process. Type I error, or a, is equal to P
(reject HOIHO is true). Type II error, or 8 (not to be confused with
3 used in regression analysis), is equal to P (accept HOIHO is false).
The probability of Type I error is also referred to as the size of the
critical region or as the level of significance of the test. Beta (8).
the probability of Type II error, provides a more specific criterion for
the assigning of critical regions for statistical significance tests.

The general principle is: among all critical regions of the same size,
use the one for which the probability of a Type II error is minimum.

The probability of making a Type II error, 8, depends upon the amount
of real deviation from the null hypothesis. It is a function of the
true value of the parameter being tested, and therefore usually cannot
be evaluated. However, 8 can be lowered in three ways: 1) increase the
sample size, 2) reduce experimental error, and 3) increase the permissi-
ble probability for Type I error (a). Decreasing B is equivalent to

increasing 1 — B, the probability of rejecting an incorrect hypothesis.

59

<9.
I

,_.
ll

»o__wnaaoza

LOLLm o:

oo.
ll

so___aan0ta

Logcm mm mnxp

.mmmmzuoax; co mummy :o woman

a u su___naaoca

Logcm H max»

5 - _ u »»WFwaaaoaa

LOLLm o:

meowm_umu cw mcoggm .PP mg=m_m

Aozv mwmmcuoazc pumnmx

Aozv mwmmgpoga; unmuo<

 

pumggoucu Aozv m_mw:uoa>:

pumggou Aozv mwmmsuoaxz

 

:o_um_=nom cw comum=u_m mash

:o_mwumo

60

The probability 1 - B is termed the pgwer_of the test to detect that a
parameter differs from a specified value. Obviously, for any given test
one would like to minimize B and maximize 1 - B. The curve described by
1 - 8 as a function of the magnitude to be detected (usually in 0 units)
is the power curve (Figure 12). This curve shows the probability of re-
jecting H0 as a function of n. It is interpreted in the following manner:
as one moves away from u (values under H1) the probability of rejecting
an incorrect hypothesis increases; as sample size increases, less of a
change is necessary to develop the power. Hence, as stated before,
increasing sample size will increase power, which decreases Type II error.
This power curve is for a two—tailed test, they are usually expressed as
one-tailed, where only half of the curve is presented.

Graphically, the relation between a and B is shown in Figure 13.

The two distributions represent H0 and H1, with mean u and ud (dif-

o
ference is mean that is being tested). The area under H0 to the right

of Yc (critical level) is the rejection region. Values in this region
are considered members of H1 based on the arbitrary selection of an a
level. However, there still is some probability (Type I error) that

they belong to H0. The area under H1 to the left of Yc is the probabil-
ity of Type II error, 8. While these values are not considered extreme
enough to support H1, there is some probability that they do belong to
H1. Thus, the two types of errors result from the overlap in the distri-
butions of the two hypotheses, Ho and H1. If a was to be increased, then
B would decrease (if ud did not change), which was stated earlier as a
measure to decrease 8. Also, if “d (i.e., H1 is changed) is moved to

the right, this would also cause a decrease in 8. Both these situations

are reversible and would cause increases in B.

61

.A¢v: cmmcm_ o go» new c co» m>c=u cmzoa Fmowaxu <

 

 

 

1
“’2
c:

(g-I) JGMOd

 

 

.S 2:3“.

62

Loccm RH maxh use coccm H maze .mcoccm mo xuwpwnmnoca coagumn co_umpmz

c1 u> oi

 

.mp mg:m_u

1

 

   

./

.,”//

 

 

 

 

63

One would like to minimize both probabilities of making errors but
usually only direct control of a, Type I error, is possible. In the
past, it has been common only to report the statistical significance of
results when a < 0.05 or a < 0.01 (the chance that a rejection of H0 is
less than 1 in 20 or 1 in 100, respectively). It is unfortunate that
"nonsignificant" results, or results judged significant at values of a
larger than 0.05, are often regarded as unpublishable. In many early
experiments or fields of high variation and complexity, the probability
of error a = 0.10, 0.25 or even 0.50 may be acceptable.

Basically, research is an organized method for finding out what will
not work as well as what will work. Those who object to this idea, fail
to appreciate that most positive knowledge is gained through acquisition
of negative knowledge. It is important not to place undue emphasis on
the exact level of significance achieved in a particular instance. Too
many scientists use the significance test for support instead of

illumination.

CHAPTER IV

TREND DETECTION METHODS

The statistical process of detecting trends in lake water quality
data is a step-wise procedure and is illustrated in Figure 14. The
process starts with the need to determine if some lake water quality
parameter has or is changing over time (hypothesis formation). Data to
solve this question are then placed in a usable form so that appropriate
statistical techniques can be applied. The data are then plotted to
yield general information about the change in the parameter over time.
Exploratory data analysis is conducted to provide the necessary infor-
mation as to what statistical technique should be applied. Finally,
appropriate statistical tests are applied to the data, to test the
hypothesis of concern. Thus, quantitative information is generated con-
cerning trends in lake water quality parameters. The time series data
may also be modeled. The model can be used to develop the pattern(s) of
change and for forecasting future events. While these steps appear dis-
crete on the diagram, in practice, information from one phase may often
be used in another. In the following sections each of these phases will

be discussed.

Hypothesis Formation

 

The problem of asserting the existence or nonexistence of trends in
lake water quality data may be treated in the classical hypothesis test-
ing framework (discussed in Chapter III). In this case, the null hy-

pothesis, H0, is that there is no change (no trend) in the underlying

64

65

 

—<>[ Hypothesis Formation
—<1—1 Data Preparation |—<)—
-<)—-[ Data Analysis }-<>—

L Statistical Tests l

 

 

 

 

 

 

 

 

Opt ilonol

l;

l TimeSeries Modeling l

 

Figure l4 . Flow diagram for Trend Detection Methods.

66

population (lake) from which the data set (water quality parameter) was
drawn. The alternative hypothesis, H1, may be either a hypothesis that
a trend does exist in the data (two-sided test) or that a positive (or
a negative) trend exists in the data (one-sided test). If the concern
is, "has there been a change in a given parameter," the power of a test
is extremely important. The power (one of four possible outcomes, Fig-
ure 11) gives the probability, at a fixed confidence level, that the
statistical test detects a trend of a specified magnitude, when in truth,
there really is one. However, if the concern is of no change or trend,
then the confidence interval (1 - a) is crucial, for it is the probabil-
ity of concluding that H0 is true, given H0 is actually true in nature.
In developing a hypothesis there are three concerns: 1) selection
of water quality parameter(s), 2) selection of a measure of central ten-
dency, and 3) selection of one or two-tailed alternative hypothesis.
The selection of a water quality parameter (discussed in Chapter II)
will determine the limnological variable or process that information
will be generated about. Hence, it is important to select the appropri-
ate parameter that will provide the information about the question of
concern. :The selection of a measure of central tendency (discussed in
Chapter III) is dependent upon: 1) what information is desired, and
2) the characterisitcs of the data and the distribution of the data.
Ideally, one would like to have numerous observations in each time in- ,’
terval, so that the data's distribution can be analyzed and a measure of
central tendency can be confidently chosen. However, in reality, one
usually has limited data which were unevenly sampled through time using
various methods. Hence, a subjective decision must be made, taking into

consideration all the facets of each individual question, as to which

67

measure of central tendency should be used in a given situation. The
third issue in selecting a null hypothesis is whether a one or two-
tailed alternative is desired. If it is known that a parameter either
increased or decreased, a one-tail H1 should be used. A one-tailed test
will maximize the probabilities of each outcome by placing all of the
rejection region (a) at one tail of the outcome distribution. However,
if the type of change is not known, a two-tailed should be used. It
should be stressed, that when possible, the one-tailed alternative
should be chosen. The construction of the two hypotheses (H0 and H1)
may be simple or complex, depending on the question being asked and the
data available. Hypothesis formation is an extremely important phase,
for as with all statistical test, only the hypothesis stated is being
tested. Hence, if this information is going to be used in planning and

management one must be sure of what is being tested.

Data Preparation

 

Before any analysis can be conducted, the data must be prepared or
placed in the desired form. By this, it is meant that the data be ex-
pressed as some measure of central tendency, such as the mean or the
median. In order to apply trend detection techniques, there can be only
one data point for each time unit. For example, when dealing with ni-
trogen changes over ten years, in yearly increments, 9gly_one data point
for each year can be used. The data preparation problem arises when
numerous observations are located in the same time unit, yet one value
is needed to represent that discrete time unit. If the data are contin-

uous throughout the interval and equally spaced, then any measure may be

68

directly calculated and used. However, when unequal sampling occurs
over time, the data must be corrected. Two methods that can be used are
regression analysis and polynomial models (regression is discussed in
Chapter III, polynomial models are discussed later in this Chapter).

For either, a model is constructed with the data for that time unit, and
an estimate for the parameter is calculated at the middle of the time
unit. Thus, one parameter value that represents that discrete time unit
is generated. Data preparation, as in hypothesis formation, is highly
variable and may be simple or complex depending on the situation. When
dealing with numerous observations in a time unit the one value used

for testing the null hypothesis should best represent the parameter

in the fashion desired.

Data Analysis

 

Once a hypothesis is formed and the data are properly arranged
(i.e., one data value per time unit) the data are ready to be explored
and analyzed. The data analysis step will provide the necessary infor-
mation to determine which method (statistical test) should be used to
test the null hypothesis. This information may also be used to examine
the hypothesis formation and data preparation steps. Thus, after exam-
ining the data, one may find it necessary to restate the null and al-
ternative hypothesis and/or express the data in a given time unit in a
different manner. As noted before, the conduction of the tasks in the
first three sections of this chapter are interdependent and may cause
reevaluation of one or two of the others. This same inter-relationship

occurs within this section. While this step should follow the flow

69

diagram in Figure 14, it may be necessary, based on the outcome in a
subsection, to change the flow. For example, if the data are non-normal

when tested, one may wish to transform the data and retest for normality.

Graphical Techniques

The graphical techniques presented help provide a general knowledge
about the parameter in the time series data. These plots provide: 1) a
visual test for trends, 2) a check on validity of assumptions (normality,
homogeneous variance), and 3) an examination of outliers. Although
qualitative in nature, these graphs provide an important starting point
for all data analysis in trend detection.

The first, and simplest, is the time series plot. The time series
plot is a plot of the data (parameter) against time. These plots may be
useful in visually detecting trends and should always be done. The plot
may be expanded to include the standard deviation or quartile range
around each value. Figure 15 shows a time series plot of the mean and
standard deviation for mean annual ammonia concentrations in Lake
Ontario from 1967-1977. From looking at the mean values only, one would
expect a decreasing trend, but by including the standard deviation range,
one may question the strength of the visual evidence. It may also be
useful to plot both the mean and median (or other combination of mea-
sures of central tendency) to provide more information about the overall
pattern and distribution of the parameter, in question, over time.

A slightly more sophisticated technique is the cumulative sum plot
(Lettenmaier, 1977). The cumulatvie sum is simply the area under local

mean to "j" minus a fraction (j/n) of the area under the global mean.

6.2.3.20 9.3 .89, 2038.28.86 Eco—Ea 2.55; :85 Co 33 3.23 me:

3:53 we:
2. 2. ...; E. m» «N z. 2 mm mm No
.

T q A d u _ J a A _

S a

70
l-o-l

.m_ menace

o

o.

8
(b/ﬁrl) QHN

om

 

O¢

71

The cumulative sum is defined as

j n
CU. = 2 xi - j/n 2 xi (31)
3 i=1 i=1
. _ .th .
where. CUj - j cusum paint
xi = ith data point

j = subinterval of n
n = sample size

The value for j is determined by the sample size (n) and the level of
information desired. For example, if n = 100, one may choose j = 10
and thus 10 cusum points would be plotted. If more information (also
more computational effort) is desired, a smaller value of j may be
chosen. This provides more points and allows for a better representa-
tion of the trend in the data. There may be n - 1 cusum values calcu-
lated for any set of data. If the mean is constant (no trend), the
cusum is zero; if there is a step change in the mean level, the cusum is
linear; if a linear change exists, the cusum is quadratic. These three
possibilities are shown in Figure 16, with t being the trend starting
point.

The histogram (frequency plot) is a useful technique for examining
the normality of the data's distribution. A normal distribution will
have a bell-shaped curve. Also, the mean, median, and mode should all
be identical. Two statistics that describe the shape of distributions
are skewness and kurtosis, the third and fourth moments, respectively,
Skewness, or asymmetry, means that one tail of the frequency curve is
drawn out more than the other. Curves are skewed to the right or left,

depending upon whether the right or left tails are drawn out. Kurtosis

72

 

 

 

 

 

 

 

 

No
Trend 0
Linear
Trend
O ..
.+.
Step
Trend
O ...
.4.

Figure l6 . CUSUM plot of step, linear, and no trend.

73

is a measure of peakedness of the distribution. A leptokurtic curve,
higher peak than normal curve, results from having more data near the
mean and at the tails and fewer data in the intermediate. Platykurtic,
flattened curves, have more data in the intermediate area and less at
the mean and in the tails. Figure 17 shows a normal curve along with
skewed right, skewed left, leptokurtic, and platykurtic distributions.
The calculation of skewness and kurtosis statistics is complicated and
one should consult a statistical textbook for the procedure (e.g., Sokal
and Rohlf, 1969).

The sample residuals (errors) may be plotted to check the validity
of assumptions on homogeneous variance (Behnken and Draper, 1972). The
sample residuals are estimates of the random errors and may be calcu-

lated from:

6. = X. - Y (32)

where: Xi

7

data point

mean of all X's

For use with regression analysis, the sample residuals are computed from:

ei = (Yi - Y) - b1 (Xi - X) (33)
where: Yi = ith Y observation or data point
7 = mean of all Y's
b. = estimate regression line slope

'l

A plot of ei against X, or ei against Yi (for regression) should
produce a horizontal band of points. If the band of points increases or

decreases in width, then heterogeneous variances usually are indicated.

74

Normal

 

 

Skewed Right Skewed Left

 

 

 

 

Lepto kurtic Platy kurtic

J A

Figure l7 . Normal, skewed, kurtotic distributions.

 

 

 

 

 

75

If the band of points is fairly uniform in width but is not horizontal,
transformations of the addition of higher degrees of polynomials to

the model may be necessary. If an unusually large ei is observed
("outlier"), the corresponding Xi or Yi should be identified to discover
the possibility of a recording error or an unusual condition in the sys-
tem. Techniques for examining "outliers" are discussed later in this
section. While these techniques may lead one to identify a so-called
outlier, the exclusion of data should occur only when concrete evidence
for removal is present and not only because of test statistics or plot-
ting techniques suggest it.

For sample sizes less than 50, these plotting techniques may not
provide enough information to make proper decisions. Even with large
sample sizes, the statistical tests which follow should be utilized to
provide the maximum assurance that the assumptions required for the

usage of the statistical tests are not violated.

Distribution Tests

An assumption central to most parametric statistical tests is that
the data come from a population that exhibits a normal distribution.
This normality assumption is relaxed by nonparametric statistics, which
require only a continuous distribution. In the previous section, graph-
ical techniques were used to provide qualitative information on the form
of the data's distribution. This section provides a statistical test of
the assumption on normality. Two important considerations of this test
are: 1) the test determines only whether the data is nonnormal and not

whether it is normal (i.e., proves the alternative hypothesis and not the

76

null hypothesis) and 2) when the sample size is small it is extremely
difficult to reject the null hypothesis even if the data grossly devi-
ates from normality or symmetry. Therefore, when dealing with samll
sample sizes, one should use an appropriate statistical test and also
intuitive knowledge of the parameter and the water system in determining
whether to use parametric or nonparametric statistical tests for trend
detection.

The W~test is used to evaluate the assumption that a sample has a
normal distribution for samples 5 50 (Shapiro and Wilk, 1965). The
W-test has been shown to be an effective technique for evaluating the
assumption of normality against a wide spectrum of deviations from nor-
mality such as skewness and kurtosis of the distribution (Atkinson, 1967;
Chen, 1971).

The W-test is used as follows:

Step_;. Rearrange the data to obtain the ordered array

X

X , Xn, where X1 5 X2 5 ... g X".

1, 2, ...

Step 2. Compute the variance ($2) of the data (Equation 11
and from previous chapter).

Step 3. If n is even, set k = n/2; if n is odd, set

k = (n - 1)/2. Then compute

k
b = Z a
i=1

n-1+1 (Xn-i+1 ' xi) (34)

where the values of a for i = l, ..., k

n-i+l
are given in Table 1 of Appendix B for n = 3,
., 50.
§3e2_3. Compute
w = b2 52 (35)

77

If the test statistic W is less than the critical
value WE, n from Table 2 of Appendix B, one may reject
the hypothesis of normality with probability of
Type I error less than a.
For large sample sizes (n > 50) the Kolmogorov-Smirnov test is sug-
gested (Siegel, 1956; Sokal and Rohlf, 1969).
A quick and easy measure that displays the general shape of the

data's distribution is the mode to mean ratio (R) (Springer and Gifford,

1980). It is calculated using the following:
R = [1 + (C.V.)Z] ‘3/2 (36)

where: C.V. = coefficient of variation (Equation 13)
If R < 1, the distribution is skewed right. If R = 1, the distribution
is normal or symmetric. If R > 1, the distribution is skewed left.
This should not be substituted for a statistical test for normality, but

may be used as a quick check technique.
Homogeneous Variance

In the biological sciences there is a common tendency toward
positive correlation of the mean and variance over a wide range of
a specific variable (i.e., groups with large means tend to have large
variances and those with smaller means have smaller variances (Gill,
1978)). A check of the assumption of homogeneous variance is done
by testing the hypothesis, Ho: oi = a: = ... 0%, where t is the number

of treatments (in this case, the number of time units). The first test

described is a quick and easy method, while the second is more efficient

78

and capable of dealing with uneven numbers of data within the different
time units.
Hartley's fmax test (Hartley, 1950) involves the ratio of the

largest to the smallest of the variances within groups.

- 2 2
1cmax — Smax /smin (36)
where: fmax = test statistic
sﬁax = maximum variance of all treatments (time unit)
sﬁin = minimum variance of all treatments (time unit)

If the test statistic exceeds the critical value, f (from

max, a, t, v
Table 3 of Appendix B), for t groups with v = r - 1 df (r equals number
of data points within each time step or unit), one may reject H0. If
replication within each group is only slightly unequal, the critical
value should be corrected by using v = (Zri/t)-1.

Bartlett's test (Bartlett, 1937) for homogenity of variances is
used because it provides good efficiency and handles uneven replication
within groups.

The following is the procedure for Bartlett's test:

§t_p_1. Convert the variances for X's to logarithms. It is

convenient to add 1 to the data when zeros are present

and multiply by 10, or a power of 10, to avoid

technical problems with logarithms.

St E 2. Compute

2_ a 2 a 2
X — 2.3026 2 ("i - 1) log 5 - Z ("i - 1) log Si (37)
i=1 i=1

number of X's

ith variance

where: a

(n
ll

79

a

 

Z (n, - 1) s?
2 _ i-1
S .—
a
Z ("i - 1)
i=1

The constant 2.3026 transforms the common logarithms to natural ones,

which are needed for the formula. If this value x2 exceeds 5: a _ 1
from Table 4 and Table 5 of Appendix B, one may reject Ho. If the com-
puted 52 is nearly equal to the critical value, a correction factor

should be used (see Sokal and Rholf, 1969).
Examination of Outliers

If it can be determined that an extreme observation is a result of
errors in recording data, parameter value determination, or foreign to
the defined population the outlier may be excluded. Otherwise, the ob-
servation should be used in the computations, unless one has adopted
some consistent procedure of testing outliers or of censoring or
"Winzorizing" (replacing an extreme value by the next most extreme ob-
servation) all extreme data (Tukey, 1962). However, arbitrary exclusion
of data that do not conform to one's perceived ideas can lead to serious
bias in estimation as well as underestimation of experimental error.

The procedures of Grubbs and Beck (1972) may be used as an approx-
imate test of whether the magnitude of the largest residual is so large,
that the corresponding value mgy be excluded. The procedure is as
follows:

Step l. Compute residuals

80

Step 2. Select largest residual (eL), either
[emaxl or 'emin'
Step 3. Compute test statistic
A = eL/s (38)

where: A test statistic

5 standard deviation

If the test statistic exceeds the critical value Au, n from Table 6 of
Appendix B, one may exclude the corresponding data point. One should
chose a small Type I Error (a = 0.01) for rejection of an outlier, be-
cause a Type I Error would result in the exclusion of an extreme pp;
valid observation which would lead to a biased estimate of 02 (too small)
and distortion of confidence statements. While a Type II Error would
lead to an inflated estimate of 02 and conservative confidence state-
ments about means.

Graphical methods, statistical tests, and intuitive knowledge must
be used in considering whether to remove an outlier. When outliers are
removed one must have: 1) sound evidence, 2) be capable of supporting
such a removal, and 3) held responsible for the removal. One final point

when dealing with outliers is that in some cases an extreme value or

outlier may be more interesting than the rest of the data.

Time Dependency of Data

A very important issue when dealing with time series data in lakes,
especially those with long detention times, is time correlated data
(Yule, 1921; Bartlett, 1935). This time correlation arises from the

fact that the value of a data point is dependent, to some degree, upon

81

data points previous to it in time. This time correlation of data is
termed autocorrelation or serial correlation. In a lake system, an
annual concentration will depend on the concentration in the lake in
previous year, or years, to some degree. Thus, those lakes with long
detention or residence times, (i.e., not much change in yearly water
composition) will usually exhibit autocorrelated data. Autocorrelated
data means that the random error terms are correlated over time. If
autocorrelation is present, the assumption that the random error terms
(ei) are independent is violated.

Consider the time series of data X1, X2, ..., X“, the most satis-
factory estimate of the kth lag autocorrelation (pk) is (Jenkins and

Watts, 1968; Box and Jenkins, 1968)

Ck
Pk = E— (39)
o
where: rk = estimate of pk
Ck = estimate of Yk (autocovariance)
Co = estimate of Y0
An estimate of Yk’ autocovariance of the kth lag is
1 n-k _ _
Ck = E .E (Xi - X) (Xi + k - X) (40)
1-1
An estimate of Y0 is
1 ” - 2
Co = H i (X - X) (41)

The Durbin-Watson test can be used to test for autocorrelation,
(i.e., Ho: p = 0) (Durbin and Watson, 1951). The Durbin-Watson

statistic d is defined in terms of the observed residuals by

82

d = i: (42)

 

where: e, = ith residual

The d statistic is compared to upper (du) and lower (du) bounds (Table 7
in Appendix B). If d falls below dL’ the hypothesis that the original
residuals are uncorrelated is rejected, and if it falls above du’ the

hypothesis is accepted. When d falls between du and d the test is

L’
inconclusive and further statistics are necessary (see Durbin and Watson,
1971). The formula above tests for positive autocorrelation, which is
by far the most common in lakes. When testing for negative autocorrela-
tion, the test statistic, 4 - d, is used. The remaining part of the
test is conducted in the same manner as for positive autocorrelation,
where if 4 - D < dL’ one concludes negative autocorrelation (p < 0)
exists. The two-sided test is done by employing both of the one-sided
tests separately. Thus, the Type I Error for a two-tailed test is 2a.

For more information on other methods of testing for autocorrelation,

consult Kenkel (1975) and Sen (1978, 1979).
Transformations

To correct for violations of the assumptions detected by analyzing
the data, transformations of the data may be used. Transformations are
valid because the linear scale has a relationship with other scales of
measurement. In essence, the scale of measurement is arbitrary, there-
fore, one may transform data to a different scale of measurement so the

transformed variates more closely satisfies the assumptions. A

83

fortunate fact of transformations is that often several departures from

the assumptions are simultaneously cured by the same transformation to a
new scale. For example, simply by correcting for nonnormality, homoge-

neous variance may also be obtained.

When a transformation is applied, tests of significance are per-
formed on the transformed data. When the transformations are nonlinear
(e.g., log, square root), confidence limits computed in the transformed
scale changed to the original form are asymmetrical (Sokal and Rohlf,
1969). However, the use of the standard error in the original scale is
misleading and should not be used with transformed data. The log and
square root transofrmations are discussed below.

The most common of all transformations is the log transformation.
The log transformation consists of converting all the data into log-
arithms, usually common logarithms. There are two situations where the
log transformation is quite effective. First, is with skewed right
distributions (very common in lake parameters). The log transformation
of skewed data will usually result in a normal distribution. The second
situation is when the mean is positively correlated with the variance
(i.e., large means have large variances). The variance should become
independent of the mean when the data undergoes a logarithmic transfor-
mation. When dealing with log transformation, two helpful hints are:

1) if the data includes zero, add 1 to all values to avoid the log of
zero (i.e., negative infinity) and 2) if there are data between 0 and 1,
multiply all values by 10 or some power of 10, to avoid the negative
values that will result.

For stronger relationships of mean to variance (such as the mean

directly proportional to the variance) or for skewed distributions, the

84

square root transformation may be appropriate for obtaining approximate
normality as well as homogeneous variance. When the data include zero's,
one-half should be added to remedy the problem of trying to take the
square root of zero.

Other transformations are available, for example reciprocal,
arcsine, and probit. For a further discussion of these and the subject
of transformations, consult Tukey (1962), Sokal and Rohlf (1969), Neter
and Wasserman (1974), and Mosteller and Tukey (1975).

A final form of data transformation is the process of removing
seasonality in the data. This is only necessary when dealing with data
in less than yearly time units. The commonest methods of removing
seasonality are differencing and modeling. The differencing technique
is used by forming a grand mean for each identical time unit (i.e., all
the same month, all the same season) and form a new time series as the
difference between the raw time series and the corresponding grand
mean. Durbin and Murphy (1975) and Cleveland and Tiao (1976) consider
the use of an additive-multiplicative model and ARIMA model, respec-
tively. For more information on the seasonal adjustment of data,

consult Wallis (1974).
Statistical Tests for Trend Detection

The final step in the trend detection method is to compute a test
statistic using an appropriate statistical test. This test statistic
is compared to a table of critical levels and the strength of evidence
(i.e., level of alpha that rejects H0) is determined. Classically, the

alpha level has been chosen apriori (i.e., before the data are examined).

85

In the case of detecting trends in lake parameters the concern is with
the degree of certainty that the data exhibit some type of trend. Thus,
the alpha level that provides acceptance of H1 (i.e., that a trend
exists) should be used to provide the necessary quantitative information
needed for planning, policy and management decisions. Three cases of
data are possible, based on type of distribution and time dependency,
and will determine which statistical test to be used. The three cases
are:

1. Normal and Independent Data

2. Symmetric and Independent Data

3. Dependent Data
For each case, statistical tests are discussed to allow computation of

the necessary statistics to test for the presence of trends.

Normal and Independent Data

All classical parametric statistics require that the data come from
a normal population and are independent of each other (i.e., independent
in time for trend analysis). However, when dealing with lake systems
the validity of these assumptions is highly questionable. Many lake
parameters will exhibit skewed distributions which may be converted to
normality through transformations of the data. A more important issue
is the small sample size of data usually available for trend detection.
As discussed before, it is difficult to disprove normality on small data
sets, even when the data grossly deviate from normality. The issue of
autocorrelation in lake data is usually more of a problem than is nor-

mality. Since most lakes do not renew their total volume of water very

86

quickly, one data point collected in time will be dependent upon the
previous data point in time (to some degree). Since the majority of
lake parameters are autocorrelated over time, the assumption of inde-
pendent data is violated. The violation of the independent data assump-
tion has been almost completely ignored in past lake trend analyses.
While it is unlikely that both of these assumptions would be valid,
especially independence, it may be possible in some situations. There-
fore, a statistical test is provided for this case. If doubt exists
about these assumptions, one should use a test from one of the other
cases that do not require the assumptions.

When testing for the difference between means of normal variables,
four cases, for which different methods are needed, may be distinguished:
1) known variances or large sample size, 2) unknown but equal variances,
3) unequal variances but equal coefficients of variation, and 4) unequal
variances and coefficients of variation (Gill, 1978). If the variances
of the populations are known from prior research or the sample size is
large (n > 200), case 1 should be used. Case 3 and 4 are less powerful
than case 2 and are needed only when variances cannot be made equal
through transformations. Since cases 1, 3, and 4 are rare, only the
second case will be discussed here. For case 1, 3, and 4 a statistical
textbook should be consulted (e.g., Gill, 1978; Sokal and Rohlf, 1969).

Case 2 is used when the variances are unknown but the hypothesis
of equal variances has been accepted (the test for homogenious variance
is in an earlier section). The statement that the variances are unknown,
implies that the true population variance is not known, while a variance
estimate, calculated from the sample data, is known, and used to test

for equal variances. The variances discussed here are for the two

87

populations that are designated by the null hypothesis and not the in-
dividual variances for each time unit, as was tested for in the homoge-
neous variance section.

This case involves the use of the well known two-sample t-test.

The test statistic is computed from

t = (X1 - X2) [5 (l/nl) + (1/n2)] (43]

where: XI mean of population 1

xi
ll

2 mean of population 2

(A
ll

standard deviation of population 1 and 2

n1 = sample size of population 1

sample size of population 2

The standard deviation (5) for the total population (p1 and p2) is

Y‘ T‘ Y‘ Y‘
1 1 2 2
‘ i=1 i=1 i=1 i=1 _

(r1 + r2 - 2)

 

= (44)
The test statistic is compared to the critical values (Table 8 of Appen-

dix B):
- H
a, r1 + r2 2 1 1

- t _ for H : p > p
a1 r 5 r 2 1 1 2

If the test statistic exceeds tha appropriate critical value, the null

hypothesis is rejected and a trend is indicated with the probability of

a Type I Error less than a.

88

Symmetric and Independent

By relaxing the assumption of normal distribution to symmetric dis-
tribution, while still assuming independent data, nonparametric tests
may be used. Nonparametric tests should also be used if the data have a
small sample size to bypass the assumption of normality. Again, it
should be stressed that assuming independent data in lake systems is ex-
tremely questionable. The nonparametric tests suggested are the Mann-
Whitney and Spearman Rho, and these should be applied to step and linear
trends, respectively.

The Mann-Whitney test is one of the most powerful of the non-
parametric tests, especially when dealing with step changes between the
populations (Conover, 1971; Siegel, 1956). The data consist of two ran-
X

dom samples. Let X Xn denote the random sample of size n1

1’ 1

2, ...,

from population 1, and let Y1, Y , Yn denote the random sample of

2
size n2 from population 2 (defined by null hypothesis). The test sta-

1, ...

tistic is calculated in the following manner.

Step_1. Rank the combined sample of X's and Y's from
smallest to largest. For ties, take average of
ranks that would have been assigned to them had
there been no ties.

Step 2. Sum the ranks for the X's

n
s = Z R<Xi)
i=1

Step 3. Compute the test statistic

_ n1(“1 + 1)

2 (45)

T = s

 

89

The lower critical values Wu n n are found in Table 9 of Appendix B.
’ 1’ 2

Substitute n1 for n and n2 for m in Table 9. The upper critical level

is computed by:

W = n n = W (46)

An alternative to using upper critical values, the statistic T', defined

as
- T (47)

may be used with the lower critical values whenever an upper tailed
test is desired.
The following decisions rules should be used for their respective

alternative hypothesis.

H1: p1 < p2 Reject H0 when T lS less than Wﬁ

H1: p1 > p2 Reject H0 when T is greater than

. . .
W a or if T 15 less than Wk

1-
When testing for continuous (i.e., linear) trends, the Spearman

Rho test is suggested (Conover, 1971; Siegel, 1956). The data X1,

X2, ..., Xn is used to compute a test statistic:
" 2
T = Z (R (Xi) - i) (48)
i=1
th

where: R(Xi) = rank of the i observation Xi in the sample size
of n

The null hypothesis is rejected if T is less than W

a, n(H0: “1 2 P2)

or if T is greater than 1 - Wﬁ (H0: p1 < “2), with W& values found in
Table 10 of Appendix B. The quantities of the Hotelling-Pabst Test sta-

tistics are used in Table 10.

90

Dependent Data

When dealing with lake parameters over time, time dependency is al-
most certain to occur. This autocorrelation leads to violation of the
assumption of independent data. The sensitivity of statistical proce-
dures to violations of the independence assumption is well known
(Gastwirth and Rubin, 1971, 1975; Serfling, 1968). Thus, some procedure
is needed if quantitative information is desired about trends in lake
parameters over time. Two similar approaches are those of Lettenmaier
(1976, 1977) and Sen (1963, 1965). The test statistics proposed by Sen
were shown to be asymptotically normal for large sample sizes (n > 1000),
but they were found unsuited for application to small and medium sample
sizes (Lettenmaier, 1976). The method suggested here is that proposed
by Lettenmaier (1976, 1977). In this method, the test critical levels
are corrected based on the degree of dependence. The correction values
were generated using Monte Carlo simulation. The data dependence is as-
sumed to be of the lag one Markov type. Test rejection levels were gen-
erated based on p, n, a, and trend magnitude and compared to the inde-
pendent case. For a more detailed discussion of the generation of the
correction values consult Lettenmaier (1976, 1977).

When testing the trend hypothesis with dependent data, the follow-
ing procedure should be used.

§tep_1. Calculate the test statistic from data using the

Mann-Whitney or the Spearman Rho test for step or

linear trend hypothesis, respectively.

91

Step 2. The modified critical level is calculated as the

upper or lower rejection level, plus a scaled

difference:
I — -
Wu - L1 + f(n, a, p) (Wa L1) (49)
L L
wau = Lu + f(n, a. p) (wo,u - Lu) (50)

where: w; = modified lower critical level

‘5
u

modified upper critical level

L1 = lower bound of the test statistic of outcome
distribution

L = upper bound of the test statistic of outcome
distribution

f(n, a, p) = correction value from Table 11 or 12 from Appendix B

for Mann-Whitney or Spearman Rho test, respectively

WA = lower critical level for independent data
L

W& = upper critical level for independent data
u

For both, the Mann-Whitney and Spearman Rho, the lower
bound of the test statistic (L1) is zero. The upper
bound (Lu) is [n(n2 - l)]/3 for the Spearman Rho and

n1 n2 for the Mann-Whitney, where n1 and n2 are the
number of data in the first and second partitions,
respectively. The a levels used in f(n, a, p) to
determine the correction value from Tables 11 and 12

of Appendix B are for a two-sided test. If a one-sided
test is desired at significance level a, a value of 2a

should be entered into f(n, a, p).

92

Spep_§. The modified critical level is compared to the test
statistic calculated in Step 1, with H0 accepted or
rejected under the same circumstances as for independent
data.

The power associated with the data may be calculated for the Mann-
Whitney and Spearman Rho by using an equivalent number of independent
samples, based on the autocorrelation of the data, in the normalized
power function. The procedure is as follows:

§tep_1. Calculate the number of equivalent independent

samples based on lag 1 Markov dependence (Matalas

and Langbein, 1962).

+ 2+
% 1+2_(9+sz) (51)
(p '1)

344:

where: n* = number of equivalent independent samples

n = sample size
p = lag 1 correlation coefficient
t = sampling interval (t = 1 for equal spaced samples)

Step 2. Calculate the trend number:

Tr(n*)1/2
"r = T (52)
where: NT = trend number (dimensionless)
Tr = trend magnitude (for Step Trend Tr = p1 - p2 )
o = standard deviation of time series (5, an

estimate of a should be used)
Step 3. Calculate the power:

1 - B = F(NT - t (53)

1 - a/2,V)

93

ll

where: 1 - 8 power of test

F

cumulative distribution function of a
standard student's t distribution

_ . . , . . _
t1 _ a/2 — critical level of student 5 t distribu
tion at probability level 1 - a/2
(two-tailed test)

V = degrees of freedom (n - 2)

Table 8 of Appendix B lists the values for the

Student's t distribution.

The procedure for calculating the power of the Spearman Rho test,

is the same as for the Mann-Whitney, except N ' is substituted for N

T T’

where

Tr [n* (n* + 1) (n* - m“2

N _
1/2 n*

 

' (54)
T (12) o

where: Tr = trend magnitude (for linear trend Tr = A0)

One problem with these procedures is that the standard deviation
and lag 1 correlation coefficient are assumed to be known. When sample
size is greater than 30 and data are independent, 5 may be substituted
for a with little sample error (Lettenmaier, 1977). However, sample
error in the lag 1 correlation coefficient estimation is of more
concern, since errors in p will effect the estimation in the equivalent
independent sample size (n*) and the variance (Bayley and Hammersley,
l946). These errors will be translated into uncertainty in the estima-
tion of NT or N}. Thus, the problem of uncertainty in the estimations

of the parameters (n*,s) should be considered when the results are being

used for planning, policy and management of lake ecosystems.

94

Time Series Modeling

 

After completing the trend detection phase, one may wish to model
the time series data to provide information on the pattern of change
over time and to forecast future events. The subject of time series
modeling has become increasing important and it's use is expanding. The
attempt here is to provide basic information only and not an indepth
study. The range of books, for those interested in a more complete re-
view, include: 1) introductory texts (Kendall, 1973; Chatfield,1975),
2) spectral analysis (Jenkins and Watts, 1968; Bloomfield, 1976),

3) univariate forecasting (Box and Jenkins, 1970; Granger and Newbold,
1977), and 4) a general review article on time series analysis (Chat-
field, 1977). Only the basic method of fitting polynomials and the use
of linear regression analysis will be discussed here.

A general model for the deterministic (trend) component is the
polynomial model. Polynomial curves are convenient approximations for

nonlinear relationships. The polynomial function is of the general

form:
- _ 2 3
Y — a + bx + cX + dX + ... (55)
where: Y = dependent variable
X = independent variable
a, b, c, d = coefficients

The polynomial involving only the terms of X and X2, will yield a
parabola with one inflection point. As increasing powers of X are
used, the curve becomes more and more complex and will fit data

increasingly well. However, each added power to X, one degree of

95

freedom is lost and the test of significance is harder. For most work,

a cubic polynomial is the upper limit of degree. The aim of a polynomial
model is to obtain a better fitting regression to a set of points. This
is done by adding a quadratic term (X2) to the regression equation and
observing if a significant portion of the residual sum of squares is re-
moved. The same is done for cubic and possibly higher terms. The nu-
meric procedures of calculating third or higher degree polynomials
usually involves matrix inversion and the use of a computer. For a
second degree polynomial, Steel and Torric (1960) or Sokal and Rohlf
(1969) should be consulted for the necessary formulas.

The methods for developing a linear regression model was discussed
in Chapter III. The major problem with the use of linear regression on
water quality time series data is the violation of the assumptions for
the use of linear regression. As has been discussed, most water quality
parameters are autocorrelated. Thus, the assumption of independent data
is violated. While the model itself may not be able to statistically
test for a trend, the pattern or shape of the time series data can be
illustrated. Linear regression can be used for forecasting future events
if one assumes that conditions under which the data was collected will
continue and be uneffected or changed. Other areas of regression analy-
sis which may be of use are: 1) distribution-free regression analysis
(Hollander and Wolfe, 1973), and 2) regression with autocorrelated
residuals (Neter and Wasserman, 1974; Kendall, 1973).

Linear and polynomial regression are simple techniques that can
be used for modeling the pattern of change in a parameter over time.
However, their use in forecasting is limited. Forecasting should in-

volve the use of more complicated models and mathematical techniques not

96

explained here. Gilchrist (1970) provides an excellent review of sta-
tistical forecasting. Also, the references cited at the beginning of
this section can be used to provide more information on methods of fore-

casting with time series data.

Application

 

In order to allow for better understanding of the trend detection
methods and their usage, two examples are presented, one for a linear
trend and the other for a step trend. The linear trend uses real data,
while the step trend is a hypothetical situation. The examples were
chosen in an attempt to cover the majority of techniques. The format
is both verbal and numerical to provide a clear explanation of the tech-
niques. Some equations will not be repeated, but are referenced by the

equation number or the step number for a given procedure.

Linear Trend Example

Two problems with Lake Ontario are the increased algal densities
and nutrient loading to the lake. While phosphorus is often the limit-
ing nutrient in clean (oligotrophic) lakes, nitrogen has an increasing
effect as water quality declines. Thus, to provide information on these
two problems, the mean nitrate concentration in the springtime for Lake
Ontario is examined for possible trends in the years 1968-1979. This
information can be used to compare algal population growth and the

nitrogen loading over these same years.

97

The mean springtime nitrate concentrations (pg/l) from Lake Ontario
during 1968-1979 are in data set 2 in Appendix A. The data are mean
values of samples collected evenly through the spring. The variances
are assumed equal between different years because of limnological knowl-
edge. Based on the knowledge that nutrient loadings have increased in

this time period, the null and alternative hypotheses are

(i.e., a positive trend is hypothesized).

The analysis is started by graphically displaying the data in a
time series plot (Figure 18) and a cumulative sum plot (Figure 19). The
following shows the calculation for the cusum points. A value of j = 1
is chosen to provide the maximum number of cusum points (i.e., maximum
information). The cusum points are calculated in the following manner
(Equation 31):

h

n
cu. Z X. - j/n Z X.
3 i=1 ' i=1 '

cu1 = 215. - 1/12 - (215 + 237 + ... + 335)
= -59

cu2 = (215 + 237) - 2/12 . 3288
= -96

cu11 = (215 + 237 + ... + 324) - 11/12 ° 3288

98

.AN.< pom mumuv cowumcucmocoo mumcu_c we uo_a mmwcmm «EFF

2::

is.

as.

 

as.

—

$2

«3.

q

:2

d

u

l

 

.m_ menace

8N
CNN
SN
com an)
23 M
8m
am
can

99

.AN.< uom mumvv mcovumgucmucoo wumcpwc eo uo_a Ezmzu .o_ mczmwu

as;

it 0%. 1.2 «so. out mom.
qt ,. . . q . u . 1 OO_|

 

41 d

is
‘7'
(0/5”)£ON

 

100

Both the time series plot and cumulative sum plot seem to indicate a
linear trend and further analysis should be conducted. The mode to mean
ratio is used to provide a quick check of normality. The mode to mean
ratio (Equation 36) is 0.97, which suggests the distribution is slightly
skewed right. The skewness is shown in the relation between the mean
and mode (Data set 2 in Appendix A). The W test is not used here due to
the very small sample size and the inability of the test to determine
non-normality under these conditions. Therefore, to be safe and not
violate the normality assumption, the distribution is considered
symetric, which is much easier to satisfy.

The next step is to estimate the autocorrelation coefficient and
test to see if the autocorrelation is zero. To estimate p1 (Equation 39)
both ck (k is equal to 1 in this case) and co need to be calculated in

the following manner (Equation 40 and 41):

i5 [(215 ' 274) (237 - 274) + (237 - 274) (242 - 274) + ... +

C1:
(324 - 274) (335 - 274)]
= 968.67
co = %5 [(215 - 274) + (327 - 274) + ... + (335 - 274)]
= 1378.3
c1 968 67
r = -— = -——L- ‘ .703

1 co 1368.3
Thus, the estimate of the lag 1 autocorrelation coefficient is .703.
To test (H0: p = O) the Durbin-Watson test is used (Equation 42).

= {-37 - (-59) + (-32) - (-37) + ... + (61 — 59)]
(-59)2 + (—37)2 + ... + 612

.0073

101

Since the calculated d value is less than dL (critical level) = 0.81,
H0 is rejected and the data are correlated. A technical note here is
that it is useful to first calculate the residuals so as to make the
calculations easier.

Based on dependent data and a linear trend hypothesis (from apriori
information and examining time series and cusum plots), the Spearman Rho

test is used, with adjusted critical levels, to test the hypothesis
(H1: “1 < 02).

Step 1. Using Equation 48, the test statistic is

T = (1 - 1)2 + (2 - 2)2 + (3 - 3)2 + 5 - 4)2 + 4 - 5)2 + ... +
(12 - 12)2
= 2

Step 2. Calculate the modified critical level (Equation 49)

I — -
waL— L1 + f(n, a, p) (w'aL L1)

where: L1 = 0

f(12, .01, .7) .63 from Table 12 Appendix B (generated)

W = W

a = 52 from Table 10 Appendix B
L

.005, 12

W' 0 + (.11) (52 - 0)

5.72

Since the test statistic (T = 2) is less than the critical level
(Wt;L = 5.72) one may reject the H0 and conclude that a linear trend
exists at a 99.5% confidence level.

The power associated with the Spearman Rho test for a linear

trend in the mean springtime nitrate concentration is as follows:

102

Step 1. Calculate the number of equivalent independent samples

Equation 51):

 

 

1 1 2(9t - pZt)
6* n 1 + (pt _ 1)2

1 _1___ 2 (p1 - p“)
E? 12 1 + (p1 _ 1)2
n* 2

Step 2. Calculate the trend number (Equation 54):

Tr [n* (n* + 1) (n* - 1)]"'/2
"T = 12 n* 0

 

Au for Linear Trends

where: Tr

100
38.78

Q
II

100 [2. (2. + 1) (2. - 1))“2

”i = 12 (2.12) (38.78)

0.86

Step 3. Calculate the power (Equation 53):

1 - B = F(NT - t1 - a/Z)

Substitute a for a/Z because the test is one-tailed.

where: t1 _ .005, 10 = 3.169
1 - B = F(.86 - 3.169)
= F(-2.309)

From the standard normal tables F(-2.31) is

equal to .010, therefore the power is .01.

103

Based on the analysis, one may conclude that mean springtime nitrate
levels have undergone a linear trend in Lake Ontario from 1968-1979 at
the 99.5% confidence level. However, due to the small sample size and
the large autocorrelation coefficient, only a power of .01 can be

associated with the test.

Step Trend Example

The following hypothetical example illustrates the procedures for
detection of step trends in lake parameters. For many years Lake A has
had excellent water quality which allowed for tremendous recreation usage.
These beautiful conditions prompted numerous housing developments on the
lake and the surrounding area. Nutrient loads have increased signifi-
cantly, to the point where the lake is experiencing accelerated eutrophi-
cation. The lake association consulted local officials and the state
Department of Natural Resources for possible solutions. The lake exhib-
its a short detection time, therefore if nutrient loading is curtailed,
water quality should respond fairly quick. A program was developed to
control the limiting nutrient of the lake, phosphorus. The management
program includes a sewage system, fertilizer control, and land use
practices. The costs involved for the plan, especially for the construc-
tion of sewer pipes to connect to the local treatment plant, are high.
During public meetings people expressed the concern of how sure can one
be that these measures will "cure" the lake.

Another lake system, Lake B, nearby, underwent the exact same
problem as Lake A. Since both lakes are limnologically very similar

and the methods for restoration are similar, data from Lake B can be

104

used to provide the degree of quantitative evidence that the plan will
be effective.

The hypothesis formed is that a step trend occurred starting at
the time of implementation of the water quality plan. Mean annual total
phosphorus is chosen as the lake parameter because of the availability

of data and the significance it plays in lake eutrophication. Thus,

”a' “1 “2
H0: p1 > p2
where: “1 = (X1, ... , X15) and p2 = (X16, ..., X30)

The value which divides p1 and p2 is dependent upon when the management
plan was initiated.

The mean annual total phosphorus concentrations are found in data
set 3 of Appendix A. Figure 20 and 21 show the time-series plot and
cusum plot, respectively. The sample residuals are shown in Figure 22
to check the validity of the assumption of homogeneous variance. The

residuals are calculated by using

e1 = Xi - X
Therefore,
61 = 77 - 77.7 = 7 - 0.7
62 = 83 - 77.7 = 5.3
6 = 64 - 77.7 = -13.7

30

105

.Am.< umm aemcv comvmgpcmucoo magogamoca _caou yo uo—a mmwcmm mewp

.233 as:

 

om mm ON m. o. m o

_l _ 1 _ _ q .00
. . .. .ow

d

. 1 .ON. .ml

.0

.W
. . . 1 .0w

 

.oN ac=m_a

106

.Am.< “mm cumvv m:o_umcucmu:ou magocqmoca page“ we uo_a Eamzu

3.8: we:

 

_m m_ m. N. m m m o

q

-
-
‘

 

._N 61=m_1

(ll/5”) 31

107

um

.3.< umm 363 £26.58. masocamocq :38 b6 SE .mm 6.33...

;
NN N 8 2 N. N s

 

. _ _ 4 4 _ oN-
. 1 2..
- o...
- m-

 

108

The residual plot tends to be fairly horizontal and thus suggesting
homogeneous variannce. Also, no extreme value is indicated in the plot.
The W-test is used to test for normality. The steps are:
§t_p_1. Order the array X1, X2, ..., X”:
58.0, 59.0 ..., 94.0, 95.0
Step_2. Compute the variance (Equation 11):

52 = 143.80

Step 3. Since n is even,

Then, compute (Equation 34):

b = .0076 (95.0 - 58.0) + .0227 (94.0 - 59.0) +
. + .4254 (81.0 - 80.0)
b = 25.66

St p 4. Compute test statistic (Equation 35):

w 25.662/143.60

4.58

Since the test statistic (W = 4.58) exceeds the critical value
(W 09’ 30 = .983) one may ppt reject Ho, therefore one assumes normality.
The mode to mean ratio is .99, which also suggests normality.

The lag 1 autocorrelation coefficient calculations are shown in
the linear trend example, and thus repetition is not necessary. The
value of r1 (estimate of p1) is 0.46. The Durbin-Watson was used to
test whether p is significantly different from zero. The results

(computation not shown) show p is significantly different from zero at

the 99% confidence level.

109

The analysis of the data showed a possible step trend, normal
and dependent data. Therefore, the Mann-Whitney test will be used to
test the hypothesis of trend.

Step 1a. Rank all values from each population from

smallest to largest (already done in W-test)

Step 1b. Sum the ranks for pl:

5 13 + 18 + ... + 30
(Note, for values of 80.0 and

80.0, the ranks would have been
25 and 26, thus 25.5 is used for

each.)

s 325

Step 1c. Compute test statistic (Equation 45):

 

_ 15 (15 + 1)
2

_q
l

- 325

= 205

Step 2. Modify the critical level (Equation 50):

W& = Lu + f(n, a, p) (Wa - Lu)
u u
where: Lu = n1 - n2
= 15 - 15
= 225
f(.30, .01, .46) = .491 (interpolated from Table 8.11)
".01, 15, 15 = 57
W = W _ = n . n - W
au 1 a 1 2 a, n1, n2

= (15 ° 15) ' 57
= 168

110

225 + (.491) (168 r 225)

QEE
ll

197.01
Since the test statistic (T = 205) exceeds the modified critical level
(W; = 197.01), H0 is rejected and one may conclude a step trend at a
99.5% confidence level.

The associated power of the Mann-Whitney test is calculated in
the following manner.

Step_1. Calculate number of equivalent independent
samples (Equation 51):

+ 2 (.46 - .462)
(.46 - 1)2

-.l_
K 30 1

 

:le

n* = 11
Step 2. Calculate the trend number (Equation 52):

1/2
- Tr(n*)

NT — 20

where: Tr = lpl - “2| for step trends

[86.47 - 68.93|
17.54

N. 17.54 (11)1/2
T 2 - 11.99

 

2.43

Step 3. Calculate the power (Equation 53):

1 — B = F(NT - t1 - a, v)

where: t1 _ .005, 28 = 2.763

F(2.43 - 2.763)

F(-.333)

111

From the standard normal tables F(-.33) is equal to .371, therefore the
power of the test is .371.

Based on the test results one may conclude with 99.5% confidence
and a power of .37 that a step trend is present in mean annual total

phosphorus concentrations in Lake A.

CHAPTER V
CONCLUSIONS

Most phases of water quality management are concerned not only with
the present condition, but with the spectrum from past to future condi-
tions. Hence, information on the changes in water quality parameters
over time is important for water quality management. The use of statis-
tical techniques to detect trends in water quality parameters has the
potential to provide the quantitative information needed for water re-
sources management and planning decisions. The usefulness of the infor-
mation generated by the detection of trends in water quality parameters
should greatly increase in the future.

The method presented here, provides:

1. formulation of a problem (hypothesis)

2. selection of water quality parameter(s) and data

3. data analysis techniques

4. statistical tests for detection of trends

The development of any quantitative methodology usually has its
roots in a need exhibited in the area of management or planning. Thus,
the use of the trend detection technique arises from a need discerned in
the management area for quantitative information on changes in water
quality parameters over time. The starting point for the use of trend
detection techniques is the determination of a desired output for which
proper management, and planning decisions are to be based upon. This
desired output determines the parameter(s) and data chosen, and the for-
mulation of the null and alternative hypotheses. It is extremely impor-
tant to properly consider the desired output in the selection of data

and hypothesis formation, for decisions made based on the results of the

112

113
trend detection analysis, must be in lieu of the input information
(i.e., parameter(s), data, and hypothesis).

The preliminary data analysis techniques are a necessary step for
trend detection analysis. The data analysis step provides: 1) visual
evidence of trends and validity of assumptions, 2) statistical tests to
determine the data's distribution, 3) statistical tests for the verifica-
tion of the assumptions of homogeneous variance and independent data,

4) methods of examining "outliers," and 5) types of data transformations.
The data analysis techniques presented are the ones generally accepted
by applied scientists. However, many other techniques are available

and may be substituted, given the alternative technique is as powerful.
The importance of preliminary data analysis cannot be overstressed. The
information provided by the examining the data will: 1) determine which
statistical tests to use for detecting trends in water quality param-
eters, and 2) provide a “good feel" for the data to allow for the utmost
evaluation and application of the trend detection results to water
quality management and planning decisions.

The autocorrelation present in most lake data, prohibits the use of
most statistical techniques, including regression, based on the viola-
tion of the independency assumption. The violation of the independency
assumption, as well as the normality assumption, is a major problem with
the majority of lake analyses conducted in the past. The continued vio-
lation of these assumptions is not necessary, when there are techniques,
such as the one presented here, that accommodate the dependent data
situation. Thus, the statistical tests for dependent data are suggested
for use, almost exclusively, when dealing with lake systems. The de-
pendency of the data is used, along with sample size and alpha level, to

correct the critical level of the statistical test.

114

As with most statistical techniques, there are some limitations to
the trend detection method. The major problem, especially with trend
detection analysis for water quality parameters, is the poor power as-
sociated with small sample sizes. The small sample size problem is com-
pounded by increasing autocorrelation. When the sample size is small
and a large autocorrelation of the data is present, the number of effec-
tive independent samples is exceedingly small. Since the power is a
function of the effective independent samples, as effective samples de-
crease, so does the power. This is exemplified in the step trend exam-
ple in the application section of Chapter 4. Based on a sample size of
12 and an autocorrelation of 0.7 the number of effective independent
samples is 2, and this results in a power of 0.01.

Recalling Figure 11, the power is the probability of accepting H1
when in reality H1 is true. In trend detection, the power means the
probability of detecting a trend when in truth a trend exists. From a
management aspect, the power is important since the information usually
desired is whether a parameter has changed over time (i.e., did phos-
phorus concentrations change?).

Another problem with the trend detection technique for dependent
data, is the assumption that both the variance and autocorrelation are
known. The assumption of known variance and autocorrelation is true
when the estimates for the variance and autocorrelation are accurate
(i.e., known with certainty). Accuracy in the estimate of variance oc-
curs when: 1) the sample size exceeds 30, 2) the data are independent,
and 3) the sampling design is good. Autocorrelation accuracy is very
important, for the autocorrelation is used in calculating: 1) effective
independent samples, and 2) variance estimate. Both the number of ef-

fective independent samples and the variance estimate are used to

115
calculate the test statistic (NT or N+) which determines the power of
the test. The problem of not knowing the variance and autocorrelation
may be accomodated by using upper confidence limits of estimates of the
variance and autocorrelation to provide conservative results.

Water quality management requires an understanding of aquatic eco-
systems. Basic to the understanding of aquatic systems is sound infor-
mation on which an understanding can be derived from. Thus, for water
quality management, information is needed on the changes in water qual-
ity parameters over time. The detection of trends in water quality
parameters allows for the management of water resources for their full
beneficial use. Thus, trend detection techniques for water quality
parameters is an integral part of water resources management for pro-

tecting environmental quality.

LIST OF REFERENCES

LIST OF REFERENCES

Atkinson, J. 0., 1967. An investigation of Goodness of Fit Tests for
Normality. Masters Thesis. Iowa State Univ., Ames, Iowa.

Bartlett, M. S., 1935. Some Aspects of the Time-correlation Problem in
Regard to Tests of Significance. J. R. Stat. Soc. 98:536-543.

Bartlett, M. S., 1937. Some Examples of Statistical Methods of Research
in Agriculture and Applied Biology. J. R. Stat. Soc (suppl.)
4:137-147.

Bayley, G. V. and J. M. Hammersley, 1946. The Effective Number of Inde-
pendent Observations in an Autocorrelated Time Series. J. R. Stat.
Soc. (suppl.) 8(1):184-197.

Beckers, C. V., S. G. Chamberlain, and G. P. Grimsrud, 1972. Quantita-
tive Methods for Preliminary Design of Water Quality Surveillance
Systems. EPA-R5-72-001.

Behnken, D. W. and N. R. Draper, 1972. Residuals and their Variance
Patterns. Technometrics 14:101-12.

Bhattacharyya, G. K. and R. A. Johnson, 1977. Statistical Concepts and
Methods. J. Wiley & Sons, New York.

Bloomfield, P., 1976. Fourier Analysis of Time Series: An Introduction.
J. Wiley & Sons, New York.

Box, G. P. and G. M. Jenkins, 1970. Time Series Analysis: Forecasting
and Control. Holden-Day, San Francisco, Calif.

 

Breiman, L., 1973. Statistics with a View Toward Applications.
Houghton-Mifflin, Boston, Mass.

Cairns, J., Jr., K. L. Dickerson, and G. F. Westlake (editors), 1977.
Biological Monitoring of Water and Effluent Quality, STP607, Amer.
Soc. for Testing and Materials, Philadelphia, Pa.

Carlson, R. E., 1977. A Trophic State Index for Lakes. Limnol.
Oceanogr. 22(2):361-369.

Chapra, S. C., 1980. Simulation of Recent and Projected Total Phospho-
rus Trends in Lake Ontario. JGLR 6(2):101-112.

Chatfield, C., 1975. The Analysis of Time-Series: Theory and Practice.
Chapman and Hall, London.

116

117

Chatfield, C., 1977. Some Recent Developments in Time-Series Analysis.
J. R. Stat. Soc. A 140(4):492-510.

Chatterjee, S. and B. Price, 1977. Regression Analysis by Example.
J. Wiley & Sons, New York.

 

Chen, E. H., 1971. The Power of the Shapiro-Wilk W Test for Normality
in Samples from Contaminated Normal Distributions. J. Am. Stat.
Assoc. 66:760-762.

Clevland, W. P. and G. C. Tiao, 1976. Decomposition of Seasonal Time
Series: A Model for the Census X-11 Program. J. Amer. Stat. Assoc.
71:581-587.

Cochran, W. G., 1963. SamplingTechniques. J. Wiley & Sons, New York.

 

Conover, C. J., 1971. Practical Nonparametric Statistics. J. Wiley &
Sons, New York.

 

Dobson, F. H., 1980. Lake Ontario Phosphorus Management: Observations
and Interpretations of Within-lake Trends 1965 to 1979. Paper pre-
sented at the Inter. Assoc. for Great Lakes Research Conference.
Kingston, Ontario.

Durbin, J. and M. J. Murray, 1975. Seasonal Adjustment Based on a Mixed
Additive-multiplicative Model. J. R. Stat. Soc. A 138:385-410.

Durbin, J. and G. 5. Watson, 1951. Testing for Serial Correlation in
Least-squares Regression. Biometrika 38:159-178.

Edwards, M. 0., 1980. Water Data and Services Available from Partici-
pants in the National Water Data Exchange. Water Res. Bult.
16(1):1-14.

Environmental Protection Agency, 1974. National Water Quality Inventory.
1974, EPA-440/9-74-001.

Environmental Protection Agency, 1976. Quality Criteria for Water.
U. S. Environ. Prot. Ag., Washington, D. C.

Gastwirth, J. L. and H. Rubin, 1971. Effect of Dependence on the Level
of Some One-sample Tests. J. Amer. Stat. Assoc. 66:816-820.

Gastwirth, J. L. and H. Rubin, 1975. The Behavior of Robust Estimators
on Dependent Data. Ann. Statist. 3:1070-1100.

Gilchrist, W., 1976. Statistical Forecasting. J. Wiley & Sons,
New York.

Gill, J. L., 1978. Design and Analysis of Experiments in the Animal and
Medical Sciences. Iowa State Univ. Press, Ames, Iowa.

 

Granger, C. W. J., and P. Newbold, 1977. Forecasting Economic Time
Series. Academic Press, New York.

118

Grubbs, F. E. and G. Beck, 1972. Extension of Sample Sizes and Per-
centage Points for Significance Tests of Outlying Observations.
Technometrics 14:847-854.

Hartley, H. 0., 1950. The Maximum F-ratio as a Short-cut Test for
Heterogeneity of Variance. Biometrika 37:308-312.

Hollander, M. and D. A. Wolfe, 1973. Nonparametric Statistical Methods.
J. Wiley & Sons, New York.

Hutchinson, G. E., 1957. A Treatise on Limnology, Volume 1. J. Wiley &
Sons, New York.

 

Hynes, H. B. N., 1970. The Ecology of RunningWaters. Univ. of Toronto
Press, Toronto, Canada.

 

Jenkins, G. M. and D. G. Watts, 1968. Spectral Analysis and Its Appli-
cations. Holden-Day, San Francisco, Calif.

Kendall, M. G., 1973. Time-Series. Hafner Press, New York.

 

Kendall, M. G. and A. Stuart, 1967. The Advanced Theory of Statistics,
Vol. 2, Inference and Relationships. Griffin, London.

 

Kenkel, J. L., 1975. Small Sample Tests for Serial Correlation in
Models Containing Lagged Dependent Variables. Review of Econ. and
Stat. 57:383-386.

Lettenmaier, D. P., 1976. Detection of Trends in Water Quality Data
from Records with Dependent Observations. Water Res. Bult.
12(5):1037-1046.

Lettenmaier, D. P., 1977. Detection of Trends in Stream Quality Moni-
toring Network Design and Data Analysis. Technical Report 51,
C. W. Harris Hydraulic Lab, Dept. of Civil Eng., Univ of Washington.

Lund, J. W. G., 1949. Studies on Asterionella, I. The Origin and
Nature of the Cells Producing Seasonal Maxima. J. Ecol.
37:389-419.

 

Lund, J. W. G., 1950. Studies on Asterionella formosa Hass, II.
Nutrient Depletion and the Spring Maximum. J. Ecol. 38:1-35.

Matalas, N. C. and W. B. Langbein, 1962. Information Content of the
Mean. J. Geophys. Res. 67 (9):3441-3448.

McNeely, R. N., V. P. Neimanis, and L. Dwyer, 1979. Water Quality
Source Book: A Guide to Water Quality Parameters. Inland Waters
Directorate, Ottawa, Canada.

Mosteller, F. and J. W. Tukey, 1977. Data Analysis and Regression.
Addison-Wesley, Reading, Mass.

Mood, A. M. and F. A. Graybill, 1963. Introduction to the Theory of
Statistics. McGraw-Hill, New York.

119

Neter, J. and W. Wasserman, 1974. Applied Linear Statistical Models.
R. D. Irwin, Inc., Homewood, Ill.

 

Olkin, I. and J. W. Pratt, 1958. Unbiased Estimation of Certain Corre-
lation Coefficients. Ann. Math. Stat. 29:201-211.

Reckhow, K. H., 1980. Lake Data Analysis and Nutrient Budget Modeling.
Corvallis Environ. Res. Lab., Environmental Protection Agency
(in press).

Reckhow, K. H. and S. C. Chapra, 1980. Engineering Approachs for Lake
Management: Data Analysis and Modeling. Ann Arbor Science, Ann
Arbor, Mich. (in press).

 

 

Rockwell, D. C, C. V. Marion, M. F. Palmer, 0. S. Devault, and R. J.
Bowden, 1979. Environmental Trends in Lake Michigan, 1p: Phos-
phorus Management Strategies for Lakes. Ed. R. C. Loehr. Ann
Arbor Science, Ann Arbor, Mich.

Sanders, T. G., D. P. Adrian, and B. B. Berger, 1976. Designing a
River Basin Sampling System, Report No. 62, Dept. of Civil Eng.,
Univ. of Mass.

Serfling, R. J., 1968. The Wilcoxon Two-sample Statistic on Strongly
Mixing Processes. Ann. Math. Statist. 39:1202-1209.

Sen, P. N., 1963. On the Properties of U-Statistics When the Observa-
tions Are Not Independent. Calcutta Stat. Ass. Bult. 12(47):69-92.

Sen, P. N., 1965. Some Nonparametric Tests for M-Dependent Time Series.
J. Amer. Stat. Ass. 60(1):134-147.

Sen, Z., 1978. Autorun Analysis of Hydrologic Time Series. J. Hydrol-
ogy 36:75-85.

Sen, 2., 1979. Application of the Autorun Test to Hydrologic Data.
J. Hydrology 42:1-7.

Sherwani, J. K. and D. H. Moreau, 1975. Strategies for Water Quality
Monitoring. Environ. Sci. and Eng. Pub. No. 398. Univ. of North
Carolina at Chapel Hill.

Shapiro, S. S. and M. B. Wilk, 1965. An Analysis of Variance Tests for
Normality, J. Am. Stat. Ass. 67:215-216.

Siegel, S., 1956. Nonparametric Statistics for the Behavioral Sciences.
McGraw-Hill, New York.

Sokal, R. R. and F. J. Rohlf, 1969. Biometry. Freeman, San Francisco,
Calif.

Springer, E. P. and G. F. Gifford, 1980. Spatial Variability of Range-
land Infiltration Rates, Water Res. Bult. 16(3):550-552.

120

Steel, R. G. D. and J. H. Torrie, 1960. Principles and Procedures in
Statistics. McGraw-Hill, New York.

 

Stumm, W. and J. J. Morgan, 1970. Aquatic Chemistry, An Introduction
EmphasizingChemical Equilibria in Natural Waters. Wiley-
Interscience, New York.

Trussell, R. P., 1972. The Percent Un-ionized Ammonia in Aqueous Ammo-
nia Solutions at Different pH Levels and Temperatures. J. Fish.
Res. Bd. Canada 29:1505-1507.

Tukey, J. W., 1962. The Future of Data Analysis. Ann. Math. Stat.
33:1-67.

Tukey, J. W., 1977. Exploratory Data Analysis. Addison-Wesley, Reading,
Mass.

 

Vollenweider, R. A., 1968. The Scientific Basis of Lake and Stream
Eutrophication, with Particular Reference to Phosphorus and Nitro-
gen as Eutrophication Factors. OECD Paris-Tech. Rep. DAS/DSI/68.

Walker, W. W., Jr., 1979. Use of Hypolimnetic Oxygen Depletion Rate
as a Trophic State Index for Lakes. Water Resour. Res.
15(6):1463-1470.

Wallis, K. F., 1974. Seasonal Adjustment and Relations Between Vari-
ables. J. Amer. Stat. Ass. 69:18-31.

Ward, R. C., 1973. Data Acquisition Systems in Water Quality Manage-
ment. EPA-R5-73-014.

Weber, C. I., 1980. Federal and State Biomonitoring Programs. lg:
Biological Monitoring for Environmental Effects. Ed. D. L. Worf.
D. C. Heath, Lexington, Mass.

Weir, J. 8., 1960. Standardized t. Nature (London) 185:558.
Wetzel, R. G., 1975. Limnology. W. B. Saunders, Philadelphia, Pa.

Wetzel, R. G. and P. H. Rich, 1973. Carbon in Freshwater Systems. Ip:
Carbon and the Biosphere. Ed. G. M. Woodwell and E. V. Pecan.
Tech. Info. Center, U. S. Atomic Energy Comm.

Wolman, G. M., 1971. The Nation's Rivers. Science 174(4012):905-918.

Worf, D. L. (Editor), 1980. Biological Monitoring_for Environmental
Effects. 0. C. Heath, Lexington, Mass.

Yule, G. U., 1921. On the Time-correlation Problem with Especial Refer-
ence to the Variate-difference Correlation Method. J. Roy, Stat.
Soc. 84:496-537.

Zar, J. H., 1974. Biostatistical Analysis. Prentice-Hall, Englewood
Cliffs, N. J.

APPENDIX A

Data Set A.1:

1223
1969
197a
1971
1972
1973
1974
1975
1976
1977
1978
1979

Mean total phosphorus concentrations (pg/1) in Lake

121

Ontario's offshore waters in springtime from 1969-1979

(Dobson, 1980)

Total Phosphorus
(pg P/L)

20.
20.
23.
21.
22.
21.
21.
21.
20.
17.
15.

8
9
0

 

.<
ll

226.

11

Mean

Median

Midrange

Geometric mean
Harmonic mean

Range

Variance

Standard deviation
Standard error of mean

Coefficient of variation

20.63
21.1

19.45
20.51
20.39

0.105

122

Data Set A.2. Mean nitrate concentrations (pg/1) for Lake Ontario in
spring from 1968-1979 (from Great Lakes Environmental

Research Lab).

 

Nitrate
Yep; Rank (pg/1) 3225
1968 1 215. 1
1969 2 237. 2 Mean = 274.0
1970 3 242. 3 Median = 267.5
1971 4 249. 5 Variance = 1503.64
1972 5 247. 4 Standard deviation = 38.78
1973 6 251. 6 Standard error of mean = 11.19
1974 7 287. 8 Coefficient of variation = 0.142
1975 8 286. 7 n = 12
1976 9 306. 9
1977 10 309. 10
1978 11 324. 11

1979 12 335. 12

123

Data Set A.3. Hypothetical mean annual total phosphorus concentration

for Lake B, over 30 years.

 

 

Phosphorus Phosphorus
m (119/1) 1% L691: 819/1) 351k
1 77.0 13 16 91.0 27
2 83.0 18 17 86.0 21
3 80.0 15 18 79.0 14
4 85.0 20 19 75.0 11
5 89.0 24 20 76.0 12
6 84.0 12 21 70.0 10
7 81.0 16 22 64.0 8
8 82.0 17 23 60.0 3
9 87.0 22 24 61.0 4
10 94.0 29 25 66.0 9
11 90.0 25.5 26 58.0 1
12 90.0 25.5 27 63.0 6
13 92.0 28 28 62.0 5
14 88.0 23 29 59.0 2
15 95.0 30 30 64.0 7

77.70 Mean 1 - 15) = 86.47

Overall Mean 1 (

143.80 Mean2 (16 - 30) = 68.93

Overall Variance

Overall Standard Deviation 11.99

n = 30

APPENDIX B

TABLE B. l.

124

 

 

SHAPIRO-WILK TEST FOR NORMALITY
Coefficients of Ordered Differences (ai n)
D

i n=11 12 13 14 15 16 17 18 19 20

1 .5601 .5475 .5359 .5251 .5150 .5056 .4968 .4886 .4808 .4734
2 .1315 .3325 .1125 .3118 .1106 .1290 .3273 .3253 .3212 .1211
3 .2260 .2347 .2412 .2460 .2495 .2521 .2540 .2553 .2561 .2565
4 . 429 .1586 .1707 .1802 .1878 .1939 .1988 .2027 .2059 .2085
5 .0695 .0922 .1099 .1240 .1353 .1447 .1524 .1587 .1641 .1686
6 .0303 .0539 .0727 .0880 .1005 .1109 .1197 .1271 .1334
7 . .0240 .0433 .0593 .0725 .0837 .0932 .1013
8 .. ... .0196 .0359 .0496 .0612 .0711
9 . .0163 .0303 .0422
10 . . .0140
i n=21 22 23 24 25 26 27 28 29 30

1 .4643 .4590 .4542 .4493 .4450 .4407 .4366 .4328 .4291 .4254
2 .3185 .3156 .3126 .3098 .3069 .3043 .3018 .2992 .2968 .2944
3 .2578 .2571 .2563 .2554 .2543 .2533 .2522 .2510 .2499 .2487
4 .2119 .2131 .2139 .2145 .2148 .2151 .2152 .2151 .2150 .2148
5 .1736 .1764 .1787 .1807 .1822 .1836 .1848 .1857 .1864 .1870
6 .1399 .1443 .1480 .1512 .1539 .1563 .1584 .1601 .1616 .1630
7 .1092 .1150 .1201 .1245 .1283 .1316 .1346 .1372 .1395 .1415
8 .0804 .0878 .0941 .0997 .1046 .1089 .1128 .1162 .1192 .1219
9 .0530 .0618 .0696 .0764 .0823 .0876 .0923 .0965 .1002 .1036
10 .0263 .0368 .0459 .0539 .0610 .0672 .0728 .0778 .0822 .0862
11 .0122 .0228 .0321 .0403 .0476 .0540 .0598 .0650 .0697
12 .0107 .0200 .0284 .0358 .0424 .0483 .0537
I3 . .0094 .0178 .0253 .0320 .0381
14 . .. .0084 .0159 .0227
15 . .. .0076

125

Table 8.1. Coefficients of Ordered Differences (a, ) (cont.)
2,,72 —'-'—'

i n=31 32 33 34 35 36 37 38 39 40

1 .4220 .4188 .4156 .4127 .4096 .4068 .4040 .4015 .3989 .3964
2 .2921 .2898 .2876 .2854 .2834 .2813 .2794 .2774 .2755 .2737
3 .2475 .2463 .2451 .2439 .2427 .2415 .2403 .2391 .2380 .2368
4 .2145 .2141 .2137 .2132 .2127 .2121 .2116 .2110 .2104 .2098
5

.1874 .1878 .1880 .1882 .1883 .1883 .1883 .1881 .1880 .1878

6 .1641 .1651 .1660 .1667 .1673 .1678 .1683 .1686 .1689 .1691
7 .1433 .1449 .1463 .1475 .1487 .1496 .1505 .1513 .1520 .1526
8 "1243 .1265 .1284 .1301 .1317 .1331 .1344 .1356 .1366 .137

9 .1066 .1093 .1118 .1140 .1160 .1179 .1196 .1211 .1225 .1237
0

1 .0899 .0931 .0961 .0988 .1013 .1036 .1056 .1075 .1092 .1108
11 .0739 .0777 .0812 .0844 .0873 .0900 .0924 .0947 .0967 .0986
12 .0585 .0629 .0669 .0706 .0739 . 770 .0798 .0824 .0848 .0870
13 .0435 .0485 .0530 .0572 .0610 .0645 .0677 .0706 .0733 .0759
14 .0298 .0344 .0395 .0441 .0484 .0523 .0559 .0592 .0622 .0051
15 .0144 .0206 .0262 .0314 .0361 .0404 .0444 .0481 .0515 .0546

16 ... .0068 .0131 .0187 .0239 .0287 .0331 .0372 .0409 .0444
17 ... ... ... .0062 .0119 .0172 .0220 .0264 .0305 .0343
18 ... ... ... ... ... .0057 .0110 .0158 .0203 . 244
19 . .. . .. .0053 .0101 .0146

20 ... ... ... ... ... ... ... ... ... .0049

126

 

Table 3.1. Coefficients of Ordered Differences (a. ) (cont.)
1,11 _

i- n=41 42 43 44 45 46 47 48 49 50

1 .3940 .3917 .3894 .3872 .3850 .3830 .3808 .3789 .3770 3751
2 .2719 .2701 .2684 .2667 .2651 .2635 .2620 .2604 .2589 2574
3 .2357 .2345 .2334 .2323 .2313 .2302 .2291 .2281 .2271 2260
4 .2091 .2085 .2078 .2072 .2065 .2058 .2052 .2045 .2038 2037
5 .1876 .1874 .1871 .1868 .1865 .1862 .1859 .1855 .1851 .1847
6 .1693 .1694 .1695 .1695 .1695 .1695 .1695 .1693 .1692 .1691
7 .1531 .1535 .1539 .1542 .1545 .1548 .1550 .1551 .1553 .1554
8 .1384 .1392 .1398 .1405 .1410 .1415 . 420 .1423 .1427 .1430
9 .1249 .1259 .1269 .1278 .1286 .1293 .1300 .1306 .1312 .1317
10 .1123 .1136 .1149 .1160 .1170 .1180 .1189 .1197 .1205 .1217
11 .1004 .1020 .1035 .1049 .1062 .1073 .1085 .1095 .1105 .1113
12 .0891 .0909 .0927 .0943 .0959 .0972 .0986 .0998 .1010 .1020
13 .0782 .0804 .0824 .0842 .0860 .0876 .0892 .0906 .0919 .0932
14 .0677 .0701 .0724 .0745 .0765 .0783 .0801 .0817 .0832 .0846
15 .0575 .0602 .0628 .0651 .0673 .0694 .071 .0731 .0748 .0764
16 .0476 .0506 .0534 .0560 .0584 .0607 .0628 .0648 .0667 .0685
17 .0379 .0411 .0442 .0471 .0497 .0522 .0546 .0568 .0588 .0608
18 .0283 .0318 .0352 .0383 .0412 .0439 .0465 .0489 .0511 .0532
19 .0188 .0227 .0263 .0296 .0328 .0357 .0385 .0411 .0436 .0459
20 .0094 .0136 .0175 .0211 .0245 .0277 .0307 .0335 .0361 .0386
21 ... .0045 .0087 .0126 .0163 .0197 .0229 .0259 .0288 .0314
22 .0042 .0081 .0118 .0153 .0185 .0215 .0244
23 . .0039 .0076 .0111 .0143 .0174
24 . ... ... .0037 .0071 .0104
25 . . . . .0015

 

Source: Gill (1978).

127

I

 

Table 38.2. Critical Values for W Statistic*

n u=0.9 0.5 0.10 0.05 0.02 0.01
11 .973 .940 .876 .850 .817 .79
12 .973 .943 .883 .859 .828 .807
13 .974 .945 .889 .866 .837 .81'
14 .975 .947 .895 .874 .846 825
15 .975 .950 .901 .881 .855 835
16 .976 .952 .906 .887 .863 .844
17 .977 .954 .910 .892 .869 .851

‘18 .978 .956 .914 .897 .874 .858
19 .978 .957 .917 .901 .879 .863
20 .979 .959 .920 .905 .884 .868
21 .980 .960 .923 .908 .888 .873
22 .980 ‘ .961 .936 .911 .59: .575
23 .981 .962 .928 .914 .895 .881
24 .981 .963 .930 .916 .898 .894
25 .981 .964 .931 .918 .901 .888
26 .982 .965 .933 .920 .904 .891
27 .982 ‘ .965 .935 .923 .906 .894
28 .982 .966 .936 .924 .908 .896
29 .982 .966 .937 .926 .910 .898
30 .983 .967 .939 .927 .912 .900
31 .983 .967 .940 .929 .914 .902
32 .983 .968 .941 .930 .915 .904
33 .983 .968 .942 .931 .917 .906
34 .983 .969 .943 .933 .919 .908
35 .984 .969 .944 .934 .920 .910
36 .984 .970 .945 .935 .922 912
37 .984 .970 .946 .936 .924 914
38 .984 .971 .947 .938 .925 916
39 .984 .971 .948 .939 .927 .917
40 .985 .972 .949 .940 .928 .919
41 .985 .972 .950 941 929 .9-
42 .985 .972 .951 .942 .930 .922
43 .985 .973 .951 .943 .932 .923
44 .985 .973 .952 .944 .933 .924
45 .985 .973 .953 .945 .934 .926
46 .985 .974 .953 .945 .935 .927
47 .985 .974 .954 .946 .936 .928
48 .985 .974 .954 .947 .937 .929
49 .985 .974 .955 .947 .937 .929

50 .985 .974 .955 .947 .918 .930
Source: Gill (1978). I

 

 

*Nonnormality is indicated when the W statistic is smaller than the
appropriate critical value.

128

TABLE 3.3 UPPER PERC’I‘JTACE POINTS OF Fﬁat DISTRIBUTION (l-CDF)

Upper Percentage Points of F at Distribution (l-CDF): Equal
Replication ‘

 

1 1:3 .1 3 h T 8' ‘1 111 I:
2‘ "('25—- "177.1 ' "2.:1'.'1—"3‘_'T.3' *1 0. 5 T.-:. 3 7 1.5 5'... 1. 91"“- ”"2“"
11) -12.3 119.1 ‘18.: 139 .02 190 311 207 312
.115 87.3 113 2112 .3111) 333 4113 473 350 7111
.111 1-38 '53:) 11130 13112 17115 31H13 2 132 31113 31015

.11
(J
m
9'.
o—
41

11.7 13.1 18.5 21.21 25.2 38.1 31.0 37.8
.111 10.3 21.11 311.9 37.7 11.4 511.31 57.1 (13..- 71).
x

1

.05 27.8 39.2 50.7 02.0 72.9 53.5 93.9 101 124
.01 55 120 151 184 210 249 281 310 301

4 .25 5.79 7 51 9.0 11 1 13.1 14.7 10.2 17 0 20.4
10 10.1 13 9 17.1 20 1 22.9 23.0 25.1 30.0 33.3

05 15.5 20 0 23.2 29 3 33.0 37.3 41.1 1 0 31.1

01 37 49 59 09 79 59 97 100 120

‘3 25 100 (3.115 7 33 S 11 9 ~10 111 1 11.1 12.3 13 ‘1
10 7 05 9.30 11 5 13 5 13 2 10 2 13 1 19.4 21 9

.05 10.5 13.7 10.3 18.7 20.3 22.9 21.7 20.3 ‘9.9
.01 22 28 33 33 42 40 50 54 00

0 .25 4.00 a 03 3 97 0 7. 7.33 3.33 8.57 0..x 10 0
.10 0.23 7.75 9 11 10.3 11.1 12.4 13 3 14.2 15 5
.05 8.35 10.4 12 1 13 7 15.0 10.3 17 5 18.0 20.7
.01 15.5 19.1 22 25 22 30 32 31 37

7 .25 3.57 4 11 5.14 5.77 0 35 0.88 7.37 7 82 8.00
.10 5.32 0.52 7.52 3.41 9.20 9.93 10.0 11.2 2 4
.05 0.34 5.44 9.70 10.8 11.5 12.7 13.5 14.3 .3
.01 12.1 14.5 10.3 15.4 20 22 23 24 27

8 .25 3.20 3.97 1.57 5.09 5.55 5.93 6.37 6.73 7.40
.10 3.71 S ()3 (1.43 7.18 7.811 3.31) 8.3.1 11.31) 111.3
.05 0.00 7.10 5.12 9.03 9.75 10.5 11.1 11.7 12.7
.01 9.9 11 7 13.2 14.5 15.8 10.9 17.9 13.9 21 £

9 .25 3.03 3.04 4.1 4.59 4.95 5.34 5.00 5.90 0.51
.10 4.20 5.07 3.74 0.32 0.52 7.25 7.70 5.09 5.75
.05 5.34 0.31 7.11 7.50 5.41 5.95 9.45 9.91 10.7

.01 3.5 0.9 11.1 12.1 13.1 13.9 11.7 13.3 10.1.

129

 

Table 3.3. Upper Percentage Points of F Distribution (l-CDF): Equal
Replication (cont.) max

; 1 t=3 1 5 0 S 9 10 12
10 .25 2.85 3.39 3 8 4.21 4 S 4.86 5.13 5.39 5 85
.10 3.93 4 6) 5.1 5.68 6 11 6.49 6 84 7.16 7 "4

.05 4.85 5 67 6.34 6. 2 7. 2 7.87 8 28 8 66 9.34
.01 7.4 8 6 9.6 10.4 11.1 11.8 12.4 12.9 13 9

12 .25 2.58 3.02 3.38 3.68 3.95 4.18 4.40 4.60 4.95
.10 3.45 4.00 4.44 4.81 5.13 5.42 5.08 5.92 6 35

‘ .05 4.16 4.79 5.30 5.72 6.09 0. 2 6.72 7.00 7 48
.01 6.1 6 9 7.6 8.2 8.7 9.1 9.5 9.9 10 6
15 .25 2.32 2.67 2.95 3.18 3.38 3.56 3.72 3.87 4.13
.10 5.00 3.41 3.74 4.02 4.25 4 46 4.05 4.82 5.13

.05 3.54 4.01 4.37 4.63 4.95 5.19 5.40 5.59 5.93

.01 4.9 5.5 6.0 6.4 6.7 7.1 7.3 7.5 8.0

20 .23 2.07 2.33 2.53 2.70 2.85 2.98 3.09 3.20 3 38
.10 2.57 2.87 3 10 3.29 3 46 3 60 3.73 3.85 4.06

.05 2.95 3.29 3.54 3.76 3 94 4.10 4.24 4.37 4.59

.01 3.8 4.5 4.0 4.9 S 5.3 5.5 5.6 5.9
30 .25 1.80 1.98 2.12 2.24 2.34 2.42 2.49 2.56 2.68
.10 2.14 2 54 2.50 2.62 2.73 2.8- 2.90 2.97 3.10

.05 2.40 2.61 2.7 2.91 3.02 3.12 3.2 3.29 5.39

.01 3.0 3.3 3.4 3.6 3.7 3.8 3.9 4.0 4.2
60 .2 1.51 1.62 1.70 1.76 1.81 1.86 1.90 1.93 2.00
.10 1.71 1.82 1.90 1.96 2.02 2.07 2.11 2 14 2.21
.05 1.35 1.96 2.04 2.11 2.1 2.22 2.20 2.30 2.56

.01 2.2 2.5 2.4 2.4 2 2.5 2.6 2.6 2.7
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1 00 1 00

 

Source: Gill (1978)

PLEASE NOTE:

Page 130 lacking in number on1y. No text missing.
Filmed as received.

UNIVERSITY MICROFILMS

131

 

 

 

Table Ba 4» Upper Percentage Points of Chi-Square Distribution
0 6: 0.3 0.2 0.1 0.05 0.025 0.01 0.005 0.001
1 1.076 1.662 2.706 3.861 5.026 _6.635 7.879 10.83
2 2.608 3.219 6.605 5.991 7.378 9.210 10.60 13.82
3 3.665 6.662 6.251 7.815 9.368 11.36 12.86 16.27
6 6.878 5.989 7.779 9.688 11.16 13.28 16.86 18.67
5 6.066 7.289 9.236 11.07 12.83 15.09 16-75 20.52
6 7.231 ' 8.558 10.66 12.59 16.65 16.81 18.55 22.66
7 8.383 9.803 12.02 16.07 16.01 18.68 20.28 26.32
8 9.526 11.03 13.36 15.51 17.53 20.09 21.96 26.12
9 10.66 12.26 16.68 16.92 19.02 21.67 23.59 27.88
10 11.78 13.66 15.99 18.31 20.68 23.21 25.19 29.59
11 12.90 16.63 17.28 19.68 21.92 26.72 26.76 31.26
12. 16.01 15.81 18.55 21.03 23.36 26.22 28.30 32.91
13 15.12 16.98 19.81 22.36 26.76 27.69 29.82 36.53
16 16.22 18.15 21.06 23.68 26.12 29.16 31.32 36.12
15 17.32 19.31 22.31 _ 25.00 27.69 30.58 32.80 37.70
16 18.62 20.67 23.56 26.30 28.85 32.00 36.27 39.25
17 - 19.51 21.61 26.77 27.59 30.19 33.61 35.72 60.79
18 20.60 22.76 25.99 28.87 31.53 36.81 37.16 62.31
19 21.69 23.90 27.20 30.16 32.85 36.19 38.58 63.82
20 . 22.77 25.06 28.61 - 31.61 36.17 37.57 60.00 65.31
21 23.86 26.17 29.62 32.67 35.68 38.93 61.60 66.80
22 26.96 27.30 30.81 33.92 36.78 60.29 62.80 68.27
23 26.02 28.63 32.01 35.17 38.08 61.66 66.18 69.73
26 27.10 29.55 , 33.20 36.62 39.36 62.98 65.56 51.18
25 28.17 30.68 36.38 37.65 60.65 66.31 66.93 52.62
26 29.25 31.79 35.56 38.89 61.92 65.66 68.29 56.05
27 30.32 32.91 36.76 60.11 63.19 66.96 69.66 55.68
28 31.39 36.03 37.92 61.36 66.66 68.28 50.99 56.89
29 32.66 35.16 39.09 62.56 65.72 69.59 52.36 58.30
30 33.53 36.25 60.26 63.77 66.98 50.89 53.67 59.70
H)1
. oq
O
01 )(g

PEQ > x2] . a. For two-tailed procedures, table should be en-

tered at percentage corresponding to o/Z.

132

 

Table B. 4. Upper Percentage Points of Chi-Square Distribution (cont.)

v a: 0.3 0.2 0.1 0.05 0.025 0.01 0.005 0.001
31 36.60 37.36 61.62 66.99 68.23 2.19 55.00 61.10

2 35.66 38.67 62.58 66.19 69.68 53.69 56.33 62.69
33 36.73 39.57 63.75 67.60 50.73 56.78 57.65 63.87
36 37.80 60.68 66.90 68.60 51.97 56.06 58.96 65.25
35 38.86 61.78 66.06 69.80 53.20 57.36 60.27 66.62
36 39.92 62.88 67.21 51.00 56.66 58.62 61.58 67.99

7 60.98 63.98 68.36 ’2.19 55.67 59.89 62.88 69.35
38 62.05 65.08 69.51 53.38 56.90 61.16 66.18 70.70
39 63.11 66.17 50.66 56.57 58.12 62.63 65.68 72.05
60 66.16 67.27 51.81 55.76 59.36 63.69 66.77 73.60
61 65.22 68.36 52.95 56.96 60.56 66.95 68.05 76.76
62 66.28 69.66 56.09 58.12 61.78 66.21 69.36 76.08
63 67.36 50.55 55.23 59.30 62.99 67.66 70.62 77.62
66 68.60 51.66 56.37 60.68 66.20 68.71 71.89 78.75
65 69.65 52.73 57.51 61.66 65.61 69.96 73.17 80.08
66 50.51 53.82 58.66 62.83 66.62 71.20 76.66 81.60
67 51.56 56.91 59.77 66.00 67.82 72.66 75.70 82.72
68 52.62 55.99 60.91 65.17 69.02 73.68 76.97 86.06
69 53.67 57.08 62.06 66.36 70.22 76.92 78.23 85.35
50 56.72 58.16 63.17 67.50 71.62 76.15 79.69 86.66
51 55.78 59.25 66.30 68.67 72.62 77 39 80.75 87.97
52 56.83 60.33 65.62 69.83 73.81 78.62 82.00 89.27
53 57.88 61.61 66.55 70.99 75.00 79.86 83.25 90.57
56 58.93 62.50 67.67 72.15 76.19 81.07 86.50 91.87
55 59.98 63.58 68.80 73.31 77.38 82 29 85.75 93.17
56 61.03 66.66 69.92 76.67 78.57 83.51 86.99 96.66
57 62.08 65.76 71.06 75.62 79.75 86.73 88.26 95.75
58 63.13 66.82 72.16 76.78 80.96 85.95 89.68 97.06
59 66.18 67.89 73.28 77.93 82.12 87.17 90.72 98.32
60 65.23 68.97 76.60 79.08 83.30 88 38 91.95 99.61
61 66.27 70.05 75.51 80.23 86.68 89.59 93.19 100.9
62 67.32 71.13 76.63 81.38 85.65 90.80 96.62 102.2
63 68.37 72.20 77.75 82.53 86.83 2 01 95.65 103.6
66 69.62 73.28 78.86 83.68 88.00 93 22 96.88 106.7
65 70.66 76.35 79.97 86.82 89.18 96.62 98.11 106.0
66 71.51 75.62 81.09 85.96 90.35 93 63 99.33 107.3
67 72.55 76.50 82.20 87.11 91.52 96.83 100.6 108.5
68 73.60 77.57 83.31 88.25 92.69 98.03 101.8 109.8
69 76.66 78.66 86.62 89.39 93.86 99.23 103.0 111.1
70 75.69 79.71 85.53 90.53 95.02 100.6 106.2 112.3

133

 

 

Table ZB.“- Upper Percentage Points of Chi-Square Distribution (cont.)

v n: 0.3 0.2 0.1 0.05 0.025 0.01 0.005 0.001
71 76.73 80.79 86.66 91.67 96.19 101.6 105.6 113.6
72 77.78 81.86 87.74 92.81 97.35 102.8 106.6 116.8
73 78.82 82.93 88.85 93.95 98.52 106.0 107.9 116.1
76 79.86 83.00 89.96 95.08 99.68 105.2 109.1 117.3
75 80.91 85.07 91.06 96.22 100.8 106.6 110.3 118.6
76 81.95 86.13 92.17 97.35 102.0 107.6 111.5 119.8
77 82.99 87.20 93.27 98.68 103.2 108.8 112.7 121.1
78 86.06 88.27 94.37 99.62 104.3 100.0 113.9 122.3
79 85.08 89.36 95.68' 100.7 105.5 111.1 115.1 123.6
80 86.12 90.61 96.58 101.9 106.6 112.3 116.3 126.8
81 . 87.16 91.47 97.68 103.0 107.8 113.5 117.5 126.1
82 88.20 92.56 98.78 106.1 108.9 116.7 118.7 127.3
83 89.26 93.60 99.88 105.3 110.1 115.9 119.9 128.6
86 90.28 96.67 101.0 106.6 111.2 117.1 121.1 129.8
85 91.32 95.73 102.1 107.5 112.6 118.2 122.3 131.0
86 92.36 96.80 103.2 108.6 113.5 119.6 123.5 132.3
87 93.60 97.86 106.3 109.8 116.7 120.6 126.7 133.5
88 96.66 98.93 105.6 110.9 115.8 121.8 125.9 136.7
89 95.48 99.99 106.5 112.0 117.0 122.9 127.1 136.0
90 96.52 101.1 107.6 113.1 118.1 126.1 128.3 137.2
91 97.56 102.1 108.7 116.3 119.3 125.3 129.5 138.6
92 98.60 103.2 109.8 115.6 120.6 126.5 130.7 139.7
93 99.66 106.2 110.8 116.5 121.6 127.6 131.9 160.9
96 100.7 105.3 111.9 117.6 122.7 128.8 133.1 162.1
95 101.7 106.6 113.0 118.8 123.9 130.0 136.2 163.3
96 102.8 107.4 116.1 119.9 125.0 131.1 135.6 166.6
97 103.8 108.5 115.2 121.0 126.1 132.3 136.6 145.8
98 106.8 109.5 116.3 122.1 127.3 133.5 137.8 167.0
99 105.9 110.6 117.6 123.2 128.6 136.6 139.0 168.2

100* 106.9 111.7 118.5 126.3 129.6 135.8 160.2 169.6

Source: Gill (1978)

’3
*For v > IOU, one may use the approximation x: v = (zl_a+/2v-1)‘/2, where

21 is an upper percentage point from the standard normal distribution (Table
-a

A.2).

134

 

Table B. 5- Lower Percentage Points of Chi-Square Distribution
b 1-«: 0.999 0.995 0.99 0.975 0.95 0.9 0.8 0.7
1 0.002* 0.039* 0.157* 0.982* 0.006 0.016 0.064 0.168
2 0.002 0.010 0.020 0.051 0.103 0.211 0.666 0.713
3 0.026 0.072 0.115 0.216 0.352 0.586 1.005 1.626
6 0.091 0.207 0.297 0.686 0.710 1.066 1.669 2.195
5 0.210 0.612 0.554 0.831 1.165 1.610 2.363 3.000
6 0.381 0.676 0.872 1.237 1.635 2.206 3.070 3.828
7 0.598 0.989 1.239 1.690 2.167 2.833 3.822 6.671
8 0.857 1.366 1.646 . 2.180 2.733 3.600 6.596 5.527
9 1.152 1.735 2.088 2.700 3.325 6.168 5.380 6.393
10 1.679 2.156 2.558 3.267 3.940 4.865 6.179 7.267
11 1.834 2.603 3.053 3.816 6.575 5.578 6.989 8.148
12 2.216 3.076 3.571 6.604x 5.226 6.306 7 807 9.036
13 2.617 3.565 6.107 5.009‘ 5.892 7.062 8.636 9.926
14 3.061 6.075 6.660 5.629 6.571 7.790 9.667 10.82
15 3.683 6.601 5.229 6.262 7.261 8.567 10.31 11.72
16 3.962 5.162 5.812 6.908 7.962 9.312 11.15 12.62
17 4.616 5.697 6.608 7.566 8.672 10.09 12.00 13.53
18 6.905 6.265 7.015 8.231 9.390 10.86 12.86 16.64
19 5.607 6.864 7.633 8.907 10.12 11.65 13.72 15.35
2 5.921 7.636 8.260 9.591 10.85 12. 6 16.58 16.27
21 6.447 8.034 8.897 10.28 11.59 13.26 15.66 17 18
22 6.983 8.663 9.562 10.98 12.36 14.06 16.3 18.10
23 7.529 9.260 10.20 11.69 13.09 16.85 17.19 19. 2
26 8.085 9.886 10.86 12.40 13.85 15.66 18.06 19.96
25 8.669 10.52 11.52 13.12 16 61 16.67 18.96 20.87
26 9.222 11.16 12.20 13.86 15.38 7.29 19.82 21.79
27 9.803 11.81 12.88 16.57 16.15 18.11 20.70 22.72
28 10.39 12.66 13.56 15.31 16.93 18.96 21.59 23.65
29 10.99 13.12 14.26 16.05 17.71 19.77 22.68 6.58
30 11.59 13.79 16.95 16.79 18.69 20.60 23.36 25.51

 

*Divide these entries by 1000.

tom) ‘1

 

I-a

 

P[Q > xi_a] 8 l-a. For two-tailed procedures, table should be
entered at percentage corresponding to l-a/Z.

135

 

Table B. 5. Lower Percentage Points of Chi-Square Distribution (con .)

‘0 l-u: 0.999 0.995 0.99 0.975 0.95 0 9 0.8 0.7

31 12.20 16.46 15.66 17.56 19.28 21.43 24.26 26.66
32 12.81 15.13 16.56 18.29 20.07 22.27 25.15 27.37
33 13.43 15.82 17.07 19.05 20.87 23.11 26.06 28.31
36 16.06 16.50 17.79 19.81 21.66 23.95 26.96 29.26
35 16.69 17.19 18.51 20.57 22.66 26.80 27.86 30.18
36 15.32 17.89 19.23 21.36 23.27 25.64 28.76 31.12
37 15.97 18.59 19.96 22.11 26.07 26.69 2 .66 32.05
38 16.61 19.29 20.69 22.88 24.88 27. 6 30.56 32.99
39 17.26 20.00 21.63 23.65 25.70 28.20 31.66 33.93
60 17I92 20.71 22.16 24.63 26.51 29 05 2.36 36.87
61 18.58 21.62 22.91 25.21 27.33 29.91 33.25 35.81
62 19.26 22.16 23.65 26.00 28.16 30.77 36.16 36.76
63 19.91 22.86 24.60 26.79 28.96 31.63 35.07 37.70
66 20.58 23.58 25.15 27.57 29.79 2 69 35.97 38.64
65 21.25 26.31 25.90 28.37 30.61 33.35 36.88 39.58
66 21.93 25.06 26.66 29.16 31.66 36.22 37.80 40.53
67 22.61 25.77 27.62 29.96 32.27 35.08 38.71 61.67
68 23.29 26.51 28.18 30.75 33.10 35.95 39.62 62-62
69 23.98 27.25 28.96 31.55 33.93 36.82 40.53 63.37
50 26.67 27.99 29.71 32.36 34.76 37.69 61.65 44.31
51 25.37 28.73 30.68 33.16 35.60 38.56 62.36 65.26
52 26.07 29.68 31.25 33.97 36.44 39.43 63.28 46.21
53 26.76 30.23 32.02 36.78 37.28 60.31 64.20 47.16
54 27.67 30.98 32.79 35.59 38.12 61.18 45.12 68.11
55 28.17 31.73 33.57 36.40 38.96 62.06 66.04 49.06
56 28.88 32.69 36.35 37.21 39.80 42.96 66.96 50.01
57 29.59 33.25 _ 35.13 38.03 60.65 63.82 67.88 50.96
58 30.30 36.01 35.91 38.84 41.69 66.70 68.80 51.91
59 31.02 36.77 36.70 39.66 62.36 65.58 69.72 52.86
60 31.74 35.53 37.68 40.68 63.19 66.46 50.64 53.81
61 32.66 36.30 38.27 61.30 66.06 47.36 51.56 56.76
62 33.18 37.07 39.06 62.13 66.89 68.23 52.69 55.71
63 33.91 37.86 39.86 62.95 65.76 69.11 53.41 56.67
66 36.63 38.61 60.65 63.78 46.59 50.00 56.36 57.62
65 35.36 39.38 61.66 66.60 67.45 50.88 55.26 58.57
66 36.09 60.16 62.26 45.63 48.31 51.77 56.19 59.53
67 36.83 60.96 63.06 66.26 69.16 52.66 57.11 60.68
68 37.56 61.71 63.86 67.09 50.02 53.55 58.04 61.66
69 38.30 42.69 66.64 67.92 50.88 56.46 58.97 62.39
70 39.06 63.28 65.64 48.76 51.76 55.33 59.90 63.35

136

 

 

Table. B. 5. Lower Percentage Points of Chi-Square Distribution (cont.)

0 1-0: 0.999 0.995 0.99 0.975 0.95 0.9 0.8 0.7
71 39.78 46.06 46.25 69.59 52.60 56.22 60.83 64.30
72 40.52 44.84 47.05 50.43 53.46 57.11 61.76 65.26
73 41.26 65.63 67.86 51.26 54.33 58.01 62.69 66.21
76 62.01 66.42 48.67 52.10 55.19 58.90 63.62 67.17
75 62.76 67.21 69.68 52.96 56.05 59.79 64.55 68.13
76 43.51 68.00 50.29 53.78 56.92 60.69 65.48 69.08
77 44.26 48.79 51.10 , 56.62 57.79 61.59 66.6 70.04
78 45.01 69.58 51.91 55.47 58.65 62.48 67.34 71.00
79 65.76 50.38 52.72 56.31 59.52 63.38 68.27 71.96
80 66.52 51.17 53.54 57.15 60.39 66.2 69.21 72.92
81 47.28 51.97 56.36 58.00 61.26 65.18 70.16 73.87
82 68.04 52.77 55.17 58.84 62.13 66.08 71.07 76.83
83 48.80 53.57 55.99 59.69 63.00 66.98 72.01 75.79
84 49.56 56.37 56.81 60.56 63.88 67.88 72.96 76.75
85 50.32 55.17 57.63 61.39 66.75 68.78 73.88 77.71
86 51.08 55.97 58.46 62.26 65.62 69.68 74.81 78.67
87 51.85 56.78 59.28 63.09 66.50 70.58 75.75 79.63
88 52.62 57.58 60.10 63.96 67.37 71.48 76.69 80.59
89 53.39 58.39 60.93 66.79 68.25 72.39 77.62 81 55
90 54.16 59.20 61.75 65.65 69.13 73.29 78.56 82.51
91 56.93 60.00 62.58 66.50 70.00 76.20 79.50 83 67
92 55.70 60.81 63.41 67.36 70.88 75.10 80.63 84.63
93 56.67 61.63 66.26 68.21 71.76 76.01 81 37 85 39
96 57.25 62.66 65.07 69.07 72.64 76.91 82.31 86 36
95 58.02 63.25 65.90 69.92 73.52 77.82 83 25 87.32
96 58.80 64.06 66.73 70.78 76.60 78.73 86 19 88 28
97 59.58 66.88 67.56 71.64 75.28 79.63 85.13 89.26
98 60.36 65.69 68.60 72.50 76.16 80.54 86.07 90.2
99 61.16 66.51 69.23 73.36 77.05 81.65 87.01 91.17

100* 61.92 67.33 70.06 76.22 77.93 82.36 87.95 92 13

Source: Gill (1978)

*Por v > 100, one may use the approximation, xi vI-(za+7Zv-1)2/2, where

2a is a lower percentage point from the standard normal distribution (Table

137

TABLE B. 6. CRITICAL VALUES FOR TESTING ONE OUTLYIXG OBSERVATION

 

 

n 330.01 0.05 0.10 n 1=0.01 0.05 0.10
. 32 3.135 2.773 2.531
.. . 36 3.166 2.799 2.616
3 1.155 1.153 1 48 36 3.191 2.823 2.639
6 1.692 1.663 1 625 38 3.216 2.566 2.661
5 1.76 1.672 1 02 60 3.260 2.866 2.682
6 1.966 1.822 1.729 62 3.261 2.887 2.700
7 2.097 1.938 1.828 66 3.282 2 905 2.719
8 2.221 2.032 1.909 66 3.302 2 923 2.736
9 2.323 2.110 1.97 68 3.319 2.96 2.753
10 2.610 2.176 2.036 50 3.336 2.956 2.765
11 2 685 2.236 2.088 55 3.376 2.992 2.806
12 2 550 2.285 2.136 60 3.611 3.025 2.837
1 2.607 2.331 2.175 65 3.662 3.055 2.866
16 2.659 2.371 2.213 70 3.671 3.082 2.893
15 2.705 2.609 2.267 75 3.696 3.107 2.917
16 2.767 2.663 2 279 80 3.521 3.130 2.960
17 2.785 2.675 2 309 85 3.563 3.151 2.961
18 2.821 2.506 2 335 90 3.563 3.171 2.981
19 2.856 2.532 2.361 95 3.582 3.189 3.000
20 2.886 2.557 2.385 100 3.600 3.207 3.017
21 2.912 2.580 2.608 105 3.617 3.226 3.033
22 2.939 2.603 2.629 110 3.632 3.239 3.069
23 2.963 2.626 2.668 115 3.667 3.256 3.066
26 2.987 2.666 2.667 120 3.662 3.267 3.078
25 3.009 2.663 2.686 125 3.675 3.281 3.092
26 3.029 2.681 2.502 130 3.688 3.296 3.106
27 3.069 2.698 2.519 135 3.700 3.306 3.116
28 3.068 2.716 2.536 160 3.712 3.318 3.129
2, 3.085 2.730 2.569 165 3.723 3.328 3.160
30 3.103 2.765 2.563
Source: Gill (1978)

138

Signifance points of the Durbin-watson statistics
dL and dU: a = .01.

TANEZB.7.

 

U85

M

Mwuwnunuuuuuunuuwwnwwmmwunumw “Manama“

 

h

23M101531‘09I:
00 [1222333344

 

3:4

%

063W“?543322I||||||l||l222M“M07OOWWIZS

5'65555555555556555556

5555 66.

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

 

h

2468 I3 7 06 58l‘79l356
9999W00NN0WII 223333‘44‘
0 0

oooooooooooooooooooooooo

 

F83

h

 

h

oooooooooooooooooooooooo

 

P82

W

ooooooooooooooo

 

Q

mmmwnmuwmmmmunuuuwnumnw

 

Val

7 02356790‘23‘567890l223“803 79w0-3 “an”

o III-Illnlll.1222222222233333333“ “

55

.......................................

 

 

h

251
”MO 00 m

....... I

2356789 45 2 357. I!
lllllllﬂﬂﬂ22 3uuﬂ6444m55

 

 

 

 

 

I39

Significance points of the Durbin-watson statistics

dL and dU:

TABLE 3.7.

.05.

a:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I50NN9 42W986M4332I|0009998777 7777888
do ddl 19 99 Run 888388888777474774 773177
2*.‘.20. llll l'III llllllllllllllll l lllll Cal
5
= II I I!
k 627I5936I358I357 I15689I23948l469l2467
L. .3...O6.w7...ﬂ..nm.9~.&.uuoun llllll 3.122334444654355
o OUDHDOHOOGOU llllllllllllllllllllllllll
I I- IIIIl I'll. III’IIII'II ‘9'. l1
73075....I098766544433333 0.22222 34445 56
U gogasaR-ﬂnlcw7qlolail-qucolnl-t0070190777.lnl-lnlﬂlol-I-I-lol-l
4
= I
k 9 3260369I 68024689I I24567948I479I15789
In. ﬁ......e.8!99u.0 O.O|..I..I..IIIu..»......o..n.o.o..3..m44 3435565...
.4 ononoonno llllllllllllllll II lllllllllll
II IIIL IIIIIIII IIIII IIIIIIIIII. II
5 I988766h055355555M55666677H90 I223
Hm. 7.47b66 nﬁhnhﬁhﬁﬁhﬂﬁ 3366666....v64747777...
3 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
= III I IIIIIIII.
k ....6 .....- 3580246n0l I346 8NI2348258W24679MI
4.. .60... .u.....w. I003|III21200~32 3&440444 55535.6
00000 lllllllllllllllllllll III-I'll IIIIIII
u M4“334444555666777883999WI0.34567R900I2
l ..5.5556rv355555565 55555,) Nﬂﬂﬁﬂﬂ 667777
2
=
k 5825803579I2467 0|23456789369I4579 I23
4% 09M00II4.IIQQQ24.2333333333344455555 666
WUIII' ||||| ll'rllr'hlehb'll'bl'l'rbll
: 67890I23455618890III2234 479 5M7399
In 33334444444444445ﬂ553563 1365 66 6 Cuﬁﬁﬁ
I |||||||||| III-IIIIIIIIIIIIII."uullrlrllrl".l.ullln|ll
I
= II -
k 8030802467 02...4567800I23348I3578II23M5
L OIIII2...).0.».I.3I333I 314444444M5555U666 6
IIIIIIIIIAIIIIIIIII'IIIIII
n 56780.0I234581390I234507890505050505W5m
IIIII».».22222222....33333333344566h7738 9
'

 

 

Kendal (1973)

Source

140

TABLE 13.8. UPPER PERCENTAGE POINTS OF STUDENT'S t DISTRIBUTION (l-CDP)

 

 

 

 

0.

v 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.0005

1 1.000 1.376 1.963 3.078 6.314 2.706 31.82 63.66 636.6

2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.500 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306. 2.896 3.355 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587
11 0.698 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437'
12 0.696 0.873 1.083 1.356 1.782 2.179 2.681 3.054 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.691 0.866 1.074 1.341 1.753 2.132 2.602 2.947 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.767
24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646

f (I)1
T
(1
I
" <’?=:
o In I

P [T > ta] = o. For two-tailed procedures, table should be entered

at column headed by desired value of 3/2.
grees of freedom.

In all cases, v = de-

141

 

TABLE 3.8. UPPER PERCENTAGE POINTS OF STUDENT'S t DISTRIBUTION (l-CDF) (cont.)
3

v 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.0005
'31 0.682 0.854 1.054 1.310 1.696 2.040 2.453 2.744 3.634
32 0.682 0.853 1.054 1.309 1.694 2.037 2.449 2.738 3.622
33 0.682 0.853 1.053 1.308 1.692 2.034 2.445 23733 3.611
34 0.682 0.852 1.053 1.307 1.691 2.032 2.441 2.728 3.601
35 0.682 0.852 1.052 1.306 1.690 2.030 2.438 2.724 3.592
36 0.681 0.852 1.052 1.306 1.688 2.028 2.434 2.720 3.582
37 0.681 0.852 1.051 1.305~ 1.687 2.026 2.431 2.716 3.574
38 0.681 0.851 1.051 1.304 1.686 2.024 2.428 2.712 3.566
39 0.681 0.851 1.050 1.304 1.685 2.023 2.426 2.708 3.559
40 0.681 0.851 1.050 1.303 1.684 2.02 2.423 2.704 3.551
42 0.680 0.850 1.049 1.302 1.682 2.018 2.418 2.698 3.538
44 0.680 0.850 . 1.049 1.301 1.680 2.015 2.414 2.692 3.526
46 0.680 0.850 1.048 1.300 1.679 2.013 2.410 2.687 3.515
48 0.680 0.849 1.048 1.299 1.677 2.011 2.406 2.682 3.305
50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 3.490
60 0.679 0.848 1.046 1.296 1.671 2.000 2.390 2.660 3.461
70 0.678 0.847 1.044 1.294 1.667 1.994 2.381 2.648 3.436
80 0.678 0.846 1.043, 1.292 1.664 1.990 2.374 2.639 3.417
90 0.677 0.846 1.042 1.291 1.662 1.987 2.368 2.632 3.402
100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 3.391
120 0.676 0.845 1.041 1.289 1.658 1.980 2.358 2.618 3.374
140 0.676 0.844 1.040 1.288 1.656 1.977 2.353 2.611 3.362
160 0.676 0.844 1.040 1.287 1.654 1.975 2.350 2.607 3.353
180 0.676 0.844 1.039 1.286 1.653 1.973 2.347 2.604 3.346
200 0.676 0.843 1.039 1.286 1.652 1.972 2.345 2.601 3.340
300 0.676 0.843 1-038 1.285 1.650 1.968 2 338* 2.592 3.323
400 0.676 0.843 1.038 1.284 1.649 1.966 2 335* 2.588 3.315
500 0.676 0.843 1.037 1.284 1.648 1.965 2 334* 2.586 3.310
1000 0.675 0.842 1.037 1.283 1.647 1.962 2 330* 2.581 3.301

a 0.6745 0.8416 1.0364 1.2816 1.6448 1.9600 2 3263 2.5758 3.2905

 

Source: Gill (1978)

I42

...—:52... 355.53.. 2: nus—5.3: .c: :5 2:2: 333m .3 5:: m3. 333., 6. 35:72.56 32m... .1230

...: I :5 u. 7....

.5253 2: Es... 356.. on >.nE 82.557 ......ED .& W 1.: V .3m :2: 262 .K
..o 8:...» 13.32% .3 ..m.n.3 cozeiﬁ .3 Sim .L. 23:2... .8. 35331.52). 6.: ..o ...: 33.827 2: Se .33.: ”E. E 3.25 2:. a
:33 996.30 8.59.

 

 

Mm mm a a. w. 2 2 z 2 .... : e. w h o n v N _ 2.
2 .2 t 3 2 2 m. : 2 a w .. ... n v n ... _ c 8.
2 ... 2 m. m. : 2 a m h e m n v m ... _ o c 8.
: E c. a w a h c c n .. v n m m _ c c o 5. v
a a e h c c m w w ... m ... n _ _ c o c e m8.
.. .. 2. m n m m m _ _ _ e c o a a o c c .8.
i 2 I 2 m. : : c. a w e o c n v n ... m _ 2.
2 : o. o. a a w e e c n n w . ... N _ _ a 8.
a a w h h c c m n v v n . ... m _ o o 9 as.
e m m w v 4 m n n m m m _ _ a c c o c 5. ...
v .. m m . n N m ... _ _ _ o e c o a e c “8.
_ _ . _ c c a e o a c c c c o o e o c .8.
w x . e e o n m m w v m m m m N _ _ o 2.
m m .. .. v .. v ... m m N m m _ _ _ c e c 8.
m m m . m ... m m m m _ _ _ _ c c c e o c we.
N m _ _ _ _ _ _ a o o a e o o c e o a S. m
_ _ c o o o c e c c c a o c o o o c e 2e.
o c e c o o c e o c c c c o o c c a o .8.
S. 2 2 2 2 2 2 2 2 2 E a a N e n 4 m N"... a. ..

 

eoimﬁﬁm 5:. .Szt:>»-zz<$_ E: ...0 83.42.30 .m .m 01:3.

143

Table 3.9. (CONTINUED)

 

m -2

w
c.
If“
ca

789101112

 

N~oooo Nu—oooo
UN—‘oco
mun—co

«assume ouuN—o

(Donahue coaxhuuuo

ho oqaxhwc

EZoqmu :o§uu—
Gamaab
2:30.“...

5
{3

_
O—l

0‘30ch
#NWNIUUN NOQM

aaw~oo MWN;OO Auwooo
“ﬂ

OOQUOWNO ﬂMbN—‘O O‘&UI~J—'O

UNHOOO N—‘OOOO

O—H ‘ OH
&—-\D~IMN Noam;—

~134va—

~

 

144

 

 

 

NO NO ON. Mb we v0 aM vM OM Mv Ov OM _M N NN w- M. a M O_.
3. MN. ac Mo .0 OM NM wv Mv aM MM _M 5N NN w. v_ O. c M MO.
Oh cc NO wM vM OM av Nv wM vM OM hN MN a _ M _ N _ w M N MNO.
_0 5M vM OM 5v Mv aM OM NM aN MN NN w- M. N a O M O _O. N—
MM NM wv Mv Nv MM MM NM ON MN NN a. o- M- O- h v N O MOO.
Mv _v wM MM NM aN ON vN _N w. M_ M. O_ a M M _ O O _OO.
ah I. On cc Nc MM MM av Mv .v NM NM wN vN ON 0. N— w v 2.
Oh cc No wM MM . M hv Mv aM MM NM wN vN ON C M _ a o N MO.
MO aM OM NM wv Mv 3. MM «M _M N vN ON 2 I O. h v _ MNO.
vM _M xv Mv Nv MM MM NM aN ON MN a. O. M- O. m M N O _O. _—
av cv Mv Ov 5M vM _M wN MN NN a_ h. v. 2 w c M _ O MOO.
wM MM MM OM N MN MN _N w_ 2 M. __ a h M M _ O O _OO.
2. 50 Mo aM MM NM wv vv Ov 5M MM aN MN NN w. v— __ h v O. .
Mo aM OM NM av Mv Nv MM MM NM MN MN _N w. M- N. w M N MO.
OM MM av ov Mv Ov 5M vM OM bN vN _N w— M- N— a e v — MNO.
xv Mv Nv aM hM vM _M wN MN MN ON 2 v. N_ a b v N O _O. O—
Mv Ov MM MM NM OM hN MN NN a. h. v. N. O— h M M _ O MOO.
MM OM wN ON 3.. NN ON 3 M- M. : a b O v N _ O O .8.
MO aM OM MM av cv Nv aM OM NM aN ON MN a- o— M _ O. O M O- .
MM NM av cv Mv Ov FM vM _M wN MN NN a_ O. M— O— h M N MO.
av cv Mv Ov NM MM NM aN hN vN _N w- c- M_ : w M M _ MNO.
.v aM 5M vM NM aN 5N vN NN a- t M. N- O. m o v N O _O. a
5M vM NM OM wN MN MN _N a. E v- N- O_ N O v N — O MOO.
hN ON vN NN ON w. e. M. M- Z a O O v M N O O O _OO.
ON ax .2 RN ON 2 v~ .2 NM : ON a a m b M. v M. N H E k z
Acmazﬁzcuv .a .m 0.773.

145

 

 

 

..N. N. . N... 8. .5 NN .N NN NN NN NN 3 Ne NN ..N «N N. N. N ....
N... N... NN 8 NN NN NN NN .N NN Ne NN. NN .N NN ..N N. N N N...
8 NN NN NN NN .N NN ..N .N N.. N.. NN NN NN NN N. N. N N NNN.
NN NN NN NN NN NN NN NN NN N., NN NN NN NN N. N. N N . .... N.
..N NN _N 8 .N NN .N NN NV NN NN NN NN N. ... N. N N 8 N8.
NN .N NN NN 8 3 .... NN NN NN ..N ..N N. N. N N N N N .8.
. .. N... 8 NN NN .N NN NN «N NN NN Ne .... ..N NN NN N. .. N N...
.... No N N NN NN NN NN NN _N 2. 8 ..N NN NN N. N. N e No.
a NN .N NN .N NN 8N NN ..N 2. 2. NN ..N NN ..N 2 .. N N NNN.
.N NN .N NN NN NN NN NV NN NN ..N .NN NN ..N N. N. N N . .... N.
..N ..N NN .N NN NN NV 2. NN N ..N NN .N N. N. N N N 8 N8.
8 NN NN NV 3. .N NN NN NN NN NN N. 2 .. N N N N o .8.
No. NN NN NN .N NN NN ..N N NN NV N.. NN NN NN .N N. . . N ....
NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN N. N. N N N...
«N NN NN ..N NN ..N NN .N NN ... NN NN NN NN N. z ... N N NNN.
..N ..N NN .N NN NN NV 3. NN NN .N NN NN N. ... .. N N . .... N.
NN NN NN NN .N N.. Nv NN NN .N NN NN N. N. N. N N N 8 N8.
NN .N NV 3 .... NN NN NN NN NN ..N N. N. N. N e N o o .8.
NN. 8 NN NN NN NN «N N N 3 3. N «N N ...N N. ... N. N ....
NN .N NN .N NN NN NN NN NN N., NN ..N N NN ..N N. . . N N N...
NN NN NN NN 8 NN .N NV N? NN NN NN NN N N. N. N N N NN...
NN ..N 8 NN NN Ne .... .... NN NN NN ..N .N N. N. ... N N . .... N.
.N NN N .N NV 2. ..N NN NN NN NN .N N. 2 .. N v N a N8.
3 N., N.. N NN NN ..N NN ..N .N N. N. N. N N v N o o .8.
NN 2 N. E N. N. I N. N. : N. o N N N N N N Nu:‘ m z
AN..N:z...ZcUO

.mum oNNmN

146

Table B.9. (CONUNUED)

 

 

n p m a 2 3 4 5 6 7 8 9 10 ll [2 [3 I4 [5 l6 I7 18 19 20
.001 3 6 10 14 18 22 26 30 35 39 44 48 53 58 62 67 71
.005 7 1 l 16 20 25 30 35 40 45 50 55 61 66 71 76 82 87
17 .01 9 14 19 24 29 34 39 45 50 56 61 67 72 78 83 89 94
.025 12 18 23 29 35 40 46 52 58 64 70 76 82 88 94 100 106
.05 1 16 21 27 34 40 46 52 58 65 71 78. 84 90 97 103 110 116
.10 1 19 26 32 39 46 53 59 66 73 80 86 93 100 . 107 114 121 128
.001 4 7 1 1 15 19 24 28 33 38 43 47 52 57 62 67 72 77
005 7 12 17 22 27 32 38 43 48 54 59 65 71 76 82 88 93

10 15 20 25 31 37 42 48 54 60 66 71 77 83 89 95 101
.025 13 19 25 31 37 43 49 56 62 68 75 81 87 94 100 107 113
.05 17 23 29 36 42 49 56 62 69 76 83 89 96 103 110 117 124

tdoab— UnwoowA— aommw— accumu—
to

28 35 42 49 56 63 70 78 85 92 99 107 114 121 129 136

8 12 16 21 26 30 35 41 46 51 56 61 67 72 78 83
13 18 23 29 34 40 46 52 58 64 7 75 82 88 94 100
16 21 27 33 39 45 51 57 64 70 76 83 S9 95 102 108

Soon

.025 14 20 26 33 39 46 53 59 66 73 79 86 93 100 107 114 120
.05 1 18 24 31 38 45 52 59 66 73 81 88 95 102 110 117 124 I31
.10 1 22 29 37 44 52 59 67 74 82 90 98 105 113 121 129 136 144
.001 4 8 13 17 22 27 33 38 43 49 55 60 66 71 77 83 89
.005 9 14 19 25 31 37 43 49 55 61 68 74 80 87 93 100 106
20 .01 11 17 23 29 35 41 48 S4 61 68 74 81 88 94 101 108 115
.025 15 21 28 35 42 49 56 63 70 77 84 91 99 106 1 13 120 12
.05 1 19 26 33 40 48 55 63 70 78 85 93 101 108 1 16 124 131 139

O
oomuu—O OOUnutJ—O “mm—CO qhw—oc

_-
C‘
I.)
u

31 39 47 55 63 71 79 87 95 103 111 120 128 136 144 152

 

Fur n or m greater than 20. the pth quantile n", 01' the Mann-Whitney test statistic may be approximated by

 

nm \'mn(n + m + I)
H p = 1 ID l1

 

where .r" is the [111"! quantile 01' a standard normal random variable,

147

Table 3.10, QUAN'I'ILLS or 1111:" 1'10l‘tl.LlNU-1’Alt.\"l' Tesr SI'AI‘ISI'IC“

 

n p = .00! .005 .0/0 .025 .050 .100 f,n(n“— 1)

 

4 2 2 20

5 2 2 4 6 40

6 2 4 6 8 14 70

7 2 6 8 14 18 26 1 12

8 6 12 16 24 32 42 168

9 12 22 28 38 50 64 240
10 22 36 44 60 74 92 ' 330
11 36 56 66 86 104 128 440
12 52 78 94 120 144 172 572
13 76 110 130 162 190 226 728
14 106 148 172 212 246 290 910
15 142 194 224 270 312 364 1120
16 186 250 284 340 390 450 1360
17 238 314 356 420 480 550 1632
18 300 390 438 512 582 664 1938
19 372 476 532 618 696 790 2280
20 454 574 638 738 826 934 2660
21 546 686 758 870 972 1092 3080
22 652 810 892 1020 1134 1270 3542
23 772 950 1042 1184 1312 1464 4048
24 904 1 104 1208 1366 1510 1678 4600
25 1050 1274 1390 1566 1726 1912 5200
26 1212 1462 1590 1786 1960 2168 5850
27 1390 1666 1808 2024 2216 2444 6552
28 1586 1890 2046 2284 2494 2744 7308
29 1800 2134 2306 2564 2796 3068 8120
30 2032 2398 2584 2868 3120 3416 8990

For n greater than 30, the quantiles 01' 7‘ may be approximated by
l n(n2 - 1)

11.) 2- "”("2 - l) + '1'). -=.—‘
I u I 6 \ n - 1

where .r, is the pth quantile of a standard normal random variable

 

souncu. Conover (1971)

148

Table 8.11. Estimated Correction Factor for the Mann Whitney Test.

N = 30
a = 01 a = .05 a = 1
.1 .903 .038 .980
.2 .715 .876 .822
.3 .688 .769 .790
p .4 .516 .609 .716
.5 .474 .556 .632
.6 .386 .459 .519
.7 .283 .336 1 .405
N = 50
a = 01 a = .05 a = 1
.1 .895 .910 .928
.2 .781 .854 .857
.3 .714 .787 .828
p .4 .603 .706 .769
.5 .478 .633 .694
.6 .383 .510 .580
.7 .192 .371 .465

Tab1e B.11.

'O
\n'm‘o‘l 4:400 N

Continued

.885
.851
.815
.725
.612
.510
.374

.01

149

N = 100

.945
.897
.863
.743
.653
.597
.500 '

.05

.963
.923
.878
.830
.756
.688
.584

150

Tab1e 3.12. Estimated Correction Factor for the Spearman Rho Test.

N = 30
a = .01 6 = .05 a = .1
.1 .923 .930 .939
.2 .786 .854 .875
.3 .785 .806 .830
p .4 .682 .729 .762
.5 .548 .639 .686
.6 .463 .539 .588
.7 .350 .418 .476
N = 50
a = 01 a = .05 a = 1
.1 .925 .929 .939
.2 .869 .895 .911
.3 .777 .786 .857
o .4 .698 .798 .814
.5 .629 .711 .756
.6 .546 .636 .676
.7 .389 .507 .579

Tab1e 8.12.

'0
ummb'wm

Continued

.944
. 954
.874
.817
.768
.686
.593

.01

151

N = 100

.954
.941
.905
.866
.815
.756
.669

.05

.970
.942
.923
.887
.850
.786
.716