I I I III II I III -‘ I' 3'7) IIIIII I “NW ”III III I II. 55" I'II'I'IIII' I I‘I'HII'I’IIH‘": ,I’ I'IT'IILU’I"): fail}; III I ‘1 ”3H IEII J}; III'II‘II ' {MAINE III 'II I:II;'I"'I;IIIIII.HII'I‘I'IIIIIIJIII I III" I'I ‘ IIIIIIIII'III‘ III»! I] "'III III ""‘ ,',I‘I.'""I.II."I' ' I; ' . III IIIIIII - I I‘I I I III] IIII” III .;.,,:II' II" “I'IIIIIIIIIIIJII I IIIIII I. IIIII ‘1’" I" I I- “.II'- .IEIIIIII‘I-fiil 5)} I I II IIIIIIII I. :'I' '. IHIIIUII¢J III-IE4). g II III II III III I“ “III! IIIIIIIIIIIIIII,,.,+:II.;IIII III II I IIIIIIIIII “4%,“, IIIIII {WWI III-1"",I‘II1 II. "'7' I ”$.14 III“ II “III II'nIIIIIII I, IIIIIIIII'I I. II. "II IJ W "5 "' ”I I I W "III III‘,IImIIII~ III II II ""‘II I" “I I ,I' " II'II' I I III ”I'll ‘III'HIH, IflhIII I'I‘ III II II' II IIIIIIIIII'I “III I {iv Wu “mafnrfiuye I”?#I M’oduquu I “IV: I It IIIIII II I {I I III“ III" " IIII I'I. IIII' OIWNWIW “I?“ IIWII MWTIIII’I’IIIIII’IIIU {I} I. III II III II .. III III \"I “III. , .II- If“ ”VIII“: III 'I' III III'II‘ >' ""‘I IIIIIIIIII‘II L ,‘IHIII‘ ,II. II I ”II I "II" III ‘1 IIIIIIII’.I III‘II'III II: IIII! IQIHIII “I“ ,..I IN“ I‘III'I IIIIIIIII'II 'IIII“ ”I“ III “,I'.“ I'm IIII'I'IIIIIIII f , .'.II 'I‘.‘ IIIII III-“3' II‘IIIII VII“, I IIIIA LIBRARY Michigan Stat: Untmw‘ey This is to certify that the thesis entitled AN EMPIRICAL STUDY OF FACTORS AFFECTING BLUE-GREEN VERSUS NONBLUE-GREEN ALGAL DOMINANCE IN LAKES presented by JONATHAN TAYLOR S I MP SON has been accepted towards fulfillment of the requirements for Master of Science degreein Resource Development fag/nur/‘K H. lgflwm Major professor Date February 2L 1980 0-7639 .t.‘ 1‘ ‘V #1.. ". flank; l." affine: .. \ .TN r " ,g-l “;‘31554’. .~.» V r C ¢ . .I‘-\ ".¢4"/‘... .. ' ,5’ . OVERDUE FINES: 25¢ per on per its; RETURNING LIBRARY MATERIALS: M Place in book return to remove charge froa circulation records AN EMPIRICAL STUDY OF FACTORS AFFECTING BLUE-GREEN VERSUS NONBLUE-GREEN ALGAL DOMINANCE IN LAKES By Jonathan Taylor Simpson A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Resource Development 1979 5/K :2 3 ‘k’ E?" ABSTRACT AN EMPIRICAL STUDY OF FACTORS AFFECTING BLUE-GREEN VERSUS NONBLUE-GREEN ALGAL DOMINANCE IN LAKES By Jonathan Taylor Simpson In many lakes, the use and enjoyment of the water is limited due to the dominance of undesirable blue-green algae. Exploratory data analysis techniques were applied to 90 north temperate lakes included in the EPA National Eutrophication Survey to examine empirical rela- tionships between: 1) the chemical and physical variables that affect algal dominance in lakes; and 2) the dominant algal type. Single variable box plots and bivariate-discriminant plots docu- ment the importanct of the inorganic nitrogen concentration and hydrau- lic detention time in determining blue-green versus nonblue-green algal dominance in eutrophic lakes. The multivariate statistical technique of discriminant analysis was applied to 68 high alkalinity lakes in the data set to: 1) further identify variable relationships; and 2) con- struct a simple predictive model for algal dominance. Application of results are discussed in an ecological and management context. ACKNOWLEDGMENTS This research was supported by funds provided by the United States Department of the Interior, Office of Water Research and Technology as authorized under the Water Resources Research Act of l964, as amended. I wish to thank committee members Dr. Kenneth H. Reckhow, Dr. Darrell L. King, Dr. Eckhart Dersch and Dr. Clarence D. McNabb for their comments and criticisms regarding this work. In addition I would like to acknowledge the contributions made by Mrs. Janine Niemer (typing) and Mr. J. Paul Schnieder (graphics consultant). A special note of thanks must go to my friends and fellow graduate students Craig N..Spencer and Michael N. Beaulac for their professional and personal concern throughout these past two years. I look forward to a lifetime of interaction with them. Particular gratitude is extended to my major professor, Dr. Kenneth H. Reckhow, and to Darrell L. King. Dr. King, with his unique spirit, was unselfish with his time and patience and provided much of the limnological insight that made up the "glue" for this paper. Academic advisor, instructor, co-researcher, editor and personal friend describe the roles Dr. Reckhow played during my graduate career. His guidence was an essential element in this research from beginning to end. The contributions he made are proudly acknowledged. Finally, I'd like to thank my parents, Mr. & Mrs. John T. Simpson for their love and support throughout my college years. It is to them, that I dedicate this thesis. ii TABLE OF CONTENTS Page LIST OF TABLES ........................ v LIST OF FIGURES ........................ vi CHAPTER I. INTRODUCTION ..................... l Problem Statement .................. l Blue-Green Algae: An Undesirable Algal-Type . . . . 4 Study Objectives .................. 6 Concluding Comments ................. 8 II. A REVIEW OF LIMNOLOGICAL RELATIONSHIPS AFFECTING ALGAL GROWTH AND DOMINANCE ............. 9 The Paradox of the Plankton ............. 9 Turbulence and Hydraulic Detention Time ....... l3 Nutrient Limitation ................. 15 Biological Mechanisms ................ 21 Concluding Comments ................. 23 III. THE NATIONAL EUTROPHICATION SURVEY .......... 25 Objectives ..................... 25 Lake Selection Criteria ............... 25 Field Sampling Methods and Analysis ......... 27 Definition of the Independent and Dependent Variables ..................... 3l IV. AN INTRODUCTION TO EXPLORATORY DATA ANALYSIS ..... 34 Introduction .................... 34 Analysis of Data Variability Using Box Plots . . . . 34 Analysis of Bivariate Relationships ......... 39 Discriminant Analysis ................ 4O Page V. AN EMPIRICAL ANALYSIS OF EPA-NES DATA ......... 5l Introduction .................... 51 Preliminary Statistics ............... 51 Analysis of EPA-NES Data Using Box Plots and the Snedecor-Cochran Statistic ........ 52 Analysis of the EPA-NES Data Using Bivariate-Discriminant Plots ........... 7O Discriminant Analysis of High Alkalinity EPA-NES Lakes .................. 79 A Case Study and Summary .............. 86 VI. CONCLUSIONS ..................... 89 BIBLIOGRAPHY ......................... 95 iv LIST OF TABLES TABLE Page l.--Concentration of essential elements for plant rowth in living tissues of freshwater plants ?demand), in mean world river water (supply) and the plantzwater ratio of concentrations (demandzsupply) (from Vallentyne, l974) ........ l6 2.--EPA-NES sample analysis summary (modified from USEPA, 1975) ................... 28 3.--Analytical methods and precision of laboratory analysis (modified from USEPA, l975) ......... 3O 4.--Key to EPA-NES data tables ............... 53 5.--Statistics for the EPA-NES data set ........... 54 6.--Matrix of correlation coefficients for the EPA-NES variables. .................. 55 7.--Snedecor-Cochran statistic for select EPA-NES variables ....................... 68 8.--Discriminant analysis results .............. 8O 9.--Minimum and maximum values for the data set used to develop the discriminant models ........ 86 lO.--Higgins Lake data .................... 87 LIST OF FIGURES FIGURE l.--Factors affecting algal dominance ............ 2.--The basic configuration of a box plot .......... 3.--Box plots possessing significantly different medians . . 4.--Box plots of total phosphorus by algal-type dominance ....................... 5.--Box plots of carbon dioxide by algal-type dominance ....................... 6.--Box plots of inorganic nitrogen by algal-type dominance ....................... 7.--Box plots of alkalinity by algal-type dominance ....................... 8.--Box plots of pH by algal-type dominance ......... 9.--Box plots of detention time by algal-type dominance ....................... l0.--Bivariate-discriminant plot of inorganic nitrogen versus detention time by algal-type dominance ..... ll.--Bivariate-discriminant plot of inorganic nitrogen versus influent phosphorus by algal-type dominance . . l2.--Bivariate-discriminant plot of detention time versus total phosphorus by algal-type dominance ....... l3.--Bivariate-discriminant plot of inorganic nitrogen versus total phosphorus by algal-type dominance. . . . l4.--Bivariate-discriminant plot of inorganic nitrogen versus alkalinity by algal-type dominance ....... l5.--Bivariate-discriminant plot of carbon dioxide versus alkalinity by algal-type dominance .......... vi Page ll 36 38 57 58 59 6O 61 62 72 73 74 75 76 77 FIGURE Page l6.--The algal-type dominance discriminant function ..... 82 l7.--Discriminant score assessing the probabilities of algal-type dominance classification ......... 85 vii EPA-NES CaCO CO2 SPSS U.S. 3 LIST OF ABBREVIATIONS Environmental Protection Agency-National Eutrophica-‘ tion Survey Calcium carbonate Carbon dioxide Statistical Package for the Social Sciences United States ClllS cm “9 mg yr MEASUREMENTS cubic meter per second centimeter degrees centigrade gram kilometer liter meter microgram milligram year viii CHAPTER I INTRODUCTION Problem Statement In recent times there have been increased demands placed on our nation's lakes, streams, and reservoirs as recreation centers, and as sources for domestic, industrial and agricultural water supplies. These demands are due to: l) the increased wealth, mobility and lei- sure time of our growing population; and 2) dwindling supplies of groundwater available convenient to the population centers. However, the subsequent introduction of cultural discharges to a lake in the form of untreated or inadequately treated industrial and municipal wastes, agricultural and urban runoff, and septic tank leachate has often resulted in serious water quality deterioration. Barlowe (l976), states that "effective resource planning and policy-making calls for broad comprehension of the relevant informa- tion concerning resource situations, the problems that exist or are expected to arise and the possible solutions of these problems." In addition to purely physical and biological knowledge, an holistic ap- proach to resource problems and policy necessarily includes tests for economic and institutional feasibility as well. Thus, a lake restora- tion program must be economically acceptable to the affected parties as well as meeting the rational goal of anticipated benefits equaling or exceeding expected costs. Institutional acceptability requires the examination of the legal, political and social constraints within the bounds of administrative workability. From this perspective, water quality problems can become very complex and demanding to those work- ing within the field. It is within this context that the need for adequate information to make sound decisions is essential if we are to realize the main objective of proper resource development, that is, "policy which enables a nation to provide citizens with opportunities for obtaining high levels of life both in the near and more distant future" (Barlowe, T976). In addition, this information must be furnished at a reason- able cost to satisfy those who must pay for it. The Federal Water Pollution Control Act of 1972 (Public Law 92- 500) established as a national goal, restoration and maintenance of the chemical, physical, and biological integrity of the nation's waters. In response to the Federal commitment for comprehensive na- tional, regional, and state water management practices, the United States Environmental Protection Agency originated the National Eutro- phication Survey (EPA-NES). The purpose of the survey was to develop information on select freshwater lakes and reservoirs so that the pro- blems of water quality can be more adequately addressed. It is important to be aware, however, that water quality itself may be viewed from many perspectives, according to the desires, uses and goals of the interested population. For example, a bass fisher- man's designation of a "quality" lake may be altogether different from the designation assigned to the same lake by a public health specialist. In other words, levels of water quality may be defined by many quanti- tative and qualitative terms. Nutrient concentrations, algal biomass, bacteria pollution, blue-green algal dominance, concentration of sus- pended sediment, loading of organic matter, oxygen depletion, and toxic pollution all may be considered a measure of water quality. Each definition can represent a real problem to those affected by it. Vollenweider (1968) points out the difficult, yet important, task of separating the problem of eutrophication from other problems of water pollution. He defines eutrophication (in the true sense) as a term that may be applied to anything, including both external and in- ternal sources, which plays a part in: l) accelerating nutrient load- ing; and 2) increasing (ambient) nutrient levels and water productivity to the point that nuisance conditions exist. Vollenweider's definition places emphasis on the causes of eutro- phication, namely increased nutrient enrichment by cultural sources. However, nutrient enrichment per se carries no inherently clear mes- sage to lakeside property owners, policy-makers, zoning boards, etc. What is much clearer to the non-professional are the obnoxious symptoms (e.g., algal blooms) and effects (e.g., lower property values around the lake) exhibited by increased eutrophy in a lake. An "abundance of primary production" is perhaps the best elucida- tion of the term "eutrophication". However, in-lake restoration tech- niques differ according to the type of eutrophic problem a lake faces. King (1979) identified three basic nuisance problems that may be found within an eutrophic lake: l) an abundance of algae and/or the dominance of an undesirable algal-type; 2) an abundance of macrophyte growth; 3) an undesirably low concentration of dissolved oxygen in all or parts of the lake. Although each of the above problems are either directly or in- directly related to primary production (stimulated by nutrient loading), in order to optimize in-lake restoration strategies, it is useful to treat each of the listed problems as a separate management problem worthy of investigation. It is the purpose of this paper to concentrate on the dominance of an undesirable algal type as a basic nuisance problem. Blue-Green Algae: An Undesirable Algal-Type Palmer (1962) states that of the approximately l8,000 algal species identified, only a small number are notable nuisance species. In particular, the dominance of blue-green algae in a lake is a very visible and objectionable sign of eutrophic conditions. Blue-greens are prokaryotic in cell structure and thus resemble bacteria in many respects. This characteristic and their relatively large size and/or organizational type (i.e., coenobial or filamentous) make them undesir- able as a food source to zooplankton and other higher organisms. Cer- tain common blue-green genera such as Anabaena, Microcystis, Aphani- zomenon and Oscillatoria are buoyant due to pseudovacuoles and may collect in large, unsightly mats or "blooms". Their death and subse- quent decomposition on the lake surface produces a distinct "septic" ordor which detracts from a variety of lake uses. Most species of blue- greens are also notorious slime producers and some genera such as Anabaena and Oscillatoria are troublesome filter cloggers (Palmer, 1962). Of all the species of freshwater algae, only the blue-greens ex- hibit toxic properties (Palmer, l962). Animal ingestion of some blue- greens (or their toxic by-products) reportedly resulted in prostration and convulsions followed by death. Incidents of animal poisoning have been documented in many states including Michigan (Stewart et al., 1950). It is very possible many similar cases go unreported because of unfamiliarity with blue-green toxicity. Among the selection advantages possessed by the blue-greens, as compared to other freshwater algae, included an ability to function at high light intensity and temperatures (Jackson, 1965; F099, 1965) and at low free carbon dioxide concentrations (King, 1970). There is also evidence that some blue-greens can chemically inhibit growth of other algae (Boyd, l973). An additional advantage afforded some species of blue-greens is the ability to fix elemental nitrogen (i.e., transform- ing it into a biologically useable form) (Dugdale and Neess, l96l). The above mentioned physiological advantages coupled with buoyancy and their incompatability with the traditional food chain enhances the likelihood that blue-green algae may become the dominant algal type in culturally impacted lakes. Because of the obnoxious and toxic proper- ties exhibited by most species of blue-green algae, dominance of this particular algal type in a lake may be defined as a symptom or index of poor water quality. The research contained in this paper hinges on this definition and thus the desirability to develop management strate- gies that affect blue-green algal dominance. Shapiro et al. (1977) state that environmental manipulation which causes a dominance shift from blue-green algae to other types of fresh- water algae has great potential as a lake restoration technique. He pre- dicts that someday "it will be possible to manipulate small or moderate size lakes to improve them. Part of the manipulation may involve con- verting the population of algae from forms inedible by zooplankton to forms that can be eaten by them. In other words, we consider it possible to bring about a blue-green to green shift in whole lakes." Obviously, the feasibility of a management strategy that manipu- lates algal dominance requires careful examination and assessment of factors such as essential and limiting growth requirements, and rela- tive competitive abilities among the algal-types. In general, phos- phorus is thought to be the most important factor in stimulating and maintaining eutrophic symptoms. While this statement may be true in terms of total algal biomass, the qualitative makeup of the population is amore complex issue. For example, King (1970, 1972) presents evidence that (especially in lakes undergoing cultural eutrophication) factors such as nitrogen, carbon, and light may be very important in producing at least one eutrophic symptom, the dominance of blue-green algae. Study Objectives The objective of this research was to statistically examine data from a cross section of north temperate lakes to identify which factors are most important in determining whether or not a lake is dominated by blue-green algae. Among the variables examined were chemical, and physical parameters found, in theoretical and experimental work, to have an influence on algal dominance. Empirical relationships were studied using exploratory data analysis techniques such as bi- variate plotting and correlation analysis. The multivariate statis- tical technique of discriminant analysis was also used as a tool to describe the relationships among variables, and to develop a predic- tive model. Chapter II briefly reviews limnological relationships and condi- tions that tend to favor the dominance of certain algal types. Chap- ter III describes the Environmental Protection Agency's National Eu- trophication Survey (EPA-NES) which provided the data base used in this study. Included in this chapter are the materials, methods and lake selection criteria used by the EPA and this investigator. De- ficiencies in sampling design and technique are noted along with the assumptions used for this investigation. Chapter IV describes the exploratory data analysis techniques undertaken to familiarize the investigator with the data and variable inter-relationships. This chapter also introduces discriminant analy- sis as an exploratory aid and as a classification tool. Chapter V applies the data analysis techniques described in Chapter IV to an EPA- NES data set. This chapter provides documentation of the factors that appear to most affect algal-type dominance. Discriminant analysis is then applied to the EPA-NES data to: l) further define the differences between blue-green dominated lakes and lakes dominated by more desir- able algae; and 2) develop a simple predictive model for algal dominance. Chapter VI summarizes the results of this research, within a management context. Concluding Comments This research is intended as a heuristic empirical examination of the factors involved in algal dominance using a large and uniformly collected data set (EPA-NES). From this analysis, it is possible to draw at least tentative conclusions regarding a sub-section of the entire cultural eutrophication problem facing lakes in the U.S. In a larger sense, this research represents a different approach to the investigation of algal dominance in lakes. Previous studies of this topic have generally been experimental in nature, and thus based upon small-scale, human-constructed systems. This research, in contrast, is based on unperturbed, natural (in the experimental sense), full— scale lake systems. Each approach has its merits and drawbacks, how- ever, by pursuing both methods, knowledge and mutual corroboration can be gained that ultimately may lead to an understanding of this complex issue. CHAPTER II A REVIEW OF LIMNOLOGICAL RELATIONSHIPS AFFECTING ALGAL GROWTH AND DOMINANCE The Paradox of the Plankton Even today the "paradox of the plankton"1 and the apparent devia- tion from the general theory of competitive exclusion remains an ex- ample of the complexity of lake systems. This phenomenon is perhaps not so remarkable, however, if one considers the spatial and temporal heterogeneities that occur in a lake and our inability to sample throughout time and space. For example, Hutchinson (l96l) suggests that the competitive ex- clusion theory is unsuitable within a dynamic environmental context such as a lake. He proposes that the multitude of external and inter- nal interactions present within a lake system are constantly creating "new" environments that each species must contend with, and these new environments may or may not be optimal habitats for growth or even survival for any particular species. It has also been established that different plankton species may be found at intermediate depths within the photic zone (Lund, T965; Moss, l969; Baker and Brook, l97l). Richerson et al. (1970) suggest 1The "paradox of the plankton" is a phrase coined by G. E. Hutch- inson (l96l) to describe the phenomenon of species co-existing in an apparently isotrophic environment where the species must compete for the same limited supply of nutrients. 9 10 that the water column is, in fact, a three-dimensional array of habi- tats. For example, Moss (1972), in the summer stratified waters of Gull Lake, observed stratified populations of Synechococcus sp., Cryp- tomonas sp., and Rhodomonas sp., within the water column. This phe- nomenon of species stratification perhaps indicates that competitive exclusion does indeed occur in selected habitats within a lake system. In the water column of most temperate lakes many species of algae may be found co-existing at any one time, however, there is usually an overall dominance of one or two species (F099, 1965). Qualitative changes in dominance occur on a seasonal basis and generally in a fairly predictable pattern from year to year in natural lakes (Wetzel, 1975). Figure 1 is a simple schematic representation of lake and water- shed characteristics that affect algal dominance. The natural base character of a lake (e.g., alkalinity, ionic makeup, base sediment, etc.) is generally a result of the surrounding watershed (King, 1979). The geomorphometry, geology, climate, terrestrial plant development, etc. also determine the natural nutrient loading to the lake. Cul- tural development within the watershed, and the importation, disposal, and subsequent migration to the lake of nutrients (e.g., contained in sewage, fertilizers, detergents, etc.) represent the artificial load. Seasonal conditions control and modify watershed inputs and internal physical, chemical and biological factors operate to produce dynamic habitats within the water column. The dominant algal type is theoreti- cally the species that can best compete in the niches defined by the habitat(s). ll mucmcwEoc _mmpm mc_womwmm mcouumd--._ weaned acaaac:;Lc: 44~_mzua zc_h<4:;:¢ cz< mum: ::H AA A S___aa__=>c a;:_+\\\.\“nvwv ma_sxa Amwuu/ 7 uv zuez_z m=:_uoecmg=_ .cu_:o_c_m /V a“ mdcw2_cmm Kmn\\\ can Leas: Lo couuacazo owe: d3C4Cuc az< >xhgsc:;zcz euzmzuhmp pomymm >Jluamz Pom: .uwcmuF_d mo::o-¢ .z-m_cose< .zuoumcu_cumpwcu_z mFm>mF powpmm >Alummz Npum: .umcmHFVwcs muczouv mzcogamoga Pmuop cowumcmmucw mcoN depend >4uummz :owpzpom m.~om:4 mo:=o-¢ :o_umome_acmup mmm_< mFm>m_ pum_mm amp upm_u mpmu_sm;u gum: _s-oom cmmxxo um>Pomm_o m~m>m_ womem amp upmwd nmumcmmwcwmm muczouv In mzoacwucou :gwm :H spams maoacwucou aawm :H mczumcmqsm» spawn emacommcm mews: pcmspmmch v_m_d mE=_o> mcmuosmcmm _ m_qum Ammmp .onm An umzoFFom cowuuzumg sawEuwu .eeWEewe eee_»eee-fi_seeeeee-_v-z ecu: um_a:ou mu_cuwc x3 muw5m_m:mm_=m mo coPHMNPHonPo .Awumcamocaoguco um>~omm_u coev meQEou mumunxposozamocauncoe_gcm mo cowumcwscwumu uwcpmewcopou acm>Fo>cw muosume “comma; mpmcwm ma umzoppoe :owumuwxo mummpzmcma 3:53.: :38 Zlmwcose< z-meecew2 z-epeeewz-eewcewz z-epweewz mzcogqmoza pouch cowmwumca eeepdz cmumEmcma Ammmp . L3 _ 75% value 0 S 0: <1 Inter- ) quartile _____ Median value range LL. 0 5. LL] (D F— 25% value (0 D O D Z :2 -L- Minimum value 0 0 <1 Figure 2: The basic configuration of a box plot 37 overlap in the display, the medians are roughly significantly different at the 95% confidence level. The height of the notch above and below the median is i Cs, where C is a constant between 1.96 and 1.39 and s is the standard deviation of the median (see McGill et al., 1978 and Reckhow, 1979 for details). Figure 3 displays, for variable y, a pair of box plots; Group A and Group B. Note that the notches do not over- lap and therefore group medians may be considered significantly differ- ent (at the predefined confidence level). Box plots do not represent the only method of estimating the dis- criminating power of independent variables, however. Snedecor and Cochran (1967) suggest that the following function be used to measure the classification effectiveness of an independent variable, given two predefined dependent variable groups. 2...; a...) where: u1 = variable mean value for group 1 u2 = variable mean value for group 2 o = variable standard deviation (both groups combined) “The discriminating power of a variable increases as relationship (4.1) increases (greater distance between the means), and a value for relation (4.1) of 1.5 indicates a high degree of classification success (94% if the variable is normally distributed)" (Reckhow, 1977). 38 Statistical significance :>- of the median Lu r_ _ J | on -4 4 0: 4 > __L_ Group A Group B Figure 3: Box plots possessing significantly different medians 39 Analysis of Bivariate Relationships Another phase of exploratory data analysis is the examination of relationships between variables of interest. Careful study of vari- able relationships facilitates the construction of a conceptual model (such as Figure l) and may suggest improvement of variable expression. The association between two variables is often expressed in terms of a correlation coefficient. A correlation coefficient is a summary of the linear relationship and indicates the degree to which variation of one variable relates to the variation of the other. The correlation coefficient (r) may be calculated by the following equation: n - .- r = 1;] (X1 ' X) (Y: ' Y) (4.2) Q -22 -2 1:] (X1 ‘ X) 1:1 (yl ‘ Y) where: n jg] = the sum of the elements; from i=1 to i=n i,y = sample mean for variables x and y respectively X. 1, elements from variables x and y respectively yi The value r may range from -1 to +1, the stronger the linear associa- tion, the higher the absolute value of r. Reckhow (1979) notes two important limitations of the correlation coefficient: 1) since r evaluates a linear relationship, it will not reflect an intrinsically nonlinear association; and 2) r may be biased towards a higher absolute data value if the data set is not normally 4O distributed.1 For these reasons it is often desirable to evaluate a correlation coefficient with the aid of a bivariate plot. Plots may visably provide evidence of nonlinearity and the need for re-expression. One useful modification of bivariate plotting is to identify different groups of interest within the plot. By constructing these bivariate— discriminant plots, variable relationships can be examined within the groups and between the groups or well as within the total data set. Discriminant Analysis Many research situations require an investigation of the functional relationship between independent and dependent variables. Regression analysis is a common method of optimizing a linear fit (y versus x1, x2 ... X”) to the data points. That is, multiple regression "best" combines the independent variables in a linear equation to describe or predict the dependent variable. This analysis is not especially appropriate if the dependent variable has discrete states, however. For example, regression analysis can be aptly used to develop a func- tion to predict phosphorus concentration, which can assume a continuous set of values (Reckhow, 1979). On the other hand, if the dependent variable has distinct or discrete states (e.g., blue-green and nonblue- green algal dominance), the multivariate statistical procedure of dis- criminant analysis is better suited to describe the functional relation- ship between independent and dependent variables. 1In some instances non-normal distributions may be normalized via a data transformation. For example, a commonly observed distribution among water quality data is the log-normal. That is, a logarithmic transformation of the data results in a normal distribution. 41 The concept of discriminant analysis is fairly old (introduced by R. A. Fisher in 1936) but infrequently used in comparison to multiple regression analysis. Since the procedure may be unfamiliar to the reader, discriminant analysis and its usefulness in exploratory data analysis and as a modeling technique is described in more detail below. The principle objective of discriminant analysis is to discrimin- ate, classify or otherwise distinguish between two or more groups of cases using a set of independent variables (Morrison, 1969). This multivariate statistical procedure estimates a linear combination of the independent variables that "best" classify the cases (e.g., the EPA-NES lakes) into one of the predefined dependent variables classes (e.g., algal-type dominance). In other words, the discriminant func- tion attempts to maximize statistical distinction along a single di- mension. The discriminant function is of the form: 01. = de1 + dizz2 + ... + dipzp + dO (4.3) where: Di = the discriminant score ij = the discriminant coefficients zj = the raw score for the discriminating variable (e.g., lake parameter values such as nitrogen concentration, mean depth, etc.) d0 = a constant Thus, for a two group example: , a case is classified as a member of group 1 if Di < (e.g., blue-green dominated lake) Di(crit) if D a case is classified as a member of group 2 i > Di(crit)’ (e.g., nonblue-green dominated lake) 42 where: Di(crit) = the critical value for the discriminant score that separates the groups Di(crit) is defined by the classification boundary, if there are two independent variables the boundary is a straight line.1 The boundary defined by three independent variables is a two-dimensional plane in a three-dimensional space, etc. In other words, the classification bound- ary is a n-l dimensional hyperplane in n space (Morrison, 1969). The discriminant analysis procedure assumes that the independent variables are normally distributed and that the independent variable variances are the same for all groups. In succinct terms, the dis- criminant procedure assumes that the group covariance matrices are equal (Nie et a1., 1974). Two research objectives may be met using discriminant analysis: 1) the development of a predictive model that can be used to estimate group memberships of unknown cases; and 2) the identification of the relative importance of the independent variables in the discriminant function. Obviously, the value of assessing the relative importance of dis- criminating variables hinges on how well they actually discriminate. Likewise, a predictive model is only as valuable as the discriminating power defined by the independent variables. Therefore, both objectives are necessarily developed concurrently in the discriminant analysis procedure, regardless of the initial priorities. 1The line is straight only if the assumption of equal covariance matrices between groups is not violated (Reckhow, personal communica- tion). 43 The discriminatory power contained in the discriminant function can be assessed by classifying known cases and then determining the number correctly classified. This classification step is necessary to demonstrate that classification results are better than one would ex- pect by chance. If the percentage of correct predictions are judged to be significant, one may then begin to draw meaningful information from the investigations into variable relationships and the relative importance of the independent variables as well as the function's potential success as a predictive model (Frank et al., 1965). On the other hand, if the predictive capabilities of the discriminant function are no greater than can be expected by chance, investigations into relative variable importance and the function's use as a model is of no relevant consequence. Assuming that a discriminant function's predictive capabilities are judged to be significant, one can logically proceed to the second objective; identification of the relative importance of the independent variables in the function. This can be accomplished by examining the function's standardized discriminant coefficients (Nie et a1., 1974). As in regression or factor analysis, the weight of the coefficient re- presents the relative contribution of the independent variable in the function. The sign of the coefficient determines the direction of the independent variable's effect on the dependent variable. However, if the variables are correlated, the standardized discriminant coeffi- cients are much less meaningful (Reckhow, personel communication). 44 When this situation occurs, the relative importance of the independent variables should be judged primarily on the examination of box plots and the Snedecor-Cochran statistic (Equation 3.1). In many research situations, more independent variables are avail- able to the researcher than practically or statistically needed to achieve satisfactory discrimination results. Therefore, step-wise discriminant analysis techniques that enter variables into the function one at a time are useful in determining the "best" discriminating variables. In step-wise discriminant analysis, the independent variable that enters the function first is the best discriminating variable judged by group (dependent variable) mean value separation. Subsequent variables are included in the function based on their dis- criminatory power in combination with the previously selected vari- ab1e(s). One common statistic used for step—wise variable selection is the partial F ratio. This “F statistic" is a test for statistical significance of additional dependent variable separation (discrimina- tion) created by a new variable beyond that already achieved by pre- viously entered variables. Thus, the variable with the highest partial F ratio (conditioned on the variables already present in the function) is selected for inclusion at each step.1 The selection procedure may 1It should be noted that the methods of variable inclusion in the discriminant function do not "order" the relative discriminant impor- tance of the variables. It can only approximate the "best" discrimina- tion variables since the procedure does not assess every possible sub- set of variables but rather sequentially selects the "next best" dis- criminator conditioned on the variables previously selected (Nie et a1., 1974 . 45 be halted when the partial F ratio is deemed too small to be of signi- ficance. Wilk's Lambda is also a common measure of group discrimina- tion and is a direct inverse function of the “F statistic". That is, the variables which maximize the F ratio, minimize Wilk's Lambda (Nie et al., 1974). Morrison (1974) notes that the "F statistic" is the multidimen- sional analog of the traditional “t-test“ for statistical difference between group means. He points out that the concept of statistical significance must be approached with caution since levels vary with sample size. Therefore, statistics like the partial F ratio and Wilk's Lambda are poor indicators in themselves of the ability to which in- dependent variables can discriminate. Thus, the question; "How well do the independent variables discriminate?" must include tests of correct classification. One issue of concern is the application of discriminant analysis to a situation where the majority of a population (or sample) belongs to one dependent variable group. Thus, population (or sample) distri- bution alone favors the classification of a case into that one group over any other groups. In this instance, one may either eliminate cases from the majority group(s) until the groups are approximately even or incorporate prior probabilities into the discriminant function. These probabilities,however, affect only the constant value and have no effect on the discriminant coefficients (Frank et al., 1965). Thus, if the research purpose is only to assess variable importance in the 46 discriminant function (via the examination of standardized discrimina- tion coefficients), prior probabilities may be disregarded and all available cases used to calculate the discriminant function. The discriminant information contained in the discriminant func- tion can be re-expressed as classification functions, one for each group. These functions are derived from the pooled within-group co- variance matrix and the centroids for the discriminating variables. The classification functions are of the form: Ci = Cilvl + ClZVZ + ... cipvp + co (4.4) where: Ci = the classification score for group i Cij = the classification coefficients vj = the raw score for the discriminating variable co = a constant An individual case is classified into the group that yields the high- est classification score of all the group classification functions (Nie et al., 1974). In the usual procedure of classification function development, all individual cases are assumed to have equal probability of group membership. However, if sample or actual population distributions are known, or if the risks associated with misclassification represent a special concern, an adjustment of classification probabilities should be performed (Morrison, 1974). Such is the case, as previously men- tioned, when group memberships are of grossly different size. 47 An important consideration when using the discriminant or classi- fication functions in a predictive capacity, is the uncertainty in- herent in the prediction. Reckhow (1978) presents probability equa- tions that take into account variable uncertainties. He also provides a method which expresses the classification equations in a probabilis- tic form. This method is presented below: P. = I (4.5) e-(zc.f.i) + 1 where: P. = the probability associated with group i classi- fication 2c.f.i = the sum of group i classification functions The Statistical Package for the Social Science (SPSS) (Nie et al., 1974) also present a method by which classification probabilities can be estimated: 2 P.|p.|"/2e ‘ “j/z P(G./X) = 3 3~ 2 (4 6) J g P |D |"/2e ' Oj/Z 1‘1 J I where: P(Gj/X) = the probability associated with group j classi- fication P. = the prior probability for group j 0. = the group covariance matrix for group j = the number of groups = the chi-square distance from each group centroid 48 Although uncertainty is inherent in any classification prediction, when constructing the discriminant or classification functions, the independent variable uncertainty (in the model-development data set) is automatically incorporated in the model (Reckhow, 1977; Walker, 1977). Therefore, if the data for a specific application case are gathered and assessed in a manner similar to that for the development data set, the probabilities (with that level of uncertainty) require no further error analysis. Discriminant analysis computer programs usually present classifi- cation information in a table or matrix (Morrison, 1969). This table described how many cases from the original data are assigned, via the classification function, to the dependent variable groups. Thus, the percentage correctly classified is the percentage of cases assigned to their true original groups. This method, however, produces an upward bias of classification "success" since the data set is optimally fitted to the functions when it is constructed (Morrison, 1969). That is, the discriminant function automatically maximizes the percentage of cases correctly classified. One method to reduce classification bias is to use a percentage of the data cases to construct the discriminating function(s) and then use the remaining percentage to test for significant classification results. Note that a large data set is generally required to apply this method. The "jackknife" technique is similar to this in theory, except that, in the jackknife approach, an individual case is omitted and a discriminant function constructed using the remaining cases 49 (Mosteller and Tukey, 1977). The omitted case is then classified based on the computed discriminant function. The process is subse- quently repeated until every case in the data set has been omitted and classified according to its own "unbiased" discriminant function. The overall result of the jackknife method is the construction of an "unbiased" correct classification table that can be used to assess the discriminatory power of the function's discrimination variables. Reckhow (1978) states that although the discriminant function(s) and the classification functions convey nearly the same discriminant- type information, they may be best used for different purposes. He suggests that the discriminant function, when displayed in a graphical form, is more useful for multi-lake analysis or cross-sectional com- parisons while the classification functions, when expressed in a prob- abilistic form, are more useful for single-lake, longitudinal studies. Evaluation techniques and exploratory data analysis must all be considered in conjunction with the goals and objectives of the re- search when assessing the usefulness of discriminant function(s) or classification functions as a predictive model. Clearly, a predictive model that requires independent variable values that cannot be obtained or estimated within normal budgetary constraints is useless as a manage- ment tool. Reckhow (1979) also points out that: "It is critical that an appropriate model be selected (assum- ing that an appropriate model can be found) for the intended purpose, and that the individual applying the model has a clear understanding of the model's limitations." 50 He further states that: "It should be the modeler's responsibility to clearly docu- ment the proper use and limitation on the use of his/her model. For an empirical model, this documentation should include a statement of all limitations ...associated with the data set used to develop the model. In addition, any biases noted within the range of application should also be indicated. Another important but generally neglected statement of information about the model concerns the type of issues and decisions appropriate for model appli- cation and the value of the information provided by the model toward this application." CHAPTER V AN EMPIRICAL ANALYSIS OF EPA-NES DATA Introduction In the analysis of a large multivariate data set it is important that the variables themselves (e.g., data distributions) and the rela- tionships among the independent variables and between the dependent and independent variables (e.g., correlations) are well understood. This chapter documents the exploratory data analysis that was used in this research to examine relationships between the chemistry and physics of select EPA-NES lakes and the dominant algal-type in those lakes. Discriminant analysis was also used in the exploration of multivariate relationships and in the construction of a predictive model. The Statistical Package for the Social Sciences (SPSS), an inte- grated system of computer programs, was used for the calculation of the statistical material described in this chapter. SPSS was accessed via the Control Data Corporation Computer (CDC 6500) on the Michigan State University Campus. Preliminary,Statistics Tables 5 and 6 present summary statistics that can be used to analyze data distributions and variable relationships. Table 5 pre- sents the mean, median, standard deviation, minima, and maxima of the 51 52 independent variables derived from the ninety EPA-NES lakes pre- selected for use in this research. Table 4 is the key to the data tables and Table 6 presents a matrix of independent variable correla- tion coefficients. Not surprisingly, there are relationships among some of the physical lake variables; notably, mean depth, lake volume and hydraulic detention time. There is also high correlation between average in- fluent phosphorus concentration and lake total phosphorus concentration. Other empirical studies have also verified this close relationship (Vollenweider, 1968; Reckhow, 1977). It is interesting to note, how- ever, that average influent nitrogen concentration and lake inorganic nitrogen concentration do not correlate nearly as well. Table 6 also indicates relatively'strong relationships between pH and: 1) carbon dioxide concentration; and 2) alkalinity. This is to be expected because of the intimate relationship among the 002, bicarbonate and carbonate equilibrium concentrations and pH (Wetzel, 1975). Analysis of EPA-NES Data Using Box Plots and the Snedecor-Cochran Statistic As discussed in Chapter IV, summary statistics, like those pre- sented in Table 5, can be misleading if the data set has a skewed dis- tribution. Therefore, graphical presentations of data, such as box plots, are desirable since they often convey more information than simple summary statistics alone. The box plot is useful in two aspects of data analysis: 1) it can lead to a thorough examination of data distributions; and 2) it Table 4: —‘| < N l— Prec Temp 00 pH AlK 53 Key to EPA-NES data tables Lake area (ka) Mean depth (m) Lake volume (10 6m2) Hydraulic detention time (yr) Basin (watershed) area (ka) Mean annual total water inflow (cms) Areal water loading (m/yr) Mean annual total precipitation (cm/yr) Median Median Median Median Median Median Median summer summer summer summer summer summer summer water temperature (0C) dissolved oxygen concentration (mg/l) pH (unitless) alkalinity (mg CaC03/1) total phosphorus concentration (mg/l) inorganic nitrogen concentration (mg/l) free carbon dioxide concentration (umoles/l) Total annual phosphorus load (g/mz—yr) Total annual nitrogen load (g/mZ-yr) Average influent phosphorus concentration (mg/l) Average influent nitrogen concentration (mg/l) 54 Table 5: Statistics for the EPA-NES data set Standard Mean Median Deviation Minimum Maximum AL (kmz) 11.02 3.94 16.80 0.10 81.12 2 (m) 5.44 4.15 5.01 0.90 31.60 V(lO6m2) 83.37 15.45 203.17 0.20 1170.60 T (yr) 1.569 0.302 3.639 0.003 21.000 AB (kmz) 2231.32 176.50 10521.71 4.00 96324.00 0 (cms) 13.59 1.40 40.84 0 287.10 qS (m/yr) 1.8 0.5 3.5 0.0 20.0 Prec. (m) 91.1 90.9 16.8 58.0 156.0 Temp (00) 21.9 21.8 4.2 10.4 30.1 00 (mg/l) 6.0 6.8 2.5 0.0 10.4 pH (unitless) 7.89 7.96 6.73 5.2 9.2 AlK (mg CaC03/1) 120 120 76 10 334 P (mg/l) 0.139 0.051 0.266 0.005 1.420 N (mg/l) 0.75 0.29 1.04 0.05 4.62 002 (umoles/l) 103.7 53.8 137.4 1.2 600.0 Lp (g/mz-yr) 8.28 1.60 19.14 0.03 129.43 LN (g/mz-yr) 140.3 35.4 305.1 1.5 2382.8 INP (mg/l) 0.32 0.11 0.79 0.01 5.70 INN (mg/l) 3.68 2.45 3.83 0.51 25.25 55 .8. . 82. 88. 8.. . 28 8.. .8. .8. . 88 .8.- 88.- .8. 88. . 888 88. 88. 88. 88. 88. . z .8. 8.. 88. .8. 88.- 88. . 8 .8. 88. 88. 88. 8..- 88. 88. . 8.8 88. 88. .8.- 8.. 8..- 8.. 88. 88. . =8 88.- .8. 8..- 88.- .8.- 8..- 88. 88.- 8.. . 88 8.. 88. 88. 8.. 88.- 8.. 88. 8..- 88. 88. . 888. 88.- 88.- 8.. 88. 8.. 8.. ...- .8.- 88.- 88.- 88. . 6888 .8.- 88.- 88. 8.. 88. 88. 8.. 88.- .8.- 8..- 88.- 88. . 88 88.- 88.- 88. .8. 8.. ... 88.- 88.- 8..- ... 8.. 88. 88. . 8 88.- 88.- 88. 88. 88. 8.. 88.- 88. .8. ... ... 88.- 88. 88. . 88 88. 8.. .8.- 88.- 88.- 88.- 88.- 88. 88. .- 8..- 88.- 88.- 88.- 88.- . . 8..- 88.- 88.- .8.- 88.- 88.- .8.- .8.- 8.. 8.. .8.- .8.- .8.- 88. 88. .8. . 8 8..- 88.- .8.- .8.- 8.. .8.- 88.- .8. 88. 88.- 88.- 8..- 88.- 88.- 88.- 88. 88. . 8 88.- 8..- 88.- .8.- 8..- ...- 88.- 88.- ... .8. 88. 8..- .8.- 88. 88. 88. .8. 88. . 88 28. 82. 88 88., 888 z 8 8.8 :8 88 888. 8888 88 8 88 . > 8 88 88.88.88> mmz- 888888.: 8.888888. .8 88.8 u:m=.E.cow.n-mpm.88>.m-.o. 8888.8 :3 82:- 29.28888 72 o. 0.. .. .0. I... P p P n N 0 4o . no 0 o 4 oo _ o OVJ O 40 4 4 o o4 o 8! 4 N e o o w‘ ..I a 8 o O o 4 o . .u .4.u .8 ¢xo.n..8 8.mnnv o o 0 m 68. 4 . 8.0 I. m 4 no 4 O C O 4 . 0 88888.. 88888-83888: 4 4 4 4 a O _ 9 8 .3 8388.588 88...... u o “ ‘ O 3 4 44 N 888.com 8883-838 4 \I 8 z 8.8:.Eo m 8 u . 8 8 8 x 8 44 r nu.nn mu '1 “ 6 .mx .4 .4. III a 0.» nu 73 88888.888 8888 -.88.8 88 8888888888 8888.88. 88888> 888888.8 8.888888. 88 88.8 8888.8.888.8u888.88>.m--... 8888.8 8:85. 8888888888 8288.882. 8.8 8.. 8.8 ..8 88. .8. .l I I o D I D N o 0 o 4 0 HO 9 4 00 .. ..O 9 o o o o 4 14 V o o 44 4 I... o 4 ow 4 O o 4 8888 4 8 8.0 .8 nu «8 “Mn n. n. .4. . nlu o o . 8 o I. O ‘ a o 4 o 4 4 . 0 888888 8888888288 4 4 4 4 4 . O . 9 8 .3 88888.858 88.8.. n 4 040 4 3 44 4 4 N 888888 88888-88... 4 8 \I 8 88 88888.888 8888 n 0 mm 44% o n Rwy 1 h 8.8 W 74 88888.588 8888-.88.8 88 8888888888 .8888 88888> 85.8 88.888888 88 88.8 8888.5.888.8-888.88>.m--.N. 8888.8 8<8588888888888 88888 0H ... .00 4 4 4 G 4 T . 3 4. .nv In. 4 4 4 4 ...: 4f .8 .8 “Mn 4 004 o4 H... a. A. a. 4. av ,4;4 . .. AHV o 8% 4 4 N 4 4 4 o O nu nu nu nx‘ .I; 4 o I. 0 V 0 V o 4 o o 4 4 4 . . W 888888 88888882888 0 4 o 4 8 .3 88888.888 8888 u 4 O o O o O O ‘ \II 0 O 00 CO ‘ IA 888888 88888-88.8 v :4 8 .3 88888.88 8888 u o o 4 O. {\ ll 0 0 >5. 0 O 75 88888.888 8888-.88.8 88 8888888888 .8888 88888> 888888.8 8.888888. 88 88.8 8888.5.888.8-888.88>.m--.m. 8888.8 8:858 8888888888 8888.. 0.. .. .0. o M O 4 O O o ...0 88 a 8 O 9 O 0 O 4 Qv ‘ O 4 O 4 v 0 40 4 m. n8 .4 A“ nxu mmm 848 4 .8 4 3 0%. o 40.0 4 o o 4 N o o 4 o 8.8 I .8 .8 .|+ 4 H8 4 4. 4~MWAC n. 4. 4. .nu._ Anv 888888 88888-88.8888 nnv 8 88 88888.888 88.8.. n 4 00 4 4 .....— 4. .4 888888 88888838 4 ) 8 a 8 88.88 8 8 u o , 8 8 8 . 8 8 8 48 4x4 mm .88 4 .4 ..fl ‘ O 4 0 8 nu 76 88:88.888 8888-.88.8 >8 88.8..88.8 88888> 888888.: 8.888888. 88 88.8 8888.5.888.8-888.88>.m--.8. 8888.8 588888 8.5 3.23338 con 00. on 0. L C P p n I o N o 04 0 A8 .4.0 _ av “Au 0 .d 4 0 4 4 9 mm. .8 .8 .8 any 0 00 o 00 ‘ ‘ ‘ N n8 4:0 .4 II. 0 w8 0 0 0 4 o4 n 00 O 4 88 4.88 .8 mm nvmwm 0 0 4 I. 43 44 . 88 888888 8888883888: 4 4 _ 4 O . O 8 ..8 88888.888 88.88 .... 4 4 4.888 9 4 4 ...... 888888 88888-88.8 4 N 8 »8 88888.888 8888 u 0 . \II I .4 « 0 mm >88 .4 .8.. w 77 88888.588 88x8-.88.8 x8 88.8..88.8 88888> 88.xo.8 888888 88 88.8 8888.8.888.8-888.88>.m--.8. 8888.8 588888 8:8 8.823888... 00m 00. On 0. D b O b O I n O .88 .4 any 0 0 d O ) 0 . O. I” 0 .88 0 4 n8 .8.8 .4 .4 888888 8888883888: 4 04 m 8 >8 88888.868 88.8.. n 4 o 08W .60 4 0 o a m .. 0m .8 888888 88888-88.8 4 4%0 0‘ 4 S 8 >8 88888.588 88.8.. n o 4 4O 46 W4 o O U 4 O 4 4 4 O 4 4 8 O .. 00m .IIILI uF 78 Empirical analysis of the EPA-NES data set appears to confirm this conclusion, although, based on Figure 14, inorganic nitrogen is im- portant only in the higher alkalinity lakes. One theory that would explain this phenomenon is presented by King (l979). Assuming a natural phosphorus limit at the onset of cul- tural eutrophication, King (1970, l972) states that the alkalinity and the phosphorus concentration are the primary determinates of whether or not aquatic growth progresses to the point where carbon or nitrogen becomes the most important limiting factor of growth. Recall that the alkalinity can serve as a reserve carbon source for photosynthesis. A lake with low alkalinity (i.e., a low carbon reserve) would require little phosphorus to initiate carbon limiting conditions and stress the algae causing a dominance transfer to bouyant blue-greens. Lakes with a high alkalinity (i.e., a large carbon reserve) would not be expected to reach the carbon limit so easily, therefore, nitrogen would be more likely to be of primary qualitative importance when phos- phorus limits are nullified (via cultural loading). Figure l4 appears to empirically confirm that the nitrogen con- centration is an important variable in determining algal dominance in the high alkalinity EPA-NES lakes. Further investigation of bivariate- discriminant plots failed to define any strong independent/dependent variable relationships in the low alkalinity EPA-NES lakes, however. King (l970, l972) points to the C02 concentration as being very impor- tant in these lakes, but examination of the bivariate-discriminant plot of CO2 versus alkalinity (Figure l5) reveals no strong relationship. 79 Note, however, that there are only a few low alkalinity lakes on which to base this conclusion and recall also the uncertainty surrounding the calculation of the CO2 value. These may be reasons why the C02 variable was not found to be important in the low alkalinity EPA-NES lakes. Since further investigation was unable to identify strong dis- criminating variables for the few low alkalinity lakes, only high alkalinity lakes were used in the discriminant analysis procedure de- scribed below (i.e., the model-building phase). Discriminant Analysis of High Alkalinity EPA-NES Lakes The step-wise discriminant analysis program used in this investi- gation of the 68 high alkalinity EPA-NES lakes is from the Statistical Package for the Social Sciences (SPSS). The linear discriminant func- tion was constructed using the overall multivariate F ratio for the test of difference between group centroids. The variables were chosen to maximize the statistical effect of separation of the pre-defined algal groups; blue-green dominated lakes and nonblue-green dominated lakes. Addition details regarding the step-wise procedure and func- tion calculation may be found in the SPSS user's manual (Nie et al., 1974). The results of the discriminant analysis is presented in Table 8. Inorganic nitrogen (N), as expected, was the first variable to enter the function (i.e., it is the variable that best discriminates among the pre-defined groups). Hydraulic detention time (T) and influent phosphorus concentration (INP) entered next but displayed must less 80 discriminatory power as observed via the standardized discriminant co- efficients.1 The discriminatory effect displayed by the entry of sub- sequent variables was deemed insignificant by this investigator and eliminated from the function. Table 8: Discriminant analysis results . Variable Step-Wise Analysis Standardized Coefficients Step Wilk's Lambda log N l .572 l.240 log INP 2 .538 -0.338 log T 3 .524 -0.262 Discriminant analysis of the variables (log) N, (log) T, and (log) INP yielded the following discriminant function: .129 2.463 d.f. =10284 N 726 (5.2) T' INP' d.s. = log d.f. (5.3) where: d.f. = the discriminant function d.s. = the discriminant score The function is the first principal component and maximizes the group separation on a single axis (Reckhow, 1979). 1The examination of standardized discriminant coefficients was considered a valid method of assessing discriminatory power because of the relatively low correlation between the independent variables contained in the function. Bl The discriminant analysis information may also be expressed in terms of classification functions, one for each group (Nie et al., 1974). The following are the SPSS derived classification functions (c.f.) for the two algal groups. 1) Classification function for the blue-green group: c.f.(]) = -3.85(log N) - .56(log T) - 2.7(log INP) - 2.2l (5.4) 2) Classification function for the nonblue-green group: c.f.(2) = l.l2(log N) - l.l0(log T) - 4.08(log INP) - 2.29 (5.5) A case is classified into the group that yields the highest classifi- cation function score. Figure l6 is a plot of the discriminant function (Equation 5.2) with the log of the numerator on the x-axis and the log of the denomi- nator on the y-axis. The classification boundary occurs approximately where d.s. = .17. For example, if d.s. < .17 a case is classified in the "blue-green dominated" category. Alternatively, if d.s. > .17, a case is classified in the "nonblue-green dominated" category. The classification phase of the SPSS discriminant analysis program result- ed in 86.8% of the EPA-NES high alkalinity data lakes being correctly classified.1 As observed in Figure 16, group separation is distinct, however, misclassification do exist. Investigation of these misclassi- fied lakes revealed no apparent characteristics that would indicate 1Most likely there is an upward bias of classification results since the discriminant function is optimumly fitted to the data set when the function was constructed (Morrison, 1969). The jackknife technique, a method to assess classification bias, is not available with SPSS, and thus could not be used. 82 :owuucae pcmcwswgumwu mucmcwsou maxuumeFm m:p--.m_ mgzmwd o_ z oo. flow: nwvd v mm. 8. up..- co..- on... Am . . mnHmw- O O 4 0 II no no no AU 0 O o OO 40 4. mw._..D \\|l: .1- .Au 8 .hv N mcmcmm :mmcmumzpncoc nnw m .3 33528 8.3 u 4 I. .Av 9 mgmcmm cmmgmuwa—n III\\ a B 252.28 9.3 n 0 a n 4 on; 83 non-representativeness in either the data collection or assess- ment. The presence of misclassified lakes is not really surprising since the discriminant function represents a simplification of the real world and cannot be expected to account for all the variability found in multi-lake analyses. Recall too, the dynamic character of algal populations and that an EPA-NES lake was originally classified into the pre-defined algal groups based on the dominant genera occupy- ing the water column at the time the sample was taken. Importantly, successional dynamics are unique to each lake and an algal type that dominates eat any particular time may not be dominant two weeks later. The waxing and waning of populations occurs continuously but at differ- ent rates in individual lakes (Fogg, 1965). Since an equation (or model) such as Equation 5.2 represents a simplification of the real world, the practical application of the discriminant function (or the classification functions) requires an evaluation of the uncertainty of the results. A presentation of this uncertainty for a given prediction may be considered as a measure of the value of the information provided by the model (Reckhow, l979). It is possible to express the uncertainty inherent in the model prob- abilistically, and then present the probabilities graphically. The SPSS discriminant analysis program calculates classification prob- abilities using the following function: P. D. "/2e'X§/¥ P(Gj/X) - 31 J] - 2 (5.6) .g P o. '1/2e’Xj/2 84 where: P. = the prior probability for group j (.50 in J this study) D. = the group covariance matrix for group j . = the chi-square distance from each group centroid J g = the number of groups 2 J Classification probabilities for the EPA-NES lakes were calculated and matched with the associated discriminant scores (calculated from Equation 5.2) to construct Figure l7. Because the discriminant func- tion incorporates the uncertainties in the model development data set into the probability estimates, Figure 17 can be used without addi- tional uncertainty estimates if the techniques used to survey an application lake are similar to the methods used by the EPA-NES (Walker, 1977; Reckhow, 1979). Since the discriminant function can be used as a predictive model, a few limitations should be mentioned. First, a model should not be applied to a lake with variable values more than the maximum values or less than the minimum values contained in the data set used to con- struct the model. Table 9 presents the ranges of the independent vari- able values used to construct the discriminant function. Further, the functions were constructed only from lakes within the north temperate climatic zone and thus should only be applied to lakes within this zone. Also, recall that the EPA and this investigator placed selection criteria for lake inclusion in the EPA-NES and the model-building data set respectively (see Chapter III). Each of these criteria may also 85 cowumuwwwmmmpu mucmcweou mnxu-_mmpm mo mm_uwpwnmnogq on» mcwmmmmmm mcoum pcmc_swgumwonu.m_ mcaowm mmoom ._.z